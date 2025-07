AI models struggle with subtle changes in medical ethics scenarios, raising concerns about their reliability in clinical decision-making.

✔ ✔ Trusted Source

Pitfalls of large language models in medical ethics reasoning



Go to source Trusted Source

Did You Know?

AI systems can still insist on outdated assumptions, even when directly told otherwise-such as assuming a male surgeon can’t be a child’s father. #medindia #ai #medicalbias’

AI systems can still insist on outdated assumptions, even when directly told otherwise-such as assuming a male surgeon can’t be a child’s father. #medindia #ai #medicalbias’

Advertisement

Patterned Thinking in Artificial Intelligence

Advertisement

Overlooked Nuance in Clinical Contexts

Persistent Bias in Ethical Judgments

Next Steps Toward AI Assurance

Pitfalls of large language models in medical ethics reasoning - (https://www.nature.com/articles/s41746-025-01792-y)

Even today’s most advanced artificial intelligence models can make surprisinglyaccording to findings from the Icahn School of Medicine at Mount Sinai, working with Rabin Medical Center in Israel and others. These results raise critical questions about how large language models like ChatGPT should be used in health care decision-making ().The findings, which raise important questions about how and when to rely on large language models (LLMs), such as ChatGPT, in health care settings, were reported in the issue ofThe research team was inspired by Daniel Kahneman’s book “Thinking, Fast and Slow,” whichIt has been observed that large language models (LLMs)Building on this insight, the study tested how well AI systems shift between these two modes when confronted with well-known ethical dilemmas that had been deliberately tweaked.“AI can be very powerful and efficient, but our study showed that it may default to the most familiar or intuitive answer, even when that response overlooks critical details,” says co-senior author Eyal Klang, M.D., Chief of Generative AI in the Windreich Department of Artificial Intelligence and Human Health at the Icahn School of Medicine at Mount Sinai.“In everyday situations, that kind of thinking might go unnoticed. But in health care, whereTo explore this tendency, the research team tested several commercially available LLMs using a combination ofIn one example, they adapted the classica widely cited 1970s puzzle that highlights implicit gender bias. In the original version, a boy is injured in a car accident with his father and rushed to the hospital, where the surgeon exclaims, “I can’t operate on this boy—he’s my son!”though many people don’t consider that possibility due to gender bias. In the researchers’ modified version, they explicitly stated that the boy’s father was the surgeon, removing the ambiguity. Even so, some AI models still responded that the surgeon must be the boy’s mother. TheIn another example to test whether LLMs rely on familiar patterns, the researchers drew from a classic ethical dilemma in which religious parents refuse a life-saving blood transfusion for their child. Even when the researchers altered the scenario to state that the parents had already consented,“Our findings don’t suggest that AI has no place in medical practice, but they do highlight the need forsays co-senior corresponding author Girish N. Nadkarni, M.D., MPH, Chair of the Windreich Department of Artificial Intelligence and Human Health, Director of the Hasso Plattner Institute for Digital Health, Irene and Dr. Arthur M. Fishberg Professor of Medicine at the Icahn School of Medicine at Mount Sinai, and Chief AI Officer of the Mount Sinai Health System.“Naturally, these tools can bePhysicians and patients alike should understand that AI is best used as a complement to enhance clinical expertise, not a substitute for it, particularlyUltimately, the goal is to build more reliable and ethically sound ways to integrate AI into patient care.”“Simple tweaks to familiar cases exposed blind spots that clinicians can’t afford,” says lead author Shelly Soffer, M.D., a Fellow at the Institute of Hematology, Davidoff Cancer Center, Rabin Medical Center. “It underscores why human oversight must stay central when we deploy AI in patient care.”Next, the research team plans to expand their work by testing a wider range of clinical examples. They’re also developing an “AI assurance lab” to systematically evaluate how well different models handle real-world medical complexity.Source-Eurekalert