Can AI help doctors avoid missed diagnoses? A new study suggests yes

In some of medicine’s toughest cases, the hardest part isn’t choosing the right diagnosis. It’s thinking of it at all. Artificial intelligence may now be better at that than doctors, a new study suggests.

“We’re witnessing a really profound change in technology that will reshape medicine,” Harvard University biomedical data scientist Arjun Manrai said in an April 28 news conference.

That change is driven by advances in large language models, the same technology OpenAI’s ChatGPT is built on. New versions, called reasoning models, can work through complex problems step by step. As of 2025, 1 in 5 doctors and nurses worldwide used AI for a second opinion on complex cases, and over half want to use it for this purpose, according to a survey of more than 2,000 clinicians. But how well the technology works in a medical setting has been debated.

Manrai and colleagues tested OpenAI’s o-1 preview model on a range of medical cases, including classic sets of symptoms used in medical training as well as real-world data directly from the charts of 76 patients who visited an emergency room in Boston. Across those clinical reasoning tests, the AI model was more likely than physicians to include the correct diagnosis, or something very close to it, among its possible answers, the researchers report April 30 in Science.

Not all researchers are convinced that this means we should trust AI with our diagnoses, arguing that AI reasoning is still far from what human doctors can do. “When we say clinical reasoning, it doesn’t mean the same thing as moral reasoning,” says Arya Rao, a researcher at Harvard Medical School, who was not involved in the study. “These models have been optimized to do this kind of sequential thought that we call reasoning, but it’s not at all the same thing as how we teach medical students to reason.”

Manrai is not opposed to the critique, noting AI technology should assist rather than replace people in medical roles. “Ultimately, I think humans want humans to guide them … through challenging treatment decisions,” he said.

Is AI better at medical diagnoses?

An AI model outperforms doctors on identifying correct diagnoses

P. Brodeur et al/Science 2026P. Brodeur et al/Science 2026

Researchers looked at three methods for diagnosing patient cases: AI models built on large language models (dark blue), specialized software for determining a diagnosis (light blue) and human clinicians (brown). The AI reasoning model o1-preview outperformed them all, including the correct diagnosis in its response almost 80 percent of the time. Some of these data came from prior studies, so not all of the systems were looking at the exact same cases. But all of the systems examined some subset of a long-running series of challenging real-world patient cases published in the New England Journal of Medicine.

Still, the results show that this type of AI “works for making diagnoses in the real world,” coauthor Adam Rodman, a doctor at Beth Israel Deaconess Medical Center in Boston, said at the news conference.

He described a patient who came into the emergency room with what seemed like routine respiratory symptoms and had recently undergone an organ transplant and was immunosuppressed. The patient turned out to have a dangerous flesh-eating infection requiring surgery. “The model actually was suspicious of this [infection] from the very beginning, probably 12 to 24 hours before the human physician would have become suspicious of this,” Rodman said.

Rao applauds the team for presenting [AI] “as an extension of a physician, not a replacement.” She calls the study “rigorous and thoughtful.” However, she does not think there’s enough evidence to say that AI models have aced clinical reasoning.

Her team released a study April 13 that tested 21 AI models at each step of the process toward reaching a diagnosis. Reasoning models got the highest scores overall. But when Rao’s team drilled down to identify which parts of the diagnostic process were trickiest for AI, the researchers found a weak point that persisted from the oldest models to the newest. That’s the process of considering several different uncertain diagnoses.

AI models based on LLMs tend to jump to conclusions. “Their reasoning is brittle precisely where uncertainty and nuance matter most,” Rao and her team wrote in their paper. Their conclusion was that LLMs are not yet ready to make decisions in medical settings.

These two studies evaluated different AI models in different ways. Yet, the results aren’t as opposed as they may seem on the surface, both teams say. They agree that the next step should be more research.

Manrai’s team is planning clinical trials to help answer the question: “How do we safely and thoughtfully integrate [AI] into care?” Rao likes that approach. So many people “don’t have enough access to care,” she says. Someday, she notes, “I think AI can be a great equalizer.”

Read the full article here