Reconciling how clinical reasoning is learned in the age of artificial intelligence
npj | digital medicine
Editorial
Published in partnership with Seoul National University Bundang Hospital
https://doi.org/10.1038/s41746-026-02873-2
Reconciling how clinical reasoning is learned
in the age of artificial intelligence
1234567890():,;
1234567890():,;
Check for updates
Two recent studies examine the role
of AI in supporting medical students’
learning – a longitudinal study of selfreported outcomes from supervised
real-world AI use, and a simulated
experiment comparing the benefits of
AI explanations against the risks of
plausible misinformation. Collectively,
their findings demonstrate that the
effects of AI may depend on the
individual and educational
environment, which has implications
for how we design medical curricula
and define clinical competence.
A
rtificial intelligence (AI) is becoming
increasingly ubiquitous, both in daily
life and in healthcare settings. Narrow
AI health technologies are gradually
being introduced into clinical practice across
multiple specialties1,2. Since ChatGPT was
publicly released in November 2022, large language models (LLMs) have also captured the
collective imagination, resulting in ambient
scribes, specialized clinician-facing platforms
that ground LLM-generated responses in the
peer-reviewed literature (e.g. OpenEvidence),
and a growing ecosystem of innovative applications. Ease of access also means that some of
these tools are increasingly being used as personalized “pocket tutors” without supervision
or institutional scaffolding3,4.
Recent reports have highlighted the risks of
“deskilling” for experienced clinicians following
the introduction of AI, while cautioning against
the twin threats of “never skilling” and “misskilling” for the next generation5–7. This appears
superficially analogous to previous moral panics
about new technologies, such as the introduction
of calculators in schools in the 1960s, which drove
fears about destroying students’ foundational
mathematical skills and inducing intellectual
laziness. In practice, these fears proved to be
unfounded, with educational systems adapting to
preserve students’ core competencies while
shifting the emphasis towards higher-order reasoning. However, there are concerns that this
analogy is imperfect rather than reassuring,
npj Digital Medicine | (2026)9:435
because of how contemporary AI systems are
increasingly encroaching on the domains of
judgement, interpretation, and communication
that are central to clinical expertise. Reconciling
these competing viewpoints will become
increasingly essential as the pace of AI development and clinical integration continues to accelerate. This is particularly vital in healthcare
settings, where the consequences of impaired
clinical reasoning extend far beyond the learner.
AI as a cognitive scaffold versus the
plausibility trap
In this editorial, we discuss two distinct but
complementary papers that add to this debate.
One introduces AI as a useful “cognitive scaffold”,
while the other cautions against the “plausibility
trap” induced by convincing misinformation.
While their objectives differ, both share the same
central question: does engaging with AI improve
or impair how medical students learn clinical
reasoning?
Xin et al. present a longitudinal surveybased study of 372 senior medical students
across 12 months of clinical rotations, in
which they utilized AI tools integrated into the
clinical workflow to support diagnostic
decision-making under the supervision of
senior clinicians8. These tools included both
computer-aided tools for radiological image
interpretation, as well as electronic health
record-based clinical decision support systems. The authors studied temporal associations between engagement in AI-assisted
diagnosis, AI literacy, and critical thinking
in medical contexts (measured by selfreported surveys) using cross-lagged panel
models, a statistical method for inferring
reciprocal causal effects between variables
over time9,10.
Teng et al. conducted a randomized controlled trial of 111 pre-clinical students, with the
aim of simulating exposure to plausiblesounding errors to evaluate students’ susceptibility to LLM-generated misinformation11.
Diagnostic performance on 25 USMLE clinical
vignette multiple choice questions was tested
across three study arms – control (no LLM
assistance), reliable LLM assistance, and
authoritative-but-flawed
LLM
assistance.
Confidence calibration (the alignment between
students’ confidence in their performance and
their actual performance) was also measured.
Xin et al. found that higher engagement with
AI assistance was prospectively associated with
higher AI literacy, which was in turn associated
with higher critical thinking scores8. In contrast,
Teng et al. reported that correct AI explanations,
despite being rigorously verified and medically
accurate, produced no improvement, and that
misleading but plausible explanations significantly degraded diagnostic accuracy relative
to the control group11. In addition, while students’
confidence rose with LLM assistance, the group
receiving misleading explanations tended to be
confident whether they were correct or not.
Why the findings may not be
contradictory
When read in isolation, these studies might
appear somewhat contradictory. One found that
AI engagement supports the development of
cognitive skills to critically appraise AI outputs
over time. The other showed that accurate AI
outputs did not improve diagnostic accuracy, and
that misinformation undermines both diagnostic
reasoning and internal uncertainty signals that
prompt the student to seek advice.
The difference may perhaps lie in the conditions of engagement. Supervision is the most
obvious point of reconciliation – perhaps AI was
beneficial in the first study because of supervision from experienced clinicians, whereas no
supervision occurred in the second? In the
context of AI use, supervision has been theorised
to support the development of core reasoning
skills rather than black-and-white thinking12.
However, the benefits only accounted for 38% of
the total longitudinal association between AI
engagement, AI literacy, and critical thinking in
this study, and was attenuated for students with
limited AI experience or those who lacked
performance-oriented goals, suggesting that
while supervision provides the scaffolding
within which active interrogation of AI outputs
can occur, this does not suffice in itself.
Mode of engagement might offer a more
nuanced explanation. The first study provided a
safe framework wherein students could learn to
actively work through cases with AI support
1
npj | digital medicine
under supervision as part of a real-world diagnostic workflow. In contrast, the second reflects
more artificial test-taking scenarios. Their finding
that even expert-verified LLM outputs did not
provide any benefit is consistent with recent
experimental evidence – learners risk developing
shallower knowledge when presented with fluent
pre-digested answers which preclude active
l (...truncated)