Reconciling how clinical reasoning is learned in the age of artificial intelligence (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41746-026-02873-2.pdf

Reconciling how clinical reasoning is learned in the age of artificial intelligence

npj | digital medicine Editorial Published in partnership with Seoul National University Bundang Hospital https://doi.org/10.1038/s41746-026-02873-2 Reconciling how clinical reasoning is learned in the age of artiﬁcial intelligence 1234567890():,; 1234567890():,; Check for updates Two recent studies examine the role of AI in supporting medical students’ learning – a longitudinal study of selfreported outcomes from supervised real-world AI use, and a simulated experiment comparing the beneﬁts of AI explanations against the risks of plausible misinformation. Collectively, their ﬁndings demonstrate that the effects of AI may depend on the individual and educational environment, which has implications for how we design medical curricula and deﬁne clinical competence. A rtiﬁcial intelligence (AI) is becoming increasingly ubiquitous, both in daily life and in healthcare settings. Narrow AI health technologies are gradually being introduced into clinical practice across multiple specialties1,2. Since ChatGPT was publicly released in November 2022, large language models (LLMs) have also captured the collective imagination, resulting in ambient scribes, specialized clinician-facing platforms that ground LLM-generated responses in the peer-reviewed literature (e.g. OpenEvidence), and a growing ecosystem of innovative applications. Ease of access also means that some of these tools are increasingly being used as personalized “pocket tutors” without supervision or institutional scaffolding3,4. Recent reports have highlighted the risks of “deskilling” for experienced clinicians following the introduction of AI, while cautioning against the twin threats of “never skilling” and “misskilling” for the next generation5–7. This appears superﬁcially analogous to previous moral panics about new technologies, such as the introduction of calculators in schools in the 1960s, which drove fears about destroying students’ foundational mathematical skills and inducing intellectual laziness. In practice, these fears proved to be unfounded, with educational systems adapting to preserve students’ core competencies while shifting the emphasis towards higher-order reasoning. However, there are concerns that this analogy is imperfect rather than reassuring, npj Digital Medicine | (2026)9:435 because of how contemporary AI systems are increasingly encroaching on the domains of judgement, interpretation, and communication that are central to clinical expertise. Reconciling these competing viewpoints will become increasingly essential as the pace of AI development and clinical integration continues to accelerate. This is particularly vital in healthcare settings, where the consequences of impaired clinical reasoning extend far beyond the learner. AI as a cognitive scaffold versus the plausibility trap In this editorial, we discuss two distinct but complementary papers that add to this debate. One introduces AI as a useful “cognitive scaffold”, while the other cautions against the “plausibility trap” induced by convincing misinformation. While their objectives differ, both share the same central question: does engaging with AI improve or impair how medical students learn clinical reasoning? Xin et al. present a longitudinal surveybased study of 372 senior medical students across 12 months of clinical rotations, in which they utilized AI tools integrated into the clinical workﬂow to support diagnostic decision-making under the supervision of senior clinicians8. These tools included both computer-aided tools for radiological image interpretation, as well as electronic health record-based clinical decision support systems. The authors studied temporal associations between engagement in AI-assisted diagnosis, AI literacy, and critical thinking in medical contexts (measured by selfreported surveys) using cross-lagged panel models, a statistical method for inferring reciprocal causal effects between variables over time9,10. Teng et al. conducted a randomized controlled trial of 111 pre-clinical students, with the aim of simulating exposure to plausiblesounding errors to evaluate students’ susceptibility to LLM-generated misinformation11. Diagnostic performance on 25 USMLE clinical vignette multiple choice questions was tested across three study arms – control (no LLM assistance), reliable LLM assistance, and authoritative-but-ﬂawed LLM assistance. Conﬁdence calibration (the alignment between students’ conﬁdence in their performance and their actual performance) was also measured. Xin et al. found that higher engagement with AI assistance was prospectively associated with higher AI literacy, which was in turn associated with higher critical thinking scores8. In contrast, Teng et al. reported that correct AI explanations, despite being rigorously veriﬁed and medically accurate, produced no improvement, and that misleading but plausible explanations signiﬁcantly degraded diagnostic accuracy relative to the control group11. In addition, while students’ conﬁdence rose with LLM assistance, the group receiving misleading explanations tended to be conﬁdent whether they were correct or not. Why the ﬁndings may not be contradictory When read in isolation, these studies might appear somewhat contradictory. One found that AI engagement supports the development of cognitive skills to critically appraise AI outputs over time. The other showed that accurate AI outputs did not improve diagnostic accuracy, and that misinformation undermines both diagnostic reasoning and internal uncertainty signals that prompt the student to seek advice. The difference may perhaps lie in the conditions of engagement. Supervision is the most obvious point of reconciliation – perhaps AI was beneﬁcial in the ﬁrst study because of supervision from experienced clinicians, whereas no supervision occurred in the second? In the context of AI use, supervision has been theorised to support the development of core reasoning skills rather than black-and-white thinking12. However, the beneﬁts only accounted for 38% of the total longitudinal association between AI engagement, AI literacy, and critical thinking in this study, and was attenuated for students with limited AI experience or those who lacked performance-oriented goals, suggesting that while supervision provides the scaffolding within which active interrogation of AI outputs can occur, this does not sufﬁce in itself. Mode of engagement might offer a more nuanced explanation. The ﬁrst study provided a safe framework wherein students could learn to actively work through cases with AI support 1 npj | digital medicine under supervision as part of a real-world diagnostic workﬂow. In contrast, the second reﬂects more artiﬁcial test-taking scenarios. Their ﬁnding that even expert-veriﬁed LLM outputs did not provide any beneﬁt is consistent with recent experimental evidence – learners risk developing shallower knowledge when presented with ﬂuent pre-digested answers which preclude active l (...truncated)