Reconciling how clinical reasoning is learned in the age of artificial intelligence

npj Digital Medicine, Jun 2026

Ariel Yuhan Ong, Margaret Sui, Kyra L. Rosen, Joseph C. Kvedar

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41746-026-02873-2.pdf

Reconciling how clinical reasoning is learned in the age of artificial intelligence

npj | digital medicine Editorial Published in partnership with Seoul National University Bundang Hospital https://doi.org/10.1038/s41746-026-02873-2 Reconciling how clinical reasoning is learned in the age of artificial intelligence 1234567890():,; 1234567890():,; Check for updates Two recent studies examine the role of AI in supporting medical students’ learning – a longitudinal study of selfreported outcomes from supervised real-world AI use, and a simulated experiment comparing the benefits of AI explanations against the risks of plausible misinformation. Collectively, their findings demonstrate that the effects of AI may depend on the individual and educational environment, which has implications for how we design medical curricula and define clinical competence. A rtificial intelligence (AI) is becoming increasingly ubiquitous, both in daily life and in healthcare settings. Narrow AI health technologies are gradually being introduced into clinical practice across multiple specialties1,2. Since ChatGPT was publicly released in November 2022, large language models (LLMs) have also captured the collective imagination, resulting in ambient scribes, specialized clinician-facing platforms that ground LLM-generated responses in the peer-reviewed literature (e.g. OpenEvidence), and a growing ecosystem of innovative applications. Ease of access also means that some of these tools are increasingly being used as personalized “pocket tutors” without supervision or institutional scaffolding3,4. Recent reports have highlighted the risks of “deskilling” for experienced clinicians following the introduction of AI, while cautioning against the twin threats of “never skilling” and “misskilling” for the next generation5–7. This appears superficially analogous to previous moral panics about new technologies, such as the introduction of calculators in schools in the 1960s, which drove fears about destroying students’ foundational mathematical skills and inducing intellectual laziness. In practice, these fears proved to be unfounded, with educational systems adapting to preserve students’ core competencies while shifting the emphasis towards higher-order reasoning. However, there are concerns that this analogy is imperfect rather than reassuring, npj Digital Medicine | (2026)9:435 because of how contemporary AI systems are increasingly encroaching on the domains of judgement, interpretation, and communication that are central to clinical expertise. Reconciling these competing viewpoints will become increasingly essential as the pace of AI development and clinical integration continues to accelerate. This is particularly vital in healthcare settings, where the consequences of impaired clinical reasoning extend far beyond the learner. AI as a cognitive scaffold versus the plausibility trap In this editorial, we discuss two distinct but complementary papers that add to this debate. One introduces AI as a useful “cognitive scaffold”, while the other cautions against the “plausibility trap” induced by convincing misinformation. While their objectives differ, both share the same central question: does engaging with AI improve or impair how medical students learn clinical reasoning? Xin et al. present a longitudinal surveybased study of 372 senior medical students across 12 months of clinical rotations, in which they utilized AI tools integrated into the clinical workflow to support diagnostic decision-making under the supervision of senior clinicians8. These tools included both computer-aided tools for radiological image interpretation, as well as electronic health record-based clinical decision support systems. The authors studied temporal associations between engagement in AI-assisted diagnosis, AI literacy, and critical thinking in medical contexts (measured by selfreported surveys) using cross-lagged panel models, a statistical method for inferring reciprocal causal effects between variables over time9,10. Teng et al. conducted a randomized controlled trial of 111 pre-clinical students, with the aim of simulating exposure to plausiblesounding errors to evaluate students’ susceptibility to LLM-generated misinformation11. Diagnostic performance on 25 USMLE clinical vignette multiple choice questions was tested across three study arms – control (no LLM assistance), reliable LLM assistance, and authoritative-but-flawed LLM assistance. Confidence calibration (the alignment between students’ confidence in their performance and their actual performance) was also measured. Xin et al. found that higher engagement with AI assistance was prospectively associated with higher AI literacy, which was in turn associated with higher critical thinking scores8. In contrast, Teng et al. reported that correct AI explanations, despite being rigorously verified and medically accurate, produced no improvement, and that misleading but plausible explanations significantly degraded diagnostic accuracy relative to the control group11. In addition, while students’ confidence rose with LLM assistance, the group receiving misleading explanations tended to be confident whether they were correct or not. Why the findings may not be contradictory When read in isolation, these studies might appear somewhat contradictory. One found that AI engagement supports the development of cognitive skills to critically appraise AI outputs over time. The other showed that accurate AI outputs did not improve diagnostic accuracy, and that misinformation undermines both diagnostic reasoning and internal uncertainty signals that prompt the student to seek advice. The difference may perhaps lie in the conditions of engagement. Supervision is the most obvious point of reconciliation – perhaps AI was beneficial in the first study because of supervision from experienced clinicians, whereas no supervision occurred in the second? In the context of AI use, supervision has been theorised to support the development of core reasoning skills rather than black-and-white thinking12. However, the benefits only accounted for 38% of the total longitudinal association between AI engagement, AI literacy, and critical thinking in this study, and was attenuated for students with limited AI experience or those who lacked performance-oriented goals, suggesting that while supervision provides the scaffolding within which active interrogation of AI outputs can occur, this does not suffice in itself. Mode of engagement might offer a more nuanced explanation. The first study provided a safe framework wherein students could learn to actively work through cases with AI support 1 npj | digital medicine under supervision as part of a real-world diagnostic workflow. In contrast, the second reflects more artificial test-taking scenarios. Their finding that even expert-verified LLM outputs did not provide any benefit is consistent with recent experimental evidence – learners risk developing shallower knowledge when presented with fluent pre-digested answers which preclude active l (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41746-026-02873-2.pdf
Article home page: https://www.nature.com/articles/s41746-026-02873-2

Ariel Yuhan Ong, Margaret Sui, Kyra L. Rosen, Joseph C. Kvedar. Reconciling how clinical reasoning is learned in the age of artificial intelligence, npj Digital Medicine, 2026, DOI: 10.1038/s41746-026-02873-2