Human–AI collaboration for dysphagia rehabilitation from effectiveness to implementation complexity: a systematic review
npj | digital medicine
Article
Published in partnership with Seoul National University Bundang Hospital
https://doi.org/10.1038/s41746-026-02729-9
Human–AI collaboration for dysphagia
rehabilitation from effectiveness to
implementation complexity: a
systematic review
Check for updates
1234567890():,;
1234567890():,;
Wenwen Yang, Sufang Li, Yifei Du, Mengran Chen, Funa Yang, Fan Zhang, Ji Zhao, Yanqing Li
Xiaoxia Xu
&
Oropharyngeal dysphagia affects over half of neurological and oncological populations, yet
rehabilitation is constrained by a global therapist shortage that human–AI collaboration has not
demonstrably addressed. Here we report a systematic review of 31 studies (1012 participants;
PROSPERO: CRD420251115997) evaluating AI-augmented swallowing rehabilitation in adults with
oropharyngeal dysphagia, or in healthy volunteers testing systems designed for clinical application.
We synthesised findings by aetiology and collaboration mode, assessing risk of bias and certainty of
evidence (Grading of Recommendations, Assessment, Development and Evaluation, GRADE). AIaugmented interventions produce short-term gains in functional oral intake and physiological
measures (GRADE moderate/low certainty), but these effects attenuate within weeks of cessation, and
adherence declines sharply once clinician supervision is withdrawn. NASSS framework analysis
reveals a central paradox: the adopter domain—digital literacy, cognitive impairment, interface
usability—is the dominant implementation barrier (61.3% rated high), meaning the populations with
the greatest need face the steepest barriers to adoption. AI algorithm performance is rated at very low
certainty, with validation largely confined to healthy volunteers. These findings support advancement
to pragmatic trials for supervised post-stroke rehabilitation but underscore that evidence for other
aetiologies, unsupervised settings, and sustained outcomes remains insufficient.
Safe swallowing requires sub-second coordination of over thirty craniocervical muscle pairs1,2—yet its rehabilitation remains critically underresourced worldwide. Oropharyngeal dysphagia affects up to 30% of
community-dwelling older adults and exceeds 50% in most neurological
and oncological populations studied, including stroke, Parkinson’s disease,
age-related frailty and head and neck cancer (HNC)3–8. Regardless of
aetiology, dysphagia independently raises the risk of aspiration pneumonia,
malnutrition, and death—post-stroke dysphagia alone confers a more than
fourfold increase in pneumonia risk9,10, and aspiration pneumonia remains
the leading cause of death in advanced Parkinson’s disease11,12. Because
swallowing dysfunction frequently persists or progresses beyond the acute
phase13,14, rehabilitation needs are chronic and escalating—accounting for
an estimated US$4.3–7.1 billion in excess dysphagia-related inpatient costs
per year in the United States15,16, a burden set to intensify as the global
population aged 60 and older approaches 2.1 billion by 205017. Intensive
swallowing rehabilitation promotes neuroplastic recovery and functional
improvement18,19, but delivering it at adequate intensity hinges on a specialist workforce whose numbers fall far short of global need—and the
deficit is widening.
In the United States, speech-language pathologist employment is
projected to grow by 15% between 2024 and 2034—well above the occupational average—yet still fall short of demand20. In low- and middleincome countries the deficit is orders of magnitude larger: the WHO estimates fewer than ten skilled rehabilitation practitioners per million population, and only 17% of low-income countries have even one
speech–language therapist per million21,22. The consequence is that a large
proportion of patients worldwide cannot access swallowing rehabilitation at
sufficient intensity. Technology may help close this gap—but only if it
augments, rather than supplants, the clinical expertise on which safe practice
depends.
The Affiliated Cancer Hospital of Zhengzhou University & Henan Cancer Hospital, Zhengzhou, China.
npj Digital Medicine | (2026)9:404
e-mail: ;
1
https://doi.org/10.1038/s41746-026-02729-9
Article
Fig. 1 | NASSS framework-based complexity assessment of implementation
barriers to human–AI collaborative dysphagia intervention. Barriers are mapped
onto seven NASSS domains: D1 (Condition), D2 (Technology), D3 (Value Proposition), D4a (Adopters: Patients), D4b (Adopters: Therapists), and D5–7 (Organisation, Wider System, and Embedding). Surrounding panels detail domain-specific
barriers synthesised from the included studies. Directional annotations between
domains indicate cross-domain cascading interactions. Solid arrows denote primary
influence pathways; dashed curved arrows denote cross-domain interactions; the
dashed boundary at the bottom indicates additional complexity amplification in
low- and middle-income country contexts.
Human–AI collaboration in rehabilitation embodies this principle,
positioning AI not as an autonomous decision-maker but as computational
support—real-time physiological monitoring, pattern recognition across
multiple signal streams, and individualised dosage adaptation—while
clinicians retain contextual judgement, oversight of aspiration risk, and the
therapeutic relationship23–25. We define a human–AI collaborative rehabilitation system as one integrating: at least one AI-enhanced component,
such as adaptive parameter adjustment, multi-parameter pattern recognition, personalised protocol generation, or algorithm-driven real-time
feedback and risk alerting; and at least one form of human clinical involvement, such as treatment plan formulation or approval, intervention
supervision, parameter adjustment, or exception management. This definition spans a spectrum of collaboration intensity, from continuous clinician oversight with AI-augmented assessment to semi-autonomous AI
operation under periodic clinical review. Swallowing lends itself to such
collaboration: it generates multimodal physiological signals—electromyographic activity, lingual pressure, laryngeal excursion, deglutition
acoustics26–28—that encode the rapid biomechanical sequences governing
airway protection and bolus transit. These sequences unfold on millisecond
timescales, beyond the reach of unaided clinical observation2 but amenable
to computational analysis. Preliminary validation indicates that systems
built on these signals achieve acceptable detection accuracy in specific
populations and improve training precision in controlled settings29–31.
Whether these early capabilities translate into real-world clinical
benefit remains unclear. Systematic reviews show that specific swallowing
interventions yield favourable group-level effects on impairment4,32–34, yet a
Cochrane review—assessing functional endpoints—found no demonstrable
reduction in mortality or long-term disability, with substantial interindividual response heterogeneity that conventional clinic (...truncated)