A multi-center study on the adaptability of a shared foundation model for electronic health records (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41746-024-01166-w.pdf

A multi-center study on the adaptability of a shared foundation model for electronic health records

npj | digital medicine Article Published in partnership with Seoul National University Bundang Hospital https://doi.org/10.1038/s41746-024-01166-w A multi-center study on the adaptability of a shared foundation model for electronic health records Check for updates 1,7 2,7 2 1234567890():,; 1234567890():,; Lin Lawrence Guo , Jason Fries , Ethan Steinberg , Scott Lanyon Fleming Catherine Aftandilian4, Jose Posada 5, Nigam Shah 2,8 & Lillian Sung 1,6,8 2 , Keith Morse 3 , Foundation models are transforming artiﬁcial intelligence (AI) in healthcare by providing modular components adaptable for various downstream tasks, making AI development more scalable and cost-effective. Foundation models for structured electronic health records (EHR), trained on coded medical records from millions of patients, demonstrated beneﬁts including increased performance with fewer training labels, and improved robustness to distribution shifts. However, questions remain on the feasibility of sharing these models across hospitals and their performance in local tasks. This multi-center study examined the adaptability of a publicly accessible structured EHR foundation model (FMSM), trained on 2.57 M patient records from Stanford Medicine. Experiments used EHR data from The Hospital for Sick Children (SickKids) and Medical Information Mart for Intensive Care (MIMICIV). We assessed both adaptability via continued pretraining on local data, and task adaptability compared to baselines of locally training models from scratch, including a local foundation model. Evaluations on 8 clinical prediction tasks showed that adapting the off-the-shelf FMSM matched the performance of gradient boosting machines (GBM) locally trained on all data while providing a 13% improvement in settings with few task-speciﬁc training labels. Continued pretraining on local data showed FMSM required fewer than 1% of training examples to match the fully trained GBM’s performance, and was 60 to 90% more sample-efﬁcient than training local foundation models from scratch. Our ﬁndings demonstrate that adapting EHR foundation models across hospitals provides improved prediction performance at less cost, underscoring the utility of base foundation models as modular components to streamline the development of healthcare AI. Foundation models1, large-scale artiﬁcial intelligence (AI) models trained on massive amounts of unlabeled data using self-supervised learning, mark a paradigm shift for healthcare AI by moving away from bespoke, singlepurpose models to generalist and more easily adaptable medical AI2. Foundation models open new opportunities to improve diagnostic and predictive capabilities, enable proactive interventions and improve patient care using a range of modalities including natural language3,4, imaging5, genomics6,7, and structured data from electronic health records (EHRs)8–11. Structured EHR foundation models, trained on tabular, timestamped event data for procedures, diagnoses, medications, and lab values as examples, offer distinct representational abilities over other modalities by focusing on encoding patients’ longitudinal medical history. This enables generating feature representations that summarize a patient’s entire medical history up to a speciﬁc time point, facilitating downstream tasks such as risk stratiﬁcation and time-to-event modeling. Recent EHR foundation models report state-of-the-art accuracy, require fewer labeled examples for task adaptation, and have demonstrated improved robustness to distribution shifts across time and patient subpopulations12,13. With model hubs (centralized repositories for pretrained model weights) playing a key role in modern AI development, 1 Program in Child Health Evaluative Sciences, The Hospital for Sick Children, Toronto, ON, Canada. 2Stanford Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA. 3Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University, Palo Alto, CA, USA. 4Division of Hematology/Oncology, Department of Pediatrics, Stanford University, Palo Alto, CA, USA. 5Universidad del Norte, Barranquilla, Colombia. 6Division of Haematology/Oncology, The Hospital for Sick Children, Toronto, ON, Canada. 7These authors contributed equally: Lin Lawrence Guo, Jason Fries.8These authors jointly e-mail: supervised this work: Nigam Shah, Lillian Sung. npj Digital Medicine | (2024)7:171 1 Article https://doi.org/10.1038/s41746-024-01166-w sharing EHR foundation models across sites offers many practical advantages by providing a less expensive route for local hospitals to adapt a foundation model for their speciﬁc needs. More importantly, key properties of foundation models, such as their skills, domain knowledge, and biases, are highly dependent on the speciﬁc data used for pretraining14,15. Since largescale EHR datasets (>1 million patients) are challenging to obtain for most researchers, sharing EHR foundation model weights becomes critical to advancing research into mitigating biases, improving robustness, and other properties intrinsic to a speciﬁc set of pretrained model weights. Finally, given recent arguments for regulatory oversight and quality assurance of healthcare AI models by public-private entities16, access to foundation model weights that have undergone some certiﬁcation process may become a prerequisite for model deployment. Adapting and improving existing foundation models (rather than pretraining from scratch) is the predominant workﬂow in domains such as NLP and computer vision. However, the absence of public structured EHR foundation models has hampered similar progress in EHR settings17. This creates challenges in advancing label/sample efﬁciency, few-shot learning, and general methods to improving EHR foundation models without access to the original pretraining data18. For example, work in other modalities has found that pretraining on large-scale, heterogeneous data generally improves robustness19 and that continued pretraining of existing models using in-domain data further improves performance in a target domain20. This offers a promising route to improving existing EHR foundation models at local hospitals but introduces potential challenges around catastrophic forgetting and other issues that have been underexplored due to the lack of large-scale, shared EHR models. Although there is a growing body of work evaluating pretrained models across different hospital systems (GenHPF21, TransformEHR22) and transfer from EHR data to insurance claims (Med-BERT9), prior studies have focused on private foundation models, pretrained from scratch, and the role architectural choices play in transfer learning performance in downstream task adaptation. There has been limited exploration of label efﬁciency in EHR settings, where encoder-only/BERT-style models perform poorly on few-shot tasks. For example, Med-BERT required an average of 200–1000 training examples per adapted task (...truncated)