Development and external validation of a pretrained deep learning model for the prediction of non-accidental trauma
www.nature.com/npjdigitalmed
ARTICLE
OPEN
Development and external validation of a pretrained deep
learning model for the prediction of non-accidental trauma
1234567890():,;
David Huang1, Steven Cogill2, Renee Y. Hsia3, Samuel Yang4 and David Kim
4✉
Non-accidental trauma (NAT) is deadly and difficult to predict. Transformer models pretrained on large datasets have recently
produced state of the art performance on diverse prediction tasks, but the optimal pretraining strategies for diagnostic predictions
are not known. Here we report the development and external validation of Pretrained and Adapted BERT for Longitudinal
Outcomes (PABLO), a transformer-based deep learning model with multitask clinical pretraining, to identify patients who will
receive a diagnosis of NAT in the next year. We develop a clinical interface to visualize patient trajectories, model predictions, and
individual risk factors. In two comprehensive statewide databases, approximately 1% of patients experience NAT within one year of
prediction. PABLO predicts NAT events with area under the receiver operating characteristic curve (AUROC) of 0.844 (95% CI
0.838–0.851) in the California test set, and 0.849 (95% CI 0.846–0.851) on external validation in Florida, outperforming comparator
models. Multitask pretraining significantly improves model performance. Attribution analysis shows substance use, psychiatric, and
injury diagnoses, in the context of age and racial demographics, as influential predictors of NAT. As a clinical decision support
system, PABLO can identify high-risk patients and patient-specific risk factors, which can be used to target secondary screening and
preventive interventions at the point-of-care.
npj Digital Medicine (2023)6:131 ; https://doi.org/10.1038/s41746-023-00875-y
INTRODUCTION
Non-accidental trauma (NAT) comprises a heterogeneous set of
diagnoses, including many forms of assault, abuse, maltreatment,
and neglect. NAT is a leading cause of injury and death,
particularly for children and adolescents1,2, pregnant and postpartum women3,4, and disadvantaged social groups5,6. NAT is also
difficult to predict, due to heterogeneity of presentation, complex
and rapidly changing epidemiology7, and concentration in
historically understudied populations8. Screening for NAT during
clinical encounters is not routine, and may be associated with
both provider and patient discomfort9. Therefore, the effects of
routine screening for NAT have not been well-demonstrated,
except in special populations10.
An automated screening algorithm using existing electronic
health record (EHR) data could enable universal screening for
high-risk patients without requiring additional clinical resources or
imposing a social burden. A limited number of studies have
evaluated the predictability of NAT. Most research has been
restricted to identification of broad risk factors, such as psychiatric
illness and prior involvement in violence11,12. Prior research has
also associated NAT with specific clinical contexts, as in children
with fracture patterns suggestive of NAT13. One previous
modeling study evaluated a Bayesian classifier for predicting a
variety of NAT diagnoses14. This approach used a flattened
representation of patient histories, precluding use of information
about the sequence and tempo of visits. More generally, prior
studies of NAT prediction have not shown external validation15, an
important assessment of the ability of a model to generalize to
different patient populations and clinical environments.
Deep learning models based on transformer architectures16
have achieved state-of-the-art performance in multiple
domains17,18, including disease prognostication19,20. Bidirectional
Encoder Representations from Transformers21 (BERT) is a popular
architecture that can effectively learn long-range patterns from
sequence data, making it promising for the prediction of
longitudinal outcomes. Such models benefit from general
pretraining on massive datasets, which improves performance
when the model is subsequently fine-tuned on a specific
prediction task21. However, previous studies of disease prediction
have focused primarily on generic pretraining methods such as
masked language modeling (MLM)19 and contrastive learning22,
which are not specifically tailored to the characteristics of medical
trajectories. One study used a domain-adapted pretraining task for
predicting prolonged length-of-stay20, but not specific diagnoses
or outcomes.
Our goal is to develop a clinically relevant prediction framework
for NAT and other challenging diagnoses, using longitudinal
patient trajectories containing patient demographics, diagnoses,
and procedures. Using statewide data on millions of encounters,
we pretrain a BERT-based model adapted for longitudinal
diagnostic prediction with a multitask pretraining objective. By
flexibly predicting the temporal relationships of diagnoses, we aim
to develop a generally applicable base model for diagnostic
forecasting. We fine-tune and externally validate our model for the
prediction of NAT. We compare model performance to a
traditional machine learning algorithm and a previously published
BERT-based model. We develop an interactive clinical interface for
understanding model predictions and individual risk factors,
which can be implemented as a clinical decision support system.
RESULTS
Study overview
We developed Pretrained and Adapted BERT for Longitudinal
Outcomes (PABLO), which we fine-tuned and validated for the
prediction of NAT within one year. Cohort creation for pretraining,
1
Department of Computer Science, Stanford University, Stanford, CA, USA. 2Department of Veterans Affairs, Seattle, WA, USA. 3Department of Emergency Medicine, UCSF School
of Medicine, San Francisco, CA, USA. 4Department of Emergency Medicine, Stanford University, Stanford, CA, USA. ✉email:
Published in partnership with Seoul National University Bundang Hospital
D. Huang et al.
1234567890():,;
2
Fig. 1 Cohort creation. Our CA dataset (left) was divided into development and test splits at a 95:5 ratio for pretraining and a 9:1 ratio for
finetuning using random sampling at the patient level. For pretraining, we included trajectories with two or more visits. For finetuning, we
included trajectories with three or more visits. We also created a CA test dataset for “first NAT” that excluded trajectories with previous NAT
diagnoses. Our FL external validation datasets (right) were created with the same inclusion and exclusion criteria as the CA test datasets.
CA = California. FL = Florida. ED = Emergency Department. NAT = non-accidental trauma.
development, test, and external validation datasets is shown in
Fig. 1. Cohort characteristics are reported in Table 1. Figure 2
summarizes the modeling approach, which encodes the longitudinal structure of patients’ medical trajectories.
Prediction of non-accidental trauma
In each cohort, 1% of patients experienced NAT within one year of
prediction (Table 1). PABLO pre (...truncated)