Uncertainty-aware ensemble of foundation models differentiates glioblastoma from its mimics
Article
https://doi.org/10.1038/s41467-025-64249-6
Uncertainty-aware ensemble of foundation
models differentiates glioblastoma from
its mimics
Received: 23 April 2025
1234567890():,;
1234567890():,;
Accepted: 9 September 2025
Check for updates
Junhan Zhao 1,2,18, Shih-Yen Lin1,18, Raphaël Attias 1, Liza Mathews1,
Christian Engel1, Guillaume Larghero1, Dmytro Vremenko1, Ting-Wan Kao1,
Tsung-Hua Lee1, Yu-Hsuan Wang3, Cheng Che Tsai1, Eliana Marostica 1,
Ying-Chun Lo 4, David Meredith5, Keith L. Ligon 6, Omar Arnaout7,
Thomas Roetzer-Pejrimovsky 8, Shih-Chieh Lin9, Natalie NC Shih10,
Nipon Chaisuriya 4,11, David J. Cook 4, Jung-Hsien Chiang 3,
Chia-Jen Liu 1,12,13, Adelheid Woehrer 8,14, Jeffrey A. Golden15,
MacLean P. Nasrallah 10 & Kun-Hsing Yu 1,5,16,17
Accurate pathological diagnosis is crucial in guiding personalized treatments
for patients with central nervous system cancers. Distinguishing glioblastoma
and primary central nervous system lymphoma is particularly challenging due
to their overlapping pathology features, despite the distinct treatments
required. To address this challenge, we establish the Pathology Image Characterization Tool with Uncertainty-aware Rapid Evaluations (PICTURE) system
using 2141 pathology slides collected worldwide. PICTURE employs Bayesian
inference, deep ensemble, and normalizing flow to account for the uncertainties in its predictions and training set labels. PICTURE accurately diagnoses
glioblastoma and primary central nervous system lymphoma with an area
under the receiver operating characteristic curve (AUROC) of 0.989, with the
results validated in five independent cohorts (AUROC = 0.924-0.996). In
addition, PICTURE identifies samples belonging to 67 types of rare central
nervous system cancers that are neither gliomas nor lymphomas. Our
approaches provide a generalizable framework for differentiating pathological
mimics and enable rapid diagnoses for central nervous system cancer patients.
More than 86,000 patients in the U.S. are diagnosed with CNS neoplasms annually, leading to over 16,000 deaths each year1. The 2021
WHO Classification of CNS Tumors (WHO CNS5)2 identifies 109 distinct
tumor subtypes based on pathology and molecular profiles3. Because
treatments and prognoses of different CNS tumors vary
considerably4–7, obtaining accurate pathological diagnoses is critical.
Glioblastoma, the most common brain cancer in the U.S., has a dismal
median survival of 8 months1,5, and surgical resection remains the
cornerstone of initial treatment7. Notably, previous studies showed
that primary central nervous system lymphoma (PCNSL) is the cancer
A full list of affiliations appears at the end of the paper.
Nature Communications | (2025)16:8341
type most frequently misdiagnosed as glioblastoma8–12. This misclassification has important clinical implications: patients with PCNSL
have a median survival of more than three years following diagnosis
and often respond well to radiotherapy4,5. Although patients’ age,
immune status, and imaging features from magnetic resonance imaging influence clinicians’ initial diagnostic assessments, pathology
evaluation using tumor samples provides the final diagnosis6,7. When
PCNSL is diagnosed during surgery with the intent for tumor removal,
neurosurgeons will usually discontinue further surgical intervention to
preserve neurological function and refer patients for radiotherapy
e-mail:
1
Article
combined with chemotherapy7–9. In addition, final diagnosis using
formalin-fixed, paraffin-embedded (FFPE) tissue confirms tumor types
and guides long-term treatment planning2. Thus, accurate distinction
between glioblastoma and PCNSL at both intraoperative and final
diagnostic stages is therefore essential to avoid unnecessary surgery
and ensure timely initiation of appropriate therapy.
Several challenges have hindered the accurate pathological
diagnosis of CNS neoplasms1,6. The current issue in diagnosing glioblastoma and PCNSL lies in the inherent variability and uncertainty in
both frozen section and FFPE evaluations13,14. Intraoperative frozen
section diagnostics are invaluable for immediate assessment during
brain cancer surgeries. However, prior studies have reported that 9.7%
to 46.2% of frozen section diagnoses differ from the final FFPE-based
diagnoses9–11,15,16. Recent studies have reported an inter-observer disagreement rate of up to 16% in FFPE diagnoses13,14. While the definitive
diagnosis of brain cancers relies on FFPE tissue analysis, which enables
thorough evaluations of the morphological patterns observed in CNS
neoplasms, these microscopic findings across cancer types are sometimes distinct and, at other times, overlapping. For example, the glioblastoma pathology is highly variable and shares features with other
tumors, including PCNSL. Glioblastomas typically manifest as infiltrating hypercellular neoplasms with nuclear pleomorphism, microvascular proliferation, and necrosis with or without surrounding
pseudopalisading17. The neoplastic cells may be fibrillary, epithelioid,
or round cells, the latter mimicking lymphoma cells. Further complicating diagnoses, PCNSL may also exhibit nuclear pleomorphism,
necrosis, increased mitotic activity, and a perivascular propensity that
can mimic pseudopalisading18. In addition, the atypia of reactive glia in
PCNSL and infiltrating lymphoma cells within the brain parenchyma
can lead to misinterpretation18,19.
Weakly supervised machine learning applied to pathology images
has demonstrated the potential to assist cancer cell detection, subtype
classification, and prognostic prediction20. Nevertheless, current deep
learning-based approaches for neuro-oncological diagnostics remain
largely confined to radiological applications. Existing pathological
diagnostic models focus on differentiating glioma types or applying
few-shot learning techniques to rarer subtypes due to the limitation of
data availability. In addition, models trained on cohorts without sufficient diversity often experience substantial performance decay when
applied to new patient populations due to differences in sample preparation and slide scanning protocols21. Due to the morphological
heterogeneity17, previous studies showed substantial variations in AI
models’ diagnostic performance for this deadly cancer22. In addition,
standard machine learning models inevitably classify any new data
points into one of the categories they were trained with, regardless of
the nature of the new samples23. These caveats have limited the
application of AI models in cancer diagnoses24.
In this study, we present the Pathology Image Characterization
Tool with Uncertainty-aware Rapid Evaluations (PICTURE). PICTURE
leverages epistemic uncertainty quantifications25,26 to identify atypical
pathology manifestations and uses diverse pathology images presented in medical literature to guide the development of selfsupervised deep neural networks. We successfully validate the PICTURE system and show that it significantly outper (...truncated)