A whole-slide foundation model for digital pathology from real-world data
Article
A whole-slide foundation model for digital
pathology from real-world data
https://doi.org/10.1038/s41586-024-07441-w
Received: 30 November 2023
Accepted: 19 April 2024
Published online: 22 May 2024
Hanwen Xu1,2,7, Naoto Usuyama1,7, Jaspreet Bagga1, Sheng Zhang1, Rajesh Rao1,
Tristan Naumann1, Cliff Wong1, Zelalem Gero1, Javier González1, Yu Gu1, Yanbo Xu1, Mu Wei1,
Wenhui Wang1, Shuming Ma1, Furu Wei1, Jianwei Yang1, Chunyuan Li1, Jianfeng Gao1,
Jaylen Rosemon3, Tucker Bower3, Soohee Lee4, Roshanthi Weerasinghe4, Bill J. Wright4,
Ari Robicsek4, Brian Piening3,5, Carlo Bifulco3,5 ✉, Sheng Wang2,6 ✉ & Hoifung Poon1 ✉
Open access
Check for updates
Digital pathology poses unique computational challenges, as a standard gigapixel
slide may comprise tens of thousands of image tiles1–3. Prior models have often
resorted to subsampling a small portion of tiles for each slide, thus missing the
important slide-level context4. Here we present Prov-GigaPath, a whole-slide
pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image
tiles in 171,189 whole slides from Providence, a large US health network comprising
28 cancer centres. The slides originated from more than 30,000 patients covering 31
major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision
transformer architecture for pretraining gigapixel pathology slides. To scale
GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath
adapts the newly developed LongNet5 method to digital pathology. To evaluate
Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer
subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With
large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains
state-of-the-art performance on 25 out of 26 tasks, with significant improvement
over the second-best method on 18 tasks. We further demonstrate the potential of
Prov-GigaPath on vision–language pretraining for pathology7,8 by incorporating
the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model
that achieves state-of-the-art performance on various digital pathology tasks,
demonstrating the importance of real-world data and whole-slide modelling.
Computational pathology has the potential to transform cancer diagnostics by empowering diverse clinical applications, including cancer subtyping2,9,10, cancer staging1,11–13, diagnostic prediction14–17 and
prognostic prediction18–23. Despite the encouraging performance of
existing computational approaches, these are often developed for a
specific application and require a large amount of annotated data for
supervised learning. Data annotation is expensive and time-consuming
and has emerged as an important bottleneck for computational pathology. Recently, self-supervised learning has shown promising results in
leveraging unlabelled data to pretrain a foundation model, which can
substantially reduce the demand for task-specific annotations24–28.
Owing to their strong generalizability, foundation models have been
developed for biomedical domains where labelled data are scarce but
unlabelled data are abundant, a situation that aptly describes computational pathology29–33.
There are three major challenges that hinder the development and
use of pathology foundation models for real-world clinical applications. First, publicly available pathology data are relatively scarce and
of varying quality, which limits the performance of foundation models
pretrained on such data. For example, existing pathology foundation
models were mainly pretrained on whole-slide images (WSIs) from The
Cancer Genome Atlas (TCGA), an expert-curated dataset comprising
approximately 30,000 slides and 208 million image tiles. Although
they are a tremendous resource, TCGA data might not be sufficiently
large to fully address the challenges around real-world digital pathology in clinical practice, such as heterogeneity and noise artefacts34,
leading to a substantial performance drop when using TCGA-based
predictive models and biomarkers on out-of-distribution samples.
Second, it remains challenging to design a model architecture that can
effectively capture both local patterns in individual tiles and global patterns across whole slides35–39. Existing models often treat each image
tile as an independent sample and formulate slide-level modelling as
multiple instance learning4,40–43, thus limiting their ability to model
complex global patterns in gigapixel whole slides. A notable exception
is Hierarchical Image Pyramid Transformer (HIPT), which explores
hierarchical self-attention over the tiles35. Third, in the rare cases in
which pretraining has been conducted on large-scale real-world patient
data, the resulting foundation models are typically not accessible to
Microsoft Research, Redmond, WA, USA. 2Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. 3Providence Genomics, Portland, OR, USA.
1
Providence Research Network, Renton, WA, USA. 5Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA. 6Department of Surgery, University of Washington, Seattle,
WA, USA. 7These authors contributed equally: Hanwen Xu, Naoto Usuyama. ✉e-mail: ; ;
4
Nature | Vol 630 | 6 June 2024 | 181
Article
the public, thus limiting their broader applicability in clinical research
and applications.
We have developed Prov-GigaPath, an open-weight pathology foundation model, to address these three challenges (Supplementary
Table 1). First, Prov-GigaPath is pretrained on Prov-Path, a large digital
pathology dataset from the Providence health network across 28 cancer centres. Prov-Path contains 1,384,860,229 image tiles from 171,189
haematoxylin and eosin (H&E)-stained and immunohistochemistry
pathology slides, which originated from biopsies and resections in
more than 30,000 patients, covering 31 major tissue types. Prov-Path
is more than five times larger than TCGA in terms of the number of
image tiles and more than two times larger than TCGA in terms of the
number of patients. Our pretraining leverages all 1.3 billion image tiles,
which, to our knowledge, constitutes the largest pretraining effort to
date. These large, diverse, real-world data serves as the foundation
for pretraining Prov-GigaPath. Prov-Path also encompasses a hierarchy of valuable information, including histopathology findings,
cancer staging, genomic mutation profiles, along with the associated
pathology reports.
Second, to capture both local and global patterns across the entire
slide, we propose GigaPath, a novel vision transformer for pretraining
large pathology foundation models on gigapixel pathology slides.
The key idea is to embed image tiles as visual tokens, thus turning a
slide into a long sequence of tokens. Transformer44 is a powerful neural architecture for sequence modelling by distilling arbitrary complex patterns among the tokens. Howev (...truncated)