A whole-slide foundation model for digital pathology from real-world data

Nature, Jun 2024

Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1,2,3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision–language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling.

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41586-024-07441-w.pdf

A whole-slide foundation model for digital pathology from real-world data

Article A whole-slide foundation model for digital pathology from real-world data https://doi.org/10.1038/s41586-024-07441-w Received: 30 November 2023 Accepted: 19 April 2024 Published online: 22 May 2024 Hanwen Xu1,2,7, Naoto Usuyama1,7, Jaspreet Bagga1, Sheng Zhang1, Rajesh Rao1, Tristan Naumann1, Cliff Wong1, Zelalem Gero1, Javier González1, Yu Gu1, Yanbo Xu1, Mu Wei1, Wenhui Wang1, Shuming Ma1, Furu Wei1, Jianwei Yang1, Chunyuan Li1, Jianfeng Gao1, Jaylen Rosemon3, Tucker Bower3, Soohee Lee4, Roshanthi Weerasinghe4, Bill J. Wright4, Ari Robicsek4, Brian Piening3,5, Carlo Bifulco3,5 ✉, Sheng Wang2,6 ✉ & Hoifung Poon1 ✉ Open access Check for updates Digital pathology poses unique computational challenges, as a standard gigapixel slide may comprise tens of thousands of image tiles1–3. Prior models have often resorted to subsampling a small portion of tiles for each slide, thus missing the important slide-level context4. Here we present Prov-GigaPath, a whole-slide pathology foundation model pretrained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides from Providence, a large US health network comprising 28 cancer centres. The slides originated from more than 30,000 patients covering 31 major tissue types. To pretrain Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. To scale GigaPath for slide-level learning with tens of thousands of image tiles, GigaPath adapts the newly developed LongNet5 method to digital pathology. To evaluate Prov-GigaPath, we construct a digital pathology benchmark comprising 9 cancer subtyping tasks and 17 pathomics tasks, using both Providence and TCGA data6. With large-scale pretraining and ultra-large-context modelling, Prov-GigaPath attains state-of-the-art performance on 25 out of 26 tasks, with significant improvement over the second-best method on 18 tasks. We further demonstrate the potential of Prov-GigaPath on vision–language pretraining for pathology7,8 by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modelling. Computational pathology has the potential to transform cancer diagnostics by empowering diverse clinical applications, including cancer subtyping2,9,10, cancer staging1,11–13, diagnostic prediction14–17 and prognostic prediction18–23. Despite the encouraging performance of existing computational approaches, these are often developed for a specific application and require a large amount of annotated data for supervised learning. Data annotation is expensive and time-consuming and has emerged as an important bottleneck for computational pathology. Recently, self-supervised learning has shown promising results in leveraging unlabelled data to pretrain a foundation model, which can substantially reduce the demand for task-specific annotations24–28. Owing to their strong generalizability, foundation models have been developed for biomedical domains where labelled data are scarce but unlabelled data are abundant, a situation that aptly describes computational pathology29–33. There are three major challenges that hinder the development and use of pathology foundation models for real-world clinical applications. First, publicly available pathology data are relatively scarce and of varying quality, which limits the performance of foundation models pretrained on such data. For example, existing pathology foundation models were mainly pretrained on whole-slide images (WSIs) from The Cancer Genome Atlas (TCGA), an expert-curated dataset comprising approximately 30,000 slides and 208 million image tiles. Although they are a tremendous resource, TCGA data might not be sufficiently large to fully address the challenges around real-world digital pathology in clinical practice, such as heterogeneity and noise artefacts34, leading to a substantial performance drop when using TCGA-based predictive models and biomarkers on out-of-distribution samples. Second, it remains challenging to design a model architecture that can effectively capture both local patterns in individual tiles and global patterns across whole slides35–39. Existing models often treat each image tile as an independent sample and formulate slide-level modelling as multiple instance learning4,40–43, thus limiting their ability to model complex global patterns in gigapixel whole slides. A notable exception is Hierarchical Image Pyramid Transformer (HIPT), which explores hierarchical self-attention over the tiles35. Third, in the rare cases in which pretraining has been conducted on large-scale real-world patient data, the resulting foundation models are typically not accessible to Microsoft Research, Redmond, WA, USA. 2Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA, USA. 3Providence Genomics, Portland, OR, USA. 1 Providence Research Network, Renton, WA, USA. 5Earle A. Chiles Research Institute, Providence Cancer Institute, Portland, OR, USA. 6Department of Surgery, University of Washington, Seattle, WA, USA. 7These authors contributed equally: Hanwen Xu, Naoto Usuyama. ✉e-mail: ; ; 4 Nature | Vol 630 | 6 June 2024 | 181 Article the public, thus limiting their broader applicability in clinical research and applications. We have developed Prov-GigaPath, an open-weight pathology foundation model, to address these three challenges (Supplementary Table 1). First, Prov-GigaPath is pretrained on Prov-Path, a large digital pathology dataset from the Providence health network across 28 cancer centres. Prov-Path contains 1,384,860,229 image tiles from 171,189 haematoxylin and eosin (H&E)-stained and immunohistochemistry pathology slides, which originated from biopsies and resections in more than 30,000 patients, covering 31 major tissue types. Prov-Path is more than five times larger than TCGA in terms of the number of image tiles and more than two times larger than TCGA in terms of the number of patients. Our pretraining leverages all 1.3 billion image tiles, which, to our knowledge, constitutes the largest pretraining effort to date. These large, diverse, real-world data serves as the foundation for pretraining Prov-GigaPath. Prov-Path also encompasses a hierarchy of valuable information, including histopathology findings, cancer staging, genomic mutation profiles, along with the associated pathology reports. Second, to capture both local and global patterns across the entire slide, we propose GigaPath, a novel vision transformer for pretraining large pathology foundation models on gigapixel pathology slides. The key idea is to embed image tiles as visual tokens, thus turning a slide into a long sequence of tokens. Transformer44 is a powerful neural architecture for sequence modelling by distilling arbitrary complex patterns among the tokens. Howev (...truncated)


This is a preview of a remote PDF: https://www.nature.com/articles/s41586-024-07441-w.pdf
Article home page: https://www.nature.com/articles/s41586-024-07441-w

Xu, Hanwen, Usuyama, Naoto, Bagga, Jaspreet, Zhang, Sheng, Rao, Rajesh, Naumann, Tristan, Wong, Cliff, Gero, Zelalem, González, Javier, Gu, Yu, Xu, Yanbo, Wei, Mu, Wang, Wenhui, Ma, Shuming, Wei, Furu, Yang, Jianwei, Li, Chunyuan, Gao, Jianfeng, Rosemon, Jaylen, Bower, Tucker, Lee, Soohee, Weerasinghe, Roshanthi, Wright, Bill J., Robicsek, Ari, Piening, Brian, Bifulco, Carlo, Wang, Sheng, Poon, Hoifung. A whole-slide foundation model for digital pathology from real-world data, Nature, DOI: 10.1038/s41586-024-07441-w