HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41746-025-02003-4.pdf

HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings

npj | digital medicine Article Published in partnership with Seoul National University Bundang Hospital https://doi.org/10.1038/s41746-025-02003-4 HONeYBEE: enabling scalable multimodal AI in oncology through foundation modeldriven embeddings Check for updates 1,2,4 1234567890():,; 1234567890():,; Aakash Tripathi 1,3,4 , Asim Waqas 3 2 1,2 , Matthew B. Schabath , Yasin Yilmaz & Ghulam Rasool Harmonized ONcologY Biomedical Embedding Encoder (HONeYBEE) is an open-source framework that integrates multimodal biomedical data for oncology applications. It processes clinical data (structured and unstructured), whole-slide images, radiology scans, and molecular proﬁles to generate uniﬁed patient-level embeddings using domain-speciﬁc foundation models and fusion strategies. These embeddings enable survival prediction, cancer-type classiﬁcation, patient similarity retrieval, and cohort clustering. Evaluated on 11,400+ patients across 33 cancer types from The Cancer Genome Atlas (TCGA), clinical embeddings showed the strongest single-modality performance with 98.5% classiﬁcation accuracy and 96.4% precision@10 in patient retrieval. They also achieved the highest survival prediction concordance indices across most cancer types. Multimodal fusion provided complementary beneﬁts for speciﬁc cancers, improving overall survival prediction beyond clinical features alone. Comparative evaluation of four large language models revealed that general-purpose models like Qwen3 outperformed specialized medical models for clinical text representation, though task-speciﬁc ﬁne-tuning improved performance on heterogeneous data such as pathology reports. Recent advances in computational oncology have been fueled by the increasing digitization of diverse biomedical data, including structured clinical variables (such as demographics, tumor staging, and laboratory results), unstructured clinical narratives (such as pathology reports, radiology reports, and physician notes), medical imaging (radiology scans and whole-slide images or WSI), and high-dimensional molecular proﬁles1–6. This wealth of multimodal data offers unprecedented opportunities to improve patient stratiﬁcation, predict treatment response, and model disease progression2,3,6. In parallel, the adaptation of deep learning techniques from computer vision and natural language processing has enabled powerful solutions in these domains3,7. However, a fundamental challenge remains: the absence of robust, generalizable methods for integrating these heterogeneous data sources into uniﬁed representations that capture the biological complexity of cancer and support predictive modeling8. Although large-scale biomedical data is increasingly available and actively analyzed in oncology, it remains fragmented across distinct modalities, such as clinical data (structured variables and unstructured narratives), radiological and pathological imaging, and molecular proﬁles, which are typically processed separately. This siloed approach limits the ability to integrate complementary information across modalities for uniﬁed, patientcentered analysis8,9. Availability of large-scale datasets and advances in self- supervised learning have enabled the development of foundation models (FMs)1,10,11. These models, pretrained on text, imaging, or molecular data, have advanced feature extraction within individual modalities by learning latent representations that capture domain-speciﬁc patterns. These modality-speciﬁc embeddings can be adapted for downstream oncology tasks such as cancer classiﬁcation or overall survival (OS) prediction. However, in practice, these models are typically applied within single- or dual-modality workﬂows, leaving the complementary information across modalities underutilized12. While multimodal data availability continues to expand in oncology, a critical bottleneck remains: the absence of standardized, scalable frameworks that integrate modality-speciﬁc embeddings into uniﬁed, patient-level representations that capture multimodal patient similarity and support downstream oncology tasks. We hypothesize that integrating FM-derived embeddings from multiple data modalities can yield richer and more clinically informative patient representations, particularly in settings where clinical data are incomplete or less structured. Rather than relying solely on model scaling or increasing parameter counts, we propose that fusing complementary information from diverse biomedical data types offers a powerful, orthogonal approach to enhance predictive performance in oncology. To test this hypothesis, we present HONeYBEE or Harmonized ONcologYBiomedical Embedding 1 Department of Machine Learning, Mofﬁtt Cancer Center & Research Institute, Tampa, FL, USA. 2Department of Electrical Engineering, University of South Florida, Tampa, FL, USA. 3Departments of Cancer Epidemiology, Mofﬁtt Cancer Center & Research Institute, Tampa, FL, USA. 4These authors contributed equally: Aakash e-mail: aakash.tripathi@mofﬁtt.org Tripathi, Asim Waqas. npj Digital Medicine | (2025)8:622 1 https://doi.org/10.1038/s41746-025-02003-4 Encoder (https://lab-rasool.github.io/HoneyBee/). HONeYBEE is an opensource framework that generates individual patient-level embeddings from (i) structured and unstructured clinical data, (ii) pathology reports, (iii) radiologic images, (iv) WSIs, and (v) molecular proﬁles using modalityspeciﬁc FMs. HONeYBEE integrates these embeddings via concatenation, mean pooling, and Kronecker product fusion strategies to create uniﬁed, multimodal representations optimized for downstream oncology tasks, including cancer subtype classiﬁcation, patient clustering, OS prediction, and patient similarity retrieval. While numerous models and pipelines exist for analyzing clinical, imaging, and molecular data, most current tools remain modality-speciﬁc and lack the ﬂexibility to support uniﬁed, end-to-end multimodal workﬂows1,13. Existing methods are typically implemented as isolated codebases with rigid dependencies, domain-speciﬁc interfaces, and limited extensibility, which complicates reproducibility and impedes multimodal experimentation14. Moreover, the absence of standardized pipelines for modality-speciﬁc embedding generation, harmonization, and ﬂexible fusion introduces substantial technical barriers, slowing the development of clinically meaningful AI models15. Addressing these limitations requires not only access to multimodal data but also modular infrastructure capable of generating, integrating, and utilizing diverse patient-level embeddings in scalable, reproducible ways. HONeYBEE directly addresses this gap by providing a modular, opensource framework for multimodal embedding generation and integration. Built around domain-speciﬁc FMs, HONeYBEE supports the standardized preprocessing and representation of ﬁve key oncology data modalities8,16–20. Each modality is processed through dedicated pipelines, producing modality-speciﬁc embe (...truncated)