HONeYBEE: enabling scalable multimodal AI in oncology through foundation model-driven embeddings
npj | digital medicine
Article
Published in partnership with Seoul National University Bundang Hospital
https://doi.org/10.1038/s41746-025-02003-4
HONeYBEE: enabling scalable multimodal
AI in oncology through foundation modeldriven embeddings
Check for updates
1,2,4
1234567890():,;
1234567890():,;
Aakash Tripathi
1,3,4
, Asim Waqas
3
2
1,2
, Matthew B. Schabath , Yasin Yilmaz & Ghulam Rasool
Harmonized ONcologY Biomedical Embedding Encoder (HONeYBEE) is an open-source framework
that integrates multimodal biomedical data for oncology applications. It processes clinical data
(structured and unstructured), whole-slide images, radiology scans, and molecular profiles to generate
unified patient-level embeddings using domain-specific foundation models and fusion strategies.
These embeddings enable survival prediction, cancer-type classification, patient similarity retrieval,
and cohort clustering. Evaluated on 11,400+ patients across 33 cancer types from The Cancer
Genome Atlas (TCGA), clinical embeddings showed the strongest single-modality performance with
98.5% classification accuracy and 96.4% precision@10 in patient retrieval. They also achieved the
highest survival prediction concordance indices across most cancer types. Multimodal fusion provided
complementary benefits for specific cancers, improving overall survival prediction beyond clinical
features alone. Comparative evaluation of four large language models revealed that general-purpose
models like Qwen3 outperformed specialized medical models for clinical text representation, though
task-specific fine-tuning improved performance on heterogeneous data such as pathology reports.
Recent advances in computational oncology have been fueled by the
increasing digitization of diverse biomedical data, including structured
clinical variables (such as demographics, tumor staging, and laboratory
results), unstructured clinical narratives (such as pathology reports, radiology reports, and physician notes), medical imaging (radiology scans and
whole-slide images or WSI), and high-dimensional molecular profiles1–6.
This wealth of multimodal data offers unprecedented opportunities to
improve patient stratification, predict treatment response, and model disease progression2,3,6. In parallel, the adaptation of deep learning techniques
from computer vision and natural language processing has enabled powerful solutions in these domains3,7. However, a fundamental challenge
remains: the absence of robust, generalizable methods for integrating these
heterogeneous data sources into unified representations that capture the
biological complexity of cancer and support predictive modeling8.
Although large-scale biomedical data is increasingly available and
actively analyzed in oncology, it remains fragmented across distinct modalities, such as clinical data (structured variables and unstructured narratives), radiological and pathological imaging, and molecular profiles, which
are typically processed separately. This siloed approach limits the ability to
integrate complementary information across modalities for unified, patientcentered analysis8,9. Availability of large-scale datasets and advances in self-
supervised learning have enabled the development of foundation models
(FMs)1,10,11. These models, pretrained on text, imaging, or molecular data,
have advanced feature extraction within individual modalities by learning
latent representations that capture domain-specific patterns. These
modality-specific embeddings can be adapted for downstream oncology
tasks such as cancer classification or overall survival (OS) prediction.
However, in practice, these models are typically applied within single- or
dual-modality workflows, leaving the complementary information across
modalities underutilized12. While multimodal data availability continues to
expand in oncology, a critical bottleneck remains: the absence of standardized, scalable frameworks that integrate modality-specific embeddings into
unified, patient-level representations that capture multimodal patient
similarity and support downstream oncology tasks.
We hypothesize that integrating FM-derived embeddings from multiple data modalities can yield richer and more clinically informative patient
representations, particularly in settings where clinical data are incomplete or
less structured. Rather than relying solely on model scaling or increasing
parameter counts, we propose that fusing complementary information from
diverse biomedical data types offers a powerful, orthogonal approach to
enhance predictive performance in oncology. To test this hypothesis, we
present HONeYBEE or Harmonized ONcologYBiomedical Embedding
1
Department of Machine Learning, Moffitt Cancer Center & Research Institute, Tampa, FL, USA. 2Department of Electrical Engineering, University of South Florida,
Tampa, FL, USA. 3Departments of Cancer Epidemiology, Moffitt Cancer Center & Research Institute, Tampa, FL, USA. 4These authors contributed equally: Aakash
e-mail: aakash.tripathi@moffitt.org
Tripathi, Asim Waqas.
npj Digital Medicine | (2025)8:622
1
https://doi.org/10.1038/s41746-025-02003-4
Encoder (https://lab-rasool.github.io/HoneyBee/). HONeYBEE is an opensource framework that generates individual patient-level embeddings from
(i) structured and unstructured clinical data, (ii) pathology reports, (iii)
radiologic images, (iv) WSIs, and (v) molecular profiles using modalityspecific FMs. HONeYBEE integrates these embeddings via concatenation,
mean pooling, and Kronecker product fusion strategies to create unified,
multimodal representations optimized for downstream oncology tasks,
including cancer subtype classification, patient clustering, OS prediction,
and patient similarity retrieval.
While numerous models and pipelines exist for analyzing clinical,
imaging, and molecular data, most current tools remain modality-specific
and lack the flexibility to support unified, end-to-end multimodal
workflows1,13. Existing methods are typically implemented as isolated
codebases with rigid dependencies, domain-specific interfaces, and limited
extensibility, which complicates reproducibility and impedes multimodal
experimentation14. Moreover, the absence of standardized pipelines for
modality-specific embedding generation, harmonization, and flexible
fusion introduces substantial technical barriers, slowing the development of
clinically meaningful AI models15. Addressing these limitations requires not
only access to multimodal data but also modular infrastructure capable of
generating, integrating, and utilizing diverse patient-level embeddings in
scalable, reproducible ways.
HONeYBEE directly addresses this gap by providing a modular, opensource framework for multimodal embedding generation and integration.
Built around domain-specific FMs, HONeYBEE supports the standardized
preprocessing and representation of five key oncology data modalities8,16–20.
Each modality is processed through dedicated pipelines, producing
modality-specific embe (...truncated)