Long noncoding RNAs (lncRNAs) participate in gene regulation underlying development and disease. Overcoming inherent limitations of bulk sequencing lncRNA analysis, we leveraged single-cell and spatial transcriptomics (ST) data to analyze 219,442 potential lncRNAs identified by the TAR-scRNA-seq pipeline across 13 cancer types. The lncRNA functions were assessed by identifying...
Advances in machine learning have transformed structural biology, enabling swift and accurate prediction of protein structure from sequence. However, key challenges persist in modeling side-chain packing, condition-dependent conformational changes and biomolecular interactions, largely because of limited high-quality training data. At the same time, emerging experimental...
Biomolecular embeddings serve as efficient representations of sequence and structure, enabling tasks such as similarity searches, structure and function prediction and estimation of biophysical properties. However, relying on embeddings without assessing their ability to accurately represent biomolecules is a critical flaw—akin to using a scalpel in surgery without verifying its...
Sequence-based deep learning models have become the state of the art for analyzing the genomic regulatory code. Particularly for enhancers, these models excel at deciphering sequence grammar that underlies their activity. To enable end-to-end enhancer modeling and design, we developed a software package called CREsted (cis-regulatory element sequence training, explanation and...
Spontaneously blinking fluorophores toggle between nonfluorescent and fluorescent forms without caging groups or redox buffers, enabling super-resolution imaging. The intrinsic blinking of such dyes is governed by molecular structure and modulated by environment; there is no one-size-fits-all fluorophore suitable for every imaging context. We report dyes with tuned on:off ratios...
Genetically encoded calcium (Ca2+) indicators (GECIs) are essential tools for monitoring neuronal activity, but the performance of red fluorescent GECIs has remained limited. In particular, many red indicators are relatively dim, produce low signal-to-noise ratios and can undergo unwanted photoswitching when exposed to blue light, restricting their use in all-optical experiments...
The extent to which an RNA folds into structure ensembles and how different structures in the ensemble regulate eukaryotic gene expression is not fully understood. Here, we coupled chemical probing with direct RNA sequencing to identify structure modifications along a single RNA molecule (sm-PORE-cupine). We used direct signal alignment in addition to base mapping to increase the...
Most proteins act through interactions with other molecules, yet predicting how single mutations perturb these interactions—defined as ‘protein codes’—remains a central challenge in computational biology. Here we introduce eSIG-Net, the edgetic mutation sequence-based interaction grammar network, a language model that integrates protein sequence embeddings with syntax-aware and...
Spatial transcriptomics enables high-resolution gene expression mapping in intact tissues. Xenium is widely adopted for its reliability, accessibility and data quality, yet the properties and limitations of Xenium-derived data remain poorly characterized. Here we present one of the most comprehensive Xenium datasets so far, encompassing over 40 breast and lung tumor sections...
Lipids play a central role in a multitude of biological functions associated with cancer, obesity, diabetes, cardiovascular and neurological pathologies. However, sensing and mapping of lipid classes in living cells remains challenging. Here we introduce a label-free approach to lipid imaging, which differentiates lipid species in living cells by hyperspectral mid-infrared...
Protein language models (PLMs) have recently emerged as a promising approach for next-generation variant-effect prediction (VEP). Most high-performing VEP methods currently utilize PLMs combined with additional information, such as homology, protein structure and population genetics data to improve prediction accuracy. This performance gain, however, comes with added complexity...
Oligodendrocytes enable rapid central nervous system signaling by myelinating axons. Here, to model key biomechanical cues regulating myelination, we developed a tunable hydrogel-based micropillar array system that mimics the three-dimensional architecture and softness of axons. This platform supports the long-term culture of oligodendrocytes and robust formation of multilayered...
The rapid advancement of spatial multi-omics technologies has unveiled opportunities for deciphering the intricate spatial heterogeneity; however, current computational approaches struggle to comprehensively integrate diverse molecular and spatial information. Here we propose 3d-OT, a deep geometry-aware framework that leverages spatial geometric and multi-omics information for...
Relating billions of proteins across the tree of life remains a challenging task for comparative biosphere genomics and artificial intelligence-driven structure prediction. Here we present DIAMOND DeepClust, a cascaded, ultra-fast clustering method enabling planetary-scale organization of protein space, scaling to trillions of sequences while retaining sensitivity at low identity...
Bottom-up proteomics relies predominantly on collision-induced dissociation (CID) for peptide sequencing, which has achieved remarkable sensitivity and efficiency now enabling single-cell analysis. However, CID shows limitations in characterizing post-translational modifications and complex proteoforms. Here we have developed an integrated mass spectrometry platform enabling...
Histopathological data are foundational in both biological research and clinical diagnostics but remain siloed from modern multimodal and single-cell frameworks. Here we introduce LazySlide, an open-source Python package built on the scverse ecosystem for efficient whole-slide image analysis and multimodal integration. By leveraging vision–language foundation models and adhering...
Tissue clearing has been widely used for fluorescence imaging of fixed tissues, but its application to live tissues has been limited by toxicity. Here we develop minimally invasive optical clearing media for fluorescence imaging of live mammalian tissues. Light scattering is minimized by adding spherical polymers with low osmolarity to the extracellular medium. A clearing medium...
Identification of small-molecule binding sites in proteins is an important task for drug discovery. Despite previous homology- and machine-learning-based approaches to this problem, true de novo binding-site prediction remains a challenge. Here we use features from a pretrained neural network to train a logistic regression model, AF2BIND, for accurate prediction of de novo...
Structured RNAs play many roles in cells and emerging biotechnology. While large RNAs and ribonucleoprotein complexes often benefit from high-resolution structural analysis through cryogenic-sample electron microscopy (cryoEM), single-domain RNAs, particularly those smaller than ~100 nt (33 kDa), have proven challenging. Here we address this methodological gap by engineering two...
The big data era in biology is underway, but the study of organismal form has been slow to capitalize on advances in imaging and computation. Imaging approaches can digitize whole organisms, but low throughput has limited the effort to document morphological diversity. Here, within the open science initiative ‘Antscan’, we applied high-throughput synchrotron X-ray microtomography...
Deconvolution algorithms estimate cell-type abundances from tissue-level data, enabling systematic cellular analysis of large cohorts. However, most deconvolution algorithms are specifically designed for single-omics data, thereby limiting their generalizability and scalability for various omics data from different cohorts. Here we present DECODE, a universal deconvolution...