Direct RNA sequencing and signal alignment reveal RNA structure ensembles in a eukaryotic cell
nature methods
Article
https://doi.org/10.1038/s41592-026-03069-y
Direct RNA sequencing and signal alignment
reveal RNA structure ensembles in a
eukaryotic cell
Received: 15 February 2025
Accepted: 19 March 2026
Jiaxu Wang 1,2,5 , Jian Han2,5, Wen Ting Tan2, Anthony Youzhi Cheng2,
Jong Ghut Ashley Aw2, Yue Wang 3,4, Guisheng Zeng3,
Niranjan Nagarajan 2,4 & Yue Wan 2
Published online: xx xx xxxx
Check for updates
The extent to which an RNA folds into structure ensembles and how
different structures in the ensemble regulate eukaryotic gene expression
is not fully understood. Here, we coupled chemical probing with direct
RNA sequencing to identify structure modifications along a single RNA
molecule (sm-PORE-cupine). We used direct signal alignment in addition
to base mapping to increase the percentage of mappable sequences
and showed that Bernoulli mixture model clustering can separate
structure ensembles accurately. We applied sm-PORE-cupine to identify
isoform-specific structure ensembles along the SARS-CoV-2 genome
and structure ensembles in the Candida albicans transcriptome. We
observed that RNAs are more structurally homogeneous in vitro, at higher
temperatures and in the 3′ untranslated regions of C. albicans. Structure
ensembles are associated with changes in translation efficiency and decay
in C. albicans, and we validated translation changes using reporter assays.
sm-PORE-cupine expands the existing toolbox for studying RNA structure
and function in diverse transcriptomes.
Although there are many examples of functional RNA structures in
bacteria and viruses, how RNA structures could regulate gene expression in eukaryotic cells remains an important open question. As RNAs
can fold into different conformations in vitro and in vivo, identifying
functional RNA structures and how they change and determining which
structure in a population ensemble is biologically important enables
us to understand fundamental RNA biology and how RNA can be targeted in diseases1,2.
To obtain single-molecule RNA structure information, several
methods, including DRACO3, DREEM4, DANCE-MaP5 and Da Vinci6,
have been developed to couple dimethyl sulfate (DMS) or selective
2′-hydroxyl acylation analyzed by primer extension (SHAPE) chemical
probing with high-throughput sequencing to read out RNA modifications along a single read of RNA. These modified patterns are then
clustered using different algorithms to identify potential structure
clusters that are present inside cells or in solution. Single-molecule
RNA structure studies have identified structure ensembles along viral
RNAs, such as in HIV4 and SARS-CoV-2 genomes3,7; human RNAs, such
as in 7SK5; and plant RNAs, including in COOLAIR6, to better study
functional RNA structures that could impact gene regulation. Although
useful, most of these strategies require the conversion of RNA into
cDNA molecules for analysis.
We previously coupled SHAPE chemical probing with nanopore direct RNA sequencing to determine aggregate RNA structure
Institute of Medical Genetics and Development, Key Laboratory of Reproductive Genetics (Ministry of Education) and Women’s Hospital, Zhejiang
University School of Medicine, Zhejiang, China. 2Genome Institute of Singapore, A*STAR, Singapore, Singapore. 3A*STAR Infectious Diseases Labs (A*STAR
ID Labs), Agency for Science, Technology and Research (A*STAR), Singapore, Singapore. 4Department of Biochemistry, Yong Loo Lin School of Medicine,
National University of Singapore, Singapore, Singapore. 5These authors contributed equally: Jiaxu Wang, Jian Han.
e-mail: ;
;
1
Nature Methods
Article
information on gene-linked RNA isoforms (PORE-cupine)8. In the present study, we further optimized SHAPE chemical probing and used
signal alignments from direct RNA sequencing to identify RNA modifications along a single RNA molecule. We also tested different clustering strategies to group modified RNA reads into their corresponding
structures. We named our single-molecule RNA structure probing
method ‘sm-PORE-cupine’. Using sm-PORE-cupine, we show that we
can accurately separate riboswitch structure ensembles, identify RNA
structure ensembles along different SARS-CoV-2 subgenomic RNAs
(sgRNAs) and determine structural heterogeneity in the C. albicans
transcriptome. Our results demonstrate that RNAs are highly heterogeneous in transcriptomes and that changes in structural ensembles
can influence gene regulation inside cells.
Results
High modification rates and low false-positive rates enable
accurate clustering of RNA structure populations
To identify paired and unpaired bases along an RNA, we previously
sequenced RNAs that are modified and unmodified with the SHAPE
compound NAI-N3 (ref. 9), using direct RNA sequencing8. We observed
that NAI-N3-modified bases can shift the current mean and standard
deviation during direct RNA sequencing, as compared to unmodified
bases, to enable us to provide an aggregate structure signal across many
molecules for a particular RNA. As the same RNA can fold into different conformations and direct RNA sequencing enables one molecule
to thread through a pore at a time, we were curious whether we could
separate structure populations based on their modification pattern
per molecule (Fig. 1a). As many factors could influence our ability to
cluster RNA modification patterns along each molecule, we performed
simulation experiments using synthetic reads to determine the impact
of structural similarity, read depth, read length and modification rate on
our ability to form structural clusters accurately (Extended Data Fig. 1a).
We observed that increasing the sequencing depth, length and modification rate of the reads all improved our ability to separate structural
populations and that a modification rate of >1.5% coupled with longer
read lengths of >750 bases can separate structural populations at a
sequencing depth of 1,000 reads per transcript (Extended Data Fig. 1a).
Additionally, as different ways of identifying SHAPE modifications
using direct RNA sequencing, such as using PORE-cupine or Tombo10,
could have different sensitivities and false-positive rates (FPRs) in
detecting structure modifications, we also tested the effect of FPR on our
ability to separate synthetic RNA populations (Extended Data Fig. 1b).
We observed that high FPR decreases our ability to cluster RNA structure
populations. To determine the most important parameters out of all
tested, we performed linear regression on them and identified that high
modification rates and low FPRs have the biggest impact on our ability
to separate RNA structure populations accurately (Fig. 1b).
Fig. 1 | Experimental and analytical workflow of sm-PORE-cupine. a, Schematic
of the experimental workflow to determine RNA structure ensembles from
single-molecule RNA structure probing data using nanopore direct RNA
sequencing. b, Bar chart shows the effects of each parameter (modification
rate, read length, read depth, similarity and FPR) in ena (...truncated)