Functional analysis of structural variants in single cells using Strand-seq
nature biotechnology
Article
https://doi.org/10.1038/s41587-022-01551-4
Functional analysis of structural variants in
single cells using Strand-seq
Received: 29 October 2021
Accepted: 7 October 2022
Published online: 24 November 2022
Check for updates
Hyobin Jeong 1,18,20, Karen Grimes 1,2,20, Kerstin K. Rauwolf3,
Peter-Martin Bruch 4,5,6, Tobias Rausch 1,5, Patrick Hasenfeld 1, Eva Benito1,
Tobias Roider 1,4,5, Radhakrishnan Sabarinathan7, David Porubsky 8,9,19,
Sophie A. Herbst 4,5, Büşra Erarslan-Uysal5,10, Johann-Christoph Jann 11,
Tobias Marschall 12, Daniel Nowak 11, Jean-Pierre Bourquin3,
Andreas E. Kulozik 5,10, Sascha Dietrich4,5,6,13, Beat Bornhauser 3,
Ashley D. Sanders 1,14,15,16,21 & Jan O. Korbel 1,5,17,21
Somatic structural variants (SVs) are widespread in cancer, but their
impact on disease evolution is understudied due to a lack of methods
to directly characterize their functional consequences. We present a
computational method, scNOVA, which uses Strand-seq to perform
haplotype-aware integration of SV discovery and molecular phenotyping
in single cells by using nucleosome occupancy to infer gene expression as
a readout. Application to leukemias and cell lines identifies local effects of
copy-balanced rearrangements on gene deregulation, and consequences
of SVs on aberrant signaling pathways in subclones. We discovered distinct
SV subclones with dysregulated Wnt signaling in a chronic lymphocytic
leukemia patient. We further uncovered the consequences of subclonal
chromothripsis in T cell acute lymphoblastic leukemia, which revealed
c-Myb activation, enrichment of a primitive cell state and informed
successful targeting of the subclone in cell culture, using a Notch inhibitor.
By directly linking SVs to their functional effects, scNOVA enables systematic
single-cell multiomic studies of structural variation in heterogeneous cell
populations.
The mutational landscapes of numerous cancers were recently cataloged1,2, revealing that somatic SVs represent around 55% of driver
mutations2,3. Somatic mutational processes generate a broad spectrum of SVs from simple (for example, deletions and inversions) to
complex classes (for example, chromothripsis)4–8, and these SVs are
important drivers of malignancy, metastasis and relapse9–12. However,
with the exception of focal deletions and amplifications, somatic SVs
have proven difficult to characterize functionally in cancer genomic
surveys1–3,13. Studies integrating transcriptome and whole genome
sequencing (WGS) data have inferred SV functional outcomes13–16, but
these typically require large cohorts and do not account for intratumor
heterogeneity (ITH)3. Instead, SV effects can be measured directly by
reading both genotype and molecular phenotype in the same cell,
A full list of affiliations appears at the end of the paper.
using single-cell multiomics17–21. Several such methods have been
developed17–20, but these do not presently account for small (<10 Mb)
somatic copy number alterations (SCNAs), balanced SVs and complex rearrangement events like chromothripsis4,5,7,22, which has limited
efforts to functionally characterize the most common class of driver
mutations in cancer.
To address this, we developed scNOVA (single-cell nucleosome
occupancy and genetic variation analysis)—a method enabling functional characterization of the full spectrum of somatic SV classes.
scNOVA uses Strand-seq23 in two ways: (1) it uses the DNA fragmentation pattern resulting from micrococcal nuclease (MNase) digestion23
to directly measure nucleosome occupancy (NO) and indirectly infer
patterns of gene activity, and (2) it couples this ‘molecular phenotype’
e-mail: ;
Nature Biotechnology | Volume 41 | June 2023 | 832–844
832
Article
with SVs discovered by single-cell tri-channel processing (scTRIP, which
jointly models read-orientation, read depth and haplotype-phase24)
in the same cell. MNase digests the linker DNA between nucleosomes,
leaving nucleosome-protected DNA intact, to enable genome-wide
inference of NO by measuring sequence read counts25–28. Previous work
has shown that active enhancers and transcribed genes exhibit reduced
NO25–30. However, the relationships between NO and SV landscapes in
cancer remain unexplored. scNOVA addresses this by integrating SVs
and NO along the genome of a cell, to functionally characterize SVs in
heterogeneous samples.
Results
NO classifies cell types and predicts gene activity changes
Strand-seq data reveals NO. We hypothesized that NO patterns
derived from MNase fragmentation during Strand-seq library preparation could represent a readout to allow functional characterization
of SVs (Fig. 1a and Extended Data Fig. 1). To test this, we evaluated
whether Strand-seq data revealed nucleosome positioning through
comparison with bulk MNase-seq data. We used the NA12878 lymphoblastoid cell line (LCL), which has both datatypes available, and pooled
95 Strand-seq libraries (sequenced to a median of 540,379 mapped
nonduplicate reads per single cell31; Supplementary Table 1), into a
‘pseudobulk’ track, allowing direct comparison with the corresponding MNase-seq dataset (sequenced to 19-fold genomic coverage32).
We measured NO along the genome (Methods) and found Strand-seq
and MNase-seq were highly concordant in terms of uniformity of
coverage and inferred nucleosome positions at DNase I hypersensitive sites (Spearman’s r = 0.68) (Fig. 1b,c). Nucleosome positioning
near the binding site of CTCF26,28 (a key chromatin organizer) closely
matched between both assays (Fig. 1d and Supplementary Fig. 1),
and estimated nucleosome repeat lengths28 were highly concordant
(Supplementary Fig. 1). In addition, both assays measured NO in all 15
chromatin states identified by the Roadmap Epigenome Consortium33.
Among these chromatin states, Strand-seq and MNase-seq revealed
the highest NO signals on average for the polycomb-repressed state
and the bivalent enhancer state, whereas the lowest average NO signals were consistently seen for the active transcription start site (TSS)
state (Extended Data Fig. 2). This indicates that Strand-seq enables
direct measurement of NO to reveal a ‘molecular readout’. We thus
developed the scNOVA framework, which harnesses Strand-seq to
measure NO genome-wide and couples this with SVs discovered in
the same sequenced cell (Fig. 1a).
As Strand-seq resolves its measurements by haplotype31, we considered that haplotype-specific differences in NO (haplotype-specific
NO) resulting from random monoallelic expression, germline SNPs
and local effects of SVs could be harnessed for scNOVA. To assess the
utility of haplotype-resolved NO, we phased 24,652,658 of 49,205,197
(50.1%) of the NA12878 Strand-seq read fragments, and pooled these
reads to generate pseudobulk NO tracks for each chromosomal
haplotype (denoted ‘H1’ and ‘H2’, respectively; Fig. 1b). Using the
female-derived NA12878 cell line, we compared haplotype-specific
NO to haplotype-resolved gene expression measureme (...truncated)