Large-scale integration of single-cell transcriptomic data captures transitional progenitor states in mouse skeletal muscle regeneration
ARTICLE
https://doi.org/10.1038/s42003-021-02810-x
OPEN
Large-scale integration of single-cell transcriptomic
data captures transitional progenitor states in
mouse skeletal muscle regeneration
1234567890():,;
David W. McKellar1, Lauren D. Walter2, Leo T. Song 1, Madhav Mantri
Iwijn De Vlaminck 1,4 ✉ & Benjamin D. Cosgrove 1,4 ✉
3, Michael F. Z. Wang3,
Skeletal muscle repair is driven by the coordinated self-renewal and fusion of myogenic stem
and progenitor cells. Single-cell gene expression analyses of myogenesis have been hampered by the poor sampling of rare and transient cell states that are critical for muscle repair,
and do not inform the spatial context that is important for myogenic differentiation. Here, we
demonstrate how large-scale integration of single-cell and spatial transcriptomic data can
overcome these limitations. We created a single-cell transcriptomic dataset of mouse skeletal
muscle by integration, consensus annotation, and analysis of 23 newly collected scRNAseq
datasets and 88 publicly available single-cell (scRNAseq) and single-nucleus (snRNAseq)
RNA-sequencing datasets. The resulting dataset includes more than 365,000 cells and spans
a wide range of ages, injury, and repair conditions. Together, these data enabled identification
of the predominant cell types in skeletal muscle, and resolved cell subtypes, including
endothelial subtypes distinguished by vessel-type of origin, fibro-adipogenic progenitors
defined by functional roles, and many distinct immune populations. The representation of
different experimental conditions and the depth of transcriptome coverage enabled robust
profiling of sparsely expressed genes. We built a densely sampled transcriptomic model of
myogenesis, from stem cell quiescence to myofiber maturation, and identified rare, transitional states of progenitor commitment and fusion that are poorly represented in individual
datasets. We performed spatial RNA sequencing of mouse muscle at three time points after
injury and used the integrated dataset as a reference to achieve a high-resolution, local
deconvolution of cell subtypes. We also used the integrated dataset to explore ligandreceptor co-expression patterns and identify dynamic cell-cell interactions in muscle injury
response. We provide a public web tool to enable interactive exploration and visualization of
the data. Our work supports the utility of large-scale integration of single-cell transcriptomic
data as a tool for biological discovery.
1 Meinig School of Biomedical Engineering, Cornell University, Ithaca, NY 14853, USA. 2 Department of Molecular Biology & Genetics, Cornell University,
Ithaca, NY 14853, USA. 3 Department of Computational Biology, Cornell University, Ithaca, NY 14853, USA. 4These authors contributed equally: Iwijn De
Vlaminck, Benjamin D. Cosgrove. ✉email: ;
COMMUNICATIONS BIOLOGY | (2021)4:1280 | https://doi.org/10.1038/s42003-021-02810-x | www.nature.com/commsbio
1
ARTICLE
COMMUNICATIONS BIOLOGY | https://doi.org/10.1038/s42003-021-02810-x
M
uscle stem cells (MuSCs) are essential for muscle
homeostasis and repair. MuSCs are typically quiescent
in homeostasis and are activated after muscle damage.
Their subsequent proliferation, differentiation, commitment, and
fusion replenishes skeletal muscle tissue in a complex, coordinated process1–3. MuSCs are a rare cell type, accounting for less
than 1% of the cells within skeletal muscle at homeostasis. Even
rarer are the cell states quiescent MuSCs transition through
during differentiation to myofiber cells. Consequently, MuSCs
and muscle progenitor cells (myoblasts and myocytes) are difficult to study in their native tissue context. Conventional strategies
to study MuSCs and muscle progenitor cells rely on enrichment
by fluorescence-activated cell sorting using a transgenic reporter
or prospective isolation markers4. These methods however are illsuited to capture the subtle, continuous cell state transitions
which are critical for myogenesis due to a paucity of highly stagespecific cell isolation markers and the rarity of these cells.
Single-cell RNA sequencing (scRNAseq) enables a detailed characterization of cell types and states in complex tissues without the
need for targeted cell enrichment5–8. Skeletal muscle has been the
focus of a number of recent scRNAseq studies, which have aimed to
catalog its dynamic and heterogeneous constituent cell types and the
progression of myogenic stem and progenitor cell regulation in
muscle development and repair7. Single-nucleus RNA sequencing
(snRNAseq) has been used to capture transcriptomic signatures
from mature myofiber nuclei, which are largely lost during cell
isolation required for scRNAseq9–13. Yet, despite advances in the
scale of sc/snRNAseq technologies (103–104 cells per experiment),
these methods still poorly sample rare cell types and transient cell
states in detail without purification, which can introduce marker bias
and technical artefacts14. For example, we previously used scRNAseq
to study the dynamics of hindlimb skeletal muscle regeneration in
adult mice and resolved ~12 muscle-resident cell types from ~35,000
single-cell transcriptomes15. However, we observed fewer than 100
committed and fusing myogenic cells even though we sampled key
time-points of myogenic differentiation post-injury15. Other studies
similarly reported an infrequent sampling of committed myogenic
progenitors from whole muscle samples15–17.
To overcome these challenges, we used large-scale integration of
single-cell transcriptomics data. We measured ~95,000 single-cell
transcriptomes from 23 new samples of regenerating mouse hindlimb muscles in older mice. We then leveraged recent improvements in batch-correction algorithms18,19 to incorporate 88 publicly
available sc/snRNAseq datasets from 18 prior studies in our
analysis9,11,15–17,20–32. This led to a dataset that included ~365,000
cells/nuclei after quality filtering and allowed us to study the cellular
composition and dynamics in response to skeletal muscle injury over
a wide range of experimental conditions. The depth of transcriptome
coverage achieved by large-scale integration of single-cell transcriptomic data enabled us to robustly characterize rare, short-lived
cell states on the myogenic cell differentiation trajectory. We identified transcription factors and surface markers that distinguish
committed myoblasts (~5 per sample, on average) and fusing
myocytes (~15 per sample, on average), which represent only 0.2
and 0.5% of all cells in the integrated muscle compendium,
respectively. We performed spatial RNA sequencing of mouse
muscle at three-time points after injury and used the integrated
compendium as a reference to achieve a high-resolution, local
deconvolution of cell subtypes. Our analysis brings insights into the
dynamics of stromal and immune cell colocalization with transient
myogenic cell states.
Results
Large-scale integration enables a high-resolution view of skeletal muscle. To profile skelet (...truncated)