A comprehensive joint analysis of the long and short RNA transcriptomes of human erythrocytes
Doss et al. BMC Genomics (2015) 16:952
DOI 10.1186/s12864-015-2156-2
RESEARCH ARTICLE
Open Access
A comprehensive joint analysis of the long
and short RNA transcriptomes of human
erythrocytes
Jennifer F. Doss1,2, David L. Corcoran2, Dereje D. Jima2,3, Marilyn J. Telen4, Sandeep S. Dave2,3 and Jen-Tsan Chi1,2*
Abstract
Background: Human erythrocytes are terminally differentiated, anucleate cells long thought to lack RNAs. However,
previous studies have shown the persistence of many small-sized RNAs in erythrocytes. To comprehensively define
the erythrocyte transcriptome, we used high-throughput sequencing to identify both short (18–24 nt) and long
(>200 nt) RNAs in mature erythrocytes.
Results: Analysis of the short RNA transcriptome with miRDeep identified 287 known and 72 putative novel
microRNAs. Unexpectedly, we also uncover an extensive repertoire of long erythrocyte RNAs that encode many
proteins critical for erythrocyte differentiation and function. Additionally, the erythrocyte long RNA transcriptome
is significantly enriched in the erythroid progenitor transcriptome. Joint analysis of both short and long RNAs
identified several loci with co-expression of both microRNAs and long RNAs spanning microRNA precursor regions.
Within the miR-144/451 locus previously implicated in erythroid development, we observed unique co-expression
of several primate-specific noncoding RNAs, including a lncRNA, and miR-4732-5p/-3p. We show that miR-4732-3p
targets both SMAD2 and SMAD4, two critical components of the TGF-β pathway implicated in erythropoiesis.
Furthermore, miR-4732-3p represses SMAD2/4-dependent TGF-β signaling, thereby promoting cell proliferation
during erythroid differentiation.
Conclusions: Our study presents the most extensive profiling of erythrocyte RNAs to date, and describes
primate-specific interactions between the key modulator miR-4732-3p and TGF-β signaling during human
erythropoiesis.
Keywords: Erythrocyte, microRNA, Long noncoding RNA, TGF-β
Background
Human erythrocytes provide gas transport throughout
the body and comprise the majority of cells in whole
blood. Human diseases related to red blood cells (RBCs)
or erythrocytes, such as anemia or malaria, affect hundreds of millions of people worldwide and present huge
health concerns. Although we have gained a significant
understanding of how these diseases occur and many
treatment options are available, we still cannot explain
many aspects of these erythrocyte diseases. Since the
* Correspondence:
1
Department of Molecular Genetics and Microbiology, Duke University,
Durham, NC 27710, USA
2
Center for Genomic and Computational Biology, Duke University, Durham,
NC 27708, USA
Full list of author information is available at the end of the article
precise regulation of both coding and noncoding RNAs
is essential for erythrocyte development, these erythrocyte diseases are often accompanied by significant transcriptome changes. During erythropoiesis, microRNAs
actively regulate proliferation and/or differentiation of
erythroid cells during physiological and pathological
adaptions [1]. Dysregulation of various other long and
short RNAs also leads to several disease states such as
ineffective erythropoiesis and anemias. Circulating erythrocytes can be easily obtained by blood drawing. Accordingly, a detailed analysis of the erythrocyte transcriptome
may provide an accessible window into the developmental
history and pathophysiology of erythrocytes. However,
such an analysis has been deemed impossible since
circulating erythrocytes were thought to lack any genetic
materials.
© 2015 Doss et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Doss et al. BMC Genomics (2015) 16:952
During terminal maturation of erythrocytes, the nucleus is extruded from progenitors, leading to anucleate
cells with no further RNA production. Therefore, mature
erythrocytes were once thought to lack RNAs and have
significantly lower signals from RNA-binding dyes such
as methylene blue. However, many studies have shown
erythrocytes do contain diverse and abundant small
RNA species [2], including noncoding RNAs Y1 and Y4
[3] as well as microRNAs [4, 5]. We have previously
shown that higher levels of miR-144 and miR-451 reflect
the hemolytic phenotype and malaria resistance of sickle
erythrocytes, respectively [6, 7]. Additionally, both miR-451
and miR-144 reside in a locus that is regulated by GATA-1,
are highly induced during erythroid differentiation, and are
critical to erythropoiesis [8]. Therefore, extensive profiling
of the RBC transcriptome is critical for an understanding
of erythrocyte biology. The RNA composition of erythrocytes may also change during long-term storage for future
blood transfusion [9].
However, previous transcriptomic analyses were limited to known erythrocyte microRNAs using microarrays
[4], or known microRNAs from mixed reticulocyte
(immature red blood cell) and erythrocyte populations
using sequencing [10]. In addition, it is not clear whether
erythrocytes also contain long (large-sized) RNAs that
may provide valuable insights into their development and
adaptations. With the recent advances in high-throughput
sequencing technologies, it is possible to perform RNASeq to identify both known and unknown transcripts.
Here, we employed high-throughput sequencing to
characterize both short (small, 18–24 nt) and long
(large, > 200 nt) RNAs in human erythrocytes. For long
RNA profiling, we prepared RNA-Seq libraries using a
protocol that allows for identification of both polyadenylated and non-polyadenylated RNAs. A total of 6843 transcripts were expressed in all three analyzed erythrocyte
samples. While this number is far less than that of typical
nucleated cells, these analyses established a surprisingly
diverse RBC transcriptome. In parallel, short RNA sequencing libraries were prepared and the miRDeep pipeline was utilized to identify both known and putative
microRNAs. From these analyses, we identified in mature
erythrocytes an abundant, diverse set of microRNAs that
include both known and putative microRNAs. The joint
analysis of transcriptomes identified several loci with
expression of both long and short RNAs, suggesting their
coordinated regulation of expression or processing.
Furthermore, we performed a functional investigation of
the uncharacterized, primate-specific miR-4732-3p within
the miR-144/451 locus. MiR-4732-3p was predicted to
target both SMAD2 and SMAD4 (...truncated)