Variation in RNA-Seq Transcriptome Profiles of Peripheral Whole Blood from Healthy Individuals with and without Globin Depletion
et al. (2014) Variation in RNA-Seq Transcriptome Profiles of Peripheral Whole Blood from Healthy
Individuals with and without Globin Depletion. PLoS ONE 9(3): e91041. doi:10.1371/journal.pone.0091041
Variation in RNA-Seq Transcriptome Profiles of Peripheral Whole Blood from Healthy Individuals with and without Globin Depletion
Heesun Shin 0
Casey P. Shannon 0
Nick Fishbane 0
Jian Ruan 0
Mi Zhou 0
Robert Balshaw 0
Janet E. Wilson-McManus 0
Raymond T. Ng 0
Bruce M. McManus 0
Scott J. Tebbutt 0
for the 0
Kai Wang, University of Southern California, United States of America
0 1 NCE CECR PROOF Centre of Excellence , Vancouver, British Columbia , Canada , 2 UBC Department of Medicine (Division of Respiratory Medicine), University of British Columbia , Vancouver, British Columbia , Canada , 3 UBC Department of Pathology and Laboratory Medicine, University of British Columbia , Vancouver, British Columbia , Canada , 4 UBC Department of Computer Science, University of British Columbia , Vancouver, British Columbia , Canada , 5 UBC Department of Statistics, University of British Columbia , Vancouver, British Columbia , Canada , 6 UBC James Hogg Research Centre & Institute for HEART
1 LUNG Health, University of British Columbia , Vancouver, British Columbia , Canada
Background: The molecular profile of circulating blood can reflect physiological and pathological events occurring in other tissues and organs of the body and delivers a comprehensive view of the status of the immune system. Blood has been useful in studying the pathobiology of many diseases. It is accessible and easily collected making it ideally suited to the development of diagnostic biomarker tests. The blood transcriptome has a high complement of globin RNA that could potentially saturate next-generation sequencing platforms, masking lower abundance transcripts. Methods to deplete globin mRNA are available, but their effect has not been comprehensively studied in peripheral whole blood RNA-Seq data. In this study we aimed to assess technical variability associated with globin depletion in addition to assessing general technical variability in RNA-Seq from whole blood derived samples. Results: We compared technical and biological replicates having undergone globin depletion or not and found that the experimental globin depletion protocol employed removed approximately 80% of globin transcripts, improved the correlation of technical replicates, allowed for reliable detection of thousands of additional transcripts and generally increased transcript abundance measures. Differential expression analysis revealed thousands of genes significantly upregulated as a result of globin depletion. In addition, globin depletion resulted in the down-regulation of genes involved in both iron and zinc metal ion bonding. Conclusions: Globin depletion appears to meaningfully improve the quality of peripheral whole blood RNA-Seq data, and may improve our ability to detect true biological variation. Some concerns remain, however. Key amongst them the significant reduction in RNA yields following globin depletion. More generally, our investigation of technical and biological variation with and without globin depletion finds that high-throughput sequencing by RNA-Seq is highly reproducible within a large dynamic range of detection and provides an accurate estimation of RNA concentration in peripheral whole blood. High-throughput sequencing is thus a promising technology for whole blood transcriptomics and biomarker discovery.
-
Funding: The authors are grateful for funding from the MITACS Accelerate program. The funders had no role in study design, data collection and analysis,
decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
. These authors contributed equally to this work.
Molecular profiles of circulating blood can be associated with
physiological and pathological events occurring in other tissues
and organs of the body [1,2]. Peripheral whole blood is therefore a
highly desirable tissue for developing diagnostic biomarker tests,
due to its ease of accessibility and the low risk associated with its
collection, as compared to invasive organ biopsies. To identify
transcripts in peripheral blood that can potentially be used as
diagnostic biomarkers, it is advantageous to utilize a technology
that is highly sensitive and provides accurate quantification of
RNA species. Conventional microarray technologies have been
widely used for such purposes [36]. High-throughput DNA
sequencing is a promising alternative transcriptome profiling
technology that provides the greater sensitivity, transcript coverage
and range, and data quality required for such investigations [7,8].
In addition, such data may generate a more complete and
comprehensive understanding of changes in transcript populations
present in peripheral whole blood that are associated with disease,
potentially providing insight into the molecular processes involved.
Globin dominates the peripheral whole blood transcriptome,
accounting for 80-90% of transcript species. This overabundance
may affect our ability to accurately detect other transcripts,
particularly those with lower expression. This concern is not new.
Experimental methods to specifically deplete globin RNA (globin
depletion; GD) have been proposed as a possible solution and
assessed on various technology platforms, including microarrays
[912] and deep Serial Analysis of Gene Expression (SAGE) [7],
but how it may affect RNA-Seq has not previously been
characterized. This is of particular interest with the rapid adoption
of RNA-Seq technology and the popularity of blood as a tissue for
investigation. Arguably, globin transcript abundance is of
particular concern in high-throughput sequencing applications, which
rely on random sampling of the entire transcript pool to assess
relative expression.
Chemical processing of delicate and often limited mRNA
samples can potentially introduce variability, skewing data
acquisition and preventing an accurate and consistent assessment
of the data. The minimization of sample variation is of particular
concern when attempting to identify and validate potential
biomarkers for specific clinical purposes. In this study we
investigate the applicability of RNA-Seq for transcriptome analysis
from whole blood samples. Using a widely available globin
depletion method we ask if globin depletion can reveal
lowabundance transcripts otherwise masked by globin transcripts and
we assess technical and biological variability associated with using
globin depletion. We investigate the level of technical variability
inherent in RNA-Seq data production and biological variability
across transcriptome samples donated by six healthy individuals.
Finally, we perform a limited differential gene expression analysis
between globin depleted (GD) and non-globin depleted (NGD)
samples in order to study any systematic effe (...truncated)