tDRmapper: challenges and solutions to mapping, naming, and quantifying tRNA-derived RNAs from human small RNA-sequencing data
Selitsky and Sethupathy BMC Bioinformatics (2015) 16:354
DOI 10.1186/s12859-015-0800-0
METHODOLOGY ARTICLE
Open Access
tDRmapper: challenges and solutions
to mapping, naming, and quantifying
tRNA-derived RNAs from human small
RNA-sequencing data
Sara R. Selitsky1,2,3* and Praveen Sethupathy1,2,4
Abstract
Background: Small RNA-sequencing has revealed the diversity and high abundance of small RNAs derived from
tRNAs, referred to as tRNA-derived RNAs. However, at present, there is no standardized nomenclature and there are
no methods for accurate annotation and quantification of these small RNAs. tRNA-derived RNAs have unique
features that limit the utility of conventional alignment tools and quantification methods.
Results: We describe here the challenges of mapping, naming, and quantifying tRNA-derived RNAs and present a
novel method that addresses them, called tDRmapper. We then use tDRmapper to perform a comparative analysis
of tRNA-derived RNA profiles across different human cell types and diseases. We found that (1) tRNA-derived RNA
profiles can differ dramatically across different cell types and disease states, (2) that positions and types of chemical
modifications of tRNA-derived RNAs vary by cell type and disease, and (3) that entirely different tRNA-derived RNA
species can be produced from the same parental tRNA depending on the cell type.
Conclusion: tDRmappernot only provides a standardized nomenclature and quantification scheme, but also
includes graphical visualization that facilitates the discovery of novel tRNA and tRNA-derived RNA biology.
Keywords: tRNA, tDR, Sequencing, RNA modifications, Bioinformatics
Background
Transfer RNAs (tRNAs) are non-coding RNAs that
deliver amino acids to ribosomes during translation.
tRNA-derived RNAs (tDRs) are small RNAs that are
enzymatically processed from either nascent pre-tRNA
transcripts or mature tRNAs [1]. Their regulated biogenesis and well-defined 5′ and 3′ ends indicate that they
are not products of tRNA degradation [2]. tDRs are generated in organisms from all domains of life [3]. They
are derived from most tRNA genes and produced in
varying abundance, in a variety of different sizes, and
from different regions of the tRNA. Several functions have
been attributed to tDRs such as post-transcriptional [4, 5]
and translational repression [6], stress granule formation
* Correspondence:
1
Bioinformatics and Computational Biology Curriculum, University of North
Carolina, Chapel Hill, NC, USA
2
Departments of Genetics, University of North Carolina, Chapel Hill, NC, USA
Full list of author information is available at the end of the article
[7], and protection from apoptosis [8]; however, all of these
have been in the context of cell culture. The role of tDRs
in human health is only now starting to emerge. tDRs may
play a role in neurodegeneration [9], cancer [5, 10], as well
as immune modulation [11], and we previously showed
that tDRs are significantly increased in the liver tissue of
patients with chronic viral hepatitis and decreased in liver
cancer [2].
Despite the potential biomedical significance of tDRs,
the field is lagging behind other small RNA fields in
terms of genomic annotation and strategies for quantification from small RNA-sequencing (small RNA-seq)
data. This is due in large part to: (a) the unique computational challenges of mapping tDRs from small RNAseq data and (b) the lack of a standardized nomenclature
for tDRs.
© 2015 Selitsky and Sethupathy. Open Access This article is distributed under the terms of the Creative Commons Attribution
4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Selitsky and Sethupathy BMC Bioinformatics (2015) 16:354
a) While small RNA-seq has enabled the discovery of
tDRs, these small RNAs are difficult to map accurately
for at least three reasons:
(1)Exact copies of tRNA genes are present in
numerous locations throughout the human
genome, and annotation of tRNAs in the human
genome is still incomplete. This means that small
RNA-seq reads corresponding to tRNA-derived
RNAs can map with equal fidelity to numerous
locations throughout the genome (multi-mapping),
which leads to ambiguity about the precise origin
of the tRNA-derived RNA.
(2)tDRs are derived from both the nascent
pre-tRNAs and the processed, mature tRNAs.
The maturation process of eukaryotic tRNAs
includes several steps, such as the removal
of 5′ leader sequence and 3′ trailer
sequence, the addition of a non-templated
“CCA” to the 3′ end, and the excision of
introns. These changes during maturation
need to be accounted for when mapping
tRNA-derived RNA reads. For example,
spliced reads (those derived from the sequence
flanking the spliced intron) or reads that
contain a non-templated 3′-end “CCA” will
not map to the genome.
(3)tRNAs are subject to extensive chemical
modifications at specific nucleotide positions
during maturation [12]. As a result, tDRs most
likely harbor these modifications, which can lead
to errors during cDNA synthesis due to reverse
transcriptase pausing and mis-incorporation of
nucleosides [13]. These errors manifest in small
RNA-seq reads as mismatches and deletions
relative to the reference tRNA sequence.
These mismatches/deletions will be referred to
as “error type.”
b) There is no standardized nomenclature for tDRs.
tDRs are produced in a variety of different sizes,
from a variety of different tRNAs, and from a variety
of different locations within the tRNAs, all of which
present challenges for a coherent naming system.
A standard naming scheme is critical to facilitate
future research. For example, it is at present
extremely difficult to use published studies to define
the bio-distribution of specific tDRs (the tissues and
conditions in which specific tDRs are expressed) because the same or similar tDR is often referred to by
completely different names (and in some cases
a name is not given at all).
In this study we introduce a tool designed to
address the challenges of mapping, naming, and
quantifying tDRs, called tDRmapper. tDRmapper was
Page 2 of 13
designed specifically for human small RNA-seq data
(single-end, 50x) generated on the Illumina sequencing platform using cDNA libraries that were prepared using the Illumina TruSeq protocol. We used
tDRmapper to analyze publically available small RNAseq datasets (total n = 45) from four categories of cell
types/tissues. These analyses helped shape the final
version of the tool and also led to the discovery of
new types of tDR species as well as novel insights
about potentially varying pa (...truncated)