Comparison of circular RNA prediction tools
Nucleic Acids Research
Comparison of circular RNA prediction tools
Thomas B. Hansen 0
Morten T. Venø 0
Christian K. Damgaard 0
Jørgen Kjems 0
0 Department of Molecular Biology and Genetics (MBG) and Interdisciplinary Nanoscience Center (iNANO), Aarhus University , DK-8000 Aarhus C , Denmark
CircRNAs are novel members of the non-coding RNA family. For several decades circRNAs have been known to exist, however only recently the widespread abundance has become appreciated. Annotation of circRNAs depends on sequencing reads spanning the backsplice junction and therefore map as nonlinear reads in the genome. Several pipelines have been developed to specifically identify these nonlinear reads and consequently predict the landscape of circRNAs based on deep sequencing datasets. Here, we use common RNAseq datasets to scrutinize and compare the output from five different algorithms; circRNA finder, find circ, CIRCexplorer, CIRI, and MapSplice and evaluate the levels of bona fide and false positive circRNAs based on RNase R resistance. By this approach, we observe surprisingly dramatic differences between the algorithms specifically regarding the highly expressed circRNAs and the circRNAs derived from proximal splice sites. Collectively, this study emphasizes that circRNA annotation should be handled with care and that several algorithms should ideally be combined to achieve reliable predictions.
INTRODUCTION
Long non-coding RNAs (lncRNAs) belong to a diverse
class of transcripts whose common feature is that they are
predicted not to function as messengers for protein
translation. Instead, lncRNAs typically function as regulators
of protein coding gene expression. The modulation
mediated by lncRNAs can take place at every step in the gene
expression pathway from transcription and chromatin
remodelling to translation as well as through regulation of
resulting protein function involving a wide range of
different mechanisms. The mechanisms discovered to date span
from lncRNAs serving as guides for proteins to lncRNAs
that act as molecular scaffolds with gene regulatory
proporties, thereby facilitating formation of active regulatory
complexes. Additionally, lncRNAs can act as target decoys
by redirecting binding of either microRNAs (miRNAs) or
DNA-/RNA-binding proteins from the intended target as
well as bind to and allosterically modifying the function of
regulatory proteins (
1
). Hence, lncRNAs contribute to
correct and timely regulation of protein expression and are
essential for the survival and maintenance of diverse cell
functions.
Circular RNA (circRNA) constitutes a particular
intriguing class of recently recognized lncRNAs. Although
the presence of circRNAs in human cells was established
more than twenty years ago (
2–5
), the prevalence and
abundance of these circular RNAs in human cells has only
recently been revealed (
6–8
). Since many large-scale RNA
sequencing applications rely on accessible termini or
poly(A)tail purification steps, circRNAs have evaded recognition
or simply been discarded as artefacts during standard
processing, which involves alignment to the ‘linear’ genome
(9). circRNA are all characterized by a non-linear
‘backsplicing’ event between a splice donor (SD) and an
upstream splice acceptor (SA) in contrast to a downstream
SA in conventional linear splicing. Hence, elucidation of
circRNA abundance requires application of dedicated
bioinformatic pipelines directed to search specifically for
circRNAs in datasets generated from deep-sequencing of
eukaryotic rRNA-depleted RNA (
6–8,10–12
). These pipelines
all identify circRNAs based on the presence of backsplice
junction-spanning reads. As a consequence, large numbers
of circRNAs derived mainly from exonic regions, but also
from intronic, intergenic and UTR regions, lncRNA loci
and antisense to known transcripts were identified (
6,7
).
These analyses also revealed that multiple circRNAs may
arise from the same gene locus, a phenomenon termed
alternative circularization (
3,6,8,10
) and that circRNAs may
comprise single to multiple exons (10). Although the
number of circRNAs identified vary widely from >25 000 in
one study (
6
) to a few thousands in others (
7,8
), it has
become clear that circRNA constitutes an abundant and
fascinating class of lncRNA. While most circRNAs are
modestly expressed in cells, specific circRNA species are
highly abundant (8) including the CDR1as/cirRS-7, which
is highly and widely expressed in the brain (
13
). Aside from
CDR1as/ciRS-7, which acts as a miR-7 sponge (
7,14
) and
circMbl that acts as a decoy for its own protein product
muscleblind (15), not much is currently known regarding
the functional importance of circRNA.
A repository of circRNA has been developed, termed
circBase (
16
), containing all annotation information on
circRNAs predicted and identified thus far. To ensure that the
circBase repository only describes bona fide circular RNAs, it
is important that the predic (...truncated)