Comparison of circular RNA prediction tools

Nucleic Acids Research, Apr 2016

CircRNAs are novel members of the non-coding RNA family. For several decades circRNAs have been known to exist, however only recently the widespread abundance has become appreciated. Annotation of circRNAs depends on sequencing reads spanning the backsplice junction and therefore map as non-linear reads in the genome. Several pipelines have been developed to specifically identify these non-linear reads and consequently predict the landscape of circRNAs based on deep sequencing datasets. Here, we use common RNAseq datasets to scrutinize and compare the output from five different algorithms; circRNA_finder, find_circ, CIRCexplorer, CIRI, and MapSplice and evaluate the levels of bona fide and false positive circRNAs based on RNase R resistance. By this approach, we observe surprisingly dramatic differences between the algorithms specifically regarding the highly expressed circRNAs and the circRNAs derived from proximal splice sites. Collectively, this study emphasizes that circRNA annotation should be handled with care and that several algorithms should ideally be combined to achieve reliable predictions.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/44/6/e58.full.pdf

Comparison of circular RNA prediction tools

Nucleic Acids Research Comparison of circular RNA prediction tools Thomas B. Hansen 0 Morten T. Venø 0 Christian K. Damgaard 0 Jørgen Kjems 0 0 Department of Molecular Biology and Genetics (MBG) and Interdisciplinary Nanoscience Center (iNANO), Aarhus University , DK-8000 Aarhus C , Denmark CircRNAs are novel members of the non-coding RNA family. For several decades circRNAs have been known to exist, however only recently the widespread abundance has become appreciated. Annotation of circRNAs depends on sequencing reads spanning the backsplice junction and therefore map as nonlinear reads in the genome. Several pipelines have been developed to specifically identify these nonlinear reads and consequently predict the landscape of circRNAs based on deep sequencing datasets. Here, we use common RNAseq datasets to scrutinize and compare the output from five different algorithms; circRNA finder, find circ, CIRCexplorer, CIRI, and MapSplice and evaluate the levels of bona fide and false positive circRNAs based on RNase R resistance. By this approach, we observe surprisingly dramatic differences between the algorithms specifically regarding the highly expressed circRNAs and the circRNAs derived from proximal splice sites. Collectively, this study emphasizes that circRNA annotation should be handled with care and that several algorithms should ideally be combined to achieve reliable predictions. INTRODUCTION Long non-coding RNAs (lncRNAs) belong to a diverse class of transcripts whose common feature is that they are predicted not to function as messengers for protein translation. Instead, lncRNAs typically function as regulators of protein coding gene expression. The modulation mediated by lncRNAs can take place at every step in the gene expression pathway from transcription and chromatin remodelling to translation as well as through regulation of resulting protein function involving a wide range of different mechanisms. The mechanisms discovered to date span from lncRNAs serving as guides for proteins to lncRNAs that act as molecular scaffolds with gene regulatory proporties, thereby facilitating formation of active regulatory complexes. Additionally, lncRNAs can act as target decoys by redirecting binding of either microRNAs (miRNAs) or DNA-/RNA-binding proteins from the intended target as well as bind to and allosterically modifying the function of regulatory proteins ( 1 ). Hence, lncRNAs contribute to correct and timely regulation of protein expression and are essential for the survival and maintenance of diverse cell functions. Circular RNA (circRNA) constitutes a particular intriguing class of recently recognized lncRNAs. Although the presence of circRNAs in human cells was established more than twenty years ago ( 2–5 ), the prevalence and abundance of these circular RNAs in human cells has only recently been revealed ( 6–8 ). Since many large-scale RNA sequencing applications rely on accessible termini or poly(A)tail purification steps, circRNAs have evaded recognition or simply been discarded as artefacts during standard processing, which involves alignment to the ‘linear’ genome (9). circRNA are all characterized by a non-linear ‘backsplicing’ event between a splice donor (SD) and an upstream splice acceptor (SA) in contrast to a downstream SA in conventional linear splicing. Hence, elucidation of circRNA abundance requires application of dedicated bioinformatic pipelines directed to search specifically for circRNAs in datasets generated from deep-sequencing of eukaryotic rRNA-depleted RNA ( 6–8,10–12 ). These pipelines all identify circRNAs based on the presence of backsplice junction-spanning reads. As a consequence, large numbers of circRNAs derived mainly from exonic regions, but also from intronic, intergenic and UTR regions, lncRNA loci and antisense to known transcripts were identified ( 6,7 ). These analyses also revealed that multiple circRNAs may arise from the same gene locus, a phenomenon termed alternative circularization ( 3,6,8,10 ) and that circRNAs may comprise single to multiple exons (10). Although the number of circRNAs identified vary widely from >25 000 in one study ( 6 ) to a few thousands in others ( 7,8 ), it has become clear that circRNA constitutes an abundant and fascinating class of lncRNA. While most circRNAs are modestly expressed in cells, specific circRNA species are highly abundant (8) including the CDR1as/cirRS-7, which is highly and widely expressed in the brain ( 13 ). Aside from CDR1as/ciRS-7, which acts as a miR-7 sponge ( 7,14 ) and circMbl that acts as a decoy for its own protein product muscleblind (15), not much is currently known regarding the functional importance of circRNA. A repository of circRNA has been developed, termed circBase ( 16 ), containing all annotation information on circRNAs predicted and identified thus far. To ensure that the circBase repository only describes bona fide circular RNAs, it is important that the predic (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/44/6/e58.full.pdf

Thomas B. Hansen, Morten T. Venø, Christian K. Damgaard, Jørgen Kjems. Comparison of circular RNA prediction tools, Nucleic Acids Research, 2016, pp. e58-e58, 44/6, DOI: 10.1093/nar/gkv1458