Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data

Nucleic Acids Research, Mar 2016

Background: Fusion transcripts are formed by either fusion genes (DNA level) or trans-splicing events (RNA level). They have been recognized as a promising tool for diagnosing, subtyping and treating cancers. RNA-seq has become a precise and efficient standard for genome-wide screening of such aberration events. Many fusion transcript detection algorithms have been developed for paired-end RNA-seq data but their performance has not been comprehensively evaluated to guide practitioners. In this paper, we evaluated 15 popular algorithms by their precision and recall trade-off, accuracy of supporting reads and computational cost. We further combine top-performing methods for improved ensemble detection. Results: Fifteen fusion transcript detection tools were compared using three synthetic data sets under different coverage, read length, insert size and background noise, and three real data sets with selected experimental validations. No single method dominantly performed the best but SOAPfuse generally performed well, followed by FusionCatcher and JAFFA. We further demonstrated the potential of a meta-caller algorithm by combining top performing methods to re-prioritize candidate fusion transcripts with high confidence that can be followed by experimental validation. Conclusion: Our result provides insightful recommendations when applying individual tool or combining top performers to identify fusion transcript candidates.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://nar.oxfordjournals.org/content/44/5/e47.full.pdf

Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data

Nucleic Acids Research Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data Silvia Liu 1 2 Wei-Hsiang Tsai 0 Ying Ding 1 2 Rui Chen 2 Zhou Fang 2 Zhiguang Huo 2 SungHwan Kim 2 Tianzhou Ma 2 Ting-Yu Chang 6 Nolan Michael Priedigkeit 5 Adrian V. Lee 4 Jianhua Luo 3 Hsei-Wei Wang 0 6 7 I-Fang Chung 0 7 George C. Tseng 1 2 0 Institute of Biomedical Informatics, National Yang-Ming University , No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112 , Taiwan 1 Department of Computational and Systems Biology, School of Medicine, University of Pittsburgh , Biomedical Science Tower 3, 3501 Fifth Avenue, Pittsburgh, PA 15213 , USA 2 Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh , 130 De Soto Street, Pittsburgh, PA 15261 , USA 3 Department of Pathology, School of Medicine, University of Pittsburgh , 3550 Terrace Street, Pittsburgh, PA 15261 , USA 4 Magee-Women's Research Institute , 204 Craft Avenue, Pittsburgh, PA 15213 , USA 5 Molecular Pharmacology, School of Medicine, University of Pittsburgh , 3550 Terrace Street, Pittsburgh, PA 15261 , USA 6 Institute of Microbiology and Immunology, National Yang-Ming University , No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112 , Taiwan 7 Center for Systems and Synthetic Biology, National Yang-Ming University , No. 155, Sec. 2, Linong Street, Beitou District, Taipei 112 , Taiwan - Background: Fusion transcripts are formed by either fusion genes (DNA level) or trans-splicing events (RNA level). They have been recognized as a promising tool for diagnosing, subtyping and treating cancers. RNA-seq has become a precise and efficient standard for genome-wide screening of such aberration events. Many fusion transcript detection algorithms have been developed for paired-end RNAseq data but their performance has not been comprehensively evaluated to guide practitioners. In this paper, we evaluated 15 popular algorithms by their precision and recall trade-off, accuracy of supporting reads and computational cost. We further combine top-performing methods for improved ensemble detection. Results: Fifteen fusion transcript detection tools were compared using three synthetic data sets under different coverage, read length, insert size and background noise, and three real data sets with selected experimental validations. No single method dominantly performed the best but SOAPfuse generally performed well, followed by FusionCatcher and JAFFA. We further demonstrated the potential of a meta-caller algorithm by combining top performing methods to re-prioritize candidate fusion transcripts with high confidence that can be followed by experimental validation. Conclusion: Our result provides insightful recommendations when applying individual tool or combining top performers to identify fusion transcript candidates. INTRODUCTION Fusion gene is a result of chromosomal insertion, deletion, translocation or inversion that joins two otherwise separated genes. Fusion genes are often oncogenes that play an important role in the development of many cancers. Transsplicing is an event that two different primary RNA transcripts are ligated together. Both fusion genes (DNA level) and trans-splicing events (RNA level) can form fusion transcripts. These events usually come from different types of aberrations in post-transcription and chromosomal rearrangements: large segment deletion (e.g. the well-known fusion TMPRSS2-ERG in prostate cancer ( 1 )), chromosome translocation (e.g. the well-known fusion BCR-ABL1 in chronic myeloid leukemia ( 2 ) and EML4-ALK in nonsmall-cell lung cancer ( 3 )), trans-splicing ( 4 ) or readthrough (two adjacent genes) ( 5 ). To date, many fusion transcripts have been found and collected in public databases. For example, there are 10 890 fusions in COSMIC (release 72) ( 6 ), 1374 fusion sequences found in human tumors (involving 431 different genes) in TICdb (release 3.3) ( 7 ), 2327 gene fusions in the Mitelman database (updated on Feb 2015) ( 8 ) and 29 159 chimeric transcripts in ChiTaRS (version 2.1) ( 9,10 ). Some databases (such as COSMIC, TICdb and ChiTaRS) collected fusion gene sequences and some (e.g. COSMIC and ChiTaRS) offered further summaries of the original tissue types. The advances in Massively Parallel Sequencing (MPS) have enabled sequencing of hundreds of millions of short reads and have been routinely applied to genomic and transcriptomic studies. The per-base sequencing resolution has provided a precise and efficient standard for fusion transcript detection, especially using paired-end RNA-Seq platforms ( 11 ). For example, Berger et al. detected and verified 11 fusion transcripts in melanoma samples, and also identified 12 novel chimeric readthrough transcripts ( 12 ). McPherson et al. verified 45 out of 268 detected fusion transcripts in ovarian and sarcoma samples ( 13 ). (...truncated)


This is a preview of a remote PDF: https://nar.oxfordjournals.org/content/44/5/e47.full.pdf

Silvia Liu, Wei-Hsiang Tsai, Ying Ding, Rui Chen, Zhou Fang, Zhiguang Huo, SungHwan Kim, Tianzhou Ma, Ting-Yu Chang, Nolan Michael Priedigkeit, Adrian V. Lee, Jianhua Luo, Hsei-Wei Wang, I-Fang Chung, George C. Tseng. Comprehensive evaluation of fusion transcript detection algorithms and a meta-caller to combine top performing methods in paired-end RNA-seq data, Nucleic Acids Research, 2016, pp. e47-e47, 44/5, DOI: 10.1093/nar/gkv1234