A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies

Briefings in Bioinformatics, Mar 2019

Over the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq2, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bib/article-pdf/20/2/471/28833911/bbx122.pdf

A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies

Briefings in Bioinformatics, 20(2), 2019, 471–481 doi: 10.1093/bib/bbx122 Advance Access Publication Date: 13 October 2017 Paper Gabriela A. Merino, Ana Conesa and Elmer A. Fernández Corresponding authors: Ana Conesa, Genomics of Gene Expression Lab, Centro de Investigaciones Prı́ncipe Felipe, Eduardo Primo Yúfera 3, 42012, Valencia, ~ a. Tel.: +34 96 328 96 80; and Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, 2033 Mowry Espan Road, FL 32610, Gainesville. Tel.: +1 352 2738127; E-mail: ; Elmer A. Fernández, Centro de Investigación y Desarrollo en Inmunologı́a y Enfermedades Infecciosas (CIDIE), CONICET, Av. Armada Argentina 3555, X5016DHK, Córdoba, Argentina. Tel.: +54 351 4938094; E-mail: Abstract Over the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases. Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over different experimental scenarios where changes in absolute and relative isoform expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on DESeq2, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular, we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated. Key words: alternative splicing; differential expression; RNA-seq; analysis workflow Introduction In high eukaryotes, many genes can produce multiple transcripts through alternative splicing (AS), a post-transcriptional regulatory mechanism responsible for the functional complexity and protein diversity made from a small number of genes [1, 2]. Splicing patterns are constantly changing, allowing organisms to respond to Gabriela A. Merino is a PhD student in Engineering Sciences and a Professor of Mathematical Analysis II at the National University of Córdoba, Argentina. Her doctoral research focuses on bioinformatics and statistical analysis of the next-generation sequencing data and is funded by the National Council of Science and Technology of Argentina (CONICET). Ana Conesa is a bioinformatician and computational biologist developer of many different bioinformatics tools for functional annotation and analysis of transcriptomics data, quality assessment of short-/long-read sequencing data and multiomics integration. She enjoys a shared position as Head of the Genomics of Gene Expression Lab at the Centro de Investigación Prı́ncipe Felipe (Valencia, Spain), and as Professor Bioinformatics at the Microbiology and Cell Sciences Department of the University of Florida at Gainesville, FL, USA. Elmer A. Fernández is an Independent Researcher at National Council of Science and Technology of Argentina (CONICET). He is a bioengineer from the National University of Entre Rios and he got his PhD in Advanced Computing and Artificial Intelligence from the University of Santiago de Compostela, Spain. He currently leads the Bioscience Data Mining Group at CIDIE-CONICET-UCC in Córdoba, Argentina. He also leads de BioDataMining node of the National Bioinformatic Platform. Submitted: 4 July 2017; Received (in revised form): 20 August 2017 C The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email: V 471 A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies 472 | Merino et al. Methods Definition of expression changes at the isoform level Let us suppose that there are three experimental conditions, A, B and C, and a gene g having two isoforms, gI and gII, having the Table 1. Illustration of changes in absolute and relative isoform expression occurred across three experimental conditions. The comparison of conditions A and B reflects the occurrence of differential absolute expression, keeping relative isoform proportions. The comparison of conditions B and C reflects alterations in the AS mechanism causing significant changes in isoform proportions Gene Isoform Expression in A g gI gII Expression in B Expression in C Abs Rel (%) Abs Rel (%) Abs Rel (%) 10 5 66.67 33.33 20 10 66.67 33.33 20 5 80 20 Abs, Absolute expression value; Rel, relative expression value. expression values listed in Table 1. The comparison of A and B conditions reveals changes in gI and gII absolute expression, without modifications in their proportions, which is an example of DIE. Note that DIE refers to absolute changes in isoform expression and hence DIE methods use count matrices at the transcript level. When conditions A and C are compared, significant changes in isoform proportions involving small changes in absolute expressions are present. This comparison reveals alterations in the AS mechanism in C with respect to A condition, a phenomenon known as DS. The changes in the proportion of the isoforms from the same gene are usually evaluated by measuring the changes in the gene’s exon usage. Workflows for DE analysis Seven commonly used methods for DE analysis based on different approaches were chosen to analyse DIE and DS. The selected methods were: EBSeq, DESeq2, NOISeq, SplicingCompass, Limma, DEXSeq and Cuffdiff2. Specific pipelines for them were designed (Figure 1). The evaluated workflows were called: Cufflinks, DESeq2, EBSeq, Limma and NOISeq, in the case of DIE analysis (solid arrows), and CufflinksDS, DEXSeq, LimmaDS and SplicingCompass, for DS study (dashed arrows). It is worth nothing, that only Cuffdiff2 and Limma DE tools are able to perform the analysis of both DIE and DS. DIE workflows This group of pipelines takes as input data isoform expression levels obtained by quantification methods based on probabilistic isoform resolut (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/bib/article-pdf/20/2/471/28833911/bbx122.pdf
Article home page: https://academic.oup.com/bib/article/20/2/471/4524048

Merino, Gabriela A, Conesa, Ana, Fernández, Elmer A. A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies, Briefings in Bioinformatics, 2019, pp. 471-481, Volume 20, Issue 2, DOI: 10.1093/bib/bbx122