A benchmarking of workflows for detecting differential splicing and differential expression at isoform level in human RNA-seq studies
Briefings in Bioinformatics, 20(2), 2019, 471–481
doi: 10.1093/bib/bbx122
Advance Access Publication Date: 13 October 2017
Paper
Gabriela A. Merino, Ana Conesa and Elmer A. Fernández
Corresponding authors: Ana Conesa, Genomics of Gene Expression Lab, Centro de Investigaciones Prı́ncipe Felipe, Eduardo Primo Yúfera 3, 42012, Valencia,
~ a. Tel.: +34 96 328 96 80; and Microbiology and Cell Science Department, Institute for Food and Agricultural Research, University of Florida, 2033 Mowry
Espan
Road, FL 32610, Gainesville. Tel.: +1 352 2738127; E-mail: ; Elmer A. Fernández, Centro de Investigación y Desarrollo en Inmunologı́a y
Enfermedades Infecciosas (CIDIE), CONICET, Av. Armada Argentina 3555, X5016DHK, Córdoba, Argentina. Tel.: +54 351 4938094; E-mail:
Abstract
Over the last few years, RNA-seq has been used to study alterations in alternative splicing related to several diseases.
Bioinformatics workflows used to perform these studies can be divided into two groups, those finding changes in the absolute isoform expression and those studying differential splicing. Many computational methods for transcriptomics analysis
have been developed, evaluated and compared; however, there are not enough reports of systematic and objective assessment of processing pipelines as a whole. Moreover, comparative studies have been performed considering separately the
changes in absolute or relative isoform expression levels. Consequently, no consensus exists about the best practices and
appropriate workflows to analyse alternative and differential splicing. To assist the adequate pipeline choice, we present
here a benchmarking of nine commonly used workflows to detect differential isoform expression and splicing. We evaluated the workflows performance over different experimental scenarios where changes in absolute and relative isoform
expression occurred simultaneously. In addition, the effect of the number of isoforms per gene, and the magnitude of
the expression change over pipeline performances were also evaluated. Our results suggest that workflow performance is
influenced by the number of replicates per condition and the conditions heterogeneity. In general, workflows based on
DESeq2, DEXSeq, Limma and NOISeq performed well over a wide range of transcriptomics experiments. In particular,
we suggest the use of workflows based on Limma when high precision is required, and DESeq2 and DEXseq pipelines to
prioritize sensitivity. When several replicates per condition are available, NOISeq and Limma pipelines are indicated.
Key words: alternative splicing; differential expression; RNA-seq; analysis workflow
Introduction
In high eukaryotes, many genes can produce multiple transcripts
through alternative splicing (AS), a post-transcriptional regulatory
mechanism responsible for the functional complexity and protein
diversity made from a small number of genes [1, 2]. Splicing patterns are constantly changing, allowing organisms to respond to
Gabriela A. Merino is a PhD student in Engineering Sciences and a Professor of Mathematical Analysis II at the National University of Córdoba, Argentina.
Her doctoral research focuses on bioinformatics and statistical analysis of the next-generation sequencing data and is funded by the National Council of
Science and Technology of Argentina (CONICET).
Ana Conesa is a bioinformatician and computational biologist developer of many different bioinformatics tools for functional annotation and analysis of
transcriptomics data, quality assessment of short-/long-read sequencing data and multiomics integration. She enjoys a shared position as Head of the
Genomics of Gene Expression Lab at the Centro de Investigación Prı́ncipe Felipe (Valencia, Spain), and as Professor Bioinformatics at the Microbiology and
Cell Sciences Department of the University of Florida at Gainesville, FL, USA.
Elmer A. Fernández is an Independent Researcher at National Council of Science and Technology of Argentina (CONICET). He is a bioengineer from the
National University of Entre Rios and he got his PhD in Advanced Computing and Artificial Intelligence from the University of Santiago de Compostela,
Spain. He currently leads the Bioscience Data Mining Group at CIDIE-CONICET-UCC in Córdoba, Argentina. He also leads de BioDataMining node of the
National Bioinformatic Platform.
Submitted: 4 July 2017; Received (in revised form): 20 August 2017
C The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please email:
V
471
A benchmarking of workflows for detecting differential
splicing and differential expression at isoform level
in human RNA-seq studies
472
| Merino et al.
Methods
Definition of expression changes at the isoform level
Let us suppose that there are three experimental conditions, A,
B and C, and a gene g having two isoforms, gI and gII, having the
Table 1. Illustration of changes in absolute and relative isoform expression occurred across three experimental conditions. The comparison of conditions A and B reflects the occurrence of differential
absolute expression, keeping relative isoform proportions. The comparison of conditions B and C reflects alterations in the AS mechanism causing significant changes in isoform proportions
Gene Isoform Expression in A
g
gI
gII
Expression in B
Expression in C
Abs
Rel (%)
Abs
Rel (%)
Abs
Rel (%)
10
5
66.67
33.33
20
10
66.67
33.33
20
5
80
20
Abs, Absolute expression value; Rel, relative expression value.
expression values listed in Table 1. The comparison of A and B
conditions reveals changes in gI and gII absolute expression,
without modifications in their proportions, which is an example
of DIE. Note that DIE refers to absolute changes in isoform expression and hence DIE methods use count matrices at the transcript level. When conditions A and C are compared, significant
changes in isoform proportions involving small changes in absolute expressions are present. This comparison reveals alterations in the AS mechanism in C with respect to A condition, a
phenomenon known as DS. The changes in the proportion of
the isoforms from the same gene are usually evaluated by
measuring the changes in the gene’s exon usage.
Workflows for DE analysis
Seven commonly used methods for DE analysis based on different approaches were chosen to analyse DIE and DS. The selected methods were: EBSeq, DESeq2, NOISeq, SplicingCompass,
Limma, DEXSeq and Cuffdiff2. Specific pipelines for them were
designed (Figure 1). The evaluated workflows were called:
Cufflinks, DESeq2, EBSeq, Limma and NOISeq, in the case of DIE
analysis (solid arrows), and CufflinksDS, DEXSeq, LimmaDS and
SplicingCompass, for DS study (dashed arrows). It is worth
nothing, that only Cuffdiff2 and Limma DE tools are able to perform the analysis of both DIE and DS.
DIE workflows
This group of pipelines takes as input data isoform expression
levels obtained by quantification methods based on probabilistic isoform resolut (...truncated)