JETTA: junction and exon toolkits for transcriptome analysis
Copyedited by: TRJ
MANUSCRIPT CATEGORY: APPLICATIONS NOTE
BIOINFORMATICS APPLICATIONS NOTE
Genome analysis
Vol. 28 no. 9 2012, pages 1274–1275
doi:10.1093/bioinformatics/bts134
Advance Access publication March 19, 2012
JETTA: junction and exon toolkits for transcriptome analysis
Junhee Seok1,2 , Weihong Xu1 , Hong Gao1 , Ronald W. Davis1 and Wenzhong Xiao1,3,∗
Genome Technology Center, 855 California Street, Palo Alto, CA 94304, 2 Department of Statistics,
Stanford University, 390 Serra Mall, Stanford, CA 94305 and 3 Massachusetts General Hospital, Harvard Medical
School, Boston, MA 02114, USA
1 Stanford
Associate Editor: Martin Bishop
Received on January 6, 2012; revised on February 22, 2012;
accepted on March 13, 2012
1
INTRODUCTION
Recent developments of high-throughput technologies, such as
exon–junction arrays (Clark et al., 2007; Xu et al., 2011) and
RNA-Seq (Wang et al., 2008), have extended transcriptome studies
in biomedical research beyond the scope of gene expression.
These genomic platforms enable the genome-wide measurements
of alternative splicing (AS), the process by which individual exons
of the same gene are spliced to produce different isoforms of mRNA
transcripts, in clinical samples and animal disease models. This, in
turn, requires computational algorithms and software tools for data
analyses and visualization.
We have developed an integrated software package, JETTA, that
provides a one-stop solution for gene expression and AS analyses of
microarray data, from raw data files to visualization. The software
provides many options for array normalization, probe selection,
background correction, expression index computation, AS detection
and data visualization. It can also be potentially utilized for AS
detections from RNA-Seq data. Here, we describe JETTA in an
analysis of human liver and muscle tissues assayed on a custom
exon–junction GG-H array (Xu et al., 2011).
2
SOFTWARE OVERVIEW
JETTA consists of two major modules: array calculator and AS
analyzer (Fig. 1A). The array calculator supports commercial and
custom exon and exon–junction arrays compatible with Affymetrix
∗ To
whom correspondence should be addressed.
1274
oligonucleotide array design. It calculates the expression indices
of genes, exons and junctions from raw probe intensities. The AS
analyzer detects and visualizes AS events between conditions. In
addition, JETTA takes as input pre-calculated expression indices
of individual samples measured by RNA-Seq and performs AS
analyses between study groups.
The array calculator computes the expression indices through the
steps of background correction, normalization and summarization.
For background modeling, GCBIN (Clark et al., 2007) and MAT
(Kapur et al., 2007) are included, both of which use linear
models based on probe sequences to estimate the background
level. Similarly, quantile (Irizarry et al., 2003) and medianscaling normalization (Kapur et al., 2007) are the two options for
normalization. The processed probe signals are then summarized
to gene-, exon- or junction-level expression levels using either Li–
Wing fitting (Li and Wong, 2001), probe selection (Xing et al., 2006)
or median-polish algorithms (Irizarry et al., 2003).
The AS analyzer takes in expression indices from either the
output of the array calculator for microarray data or similar
results calculated by other tools for RNA-Seq data such as rSeq
(Jiang and Wong, 2009). The analyzer estimates AS signals that
include Splicing Index, i.e. fold changes of each exon and junction
normalized to the gene expression level (Clark et al., 2002, 2007),
and the corresponding P-values using the algorithms of detection
above background (DABG), microarray analysis of differential
splicing (MADS) or microarray detection of alternative splicing
(MIDAS). DABG calculates the probability of the presence of each
probe set based on the non-parametric distribution of background
probe intensities (Clark et al., 2007), and MADS (Xing et al.,
2008) and MIDAS (Affymetrix white paper: Alternative Transcript
Analysis Methods for Exon Arrays v1.1) calculate the P-value of
differential expression for each exon and junction relative to the
corresponding gene expression levels between two conditions. For
RNA-Seq data, Splicing Index and MIDAS are calculated, as DABG
and MADS are specific to arrays.
The analyzer then detects AS events by allowing custom filtering
on one or a combination of the provided AS signals. For example, to
reduce false positive detections, it is helpful to require each detected
event being supported by the signal of the exon as well as at least
one of its connecting junctions when analyzing exon–junction arrays
(Xu et al., 2011).
JETTA also provides the visualization of AS signals (Fig. 1B). In
addition, it is integrated with cisGenome Brower (Ji et al., 2008)
to allow the examination of raw signals of exons and junctions.
The software is implemented as an R-package as well as standalone
software with graphical user interface.
ABSTRACT
Summary: High-throughput genome-wide studies of alternatively
spliced mRNA transcripts have become increasingly important in
clinical research. Consequently, easy-to-use software tools are
required to process data from these studies, for example, using
exon and junction arrays. Here, we introduce JETTA, an integrated
software package for the calculation of gene expression indices
as well as the identification and visualization of alternative splicing
events. We demonstrate the software using data of human liver and
muscle samples hybridized on an exon–junction array.
Availability: JETTA and its demonstrations are freely available at
http://igenomed.stanford.edu/∼junhee/JETTA/index.html
Contacts:
© The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email:
[12:53 9/4/2012 Bioinformatics-bts134.tex]
Page: 1274
1274–1275
Copyedited by: TRJ
MANUSCRIPT CATEGORY: APPLICATIONS NOTE
JETTA
A
B
3
RESULTS AND DISCUSSION
JETTA is demonstrated using exon–junction array data of quadruplicates of human liver and muscle samples (Xu et al., 2011). Lowlevel data processing was performed through GCBIN correction,
median-scaling normalization and median-polish summarization.
Alternatively, spliced exons between liver and muscle were then
identified through the following steps. First, exon probe sets were
selected with MIDAS P-values <0.01 and DABG P-values <0.01
(in at least one tissue), resulting in 13 150 candidate exons. Second,
significant junction probe sets were identified that satisfied the same
criteria of MIDAS and DABG P-values as above. Third, since
exon probes alone sometimes are not sufficient for reliable analysis
of splicing (Xing et al., 2008), to increase the confidence of the
findings, we determined among candidate exons those supported by
at least one significant connecting junction that corroborated the AS
signal of the exon. This gave a final result of 6461 exons in 2 (...truncated)