JETTA: junction and exon toolkits for transcriptome analysis (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bioinformatics/article-pdf/28/9/1274/48877726/bioinformatics_28_9_1274.pdf

JETTA: junction and exon toolkits for transcriptome analysis

Copyedited by: TRJ MANUSCRIPT CATEGORY: APPLICATIONS NOTE BIOINFORMATICS APPLICATIONS NOTE Genome analysis Vol. 28 no. 9 2012, pages 1274–1275 doi:10.1093/bioinformatics/bts134 Advance Access publication March 19, 2012 JETTA: junction and exon toolkits for transcriptome analysis Junhee Seok1,2 , Weihong Xu1 , Hong Gao1 , Ronald W. Davis1 and Wenzhong Xiao1,3,∗ Genome Technology Center, 855 California Street, Palo Alto, CA 94304, 2 Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305 and 3 Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, USA 1 Stanford Associate Editor: Martin Bishop Received on January 6, 2012; revised on February 22, 2012; accepted on March 13, 2012 1 INTRODUCTION Recent developments of high-throughput technologies, such as exon–junction arrays (Clark et al., 2007; Xu et al., 2011) and RNA-Seq (Wang et al., 2008), have extended transcriptome studies in biomedical research beyond the scope of gene expression. These genomic platforms enable the genome-wide measurements of alternative splicing (AS), the process by which individual exons of the same gene are spliced to produce different isoforms of mRNA transcripts, in clinical samples and animal disease models. This, in turn, requires computational algorithms and software tools for data analyses and visualization. We have developed an integrated software package, JETTA, that provides a one-stop solution for gene expression and AS analyses of microarray data, from raw data files to visualization. The software provides many options for array normalization, probe selection, background correction, expression index computation, AS detection and data visualization. It can also be potentially utilized for AS detections from RNA-Seq data. Here, we describe JETTA in an analysis of human liver and muscle tissues assayed on a custom exon–junction GG-H array (Xu et al., 2011). 2 SOFTWARE OVERVIEW JETTA consists of two major modules: array calculator and AS analyzer (Fig. 1A). The array calculator supports commercial and custom exon and exon–junction arrays compatible with Affymetrix ∗ To whom correspondence should be addressed. 1274 oligonucleotide array design. It calculates the expression indices of genes, exons and junctions from raw probe intensities. The AS analyzer detects and visualizes AS events between conditions. In addition, JETTA takes as input pre-calculated expression indices of individual samples measured by RNA-Seq and performs AS analyses between study groups. The array calculator computes the expression indices through the steps of background correction, normalization and summarization. For background modeling, GCBIN (Clark et al., 2007) and MAT (Kapur et al., 2007) are included, both of which use linear models based on probe sequences to estimate the background level. Similarly, quantile (Irizarry et al., 2003) and medianscaling normalization (Kapur et al., 2007) are the two options for normalization. The processed probe signals are then summarized to gene-, exon- or junction-level expression levels using either Li– Wing fitting (Li and Wong, 2001), probe selection (Xing et al., 2006) or median-polish algorithms (Irizarry et al., 2003). The AS analyzer takes in expression indices from either the output of the array calculator for microarray data or similar results calculated by other tools for RNA-Seq data such as rSeq (Jiang and Wong, 2009). The analyzer estimates AS signals that include Splicing Index, i.e. fold changes of each exon and junction normalized to the gene expression level (Clark et al., 2002, 2007), and the corresponding P-values using the algorithms of detection above background (DABG), microarray analysis of differential splicing (MADS) or microarray detection of alternative splicing (MIDAS). DABG calculates the probability of the presence of each probe set based on the non-parametric distribution of background probe intensities (Clark et al., 2007), and MADS (Xing et al., 2008) and MIDAS (Affymetrix white paper: Alternative Transcript Analysis Methods for Exon Arrays v1.1) calculate the P-value of differential expression for each exon and junction relative to the corresponding gene expression levels between two conditions. For RNA-Seq data, Splicing Index and MIDAS are calculated, as DABG and MADS are specific to arrays. The analyzer then detects AS events by allowing custom filtering on one or a combination of the provided AS signals. For example, to reduce false positive detections, it is helpful to require each detected event being supported by the signal of the exon as well as at least one of its connecting junctions when analyzing exon–junction arrays (Xu et al., 2011). JETTA also provides the visualization of AS signals (Fig. 1B). In addition, it is integrated with cisGenome Brower (Ji et al., 2008) to allow the examination of raw signals of exons and junctions. The software is implemented as an R-package as well as standalone software with graphical user interface. ABSTRACT Summary: High-throughput genome-wide studies of alternatively spliced mRNA transcripts have become increasingly important in clinical research. Consequently, easy-to-use software tools are required to process data from these studies, for example, using exon and junction arrays. Here, we introduce JETTA, an integrated software package for the calculation of gene expression indices as well as the identiﬁcation and visualization of alternative splicing events. We demonstrate the software using data of human liver and muscle samples hybridized on an exon–junction array. Availability: JETTA and its demonstrations are freely available at http://igenomed.stanford.edu/∼junhee/JETTA/index.html Contacts: © The Author 2012. Published by Oxford University Press. All rights reserved. For Permissions, please email: [12:53 9/4/2012 Bioinformatics-bts134.tex] Page: 1274 1274–1275 Copyedited by: TRJ MANUSCRIPT CATEGORY: APPLICATIONS NOTE JETTA A B 3 RESULTS AND DISCUSSION JETTA is demonstrated using exon–junction array data of quadruplicates of human liver and muscle samples (Xu et al., 2011). Lowlevel data processing was performed through GCBIN correction, median-scaling normalization and median-polish summarization. Alternatively, spliced exons between liver and muscle were then identified through the following steps. First, exon probe sets were selected with MIDAS P-values <0.01 and DABG P-values <0.01 (in at least one tissue), resulting in 13 150 candidate exons. Second, significant junction probe sets were identified that satisfied the same criteria of MIDAS and DABG P-values as above. Third, since exon probes alone sometimes are not sufficient for reliable analysis of splicing (Xing et al., 2008), to increase the confidence of the findings, we determined among candidate exons those supported by at least one significant connecting junction that corroborated the AS signal of the exon. This gave a final result of 6461 exons in 2 (...truncated)