FIRMA: a method for detection of alternative splicing from exon array data

Bioinformatics, Aug 2008

Motivation: Analyses of EST data show that alternative splicing is much more widespread than once thought. The advent of exon and tiling microarrays means that researchers now have the capacity to experimentally measure alternative splicing on a genome wide level. New methods are needed to analyze the data from these arrays. Results: We present a method, finding isoforms using robust multichip analysis (FIRMA), for detecting differential alternative splicing in exon array data. FIRMA has been developed for Affymetrix exon arrays, but could in principle be extended to other exon arrays, tiling arrays or splice junction arrays. We have evaluated the method using simulated data, and have also applied it to two datasets: a panel of 11 human tissues and a set of 10 pairs of matched normal and tumor colon tissue. FIRMA is able to detect exons in several genes confirmed by reverse transcriptase PCR. Availability: R code implementing our methods is contributed to the package aroma.affymetrix. Contact: epurdom{at}stat.berkeley.edu Supplementary information: Supplementary data are available at Bioinformatics online.

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/24/15/1707.full.pdf

FIRMA: a method for detection of alternative splicing from exon array data

E. Purdom 3 K. M. Simpson 1 M. D. Robinson 0 1 J. G. Conboy 4 A. V. Lapuk 4 T. P. Speed 1 3 Associate Editor: David Rocke 0 Department of Medical Biology, University of Melbourne , Parkville, Victoria 3010, Australia 1 The Walter and Eliza Hall Institute, 1G Royal Parade , Parkville, Victoria, 3050 2 3860, Berkeley, CA 94720-3860, USA 3 Department of Statistics, University of California at Berkeley , 367 Evans Hall 4 Life Sciences Division, Lawrence Berkeley National Laboratory , 1 Cyclotron Road, Berkeley, CA 94720, USA Motivation: Analyses of EST data show that alternative splicing is much more widespread than once thought. The advent of exon and tiling microarrays means that researchers now have the capacity to experimentally measure alternative splicing on a genome wide level. New methods are needed to analyze the data from these arrays. Results: We present a method, finding isoforms using robust multichip analysis (FIRMA), for detecting differential alternative splicing in exon array data. FIRMA has been developed for Affymetrix exon arrays, but could in principle be extended to other exon arrays, tiling arrays or splice junction arrays. We have evaluated the method using simulated data, and have also applied it to two datasets: a panel of 11 human tissues and a set of 10 pairs of matched normal and tumor colon tissue. FIRMA is able to detect exons in several genes confirmed by reverse transcriptase PCR. Availability: R code implementing our methods is contributed to the package aroma.affymetrix. Contact: Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION Alternative splicing is thought to have several roles in complex organisms, primarily in increasing protein diversity (Maniatis and Tasic, 2002). It can affect the intracellular localization, binding properties or stability of a protein, or regulate its expression via nonsense-mediated decay (NMD) (Stamm et al., 2005). These events usually occur in a regulated manner, but if an aberrant splicing event occurs, it can be causative for, or symptomatic of, disease. More than 15% of heritable human diseases are known to be associated with mutations in splice sites or in splicing regulatory elements (Matlin et al., 2005). In particular, aberrant premRNA splicing events are known to be implicated in several types of cancer (Brinkman, 2004; Venables, 2004). Previously thought to be a relatively uncommon phenomenon, alternative splicing has recently been shown to be widespread To whom correspondence should be addressed. throughout the genome. Analyses of data on human expressed sequence tags (ESTs) give estimated lower bounds between 35% and 59% for the proportion of genes which have at least one splice variant (Modrek and Lee, 2002). The frequency of functional alternative splicing events is probably lower than this. Several groups have searched for alternative splicing events conserved between human and mouse, and their results suggest that the proportion of functionally alternatively spliced genes is 10% (Sorek et al., 2004; Sugnet et al., 2004; Yeo et al., 2005). A weakness of all EST-based methods is that they are biased towards genes which have greater EST coverage (Modrek and Lee, 2002). Several kinds of alternative splicing have been observed (see Black, 2003, for a recent review). The most common form is skipping or inclusion of one or more cassette exons (roughly 4050% of cases based on bioinformatic evidence (Clark and Thanaraj, 2002; Sugnet et al., 2004), these being exons which are wholly present in some transcripts, and wholly absent in some others. Alternatively, mutually exclusive cassette exon usage can take place; e.g. exon A or exon B forms part of the transcript, but never A and B together (more generally, multiple exons can exhibit mutual exclusivity). Usage of alternative 3 or 5 splice sites can result in shortening or lengthening of an exon. Other types of alternative splicing that have been observed are alternative promoter usage, alternative polyadenylation sites and intron retention. Additionally, any combination of the above may occur in an alternatively spliced transcript (Black, 2003). Skipping or inclusion of internal cassette exons is the most common kind of alternative splicing, and possibly the easiest to detect and verify. For this reason, we have focused on identifying specific exons showing patterns of differential alternative expression and have not approached the problem of reconstructing more complicated transcript patterns. Our algorithm FIRMA has been developed for analyzing the Affymetrix exon array, Santa Clara, California, USA, which queries the expression level of well annotated and as well as predicted exons. In brief, FIRMA scores each exon as to whether its probes systematically deviate from the expected gene expression level. With a small number of probes per exon (four or less), this is a challenging microarray platform to analyzesuch deviations can come from a myriad of biological and technical factors unrelated to alternative splicing. We show that FIRMA performs well in detecting exon-specific changes in expression and therefore can contribute substantially to the detection of regulated alternative splicing. Of course a single scoring method can only be one step in the analysis, and any results must be evaluated in the light of these other complications. The GeneChip Human Exon 1.0 ST (sense target) array is a whole-genome array, containing over 1.4 million probesets of up to four perfect match (PM) probes each, spread across exons from all known genes, plus a number of additional regions based on other annotation sources, including GENSCAN predictions and ESTs from dbEST. In the design phase, sequences from all the annotation sources were mapped to the July 2003 version of the human genome (UCSC hg16, NCBI 34). Regions which had some evidence from one or more sources for being transcribed were divided into probe selection regions (PSR) according to the presence of canonical splice sites, CDS start and stop positions or polyadenylation sites. Probes were then selected from within PSRs >25 bp in length. Each PSR corresponds to a probeset, which generally contains four possibly overlapping probes (sometimes fewer). About a quarter of the probesets are based solely on EST evidence, while another quarter are based solely on GENSCAN predictions (GeneChip Exon Array Design Technical Note, Affymetrix). The array contains only PM probes, with a small number of generic mismatch probes for the purposes of background correction. There are no probes which span exonexon junctions. Association of probesets with genes is not made at design time. Instead, these main-design probesets are annotated afterwards, using their alignment to the genome (Exon Probeset Annotations Whitepaper, Affymetrix). This process has been undertaken by Affymetrix, first for NCBI Build 34 of the genome, and more recently for Build 35. The result is (...truncated)


This is a preview of a remote PDF: https://bioinformatics.oxfordjournals.org/content/24/15/1707.full.pdf
Article home page: http://bioinformatics.oxfordjournals.org/content/24/15/1707.abstract

E. Purdom, K. M. Simpson, M. D. Robinson, J. G. Conboy, A. V. Lapuk, T.P. Speed. FIRMA: a method for detection of alternative splicing from exon array data, Bioinformatics, 2008, pp. 1707-1714, 24/15, DOI: 10.1093/bioinformatics/btn284