The evolution, impact and properties of exonic splice enhancers (pdf)

Article PDF cannot be displayed. You can download it here:

http://genomebiology.com/content/pdf/gb-2013-14-12-r143.pdf

The evolution, impact and properties of exonic splice enhancers

Cceres and Hurst Genome Biology The evolution, impact and properties of exonic splice enhancers Eva Fernndez Cceres 0 Laurence D Hurst 0 0 Department of Biology and Biochemistry, University of Bath , Bath BA2 7AY , UK Background: In humans, much of the information specifying splice sites is not at the splice site. Exonic splice enhancers are one of the principle non-splice site motifs. Four high-throughput studies have provided a compendium of motifs that function as exonic splice enhancers, but only one, RESCUE-ESE, has been generally employed to examine the properties of enhancers. Here we consider these four datasets to ask whether there is any consensus on the properties and impacts of exonic splice enhancers. Results: While only about 1% of all the identified hexamer motifs are common to all analyses we can define reasonably sized sets that are found in most datasets. These consensus intersection datasets we presume reflect the true properties of exonic splice enhancers. Given prior evidence for the properties of enhancers and splice-associated mutations, we ask for all datasets whether the exonic splice enhancers considered are purine enriched; enriched near exon boundaries; able to predict trends in relative codon usage; slow evolving at synonymous sites; rare in SNPs; associated with weak splice sites; and enriched near longer introns. While the intersect datasets match expectations, only one original dataset, RESCUE-ESE, does. Unexpectedly, a fully experimental dataset identifies motifs that commonly behave opposite to the consensus, for example, being enriched in exon cores where splice-associated mutations are rare. Conclusions: Prior analyses that used the RESCUE-ESE set of hexamers captured the properties of consensus exonic splice enhancers. We estimate that at least 4% of synonymous mutations are deleterious owing to an effect on enhancer functioning. - Background The identification of splice sites in long metazoan transcripts requires not just splice site sequences. Indeed, it is estimated that only around 50% of the information specifying splice sites is at the splice site itself [1]. In addition there are short stretches within the immature RNA that function as either enhancers or suppressors of splicing. These can be either within the exons or the introns. Here we concentrate on the exonic splice enhancers (ESEs). ESE motifs function in part by binding SR proteins to aid exonic splice site recognition [2,3]. In addition they may function to help retain unspliced pre-mRNAs in the nucleus [4]. We concentrate on ESEs because, unlike exonic splice suppressors [5], ESEs are claimed to have a profound influence on protein and gene evolution [3,5-7]. ESEs are thought to be enriched near splice sites [8-10], potentially explaining why exon ends are slower evolving at both synonymous [6] and non-synonymous sites [5] and why SNP density is lower [10,11]. Closer analysis indeed supports the view that this slower evolution is in large part owing to the impact of purifying selection on ESEs in proximity to exon ends [5-7,10,11]. Consistent with this, SNPs responsible for altered splicing are enriched at exonic ends [10]. ESEs appear to be under particularly strong selective constraint up to 50 to 100 bp from exon ends [11]. As the average human exon is approximately 130 bp, this means that for many human exons all of the exon is exon-end in the sense that it is a domain in which ESEs can be functional. Given the possibility that ESEs are at a high density (approximately 30% to 40%) at exon ends and that they are evolutionarily conserved, their impact on amino acid and codon usage is of considerable interest to molecular evolutionists. However, most of the above work comes with a potentially serious caveat, in that nearly all prior work on the evolutionary impact of ESEs has employed the same set of putative ESEs (for an exception see [10]), that deduced by Fairbrother et al. [9,12]. Given that there now exist three other systematic attempts to define sets of putative ESEs, it is relevant to ask whether these sets agree on the properties of ESEs and to what extent they concur as to which hexamers function as ESEs. Considering the sets of hexamers agreed on, at least in part, by the various methods also provides an opportunity to characterise the properties and impact of ESEs. Such issues are not only of interest to the molecular evolutionary community. If, for example, ESEs are under purifying selection then mutations disrupting ESEs are possible causes of splice-associated diseases [3]. Understanding the commonality of purifying selection on ESEs is then of relevance to medical genetics and diagnostics. There have been several different approaches to describe sequences that function as ESEs. Early in-depth approaches identified ESEs by looking for splice-altered disease alleles [13], by mutagenesis of mini-genes and by systematic evolution of ligands by exponential enrichment (SELEX) [14-16]. Given the binding motifs of a series of SR proteins, applications such as ESEfinder attempt to identify possible ESEs in any given sample of sequence [17]. We do not consider these analyses but rather concentrate on the four systematic attempts to define ESEs. The majority of systematic attempts to define ESEs employ computational approaches, confirmed with experimental support. Typically these approaches start with a presumption about the distribution of ESEs and look for the sequences most enriched in these trends. For example, Fairbrother et al. presumed that ESEs will be enriched in constitutive exons compared with introns and more abundant in exons with weak splice sites than in those with strong splice sites [9,12]. Looking for 6-mers enriched on both of these axes led to a candidate set of ESEs. A similar approach, but one avoiding potential confounding with amino acid coding, was taken by Zhang and Chasin [18]. This group identified motifs enriched in internal non-coding exons of protein coding genes compared to unspliced pseudo-exons and 5 untranslated regions. Goren et al. [19] took an alternative approach and, supposing that functional ESEs should be slow evolving, looked for motifs that were more conserved than expected at synonymous sites. Combining this with evidence for enrichment compared with background codon usage rates led to the identification of a set of exonic splice regulatory motifs, the majority of which proved on experimental confirmation to be ESEs. While a minority were exonic splice inhibitors, the precise numbers are uncertain not least because ESEs can also function as exonic splice inhibitors depending on their position and context within the exon [20]. While these three predominantly computational approaches have provided an extensive compendium of ESE sequences it is possible that they are not exhaustive. Given, too, the possibility of conditional and position dependent effects, recently Ke et al. [20] adopted an experimental high (...truncated)