The evolution, impact and properties of exonic splice enhancers
Cceres and Hurst Genome Biology
The evolution, impact and properties of exonic splice enhancers
Eva Fernndez Cceres 0
Laurence D Hurst 0
0 Department of Biology and Biochemistry, University of Bath , Bath BA2 7AY , UK
Background: In humans, much of the information specifying splice sites is not at the splice site. Exonic splice enhancers are one of the principle non-splice site motifs. Four high-throughput studies have provided a compendium of motifs that function as exonic splice enhancers, but only one, RESCUE-ESE, has been generally employed to examine the properties of enhancers. Here we consider these four datasets to ask whether there is any consensus on the properties and impacts of exonic splice enhancers. Results: While only about 1% of all the identified hexamer motifs are common to all analyses we can define reasonably sized sets that are found in most datasets. These consensus intersection datasets we presume reflect the true properties of exonic splice enhancers. Given prior evidence for the properties of enhancers and splice-associated mutations, we ask for all datasets whether the exonic splice enhancers considered are purine enriched; enriched near exon boundaries; able to predict trends in relative codon usage; slow evolving at synonymous sites; rare in SNPs; associated with weak splice sites; and enriched near longer introns. While the intersect datasets match expectations, only one original dataset, RESCUE-ESE, does. Unexpectedly, a fully experimental dataset identifies motifs that commonly behave opposite to the consensus, for example, being enriched in exon cores where splice-associated mutations are rare. Conclusions: Prior analyses that used the RESCUE-ESE set of hexamers captured the properties of consensus exonic splice enhancers. We estimate that at least 4% of synonymous mutations are deleterious owing to an effect on enhancer functioning.
-
Background
The identification of splice sites in long metazoan
transcripts requires not just splice site sequences. Indeed, it
is estimated that only around 50% of the information
specifying splice sites is at the splice site itself [1]. In
addition there are short stretches within the immature
RNA that function as either enhancers or suppressors
of splicing. These can be either within the exons or
the introns. Here we concentrate on the exonic splice
enhancers (ESEs). ESE motifs function in part by binding
SR proteins to aid exonic splice site recognition [2,3].
In addition they may function to help retain unspliced
pre-mRNAs in the nucleus [4].
We concentrate on ESEs because, unlike exonic splice
suppressors [5], ESEs are claimed to have a profound
influence on protein and gene evolution [3,5-7]. ESEs are
thought to be enriched near splice sites [8-10], potentially
explaining why exon ends are slower evolving at both
synonymous [6] and non-synonymous sites [5] and why
SNP density is lower [10,11]. Closer analysis indeed
supports the view that this slower evolution is in large part
owing to the impact of purifying selection on ESEs in
proximity to exon ends [5-7,10,11]. Consistent with this,
SNPs responsible for altered splicing are enriched at exonic
ends [10]. ESEs appear to be under particularly strong
selective constraint up to 50 to 100 bp from exon ends
[11]. As the average human exon is approximately 130 bp,
this means that for many human exons all of the exon is
exon-end in the sense that it is a domain in which ESEs
can be functional. Given the possibility that ESEs are at a
high density (approximately 30% to 40%) at exon ends and
that they are evolutionarily conserved, their impact on
amino acid and codon usage is of considerable interest to
molecular evolutionists.
However, most of the above work comes with a
potentially serious caveat, in that nearly all prior work on the
evolutionary impact of ESEs has employed the same set of
putative ESEs (for an exception see [10]), that deduced by
Fairbrother et al. [9,12]. Given that there now exist three
other systematic attempts to define sets of putative ESEs, it
is relevant to ask whether these sets agree on the properties
of ESEs and to what extent they concur as to which
hexamers function as ESEs. Considering the sets of hexamers
agreed on, at least in part, by the various methods also
provides an opportunity to characterise the properties and
impact of ESEs. Such issues are not only of interest to the
molecular evolutionary community. If, for example, ESEs
are under purifying selection then mutations disrupting
ESEs are possible causes of splice-associated diseases [3].
Understanding the commonality of purifying selection on
ESEs is then of relevance to medical genetics and
diagnostics.
There have been several different approaches to describe
sequences that function as ESEs. Early in-depth approaches
identified ESEs by looking for splice-altered disease alleles
[13], by mutagenesis of mini-genes and by systematic
evolution of ligands by exponential enrichment (SELEX)
[14-16]. Given the binding motifs of a series of SR proteins,
applications such as ESEfinder attempt to identify possible
ESEs in any given sample of sequence [17]. We do not
consider these analyses but rather concentrate on the four
systematic attempts to define ESEs.
The majority of systematic attempts to define ESEs
employ computational approaches, confirmed with
experimental support. Typically these approaches start with a
presumption about the distribution of ESEs and look for
the sequences most enriched in these trends. For example,
Fairbrother et al. presumed that ESEs will be enriched in
constitutive exons compared with introns and more
abundant in exons with weak splice sites than in those
with strong splice sites [9,12]. Looking for 6-mers enriched
on both of these axes led to a candidate set of ESEs. A
similar approach, but one avoiding potential confounding
with amino acid coding, was taken by Zhang and Chasin
[18]. This group identified motifs enriched in internal
non-coding exons of protein coding genes compared to
unspliced pseudo-exons and 5 untranslated regions. Goren
et al. [19] took an alternative approach and, supposing that
functional ESEs should be slow evolving, looked for motifs
that were more conserved than expected at synonymous
sites. Combining this with evidence for enrichment
compared with background codon usage rates led to the
identification of a set of exonic splice regulatory motifs, the
majority of which proved on experimental confirmation to
be ESEs. While a minority were exonic splice inhibitors,
the precise numbers are uncertain not least because ESEs
can also function as exonic splice inhibitors depending on
their position and context within the exon [20].
While these three predominantly computational
approaches have provided an extensive compendium of ESE
sequences it is possible that they are not exhaustive. Given,
too, the possibility of conditional and position dependent
effects, recently Ke et al. [20] adopted an experimental
high (...truncated)