Estimating the prevalence of functional exonic splice regulatory information
Estimating the prevalence of functional exonic splice regulatory information
Rosina Savisaar 0
Laurence D. Hurst 0
0 The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath , Bath BA2 7AY , UK
In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1-4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the
-
* Rosina Savisaar
assumption that constraint only operates on exon ends, the
conservation-based methods can be overly conservative.
A potentially important insight from the past couple of
decades of work on mammalian genomes has been that
genetic information is not always stored serially, with
different kinds of elements arranged one after the other in
neatly separated compartments (e.g. promoter
compartments, which contain regulatory signals, followed by genic
compartments, which contain coding information, with no
overlaps between different open reading frames). Instead,
our genomes are fundamentally multi-layered: not only
can open reading frames overlap each-other (Lazar et al.
1989; Makalowska et al. 2005; Michel et al. 2012;
Miyajima et al. 1989; Sanna et al. 2008; Stallmeyer et al. 1999;
Veeramachaneni et al. 2004), they also routinely overlap
various kinds of regulatory elements (Itzkovitz et al. 2010;
Lin et al. 2011; Shabalina et al. 2013). An example of the
latter would be a microRNA binding site embedded inside
a coding sequence (CDS) (Fang and Rajewski 2011;
Forman et al. 2008; Hausser et al. 2013; Hurst 2006; Lewis
et al. 2005; Liu et al. 2015). Such overlaps imply that the
evolution of CDSs depends not only on selection pressures
related to protein structure and function but also on
selection on overlapping regulatory signals.
Exonic splice enhancers (ESEs) are the class of
regulatory signals whose impact on CDS evolution has been
most thoroughly demonstrated [although other kinds of
non-coding information have also been studied (e.g.
Agoglia and Fraser 2016; Birnbaum et al. 2014; Cakiroglu et al.
2016; Hurst 2006; Itzkovitz et al. 2010; Lin et al. 2011; Liu
et al. 2015; Shabalina et al. 2013; Stergachis et al. 2013;
Warnecke et al. 2008a; Xing and He 2015)]. ESEs are short
RNA motifs that promote the splicing of the exon in which
they are contained. They mostly represent binding sites
to RNA-binding proteins (RBPs) that contact the exonic
regions of the pre-mRNA (Fu and Ares 2014; Lee and Rio
2015). They have been repeatedly shown to be under
purifying selection using both divergence (C?ceres and Hurst
2013; Parmley et al. 2006; Sterne-Weiler et al. 2011) and
population genetic data (C?ceres and Hurst 2013; Carlini
and Genut 2006; Fairbrother et al. 2004; Majewski and Ott
2002), and their disruption can cause disease (e.g. Collin
et al. 2008; Lim et al. 2011; Moseley et al. 2002; Ramser
et al. 2005; Sterne-Weiler et al. 2011; see Wen et al. 2016
for a database of disease-associated synonymous
mutations in general). The effect of ESEs might even extend to
the level of protein structure: evidence suggests that
protein regions where the underlying RNA sequence contains
splice regulatory information have greater rates of
structural disorder (Macossay-Castillo et al. 2014; Pancsa and
Tompa 2016; Smithers et al. 2015). In addition, amino acid
usage in protein regions encoded for by exon ends, where
ESE density is highest, appears to be biased by the
underlying ESE presence (Parmley et al. 2007). However, despite
ample evidence that ESEs are functional and do indeed
play a role in CDS evolution, the scale of the phenomenon
remains uncertain. How prevalent is functional exonic
splice regulatory information (enhancing or inhibitory)?
Is the need to preserve (...truncated)