Estimating the prevalence of functional exonic splice regulatory information

Human Genetics, Apr 2017

In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1–4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://link.springer.com/content/pdf/10.1007%2Fs00439-017-1798-3.pdf

Estimating the prevalence of functional exonic splice regulatory information

Estimating the prevalence of functional exonic splice regulatory information Rosina Savisaar 0 Laurence D. Hurst 0 0 The Milner Centre for Evolution, Department of Biology and Biochemistry, University of Bath , Bath BA2 7AY , UK In addition to coding information, human exons contain sequences necessary for correct splicing. These elements are known to be under purifying selection and their disruption can cause disease. However, the density of functional exonic splicing information remains profoundly uncertain. Several groups have experimentally investigated how mutations at different exonic positions affect splicing. They have found splice information to be distributed widely in exons, with one estimate putting the proportion of splicing-relevant nucleotides at >90%. These results suggest that splicing could place a major pressure on exon evolution. However, analyses of sequence conservation have concluded that the need to preserve splice regulatory signals only slightly constrains exon evolution, with a resulting decrease in the average human rate of synonymous evolution of only 1-4%. Why do these two lines of research come to such different conclusions? Among other reasons, we suggest that the methods are measuring different things: one assays the density of sites that affect splicing, the other the density of sites whose effects on splicing are visible to selection. In addition, the experimental methods typically consider short exons, thereby enriching for nucleotides close to the splice junction, such sites being enriched for splice-control elements. By contrast, in part owing to correction for nucleotide composition biases and to the - * Rosina Savisaar assumption that constraint only operates on exon ends, the conservation-based methods can be overly conservative. A potentially important insight from the past couple of decades of work on mammalian genomes has been that genetic information is not always stored serially, with different kinds of elements arranged one after the other in neatly separated compartments (e.g. promoter compartments, which contain regulatory signals, followed by genic compartments, which contain coding information, with no overlaps between different open reading frames). Instead, our genomes are fundamentally multi-layered: not only can open reading frames overlap each-other (Lazar et al. 1989; Makalowska et al. 2005; Michel et al. 2012; Miyajima et al. 1989; Sanna et al. 2008; Stallmeyer et al. 1999; Veeramachaneni et al. 2004), they also routinely overlap various kinds of regulatory elements (Itzkovitz et al. 2010; Lin et al. 2011; Shabalina et al. 2013). An example of the latter would be a microRNA binding site embedded inside a coding sequence (CDS) (Fang and Rajewski 2011; Forman et al. 2008; Hausser et al. 2013; Hurst 2006; Lewis et al. 2005; Liu et al. 2015). Such overlaps imply that the evolution of CDSs depends not only on selection pressures related to protein structure and function but also on selection on overlapping regulatory signals. Exonic splice enhancers (ESEs) are the class of regulatory signals whose impact on CDS evolution has been most thoroughly demonstrated [although other kinds of non-coding information have also been studied (e.g. Agoglia and Fraser 2016; Birnbaum et al. 2014; Cakiroglu et al. 2016; Hurst 2006; Itzkovitz et al. 2010; Lin et al. 2011; Liu et al. 2015; Shabalina et al. 2013; Stergachis et al. 2013; Warnecke et al. 2008a; Xing and He 2015)]. ESEs are short RNA motifs that promote the splicing of the exon in which they are contained. They mostly represent binding sites to RNA-binding proteins (RBPs) that contact the exonic regions of the pre-mRNA (Fu and Ares 2014; Lee and Rio 2015). They have been repeatedly shown to be under purifying selection using both divergence (C?ceres and Hurst 2013; Parmley et al. 2006; Sterne-Weiler et al. 2011) and population genetic data (C?ceres and Hurst 2013; Carlini and Genut 2006; Fairbrother et al. 2004; Majewski and Ott 2002), and their disruption can cause disease (e.g. Collin et al. 2008; Lim et al. 2011; Moseley et al. 2002; Ramser et al. 2005; Sterne-Weiler et al. 2011; see Wen et al. 2016 for a database of disease-associated synonymous mutations in general). The effect of ESEs might even extend to the level of protein structure: evidence suggests that protein regions where the underlying RNA sequence contains splice regulatory information have greater rates of structural disorder (Macossay-Castillo et al. 2014; Pancsa and Tompa 2016; Smithers et al. 2015). In addition, amino acid usage in protein regions encoded for by exon ends, where ESE density is highest, appears to be biased by the underlying ESE presence (Parmley et al. 2007). However, despite ample evidence that ESEs are functional and do indeed play a role in CDS evolution, the scale of the phenomenon remains uncertain. How prevalent is functional exonic splice regulatory information (enhancing or inhibitory)? Is the need to preserve (...truncated)


This is a preview of a remote PDF: https://link.springer.com/content/pdf/10.1007%2Fs00439-017-1798-3.pdf

Rosina Savisaar, Laurence D. Hurst. Estimating the prevalence of functional exonic splice regulatory information, Human Genetics, 2017, pp. 1-20, DOI: 10.1007/s00439-017-1798-3