Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences

PLoS Biology, Jun 2011

The authors propose that short repeat sequences may play an important role in causing the pervasive clustering of mutations across diverse genomes from prokaryotes to humans.

Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences

Leu J-Y (2011) Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences. PLoS Biol 9(6): e1000622. doi:10.1371/journal.pbio.1000622 Clusters of Nucleotide Substitutions and Insertion/ Deletion Mutations Are Associated with Repeat Sequences Michael J. McDonald 0 Wei-Chi Wang 0 Hsien-Da Huang 0 Jun-Yi Leu 0 Kenneth H. Wolfe, Trinity College Dublin, Ireland 0 1 Institute of Molecular Biology, Academia Sinica , Taipei, Taiwan , 2 Institute of Bioinformatics and Systems Biology, National Chiao Tung University , Hsinchu, Taiwan , 3 Department of Biological Science and Technology, National Chiao Tung University , Hsinchu , Taiwan The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution. - Funding: MJM was supported by Academia Sinica Postdoctoral Fellowship and JYL was funded by Academia Sinica of Taiwan, Taiwan National Science Council (grant NSC99-2321-B-001-031) and Human Frontier Science Program (grant RGY53/2007); WCW and HDH were funded by the National Science Council (Taiwan) (grants NSC-99-2911-I-009-101, NSC 98-2311-B-009-004, and NSC 99-2627-B-009-003). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Abbreviations: D, nucleotide diversity; Db, background divergence; Indel, insertion/deletion mutation . These authors contributed equally to this work. A major challenge of evolutionary genetics is to determine the mechanisms underlying cryptic patterns of mutation rate variation and how they influence evolutionary outcomes [1]. One of the most striking of these trends is the association between indel mutations and nucleotide substitutions [27]. Inter-species genome comparisons have revealed this trend to be universal to all prokaryotic and eukaryotic genomes examined thus far [46]. The prevailing explanation for this association is that indels, as universal mutators [4], cause the accumulation of nucleotide substitutions in the hundreds of base pairs of sequence surrounding the indel [4,6]. Although such studies have been unable to unequivocally determine if the clusters are due to a single multimutational event (multiple mutation hypothesis), the indel per se (the mutagenic indel hypothesis), or the region of sequence in which the indel is found (the regional differences hypothesis), the mutagenic indel hypothesis has been adopted by workers in the field [812]. The mechanism of indel mutagenicity proposed by Tian and co-workers is that indels, when heterozygous, cause paired chromosomes to form heteroduplex DNA during meiosis [4]. This is posited to cause error-prone DNA repair systems to target indel-containing regions, leading to an increased likelihood of nucleotide substitution in the sequence surrounding the indel. Over time, this increase in mutation rate is predicted to leave as its signature the clustering of nucleotide substitutions in the DNA surrounding indels, while corresponding non-indel-containing orthologous sequences should have a lower number of substitutions, in accordance with the background substitution rate. In addition, because the proposed mutagenic effect of the indel is postulated to be dependent on its heterozygosity, the accumulation of substitutions should cease as soon as the indel becomes homozygous in the population. These predictions contrast with the regional differences hypothesis; regional effects are predicted to cause both indel and non-indel haplotypes to accumulate substitutions whether the indel is heterozygous or not. The multiple mutations hypothesis differs from both the regional and An intriguing observation made during the comparison of genomes is that insertion and deletion mutations (indels) cluster together with nucleotide substitutions. Two (not mutually exclusive) hypotheses have been proposed to explain this phenomenon. The first postulates that an indel mutation causes an increase in the likelihood of the surrounding sequence incurring nucleotide substitutions, while the second claims that the region of DNA in which such a cluster is located is more likely to sustain both indels and substitutions. Here, we present evidence suggesting that the region of DNA, and not the indel, is associated with the accumulation of clusters of mutations over evolutionary time scales. We find that repeat sequences are closely associated with a large proportion of indels and that the abundance of repeat sequences is linked with regions of increased nucleotide diversity. By analysing molecular data and measuring the mutation rates of genes engineered to contain repeats, we find that the mutation rate can be manipulated by the insertion of long repeat sequences. On the basis of these results, we propose a model in which repeat sequences are prone to cause stalling of the high-fidelity DNA polymerase, leading to the recruitment of error-prone repair polymerases which then replicate the surrounding sequence with a higher-than-average error rate. indel hypotheses in that clusters of mutations are due to a one-off mutation event. Determining whether mutations have accumulated over time or are due to a single mutation event is difficult without the ability to examine indel divergence on a temporal scale. Here we use a population genomics approach to (...truncated)


This is a preview of a remote PDF: http://www.plosbiology.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pbio.1000622&representation=PDF
Article home page: http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.1000622

Michael J. McDonald, Wei-Chi Wang, Hsien-Da Huang, Jun-Yi Leu. Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences, PLoS Biology, 2011, Volume 9, Issue 6, DOI: 10.1371/journal.pbio.1000622