Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat Sequences
Leu J-Y (2011) Clusters of Nucleotide Substitutions and Insertion/Deletion Mutations Are Associated with Repeat
Sequences. PLoS Biol 9(6): e1000622. doi:10.1371/journal.pbio.1000622
Clusters of Nucleotide Substitutions and Insertion/ Deletion Mutations Are Associated with Repeat Sequences
Michael J. McDonald 0
Wei-Chi Wang 0
Hsien-Da Huang 0
Jun-Yi Leu 0
Kenneth H. Wolfe, Trinity College Dublin, Ireland
0 1 Institute of Molecular Biology, Academia Sinica , Taipei, Taiwan , 2 Institute of Bioinformatics and Systems Biology, National Chiao Tung University , Hsinchu, Taiwan , 3 Department of Biological Science and Technology, National Chiao Tung University , Hsinchu , Taiwan
The genome-sequencing gold rush has facilitated the use of comparative genomics to uncover patterns of genome evolution, although their causal mechanisms remain elusive. One such trend, ubiquitous to prokarya and eukarya, is the association of insertion/deletion mutations (indels) with increases in the nucleotide substitution rate extending over hundreds of base pairs. The prevailing hypothesis is that indels are themselves mutagenic agents. Here, we employ population genomics data from Escherichia coli, Saccharomyces paradoxus, and Drosophila to provide evidence suggesting that it is not the indels per se but the sequence in which indels occur that causes the accumulation of nucleotide substitutions. We found that about two-thirds of indels are closely associated with repeat sequences and that repeat sequence abundance could be used to identify regions of elevated sequence diversity, independently of indels. Moreover, the mutational signature of indel-proximal nucleotide substitutions matches that of error-prone DNA polymerases. We propose that repeat sequences promote an increased probability of replication fork arrest, causing the persistent recruitment of error-prone DNA polymerases to specific sequence regions over evolutionary time scales. Experimental measures of the mutation rates of engineered DNA sequences and analyses of experimentally obtained collections of spontaneous mutations provide molecular evidence supporting our hypothesis. This study uncovers a new role for repeat sequences in genome evolution and provides an explanation of how fine-scale sequence contextual effects influence mutation rates and thereby evolution.
-
Funding: MJM was supported by Academia Sinica Postdoctoral Fellowship and JYL was funded by Academia Sinica of Taiwan, Taiwan National Science Council
(grant NSC99-2321-B-001-031) and Human Frontier Science Program (grant RGY53/2007); WCW and HDH were funded by the National Science Council (Taiwan)
(grants NSC-99-2911-I-009-101, NSC 98-2311-B-009-004, and NSC 99-2627-B-009-003). The funders had no role in study design, data collection and analysis,
decision to publish, or preparation of the manuscript.
Competing Interests: The authors have declared that no competing interests exist.
Abbreviations: D, nucleotide diversity; Db, background divergence; Indel, insertion/deletion mutation
. These authors contributed equally to this work.
A major challenge of evolutionary genetics is to determine the
mechanisms underlying cryptic patterns of mutation rate variation
and how they influence evolutionary outcomes [1]. One of the
most striking of these trends is the association between indel
mutations and nucleotide substitutions [27]. Inter-species
genome comparisons have revealed this trend to be universal to all
prokaryotic and eukaryotic genomes examined thus far [46]. The
prevailing explanation for this association is that indels, as
universal mutators [4], cause the accumulation of nucleotide
substitutions in the hundreds of base pairs of sequence surrounding
the indel [4,6]. Although such studies have been unable to
unequivocally determine if the clusters are due to a single
multimutational event (multiple mutation hypothesis), the indel
per se (the mutagenic indel hypothesis), or the region of sequence
in which the indel is found (the regional differences hypothesis),
the mutagenic indel hypothesis has been adopted by workers in the
field [812].
The mechanism of indel mutagenicity proposed by Tian and
co-workers is that indels, when heterozygous, cause paired
chromosomes to form heteroduplex DNA during meiosis [4].
This is posited to cause error-prone DNA repair systems to target
indel-containing regions, leading to an increased likelihood of
nucleotide substitution in the sequence surrounding the indel.
Over time, this increase in mutation rate is predicted to leave as its
signature the clustering of nucleotide substitutions in the DNA
surrounding indels, while corresponding non-indel-containing
orthologous sequences should have a lower number of
substitutions, in accordance with the background substitution rate. In
addition, because the proposed mutagenic effect of the indel is
postulated to be dependent on its heterozygosity, the accumulation
of substitutions should cease as soon as the indel becomes
homozygous in the population. These predictions contrast with
the regional differences hypothesis; regional effects are predicted to
cause both indel and non-indel haplotypes to accumulate
substitutions whether the indel is heterozygous or not. The
multiple mutations hypothesis differs from both the regional and
An intriguing observation made during the comparison of
genomes is that insertion and deletion mutations (indels)
cluster together with nucleotide substitutions. Two (not
mutually exclusive) hypotheses have been proposed to
explain this phenomenon. The first postulates that an indel
mutation causes an increase in the likelihood of the
surrounding sequence incurring nucleotide substitutions,
while the second claims that the region of DNA in which
such a cluster is located is more likely to sustain both
indels and substitutions. Here, we present evidence
suggesting that the region of DNA, and not the indel, is
associated with the accumulation of clusters of mutations
over evolutionary time scales. We find that repeat
sequences are closely associated with a large proportion
of indels and that the abundance of repeat sequences is
linked with regions of increased nucleotide diversity. By
analysing molecular data and measuring the mutation
rates of genes engineered to contain repeats, we find that
the mutation rate can be manipulated by the insertion of
long repeat sequences. On the basis of these results, we
propose a model in which repeat sequences are prone to
cause stalling of the high-fidelity DNA polymerase, leading
to the recruitment of error-prone repair polymerases
which then replicate the surrounding sequence with a
higher-than-average error rate.
indel hypotheses in that clusters of mutations are due to a one-off
mutation event. Determining whether mutations have
accumulated over time or are due to a single mutation event is difficult
without the ability to examine indel divergence on a temporal
scale.
Here we use a population genomics approach to (...truncated)