Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins
Open Access
Simon and
2009
Volume
10,Hancock
Issue 6, Article R59
Research
Tandem and cryptic amino acid repeats accumulate in disordered
regions of proteins
Michelle Simon and John M Hancock
Address: Bioinformatics Group, MRC Harwell, Mammalian Genetics Unit, Harwell Science and Innovation Campus, Harwell, Oxfordshire,
OX11 0RD, UK.
Correspondence: John M Hancock. Email:
Published: 1 June 2009
Genome Biology 2009, 10:R59 (doi:10.1186/gb-2009-10-6-r59)
Received: 19 March 2009
Accepted: 1 June 2009
The electronic version of this article is the complete one and can be
found online at http://genomebiology.com/2009/10/6/R59
© 2009 Simon and Hancock; licensee BioMed Central Ltd.
This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which
permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Amino
<p>Analysis
unstructured
acid repeats
of
regions.</p>
amino
and
acid
disorder
repeats in four mammalian and one bird genome shows that many are associated preferentially with intrinsically
Abstract
Background: Amino acid repeats (AARs) are common features of protein sequences. They often
evolve rapidly and are involved in a number of human diseases. They also show significant
associations with particular Gene Ontology (GO) functional categories, particularly transcription,
suggesting they play some role in protein function. It has been suggested recently that AARs play a
significant role in the evolution of intrinsically unstructured regions (IURs) of proteins. We
investigate the relationship between AAR frequency and evolution and their localization within
proteins based on a set of 5,815 orthologous proteins from four mammalian (human, chimpanzee,
mouse and rat) and a bird (chicken) genome. We consider two classes of AAR (tandem repeats
and cryptic repeats: regions of proteins containing overrepresentations of short amino acid
repeats).
Results: Mammals show very similar repeat frequencies but chicken shows lower frequencies of
many of the cryptic repeats common in mammals. Regions flanking tandem AARs evolve more
rapidly than the rest of the protein containing the repeat and this phenomenon is more pronounced
for non-conserved repeats than for conserved ones. GO associations are similar to those
previously described for the mammals, but chicken cryptic repeats show fewer significant
associations. Comparing the overlaps of AARs with IURs and protein domains showed that up to
96% of some AAR types are associated preferentially with IURs. However, no more than 15% of
IURs contained an AAR.
Conclusions: Their location within IURs explains many of the evolutionary properties of AARs.
Further study is needed on the types of IURs containing AARs.
Background
Amino acid repeats (AARs) are segments of proteins made up
of simple patterns of amino acids, often strings of a single
amino acid. They have long been recognized to be common
features of eukaryotic proteins [1-4]. Polyglutamine repeats,
the most intensively studied class because of their association
with human diseases such as Huntington's [5], tend to be evolutionarily labile, especially when encoded by pure repeats of
the codon CAG [6,7]. Because of this lability, AARs have often
been considered to be evolutionarily neutral structures [8].
Genome Biology 2009, 10:R59
http://genomebiology.com/2009/10/6/R59
Genome Biology 2009,
However, a number of experimental studies [9-12] suggest
that AARs play an important role in protein function. Studies
of the functions of AAR-containing proteins also suggest that
they are preferentially found within certain classes of proteins. From the earliest reports through to the most recent
genome-wide surveys in Saccharomyces cerevisiae [3,13,14]
and mammals [15] a consistent pattern of association with
transcription has emerged for the most common tandem
repeat types. Additional associations, notably with protein
kinases [13], suggest possible involvement in cellular signaling networks, which in turn suggest that repeats could play a
significant role in the evolution of such networks [16]. Finally,
studies of the relationship between morphology and repeat
length in dog breeds [17] have shown that variation at repeat
loci can have evolutionarily significant effects on phenotype.
Polyalanine repeats have also been found to be involved in a
number of genetic diseases, in this case involving developmental defects [18]. Removing a polyalanine tract from
murine Hoxd-13 has a direct effect on bone phenotype [19],
again indicating involvement of an AAR in an important biological process.
AAR size difference between orthologous human and mouse
proteins correlates with protein nonsynonymous substitution
rate [20]. A study of the factors contributing to the evolutionary expansion of polyglutamine repeats in a limited number
of human-mouse orthologues [21] concluded that labile
repeats, which are encoded by homogeneous runs of a single
codon [6], have a strong tendency to arise in regions of proteins subject to weaker purifying selection than the protein as
a whole, while repeats that are more conserved did not show
this tendency. This has been supported recently by a largescale study of human, mouse and rat repeats [22]. These
observations suggest a model for repeat evolution whereby
initially labile repeats become fixed when they reach some
optimal length range [21]. Human polyglutamine disease
genes might then be still evolving towards such an optimum.
Intrinsically unstructured regions (IURs), also called disordered regions, are regions of protein, ranging in size from
short loops to complete proteins, that do not form a compact
tertiary structure under normal solvation conditions [23].
They have been suggested to be involved in protein-ligand
binding, including protein-protein interactions, forming
compact structures only when bound to a cognate ligand [24].
Tompa [25] pointed out that many IURs contain AARs and
suggested that IURs may evolve to a considerable extent by
the expansion of such repeats. Disordered proteins - that is,
proteins primarily made up of IURs - have also been suggested to have lower sequence complexity than ordered proteins [26]. Tompa's suggestion [25] would be consistent with
the relatively rapid sequence evolution of many IURs [27,28],
the observation that highly connected (hub) proteins in protein interaction networks appear to be enriched in AARs and
in proteins containing IURs [29], and the suggestion that evolution of AARs could have an effect on network evolution by
Volume 10, Issue 6, Article R59
Simon and Hancock R59.2
altering protein-protein affinities [16]. As Tompa [25] analyzed only a relatively small set of IURs, his hypothesis raises
the question whether AARs show a preferential location in
IURs, and whether any such preference could account for the
evolutionary properties of the bulk of A (...truncated)