Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins

Jun 2009

Background Amino acid repeats (AARs) are common features of protein sequences. They often evolve rapidly and are involved in a number of human diseases. They also show significant associations with particular Gene Ontology (GO) functional categories, particularly transcription, suggesting they play some role in protein function. It has been suggested recently that AARs play a significant role in the evolution of intrinsically unstructured regions (IURs) of proteins. We investigate the relationship between AAR frequency and evolution and their localization within proteins based on a set of 5,815 orthologous proteins from four mammalian (human, chimpanzee, mouse and rat) and a bird (chicken) genome. We consider two classes of AAR (tandem repeats and cryptic repeats: regions of proteins containing overrepresentations of short amino acid repeats). Results Mammals show very similar repeat frequencies but chicken shows lower frequencies of many of the cryptic repeats common in mammals. Regions flanking tandem AARs evolve more rapidly than the rest of the protein containing the repeat and this phenomenon is more pronounced for non-conserved repeats than for conserved ones. GO associations are similar to those previously described for the mammals, but chicken cryptic repeats show fewer significant associations. Comparing the overlaps of AARs with IURs and protein domains showed that up to 96% of some AAR types are associated preferentially with IURs. However, no more than 15% of IURs contained an AAR. Conclusions Their location within IURs explains many of the evolutionary properties of AARs. Further study is needed on the types of IURs containing AARs.

Article PDF cannot be displayed. You can download it here:

http://genomebiology.com/content/pdf/gb-2009-10-6-r59.pdf

Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins

Open Access Simon and 2009 Volume 10,Hancock Issue 6, Article R59 Research Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins Michelle Simon and John M Hancock Address: Bioinformatics Group, MRC Harwell, Mammalian Genetics Unit, Harwell Science and Innovation Campus, Harwell, Oxfordshire, OX11 0RD, UK. Correspondence: John M Hancock. Email: Published: 1 June 2009 Genome Biology 2009, 10:R59 (doi:10.1186/gb-2009-10-6-r59) Received: 19 March 2009 Accepted: 1 June 2009 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2009/10/6/R59 © 2009 Simon and Hancock; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Amino <p>Analysis unstructured acid repeats of regions.</p> amino and acid disorder repeats in four mammalian and one bird genome shows that many are associated preferentially with intrinsically Abstract Background: Amino acid repeats (AARs) are common features of protein sequences. They often evolve rapidly and are involved in a number of human diseases. They also show significant associations with particular Gene Ontology (GO) functional categories, particularly transcription, suggesting they play some role in protein function. It has been suggested recently that AARs play a significant role in the evolution of intrinsically unstructured regions (IURs) of proteins. We investigate the relationship between AAR frequency and evolution and their localization within proteins based on a set of 5,815 orthologous proteins from four mammalian (human, chimpanzee, mouse and rat) and a bird (chicken) genome. We consider two classes of AAR (tandem repeats and cryptic repeats: regions of proteins containing overrepresentations of short amino acid repeats). Results: Mammals show very similar repeat frequencies but chicken shows lower frequencies of many of the cryptic repeats common in mammals. Regions flanking tandem AARs evolve more rapidly than the rest of the protein containing the repeat and this phenomenon is more pronounced for non-conserved repeats than for conserved ones. GO associations are similar to those previously described for the mammals, but chicken cryptic repeats show fewer significant associations. Comparing the overlaps of AARs with IURs and protein domains showed that up to 96% of some AAR types are associated preferentially with IURs. However, no more than 15% of IURs contained an AAR. Conclusions: Their location within IURs explains many of the evolutionary properties of AARs. Further study is needed on the types of IURs containing AARs. Background Amino acid repeats (AARs) are segments of proteins made up of simple patterns of amino acids, often strings of a single amino acid. They have long been recognized to be common features of eukaryotic proteins [1-4]. Polyglutamine repeats, the most intensively studied class because of their association with human diseases such as Huntington's [5], tend to be evolutionarily labile, especially when encoded by pure repeats of the codon CAG [6,7]. Because of this lability, AARs have often been considered to be evolutionarily neutral structures [8]. Genome Biology 2009, 10:R59 http://genomebiology.com/2009/10/6/R59 Genome Biology 2009, However, a number of experimental studies [9-12] suggest that AARs play an important role in protein function. Studies of the functions of AAR-containing proteins also suggest that they are preferentially found within certain classes of proteins. From the earliest reports through to the most recent genome-wide surveys in Saccharomyces cerevisiae [3,13,14] and mammals [15] a consistent pattern of association with transcription has emerged for the most common tandem repeat types. Additional associations, notably with protein kinases [13], suggest possible involvement in cellular signaling networks, which in turn suggest that repeats could play a significant role in the evolution of such networks [16]. Finally, studies of the relationship between morphology and repeat length in dog breeds [17] have shown that variation at repeat loci can have evolutionarily significant effects on phenotype. Polyalanine repeats have also been found to be involved in a number of genetic diseases, in this case involving developmental defects [18]. Removing a polyalanine tract from murine Hoxd-13 has a direct effect on bone phenotype [19], again indicating involvement of an AAR in an important biological process. AAR size difference between orthologous human and mouse proteins correlates with protein nonsynonymous substitution rate [20]. A study of the factors contributing to the evolutionary expansion of polyglutamine repeats in a limited number of human-mouse orthologues [21] concluded that labile repeats, which are encoded by homogeneous runs of a single codon [6], have a strong tendency to arise in regions of proteins subject to weaker purifying selection than the protein as a whole, while repeats that are more conserved did not show this tendency. This has been supported recently by a largescale study of human, mouse and rat repeats [22]. These observations suggest a model for repeat evolution whereby initially labile repeats become fixed when they reach some optimal length range [21]. Human polyglutamine disease genes might then be still evolving towards such an optimum. Intrinsically unstructured regions (IURs), also called disordered regions, are regions of protein, ranging in size from short loops to complete proteins, that do not form a compact tertiary structure under normal solvation conditions [23]. They have been suggested to be involved in protein-ligand binding, including protein-protein interactions, forming compact structures only when bound to a cognate ligand [24]. Tompa [25] pointed out that many IURs contain AARs and suggested that IURs may evolve to a considerable extent by the expansion of such repeats. Disordered proteins - that is, proteins primarily made up of IURs - have also been suggested to have lower sequence complexity than ordered proteins [26]. Tompa's suggestion [25] would be consistent with the relatively rapid sequence evolution of many IURs [27,28], the observation that highly connected (hub) proteins in protein interaction networks appear to be enriched in AARs and in proteins containing IURs [29], and the suggestion that evolution of AARs could have an effect on network evolution by Volume 10, Issue 6, Article R59 Simon and Hancock R59.2 altering protein-protein affinities [16]. As Tompa [25] analyzed only a relatively small set of IURs, his hypothesis raises the question whether AARs show a preferential location in IURs, and whether any such preference could account for the evolutionary properties of the bulk of A (...truncated)


This is a preview of a remote PDF: http://genomebiology.com/content/pdf/gb-2009-10-6-r59.pdf
Article home page: http://genomebiology.com/2009/10/6/R59

Michelle Simon, John M Hancock. Tandem and cryptic amino acid repeats accumulate in disordered regions of proteins, 2009, pp. R59, 10, DOI: 10.1186/gb-2009-10-6-r59