Impact of Indels on the Flanking Regions in Structural Domains

Molecular Biology and Evolution, Jan 2011

Amino acid substitution and insertions/deletions (indels) are two common events in protein evolution; however, current knowledge on indels is limited. In this study, we investigated the effects of indels on the flanking regions in protein structure superfamilies. Comprehensive analysis of structural classification of proteins superfamilies revealed that indels lead to a series of changes in the flanking regions, including the following: 1) structural shift in the tertiary structure, with a first-order exponential decay relation between structural shift and the distance to indels, 2) instability of the secondary structure elements in which parts of the α helix and β sheet are destroyed, and 3) an increase in the amino acid substitution rate of the primary structure and the nonsimilar amino acid substitution rate. In general, these quality changes are due to the combined effects of the “regional-inherent effect,” “indel-accompanied effect,” and “indel-following effect.” Furthermore, these quality changes reflect changes in selective pressure. Indels are more likely to be preserved in regions with low selective pressure, and indels can further reduce the selective pressure on the flanking regions. These findings improve our understanding of the role of indels in protein evolution.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

https://mbe.oxfordjournals.org/content/28/1/291.full.pdf

Impact of Indels on the Flanking Regions in Structural Domains

Zheng Zhang 0 Jie Huang 0 Zengfang Wang 0 Lushan Wang 0 Peiji Gao 0 0 State Key Laboratory of Microbial Technology, Shandong University , Jinan, China Amino acid substitution and insertions/deletions (indels) are two common events in protein evolution; however, current knowledge on indels is limited. In this study, we investigated the effects of indels on the flanking regions in protein structure superfamilies. Comprehensive analysis of structural classification of proteins superfamilies revealed that indels lead to a series of changes in the flanking regions, including the following: 1) structural shift in the tertiary structure, with a first-order exponential decay relation between structural shift and the distance to indels, 2) instability of the secondary structure elements in which parts of the a helix and b sheet are destroyed, and 3) an increase in the amino acid substitution rate of the primary structure and the nonsimilar amino acid substitution rate. In general, these quality changes are due to the combined effects of the ''regional-inherent effect,'' ''indel-accompanied effect,'' and ''indel-following effect.'' Furthermore, these quality changes reflect changes in selective pressure. Indels are more likely to be preserved in regions with low selective pressure, and indels can further reduce the selective pressure on the flanking regions. These findings improve our understanding of the role of indels in protein evolution. The Author 2010. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: - Insertions/deletions (indels) and amino acid substitution are the most common events in protein evolution; the structural evolution of protein is considered as a combined effects of amino acid substitutions and indels (Grishin 2001). Many studies have provided substantial insight into amino acid substitutions over the past years (Yang et al. 1994; Whelan et al. 2001). However, indels remain less well understood, and many biological questions remain unanswered. Early studies on indels showed that the frequency of indels is one order of magnitude lower than that of amino acid substitutions. Indels are more likely to occur on the surface of the protein and are often reverse turns or coils within loops (Pascarella and Argos 1992; Benner et al. 1993). Subsequently, several statistical models were established for indels (Thorne et al. 1991; Mitchison 1999; McGuire et al. 2001; Chang and Benner 2004). Recent studies have shown that ancient domain families favor insertions, the length of which has gradually increased during the process of evolution (Wolf et al. 2007). In contrast to deletions, a succession of insertions and a rapid evolution seem to be realistic elements of a process that could lead to the emergence of novel protein architectures (Aravind et al. 2002; Blouin et al. 2004). The functional divergence of homologous proteins is also considered to arise from indels in outlying regions of the protein structure (Reeves et al. 2006; Chan et al. 2007; Jiang and Blouin 2007; Chen et al. 2009). Recently, it was found that the occurrence of indels in genomes can increase the substitution rate of their flanking regions (Tian et al. 2008). Moreover, heterozygosity for an indel was found to be mutagenic to flanking sequences (Dawkins et al. 1999; Longman-Jacobsen et al. 2003). These discoveries suggest that the occurrence of indels and their impact on the flanking regions may play an important role in molecular evolution. In this paper, we report qualitative changes in flanking regions that arise from the occurrence of indels in protein structure domains. The study is based on the data obtained from homologous superfamilies in the structural classification of proteins (SCOPs) database (Andreeva et al. 2008). Indels and their flanking regions were detected by secondary structure matching (SSM), a common structure alignment program (Krissinel and Henrick 2004). The results indicated that indels in proteins lead to structural shifts in the flanking regions and disruption of some secondary structure elements. These changes are due to the combined effects of the regional-inherent effect (RIE), indelaccompanied effect has been changed to IDAE. Please check if this is correct.> (IDAE), and indel-following effect (IDFE). The effect of indels on the substitution rate of flanking regions was also investigated. In contrast to the commonly accepted view, we found that such substitutions could be induced by reducing the intensity of selective pressure. Pairwise Structure Comparison in SCOP Superfamilies Data on homologous protein superfamilies were obtained from the SCOP (1.73) structural classification database (Andreeva et al. 2008). The samples are all domains in the first five SCOP classifications, including all alpha proteins, all beta proteins, alpha and beta proteins (a/b), alpha and beta proteins (a b), and multidomain proteins. From these domains, we chose data from the ASTRAL95 nonredundant structural database (Chandonia et al. 2004) in which any two sequences have sequence identity (SI) less than 95%. Pairwise structural alignments were performed in superfamilies with no less than two nonredundant domains. All these alignments were conducted by the SSM online alignment service (http://www.ebi.ac.uk/msd-srv/ ssm/). The immunoglobulin superfamily with 841 nonredundant domains was excluded because it formed too many pairwise alignments. Thus, 395,695 structural alignments were performed between 12,573 nonredundant domains from 1,053 superfamilies. All the alignments were downloaded for local analysis. Extracting Indels and the Flanking Regions We termed regions that did not have a matching region in the alignment file as gaps. These regions arise from the occurrence of indels. The alignment regions on both sides of the gap were termed flanking regions, and these may be potentially related with indels during the evolutionary process. All the gaps and the flanking regions in the alignments were selected by a program and were saved as separate files. To avoid redundant counting, the alignment regions between two neighboring gaps were equally divided into two parts (fig. 1A and B). The pseudo gaps that were produced by incomplete domains were excluded. By this method, a Gap database containing 2,924,980 gap files was obtained. The Gap database is freely available at http://indel.bioinfo.sdu.edu.cn. Gap files with no less than ten alignment sites on either side of the flanking regions were selected. Therefore, in these gap files, there will not be another gap within 20 alignment sites on this side of the flanking regions. This attenuates the overlapping effect of neighboring gaps. By this method, 406,000 files were selected, and these formed the Gap-10 database in which 30,408 files were suitable on both sides of the flanking region. Insertion FA Deletion FA Insertion SI Deletion SI Detection of Insertions and Deletions Some of the gap files in the Gap database can be detected by the outgroup test method that we used to determine whether the gaps are due to insertions or deletions, and these were termed confirmative insertions and confirmative deletions, respectively. This analysis was conducted by two methods for the purpose of contrast. Method 1 Gaps in proteins within the same family can be detected by analyzing proteins from different families but within the same superfamily. Assume that protein a and protein b in figure 1 are from the same family and protein c is from another family within the same superfamily. If Gap 1 and Gap 2 have the same length and same position, it is quite possible that they arise from the same insertion or deletion. So, the gap between protein a and protein b is the result of a deletion in the protein (fig. 1E). In the same way, insertions can also be detected. Most of the gaps have many outgroup proteins that are suitable for detection. However, in order to be rigorous, only gaps in which all the outgroup proteins had the same result were selected as confirmative deletions or insertions. By this method, 32,021 confirmative insertions and 23,199 confirmative deletions were detected of which 13,812 confirmative insertions and 9,937 confirmative deletions were included in the Gap-10 database. These formed the databases of Insertion FA and Deletion FA, respectively. Method 2 Gaps can also be detected by SI. In figure 1, assume that the SI of protein a and protein b is larger than that of protein a and protein c and that of protein b and protein c. Once again, if Gap 1 and Gap 2 have the same length and same position, it is quite possible that they have arisen from the same insertion or deletion. Thus, this gap may have arisen from a deletion in protein a (fig. 1E). In the same way, insertions can also be detected. However, in order to be rigorous, this gap was categorized as confirmative insertion or confirmative deletion only when 80% of the suitable outgroup proteins results were the same, and the number was more than three. By this method, 44,235 confirmative insertions and 26,610 confirmative deletions were detected of which 18,124 confirmative insertions and 10,528 confirmative deletions were included in the Gap-10 database. They formed the database of Insertion SI and Deletion SI, respectively. Some insertions and deletions were detected by both of these methods, and 5,039 confirmative insertions (36% of Insertion FA and 28% of Insertion SI) and 2,921 confirmative deletions (29% of Deletion FA and 28% of Deletion SI) were included in the Gap-10 database. Additionally, a few gaps were classified into different categories, which represent inconsistent results: There were 201 gap files in both Insertion FA (accounting for 1.5%) and Deletion SI (accounting for 1.9%) and 309 gap files in both Insertion SI (accounting for 1.7%) and Deletion FA (accounting for 3.1%). Setting Up of the Control Groups Some of the alignment files within the same superfamily were divided into three groups based on the following criteria. First, the flanking regions of all confirmative insertions and confirmative deletions were used to form Set1. Second, in the case of alignments in which neither of the sequences of the alignment had confirmative insertions or deletions in the required position, the respective flanking regions were placed in Set0 (fig. 1C). Third, in the case of alignments in which both sequences had confirmative insertions or deletions in the required position, the respective flanking regions were placed in Set2 (fig. 1D). In contrast, for each alignment in Set0 and Set2, there was one sequence that was the same as that in Set1 (fig. 1E). The flanking regions in Set0 and Set2 correspond to those of Set1, and these cover half of the internal regions between two adjacent gaps (fig. 1C and D). The flanking regions in Set0 and Set2 also have no less than ten alignment sites. The number of flanking regions and the average sequence nonidentity (SNI) of the alignment files in each Set are shown in table 1. Statistical Analysis Statistical analysis was limited to the ten nearest residues of the flanking regions. If the sites had the same distance to indels, they were extracted and collected in the same data set. Because all the flanking regions that we analyzed contained at least ten alignments sites, all the data sets obtained were of the same number of sites. Based on these data sets, the qualities of the flanking regions (root mean square deviation [RMSD], SNI, nonsimilar substitution [NS]/similar substitution [SS], etc.) were calculated to obtain the original data (shown in supplementary tables S1-S5, Supplementary Material online). The original data of the qualities of the flanking regions were normalized. First, to show the indel-related effects of the original data, background values that were unrelated to indels were deleted. Next, considering that the evolutionary relationship between the proteins in each Set was different, all data were weighted according to the data in Set1. We normalized all the data using the following two equations: zijSetj 5 xijSetj xjSetj xjSet1=xjSetj In these equations, i represents any site in the flanking regions, j represents any Set, z represents the normalized value of a certain quality, x represents the original value of a certain quality (RMSD, SNI, NS/SS, etc.), and x represents the background value of a certain quality (the background value was obtained by calculating the average value of the respective values of the eight, ninth, and tenth sites in the flanking regions). Results Structural Shift Model in the Flanking Regions of Indels Indels in proteins add or remove a part of the amino acid sequence, which has a definite impact on the flanking regions. To depict such an impact, structural alignments were performed between 12,573 nonredundant domains of 1,053 superfamilies in the SCOP (1.73) structural classification database to search for indels and their flanking regions (see Methods). Strict quality screening of these indels and their flanking regions led to the identification of 406,000 indels containing 436,408 flanking regions. We analyzed the composition of these indels and found that in these indel regions, the proportion of hydrophilic amino acid residues and nonstructural elements was much higher than that of hydrophobic amino acid residues (Supplementary figures S1 and S2, Supplementary Material online). This is consistent with previous findings (Pascarella and Argos 1992; Benner et al. 1993; Chang and Benner 2004; de la et al. 2007) and also confirms that these results were not artifacts arising from the structure alignment algorithm. Statistical analysis was conducted in the flanking regions, and the results showed that there was a structural shift in the flanking regions of indels (fig. 2). The shorter the distance from the flanking region to indels, the greater their degree of deviation of Ca atoms (expressed as RMSD) (fig. 2A). The level of structural shift of the flanking regions decreased depending on their distance to the indels (S), and the decrease fitted an exponential decay function that could be represented as follows: RMSD 5 c1exp S=c2 c3 In this equation, c1, c2, and c3 are empirical parameters. In this relationship, no more than seven residues nearest to the indels were notably influenced by the indels. To investigate the change in RMSD at different indel lengths, the flanking regions were divided into ten groups based on the indel length (fig. 2B). It is apparent that the RMSD fits a first-order exponential decay relationship with S, irrespective of the indel length. The structural shift of the flanking regions with the length of indels being 1 is notably smaller than situations with other lengths of indels. The results suggested that the occurrence of indels is notably correlated with a structural shift in the flanking regions. On the one hand, due to the highly dependence between adjacent residues, the occurrence of indels may have direct causality with the structural shift of the flanking regions. On the other hand, this relationship may be indirect. In contrast to DNA noncoding regions, coding regions are under higher selective pressure (Clark et al. 2007; de la et al. 2007). As a consequence of this selective pressure, most indels are preserved in regions in which the hydrophilic surface of the protein has no recognizable secondary structure (Pascarella and Argos 1992; Benner et al. 1993; Chang and Benner 2004). These regions have a relatively small contribution to the structural stability of proteins. Furthermore, these regions are more flexible, and their location can easily change during protein evolution. Therefore, in order to explore the mechanisms of indels effects on the flanking regions, a control experiment was designed. Three Factors Leading to the Structural Shift of the Flanking Region In SCOP superfamilies, we developed two methods to detect insertions and deletions, and these are named confirmative insertion and confirmative deletion (see Methods). All the confirmative insertions and confirmative deletions were grouped in Set1, which displayed the differences in the flanking regions in proteins with indels and those without indels. Furthermore, some alignments in the superfamilies were detected on the basis of confirmative insertions or confirmative deletions. Alignments in Set0 showed no specific insertion or deletion in both of the sequences, and this represented the difference between homologous proteins with no indels. On the contrary, alignments in Set2 showed a certain insertion or deletion in both of the sequences, and this represented the difference between homologous proteins after the occurrence of indels. The structural shift in the flanking regions of indels is shown in figure 3 by data from Set0, Set1, and Set2. All data were normalized to eliminate background differences (see Methods). The results obtained by the two methods were almost the same (fig. 3AD), which confirmed that these results were not artifacts of the analytical methods used. Comparisons between Set1 and Set0 showed that the structural shift in the flanking regions in Set1 was much larger than that in Set0. This phenomenon was similar in both confirmative insertions and confirmative deletions, indicating that in homologous protein superfamilies, the level of structural shift in proteins with indels was notably larger than that in proteins without indels. This proved that indels have the potential to induce structural shifts in the flanking regions, indicating that this is a causal relationship. Further analysis was carried out by comparing Set2 and Set0. Different trends were apparent between confirmative insertions and confirmative deletions. In the case of confirmative insertions (fig. 3A and C), the Set2 and Set0 curves were almost identical. This indicated that the structural shift was similar in proteins without indels and proteins with indels. However, in the case of the three sites nearest to the indels of confirmative deletions (fig. 3B and D), the RMSD of Set2 was notably larger than that of Set0. This indicated that the extent of structural shift in proteins with deletions was larger than that in proteins without deletions. Therefore, it appeared that although deletions in proteins easily led to a structural shift in the flanking regions during the evolutionary process, insertions could not bring about such a change. When only the data from Set0 were considered, a structural shift in the flanking regions was also observed (fig. 3). The degree of structural shift increased when the distance to the indels decreased, suggesting that indels are more likely to be preserved in regions that easily shift (i.e., more flexible regions) as a result of selective pressure. Therefore, the possibility is also true that the decrease of selective pressure indirectly leads to the structural shift of the flanking regions. Thus, the structural shift in the flanking regions of indels (fig. 2) could be explained by the combined effects of three different factors. These three factors could be distinguished as follows: the inherent quality differences in the flanking regions, which were termed RIE; changes in the flanking regions that accompanied the occurrence of indels, which were named IDAE; and lasting effects in the flanking regions after the occurrence of indels (especially deletions), which were named IDFE. Estimating RIE, IDAE, and IDFE Although it is easy to qualitatively analyze the combined effects of RIE, IDAE, and IDFE of indels on the flanking regions, it is difficult to separate the individual effects and estimate these quantitatively. We tried to estimate RIE, IDAE, and IDFE using the normalized data of Set0, Set1, and Set2. The background value that was unrelated to indels and the differences resulting from the evolutionary distance differences were deleted. In Set0, neither protein had indels, so the indel-related quality differences represented the result of RIE only. In Set1, only one protein had indels, so the indel-related quality differences were due to RIE of each of the two flanking regions, IDAE and IDFE of the flanking region of a certain insertion or deletion. In Set2, both proteins were under the effect of the same indel, so IDAE could not be seen in the alignment files, and the indel-related quality differences were due to the RIE and IDFE of the flanking regions. Therefore, the standardized data in Set0 contained two RIE; the standardized data in Set1 contained two RIE, one IDAE, and one IDFE; and the standardized data in Set2 contained two RIE and two IDFE. The three effects can be estimated by the following equations: RIEi 5 zijSet0=2 IDFEi 5 zijSet2 zijSet0 =2 In these equations, i represents any site of the flanking regions and z represents the normalized values of a certain quality (RMSD, SNI, NS/SS, etc.). We estimated the RIE, IDAE, and IDFE that affected the structural shift in the flanking regions of confirmative insertions and confirmative deletions (fig. 4). For confirmative insertions, the effect of IDAE on the flanking regions was notably larger than that of RIE, and the effect of IDFE was approximately 0. For confirmative deletions, the effect of IDAE was the primary effect, and the effects of IDFE and RIE were quite similar. All these effects fitted a first-order exponential decay curve, with the exception of the effect of IDAE on confirmative insertions, which was almost negligible (eq. 3; table 2). In general, the sum of insertion-related structural shift in the flanking regions was due to approximately 19% RIE and 81% IDAE. The sum of deletion-related structural shift in the flanking regions was due to 13% RIE, 73% IDAE, and 14% IDFE. Potential of Indels to Alter the Secondary Structure Elements of Their Flanking Regions As discussed above, indels lead to structural shifts in the flanking regions. The shifts are accompanied by changes in hydrogen bonding between the residues. This results in changes in the secondary structure elements (a helix and b-pleated sheet), which are stabilized by inter-residue hydrogen bonds. The effects of indels on secondary structure elements are depicted by data from Set0, Set1, and Set2 (fig. 5). The results showed that both confirmative insertions and confirmative deletions notably increased the a helix change rate (fig. 5A and B) and b-pleated sheet change rate (fig. 5C and D) of their flanking regions, which is consistent with the expectation that the occurrence of indels can induce instability in the secondary structure elements of their flanking regions. Similar to the factors that affect structural shift in the flanking regions, the changes in the secondary structure were also influenced by RIE, IDAE, and IDFE (fig. 5). IDAE had a primary effect on the secondary structure change rate, which was no more than four sites around the indels. Moreover, the effect of IDAE on the b-pleated sheet change rate was greater than that on the a helix change rate. IDFE had a notable effect on the secondary structure elements of the three sites surrounding the deletions. In the case of insertions, IDFE had no effect on the a helix change rate of the flanking regions and had only a notable effect on the first site of the b-pleated sheet. RIE also affected the secondary structure change rate of the flanking regions, especially that of the flanking regions of insertions. Furthermore, we analyzed the changing directions of the secondary structure elements around the indels (fig. 6). In the case of confirmative insertions (fig. 6A), the occurrence of insertions resulted in disruption of most secondary structure elements (a helix and b-pleated sheet) of the flanking regions and their resulting conversion into nonsecondary structures. Moreover, a small amount of nonsecondary structures changed into a helices. In the case of confirmative deletions (fig. 6B), in contrast to insertions, more secondary structure elements were destroyed and converted into nonsecondary structures due to the occurrence of deletions. Fewer nonsecondary structures were converted into secondary structures. There is little possibility that a new b-pleated sheet will form in the flanking regions after the occurrence of insertions or deletions. These results revealed that in protein structures, the secondary structure elements of the flanking regions are destroyed as a direct consequence of indels. In contrast to insertions, deletions have a stronger and more sustained ability to destroy the secondary structure elements of the flanking regions. Impact of Indels on Amino Acid Substitution in the Flanking Regions A recent study on genomes revealed that indels induce an increase in the base substitution rate in the flanking region, which is supported by the observation that heterozygosity for an indel is mutagenic (Tian et al. 2008). It was proved that there is a weak but real relationship between tertiary structure and molecular evolution (Choi et al. 2007). So in our study, the effect of indels on the substitution rate of the flanking regions was studied by sequence alignment based on structures in homologous superfamilies. We observed that the sequence divergence of the flanking regions of indels was also affected by the combined effects of RIE, IDAE, and IDFE (fig. 7). In the case of SNI (fig. 7A and B), the effect of IDAE on the flanking regions was notable, indicating that the occurrence of indels was accompanied by an increase in the amino acid substitution rate of the flanking regions. This phenomenon has also been observed in genomes. Moreover, in the three sites nearest to the deletions, IDFE also influenced SNI, indicating that after deletions had occurred, there would be lasting effects that would lead to an increase in the amino acid substitution rate. This was similar to the tendency in which indels affected the flanking regions. Furthermore, we calculated the ratio of nonsimilar amino acid substitution (NS) to similar amino acid substitution (SS) for the flanking regions (fig. 7C and D). The level of NS/SS is likely to be associated with the level of selective pressure. NS and SS were defined according to the level of structural and chemical similarities of the aligned residues obtained from SSM. From our results, the effects of IDAE on the NS/SS of the flanking regions were notably larger than those of RIE. Additionally, for the site nearest to the indels, the IDAE of insertions was notably larger than that of deletions. This indicated that in contrast to spontaneous amino acid substitutions in the flanking regions, NSs accounted for a higher proportion of the substitutions that accompanied indels (especially insertions). This was in contrast to the spontaneous amino acid substitutions in the flanking regions. Furthermore, in the case of the three sites nearest to the flanking regions of deletions, IDFE also increased the NS/SS ratio. This indicated that after deletions had occurred in protein structures, NSs were more easily conserved in the flanking regions, reflecting a decrease in selective pressure. These results revealed that indels in the protein structure led to amino acid substitutions that occurred more easily in the flanking regions. However, this phenomenon does not rely solely on the mutagenic property of indels in the genome because mutations in the protein structure were also subjected to a high level of selective pressure. Similarly, indels in the protein structure led to an increase in the NS/SS ratio of their flanking regions. This was reflective of the fact that at the protein level, the decrease in the selective pressure of the flanking regions was also an important factor affecting the amino acid substitution rate. Effects of RIE, IDAE, and IDFE on the Flanking Regions In contrast to amino acid substitutions, indels are generally considered to play primary roles in the process of protein structure evolution (Grishin 2001; Hormozdiari et al. 2009). Moreover, the contributions of insertions are believed to be more important than those of deletions during the structural and functional divergence of proteins (Jiang and Blouin 2007; Wolf et al. 2007). In this study, we discovered three kinds of changes in the flanking regions of indels (structural shift in the tertiary structure, changes in the secondary structure elements, and amino acid sequence divergence), which were the result of the combined effects of RIE, IDAE, and IDFE. The effects of RIE, IDAE, and IDFE on different qualities of the flanking regions tended to be very similar. For example, all these effects decreased as the distance to indels increased. The effects of IDAE were the largest, whereas IDFE only affected the flanking regions of deletion. These results indicated that there may be an internal connection between different qualities of the flanking regions. The effect of RIE on the qualities of the flanking regions may be due to inhibition of the selective pressure on indels in protein structures. Indels in protein structures are more easily preserved in regions that can shift easily because the geometric constraints of the secondary structure of such regions are low, and amino acid substitutions are easily preserved, especially NSs. Such regions often lie in nonsecondary structures on the proteins hydrophilic surface. Moreover, in contrast to deletions, RIE have a greater impact on the flanking regions of insertions, indicating that insertions are more easily preserved on the protein surface. IDAE is the primary factor that affects quality changes in the flanking regions. The occurrence of indels is often accompanied by structural shift and secondary structure damage. This may be due to tension release because indels add or remove a certain region in the protein sequence, and structures have to be locally reconstructed to adapt to these changes. On the other hand, more NSs are preserved when indels occur, which is a further proof of a link with indels. This may be the result of reconstruction compensation. Indels with more NSs are more easily preserved because these NSs can compensate for the disruption of structures arising from the release of tension. Moreover, in contrast to deletions, insertions appear to depend more on reconstruction compensation. IDFE only affects the flanking regions of deletions and has lasting effects on the qualities of the flanking regions. Because deletions remove a region of the protein structure, these may lead to larger structural shifts in the flanking region during the evolutionary process and may also result in the disruption of secondary structure elements to a certain extent. The decrease in the structural constraints of the flanking regions accompanies the decrease in selective pressure and further increases the probability of preserving amino acid substitution (particularly NSs). Although IDFE has a lower effect than IDAE on the flanking regions, it has a lasting effect on proteins. The Selective Pressure of the Flanking Regions of Indels In genomes, it has been proposed that the induction of indels increases the base substitution rate (Tian et al. 2008). Indel heterozygosity is expected to affect localized chromosome pairing during meiosis and might target the region for mutational repair (Bonneau and Longy 2000; Silva and Kondrashov 2002). Indels in protein-coding regions are subject to distinct levels of selective pressure depending on their structural impact on the amino acid sequence (de la et al. 2007). In protein structure domains, increases in the tertiary structure shift, secondary structure element damage, and amino acid substitution in the flanking regions of indels reflect the decrease in selective pressure. These qualities of the flanking regions are affected by RIE, IDAE, and IDFE; thus, the decrease in selective pressure can also be regarded as the combined effects of these three factors. RIE reflects the fact that indels are more easily preserved in regions with low selective pressure, whereas IDAE reflects that the selective pressure of the flanking regions decreases notably when indels occur. IDFE reflects the fact that the selective pressure decreases slightly after the occurrence of deletions. The drift of the core of a domain structure is unlikely to be stable and functional (Jiang and Blouin 2007). The functional divergence of homologous structures is primarily based on changes in structure, accompanied by quality changes in certain sites. Therefore, functional changes in proteins often arise from the combined effects of indels and substitutions. If indels in proteins are able to induce the increase of amino acid substitution rate of the flanking region, especially for those NSs, there is great possibility that indels and substitutions that affect functions occur at the same time. Therefore, if mutations that can alter functions do not occur independently, the possibility of producing new functions increases, which accelerates the adaptation of the gene products to the environment. In contrast to traditional studies that have concentrated on indels (Pascarella and Argos 1992; Benner et al. 1993), this study focuses on the impact of indels on the flanking regions. In proteins, there are a series of quality changes in the flanking regions, including the following: 1) structural shift in tertiary structure and first-order exponential decay relation between structural shift and the distance to indels, 2) instability of the secondary structure elements in which parts of the a helix and b sheet are destroyed, and 3) an increase in the amino acid substitution rate of the primary structure and the NS rate. By comparing the situation before and after the occurrence of indels, we found that all the qualities of the flanking regions were affected by the following three factors: RIE, IDAE, and IDFE. RIE reflects the inner quality differences in the flanking regions. IDAE accompanies the occurrence of indels and is the primary factor leading to changes in the flanking regions. IDFE reflects the lasting effects on the flanking regions and generally only affects the flanking regions of deletions. The phenomenon that indels can lead to an increase in the mutation rate of the flanking regions in genome is also observed in proteins that experience high levels of selective pressure. Interestingly, the NS/SS value of the flanking regions of indels also increases, indicating that in the case of protein sequences, indels can decrease the selective pressure of the flanking regions. In summary, all these results show that as an active factor, indels have effects on the qualities of the flanking regions. This improves our understanding of the role of indels in protein evolution. Supplementary Material Supplementary tables S1-S5 and Supplementary figures S1-S2 are available at Molecular Biology and Evolution online (http://www.mbe.oxfordjournals.org/). We thank Cai Jiang and Ting Peng for their helpful discussion and Yangyang Liu for their programming support. We also thank Martin Kreitman, Jeff Thorne (a reviewer), and an anonymous reviewer for their comments and suggestions. This work was supported by the National Natural Science Foundation of China, No. 30870044 and 30970092. References


This is a preview of a remote PDF: https://mbe.oxfordjournals.org/content/28/1/291.full.pdf

Zheng Zhang, Jie Huang, Zengfang Wang, Lushan Wang, Peiji Gao. Impact of Indels on the Flanking Regions in Structural Domains, Molecular Biology and Evolution, 2011, 291-301, DOI: 10.1093/molbev/msq196