Understanding human disease mutations through the use of interspecific genetic variation (pdf)

Article PDF cannot be displayed. You can download it here:

https://hmg.oxfordjournals.org/content/10/21/2319.full.pdf

Understanding human disease mutations through the use of interspecific genetic variation

Mark P. Miller 0 Sudhir Kumar 0 0 Department of Biology, Arizona State University , Tempe, AZ 85287-1501, USA Data on replacement mutations in genes of disease patients exist in a variety of online resources. In addition, genome sequencing projects and individual gene sequencing efforts have led to the identification of disease gene homologs in diverse metazoan species. The availability of these two types of information provides unique opportunities to investigate factors that are important in the development of genetically based disease by contrasting long and short-term molecular evolutionary patterns. Therefore, we conducted an analysis of disease-associated human genetic variation for seven disease genes: the cystic fibrosis transmembrane conductance regulator, glucose-6-phosphate dehydrogenase, the neural cell adhesion molecule L1, phenylalanine hydroxylase, paired box 6, the X-linked retinoschisis gene and TSC2/tuberin. Our analyses indicate that disease mutations show definite patterns when examined from an evolutionary perspective. Human replacement mutations resulting in disease are overabundant at amino acid positions most conserved throughout the long-term history of metazoans. In contrast, human polymorphic replacement mutations and silent mutations are randomly distributed across sites with respect to the level of conservation of amino acid sites within genes. Furthermore, disease-causing amino acid changes are of types usually not observed among species. Using Grantham's chemical difference matrix, we find that amino acid changes observed in disease patients are far more radical than the variation found among species and in non-diseased humans. Overall, our results demonstrate the usefulness of evolutionary analyses for understanding patterns of human disease mutations and underscore the biomedical significance of sequence data currently being generated from various model organism genome sequencing projects. - One central purpose of genome sequencing projects is to effect a better understanding of the genetics of disease and provide assistance with the identification of disease-associated genes (13). However, many human mutation databases containing genetic variation found in disease patients already exist, and new databases and database entries are rapidly accumulating (4,5). Concomitant analysis of these two types of information provides unique opportunities to identify intrinsic attributes of disease-associated human genetic variation, leading to a better understanding of the relationship between mutations and the development of disease phenotypes. Information contained in the alignments of homologous disease-associated genes has long been recognized as an important factor for understanding contemporary deleterious genetic variation in humans (4,6). For example, in a given set of homologous genes, a large fraction of amino acid sites will be conserved even among distantly related species that diverged hundreds of millions of years ago. Variations that arose at such positions throughout evolutionary history have evidently been under strong purifying selection and eliminated from populations, suggesting that the existing amino acid residues at invariant positions are critical for proper gene function. Thus, information from interspecific alignments can indicate amino acid residues in gene products that are likely to produce disease if mutated in humans. Likewise, some positions in protein sequences vary among species, and such variable sites may indicate positions that are under less severe selective constraints. These variable positions suggest sites where residue changes can be tolerated by natural selection and provide insights into the types of amino acids that can be freely exchanged without negatively impacting protein function. Since the logic of these statements is often used by researchers to indicate the potential for an observed amino acid change to produce disease in humans (610), we conducted a study to directly evaluate the extent that interspecific sequence alignments reveal common attributes of the deleterious mutations observed in humans. We performed three types of analyses using disease mutation data and homologous gene sequences from seven disease-associated genes (Table 1 and Fig. 1): cystic fibrosis transmembrane conductance regulator (CFTR), glucose-6-phosphate dehydrogenase (G6PD), neural cell adhesion molecule L1 (L1CAM), phenylalanine hydroxylase (PAH), paired box 6 (PAX6), the X-linked retinoschisis gene Number of mutations analyzed (disease/polymorphic/silent)a aDisease mutations refer to those amino acid changes that produce a disease phenotype. Polymorphic mutations are amino acid changes that are presumably not disease related. Silent mutations are DNA sequence changes that do not alter the encoded amino acid. bThe database analyzed contained 48 type I mutations that result in chronic non-spherocytic hemolytic anemic and 62 less severe types II, III or IV mutations. and a gene associated with tuberous sclerosis (TSC2). First, we determined the association between the prevalence of disease mutations and the extent to which corresponding amino acid sites in other species have been conserved throughout the evolutionary history of metazoans. Secondly, we compared the frequency of a given type of amino acid change in disease patients to frequencies obtained from interspecific comparisons. Finally, we compared the chemical property differences of amino acid changes seen among species and non-diseased humans with those observed in disease patients. RESULTS AND DISCUSSION The association of disease mutations and evolutionarily conserved amino acid residues A null hypothesis describing the distribution of human genetic variation among amino acid sites in a gene can be generated assuming that point mutations occur randomly throughout that gene. If a set of mutations found in a population is representative of the random mutational process, then the number of mutations observed at a given type of site in a gene should be proportional to the frequency with which sites of that type appear in a sequence. Using information from interspecific comparisons, we tested the null hypothesis that disease-associated replacement mutations are randomly distributed among different classes of amino acid sites which were determined based on their variability among extant metazoans. This analysis permits a direct assessment of statements suggesting that disease mutations are more common at evolutionarily conserved residues. If we do not reject the null hypothesis of random association for a set of disease mutations, then mutations at conserved sites are no more important than those at variable sites for the development of the disease phenotype. In contrast, analyses will illustrate the importance of replacement mutations at conserved sites if the null hypothesis is rejected due to an overabundance of disease mutations at conserved positions and a (...truncated)