Evolutionary and Structural Analyses of GDAP1, Involved in Charcot-Marie-Tooth Disease, Characterize a Novel Class of Glutathione Transferase-Related Genes

Molecular Biology and Evolution, Jan 2004

Mutations in the Ganglioside-induced differentiation-associated protein-1 (GDAP1) gene cause autosomal recessive Charcot-Marie-Tooth disease type 4A. The protein encoded by GDAP1 shows clear similarity to glutathione transferases (also known as glutathione S-transferases or GSTs). The human genome contains a paralog of GDAP1 called GDAP1L1. Using comparative genomics, we show that orthologs of GDAP1 and GDAP1L1 are found in mammals, birds, amphibians, and fishes. Likely orthologs of those genes in invertebrates and a low but consistent similarity with some plant and eubacterial genes have also been found. We demonstrate that GDAP1 and GDAP1L1 do not belong to any of the known classes of GST genes. In addition to having distinctive sequences, GDAP1 and its relatives are also characterized by an extended region in GST domain II, absent in most other GSTs, and by a C-terminal end predicted to contain transmembrane domains. Mutations affecting any of those characteristic domains are known to cause Charcot-Marie-Tooth disease. These features define the GDAP1 class of GST-like proteins.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:


Evolutionary and Structural Analyses of GDAP1, Involved in Charcot-Marie-Tooth Disease, Characterize a Novel Class of Glutathione Transferase-Related Genes

Antonio Marco 0 Ana Cuesta Laia Pedrola Francesc Palau Ignacio Marn 0 0 Departamento de Gene tica, Universidad de Valencia , Valencia , Spain; and Laboratory of Genetics and Molecular Medicine, Instituto de Biomedicina, Consejo Superior de Investigaciones Cient ficas , Valencia , Spain Mutations in the Ganglioside-induced differentiation-associated protein-1 (GDAP1) gene cause autosomal recessive Charcot-Marie-Tooth disease type 4A. The protein encoded by GDAP1 shows clear similarity to glutathione transferases (also known as glutathione S-transferases or GSTs). The human genome contains a paralog of GDAP1 called GDAP1L1. Using comparative genomics, we show that orthologs of GDAP1 and GDAP1L1 are found in mammals, birds, amphibians, and fishes. Likely orthologs of those genes in invertebrates and a low but consistent similarity with some plant and eubacterial genes have also been found. We demonstrate that GDAP1 and GDAP1L1 do not belong to any of the known classes of GST genes. In addition to having distinctive sequences, GDAP1 and its relatives are also characterized by an extended region in GST domain II, absent in most other GSTs, and by a C-terminal end predicted to contain transmembrane domains. Mutations affecting any of those characteristic domains are known to cause CharcotMarie-Tooth disease. These features define the GDAP1 class of GST-like proteins. Introduction Charcot-Marie-Tooth (CMT) disease is the name of a group of common hereditary neuropathies of heterogeneous genetic origin. They are characterized by slow and progressive weakness and atrophy of muscles, primarily affecting peroneal and distal leg muscles and later those in arms and forearms, often accompanied by distal sensory deficits and foot deformities (reviewed in Kuhlenbau mer et al. 2002). CMT neuropathies are generally divided into two main classes. On one hand, there are CMT syndromes associated to axon demyelination that are characterized by slow motor nerve conduction velocities (demyelinating CMT or CMT1). On the other hand, CMT can also be caused by axonal loss (axonal CMT or CMT2). Most cases of CMT follow an autosomal dominant or X-linked dominant type of inheritance. Autosomal-recessive CMT neuropathies are much less common (Kuhlenba umer et al. 2002). In recent years, a substantial number of genes have been characterized whose mutations lead to the different types of CMT disease (reviewed in Berger, Young, and Suter 2002; and Shy, Garbern, and Kamholz 2002). The analysis of families with members that suffer autosomalrecessive CMT disease has allowed the determination of at least 10 loci involved in those rare pathologies (Berger, Young, and Suter 2002). Several of the genes that correspond to those loci have already been characterized. Particularly, mutations of one of those genes, called Ganglioside-induced differentiation-associated protein 1 (GDAP1 [Liu et al. 1999]), were shown to generate either axonal (Cuesta et al. 2002; Sevilla et al. 2003) or demyelinating (Baxter et al. 2002) autosomal-recessive CMT type 4A (CMT4A) neuropathy. Additional mutations in GDAP1 have been recently shown to generate intermediate phenotypes, where both axonal and demyelinating anomalies are observed (Nelis et al. 2002; Azzedine et al. 2003; Boerkoel et al. 2003; De Sandre-Giovannoli et al. 2003; Senderek et al. 2003). GDAP1 protein has sequence similarity with glutathione S-transferases (GSTs) (Cuesta et al. 2002; Baxter et al. 2002). GSTs belong to many different classes that are defined by all members of a class having high sequence similarity and also common structural and functional features (reviewed in Sheenan et al. 2001; and Sherratt and Hayes 2002). However, we already detected that GDAP1 sequence was different enough from those of known GST classes as to suggest that GDAP1 could belong to a novel, still undescribed, class (Cuesta et al. 2002). In addition to its distinctive primary sequence, GDAP1 protein shows structural features that are absent in canonical GSTs. Thus, two or three carboxyl-terminal transmembrane domains were predicted in GDAP1 protein (Baxter et al. 2002; Cuesta et al. 2002), whereas most GSTs are cytosolic enzymes and lack those domains. However, the precise characterization of the relationships among GDAP1 genes and GSTs is a complex task because GSTs are very numerous and highly heterogeneous. Thus, only a careful comparison to all the GST variants may establish whether GDAP1 truly can be considered a member of a novel class of GST-like genes. In this study, we performed a comprehensive analysis of all known classes of GSTs to determine the origin and evolution of the GDAP1 gene. We demonstrate that GDAP1 belongs to an ancient, clearly defined monophyletic group of genes, distinct from all other GSTs. In addition, precise structural analyses provide clues about the function of GDAP1 and other related proteins and the significance of known human GDAP1 mutations in a biochemical context. Materials and Methods There are thousands of GST sequences available in public databases. This large number precludes performing exhaustive analyses in which all sequences are compiled. We thus used alternative strategies for building a comprehensive database of significant GST proteins. We started by selecting single members of each of the known GST classes and performing, using their sequences as query, systematic TBlastN (Altschul et al. 1997) database searches against the National Center for Biotechnology Information (NCBI) databases (online at http://www.ncbi. nlm.nih.gov/). Once those searches became saturated, we selected some representative members of each class, characterized for having a very high sequence similarity. We also included those sequences found in multiple searches with a low similarity to those of the main GST classes. Those sequences could belong to still undescribed classes of the GST protein superfamily. In parallel, we performed, using the protein sequence of human GDAP1 as query, exhaustive searches against all databases available at the NCBI Web site and those compiled in the GOLD database (http://igweb.integratedgenomics.com/ GOLD/) to detect all the proteins most similar to human GDAP1 in other species. Only a few additional sequences with high similarity to GDAP1 were detected, all of them in the databases generated for the genome projects of model vertebrate species that are still in progress, as the fish Takifugu rubripes or the bird Gallus gallus. Results obtained from each of those searches were merged and protein sequences were aligned using ClustalX version 1.83 (Thompson et al. 1997). We then generated a preliminary phylogenetic tree using the Neighbor-Joining (NJ [Saitou and Nei 1987]) routine available in ClustalX 1.83. Once that primary tree was obtained, duplicates and partial sequences were detected and eliminated. We left only a few partial sequences for their relevance to this study, being clearly very similar to human GDAP1 (those sequences are detailed below). In some cases, only one or a few sequences were found together in an independent branch outside of the known GST families. Those sequences were then reanalyzed, generating new TBlastN searches, to determine whether additional members of a small family or subfamily previously undetected could be found. We were unable to generate a reliable alignment of canonical GSTs or GDAP1 proteins with members of two highly dissimilar classes of GSTs, namely the mitochondrial kappa class (Pemble, Wardle, and Taylor 1996; Jowsey et al. 2003) and the microsomal class (Morgenstern, Guthenberg, and DePierre 1982). Those classes were already known to be extremely different from the rest (Snyder and Maddison 1997). Particularly, microsomal GSTs are now included in a different superfamily of proteins known as MAPEG (membrane-associated proteins in eicosanoid and glutathione metabolism [Jakobsson et al. 1999; Hayes and Strange 2000]). Therefore, members of those two classes were excluded from our analyses. Our final database was generated in April 2003 and contained 289 sequences. We then used the most conserved region of those sequences, which includes most of GST domains I and II (Hayes and Pulford 1995) corresponding to amino acids 26 to 283 of the 358-amino acid human GDAP1 protein (fig. 1), to generate a final multiple-protein alignment using again ClustalX 1.83. Alignments were then manually corrected using GeneDoc version 2.6 (Nicholas and Nicholas 1997). In some ambiguous cases, new Blast searches were performed to establish the most likely alignment for particular pairs or groups of sequences. Phylogenetic trees were obtained both by the NJ and maximum-parsimony (MP) methods, using the routines available in MEGA version 2.1 (Kumar et al. 2001). For NJ, sites with gaps were included and Kimuras correction was used, whereas for MP, the parameters were as follows: (1) all sites included, (2) 10 initial randomly-generated trees used as seeds, and (3) heuristic search using close-neighbor interchange with search level equal to 3. Support for the topologies obtained with those two methods was determined using bootstrap with 1,000 replicates. To check for the robustness of our results, we generated a second type of analysis using, instead of the full-length GST core, only the two most conserved regions found in all the GSTs included in our final multiple-protein alignment (corresponding to amino acids 26 to 101 and 216 to 283 in human GDAP1 protein; see fig. 1 for details). These conserved regions were used to generate NJ and MP phylogenetic trees using the same procedures as those for the full-length GST core trees. To predict the secondary structure of GDAP1 and its relatives, we used the SSPRO (Pollastri et al. 2002), SOPM (Geourjon and Deleage 1994), and Prof version 1.0 (Ouali and King 2000) programs. Three-dimensional structure was predicted using Swiss-Model (Peitsch 1996; online at http://swissmodel.expasy.org/) and using the known structure of the GST3 protein of Zea mays (phi class; Protein Data Bank [PDB] code 1aw 9 [Neuefeind et al. 1997]) as template. Swiss-Pdb viewer version 3.7 (Guex and Peitsch 1997) was used to generate a threedimensional image. We also thoroughly characterizated the transmembrane regions present in GDAP1 and its relatives. We combined the results of three programs devised to predict whether transmembrane regions are present in protein sequences: (1) TMpred (Hofmann and Stoffel 1993 [available at http://searchlauncher.bcm.tmc. edu/]); (2) TMHMM version 2.0 (Krogh et al. 2001 [available at http://www.cbs.dtu.dk/services/TMHMM]); and (3) HMMTOP version 2.0 (Tusnady and Simon 2001 [available at http://www.enzim.hu/hmmtop/]). Phylogenetic trees shown in figures 2 and 3 were generated using MEGA 2.1. GeneDoc version 2.6 was used for displaying and highlighting the multiple-sequence alignments shown in figures 1 and 4. Results Following the strategies described in the previous section, we generated a database containing members of all characterized GST classes, plus all the sequences closely related to human GDAP1. Figure 1 shows a selected part of our complete alignment of 289 protein sequences, where we include the sequence of the GST conserved core of the human GDAP1 protein, the closest relatives to GDAP1 found in humans or other organisms plus representative members of mammalian GST classes. We also indicate in figure 1 the two most conserved regions of that core used in some of the analyses that we describe below. Figure 2A shows the topology of the unrooted tree obtained with the NJ method of phylogenetic reconstruction using the full-length GST core domain. Bootstrap values for well-supported or particularly significant branches found in the NJ analysis are also shown in figure 2. MP G G p results were similar enough as to be embebbed into this tree, and, therefore, the MP bootstrap values are also presented in figure 2. Two main groups of GST proteins were detected, in agreement with previous studies (e. g., Snyder and Maddison 1997). On one hand, a strongly supported group contains GSTs of the mu, pi, and alpha classes that are very close relatives, together with more distant classes, including sigma, S-crystallins, and prostaglandin D synthases, among others. The second main group puts together the GDAP1 proteins with those of the zeta, phi, omega, lambda, tau, beta, and theta classes, plus a few minor sets of sequences. This second main group contains proteins of many different eukaryotic organisms, including animals, plants, and fungi, and also all the bacterial GST-like proteins that we detected. It is significant that, apart from those two main groups, the precise phylogenetic relationships among GST classes cannot be ascertained with this data set (see low-bootstrap support for most inner branches of the tree shown in fig. 2). When the same analyses are performed not with the whole GST core, but with only the most conserved parts of the GST sequence core (those two underlined in fig. 1), the inner topology changes (fig. 2B). In conclusion, data shown in figure 2 conclusively demonstrate that GDAP1 proteins do not belong to any of the hitherto described GST classes but do not enable us to establish which of the classes is the closest relative of GDAP1 genes. As shown in figures 1 and 2, we have found genes very similar to GDAP1 in several vertebrate species, including mammals, birds, and fishes. This result strongly suggests that all those genes are orthologs, and, therefore, the origin of this gene predates the fish-tetrapod split. In addition, figures 2A and 2B also show the close similarity of GDAP1 to an obvious paralog that has been called, in humans, GDAP1L1 (Ganglioside-induced differentiationassociated protein 1-like 1 [UniGene Cluster Hs.20977, located at chromosome 20q12-q13]) This gene also originated before the fish-tetrapod split, as we can deduce because a very similar sequence is found in the genome of the zebrafish, Danio rerio. A few additional sequences from other species had clear similarities with GDAP1 or GDAP1L1 but were too short to be included in our study. Among them, we found sequences from the mammals Sus scrofa (GenBank accession number BG833972) and Bos taurus (GenBank accession number AL212783), the amphibian Xenopus laevis (GenBank accession number BG813460), and the fish Tetraodon nigroviridis (GenBank accession number AL212873). Although bootstrap support is low, the position in our general phylogenetic trees of single genes in the invertebrate model species Drosophila melanogaster (CG4623 gene) and Anopheles gambiae (agCP6058 gene) suggests they could also be GDAP1 orthologs in those organisms. These two sequences appear also as the closest relatives of GDAP1 and GDAP1L1 genes among all animal sequences when only the most conserved region is used to generate phylogenetic trees (fig. 2B). To avoid the inherent difficulties in obtaining well-supported topologies when a very large data set of highly heterogeneous sequences are included, as well as to show more clearly the relative positions of some of the most significant and extensively studied GST proteins, we generated a second analysis that included only the sequences in our main database that belonged to model animal organisms with fully or very extensively sequenced genomes (Homo sapiens, Mus musculus, D. melanogaster, Anopheles gambiae, and Caenorhabditis elegans). The results of this second analysis are detailed in figure 3, where the NJ tree is presented and where again both the NJ and the MP bootstrap data are shown together. As we can see comparing figures 2 and 3, the topologies for the significant animal GSTs in both trees are almost identical, and when they differ, topologies are only weakly supported. In particular, the GDAP1 group proteins appear as clearly different from all the other GSTs and the close similarity of Drosophila CG4623 and Anopheles agCP6058 with GDAP1 proteins (and not with other GSTs) is much more obvious. We conclude that GDAP1 belongs to a particular class of GST-like genes, having relatives in both vertebrates and invertebrates. However, figure 2A also shows that some other sequences appear also quite close to GDAP1 proteins in our phylogenetic trees. They are single genes in the plants Arabidopsis and Oryza and sequences of eubacterial origin, corresponding to the related genes PcpC (encoding tetrachlorohydroquinone [TCHQ] dehalogenase [Orser et al. 1993; Cai and Xun 2002; Habash et al. 2002]) and LinD (encoding 2,5dichlorohydroquinone dehalogenase [Miyauchi et al. 1998]) of sphingomonad a-proteobacterial species (reviewed in Copley 1998, 2000). To determine whether the similarity of these proteins and GDAP1 is significant, we must first consider the information provided by structural analyses. In figure 4, we summarize the features detected by the secondary structure prediction programs on the sequences of GDAP1 and GDAP1L1 genes and the invertebrate GDAP1-related sequences. As shown in figure 1, a particular feature of GDAP1, GDAP1L1, and its relatives is the presence of nucleotides encoding a long stretch of additional amino acids between the two most conserved regions of the protein. In figure 4, we show that this long additional region is located by secondary structure prediction programs to lie between what in canonical GSTs are two alpha helices, respectively called a4 and a5 (see e.g., Board et al. 2000; Thom et al. 2001). We will thus refer to the additional region as the a4-a5 loop. The proteins encoded by the closest GDAP1 relatives in invertebrates also possess an a4-a5 loop, which lends credibility to the hypothesis that GDAP1 genes may have originated before the protostome-deutorostome split. Additionally, Baxter et al. (2002) and Cuesta et al. (2002) mentioned that two or perhaps three transmembrane domains were predicted at the C-terminal end of human GDAP1 proteins. We have searched, using three different programs that detect those domains, whether the proteins that appear as close relatives to human GDAP1 in our trees also are predicted to have transmembrane domains (see Materials and Methods). We have found that all three algorithms coincide in making the strongest prediction for a transmembrane domain at the most distal C-terminus of both GDAP1 and GDAP1L1 proteins in all vertebrate species for which those regions are available. This domain would correspond to amino acids 319 to 340 in human GDAP1 and to amino acids 339 to 362 in human GDAP1L1 (fig. 4). In addition, a second transmembrane domain, located N-terminally with respect to the first one, is predicted by the three programs in both GDAP1 and GDAP1L1 proteins (see also fig. 4), although support for this prediction is weaker. Finally, the program TMpred predicts a third transmembrane domain in GDAP1L1 genes, just upstream of the other two, but this result is not confirmed by the other two programs. When considering the Drosophila putative GDAP1 orthologs, both TMpred and HMMTOP 2.0 also strongly support the presence in the C-terminus of its product of a single transmembrane domain, whereas TMHMM detects some evidence, but it is considered not significant. This discrepancy may be the result of TMHMM being more restrictive than the other two programs (Kall and Sonnhammer 2002). For the Anopheles sequence, only TMPred predicts a transmembrane domain. However, the putative C-terminus of the Anopheles protein lacks similarity with that of the Drosophila protein, and we cannot exclude at present that it may be incorrectly determined. In fact, the putative N-terminal end of this sequence is most likely incorrect (fig. 4). In addition to GDAP1 and its closest relatives, the above-mentioned plant and eubacterial genes generate products that also present an a4-a5 loop (fig. 1). This might suggest again an evolutionary relationship, but the presence of this loop could also be interpreted as a convergent feature; that is, the sequences that encode this extra loop may have independently evolved in totally unrelated genes. This putative convergence could generate a spurious proximity of all those sequences in our phylogenetic trees when the full-length GST core is analyzed. The fact is that only a few GSTs have an extended region between helices a4 and a5, and those presenting it would have a greater chance to appear clustered together. Thus, the analyses performed using only the most conserved regions of the GST core (fig. 2B), which obviously do not include the a4-a5 loop (fig. 1), became crucial to determining whether the plant and eubacterial sequences and GDAP1 sequences are indeed related. As it can be seen in figure 2B, the relative proximity in the phylogenetic trees of GDAP1 proteins and the plant and eubacterial sequences holds true even when the extra region is eliminated. Thus, a relationship between GDAP1 genes and these sequences is supported, although certainly not strongly, by our data. Interestingly, neither the plant nor bacterial putative relatives of GDAP1 present any potential transmembrane region. The currently known mutations in human GDAP1 that cause CMT4A neuropathy are also shown in figure 4 (data derived from Baxter et al. 2002; Cuesta et al. 2002; Nelis et al. 2002; Azzedine et al. 2003; Boerkoel et al. 2003; De Sandre-Giovannoli et al. 2003; and Senderek et al. 2003). These mutations can be grouped into a few types: (1) those that generate truncated versions of the protein, affecting the GST domain (W31X, Q163X, S194X, L223X, and G262fsX284); (2) single amino acidic changes in the GST-like core domain (R120Q; R282C); (3) a single mutation in the predicted first transmembrane domain (R310Q); (4) a mutation that generates a truncated version of the protein that lacks the FIG. 5.Predicted three-dimensional structure for human GDAP1 protein (light grey). Main GST alpha helices are indicated as in figure 4. The region presented as a backbone corresponds to part of the Zea mays GST used to model GDAP1 structure. This region (alpha helices 4 and 5 and loop between them) cannot be properly analyzed for GDAP1 because of lack of sequence similarity. The region shown in white would correspond to the position of the GDAP1 a4-a5 loop, in the case that helices 4 and 5 conserve the same structure that is in canonical GSTs. N indicates N-terminal end. (C) indicates approximate position of the C-terminal end. C-terminal end (T288fsX290); (5) a single amino acidic change that affects a residue that is conserved in both GDAP1 and GDAP1L1 genes, found in the a4-a5 loop (R161H); and (6) splice site mutations ( IVS4 1G . A; IVS3 2A . G. This last mutation apparently eliminates exon 4, being therefore equivalent to a deletion of part of the GST domain (see De Sandre-Giovannoli et al. 2003). These highly heterogeneous results demonstrate that all the characteristic regions of this protein (GST domain; Cterminal, putative transmembrane domain; and a4-a5 loop) can be mutated to generate an autosomal recessive CMT pathology. Despite considerable primary sequence divergence, members of different GST classes have similar threedimensional structures (summarized in Sheenan et al. 2001; and Board et al. 2000). We were interested in determining whether GDAP1 proteins may have a similar fold. We found that the sequence of GDAP1 is similar enough to those of GSTs to generate a prediction of most of its threedimensional structure. We used Swiss-model to reconstruct the three-dimensional structure of GDAP1 based on the most similar GST structure available, a plant phi-class GST (Neuefeind et al. 1997 [see Materials and Methods]). As shown in figure 5, GDAP1 is modeled as having the thioredoxin fold at its N-terminus (domain I), characterized by four beta sheets and three alpha helices (a1 to a3, left side of fig. 5). This fold is highly conserved in all GSTs and GST-related proteins (Senderek et al. 2003). The structure of the C-terminally located domain II is more difficult to predict. Part of helix a5 and helices a6 and a7 could also acquire a three-dimensional structure similar to that found in GSTs (fig. 5, right). On the other hand, helix a4, part of a5, and, logically, the a4-a5 loop, cannot be properly modeled using canonical GSTs as templates (fig. 5). Because canonical GSTs dimerize, we examined further whether the a4-a5 loop would be situated in the interaction surface by properly juxtaposing two models as the one shown in figure 5. We found that, if indeed the a4-a5 loop is located more or less as the interhelical region between a4 and a5 in canonical GSTs (represented in white in fig. 5), it would be clearly not be part of the interaction surface (not shown). Discussion We have demonstrated that GDAP1 belongs to a new group of GST-like proteins, quite different from all other GST classes, which we will now refer to as the GDAP1 class. Genes of this class, which includes human GDAP1 and GDAP1L1 genes, are characterized by three main features. (1) Their distinctive sequences appear as quite distant from all other GSTs in dendrograms (figs. 2 and 3). (2) They have a characteristic additional amino acidic region between what are predicted to be helices a4 and a5 (a4-a5 loop) that is absent in most GSTs. (3) They also have C-terminal extensions that may correspond to transmembrane domains. Those domains are absent in canonical GSTs, which, in general, are cytosolic enzymes. Some exceptions, such as the microsomal variants (highly divergent and excluded from our analyses) or the Saccharomyces cerevisiae GTT1 protein (Choi, Lou, and Vancura 1998 [see figs. 2A and 2B for its position in the GST trees.]), are totally unrelated both in sequence and structure to GDAP1. All these features suggest that, although it is likely that GDAP1-class proteins bind glutathione and perhaps function as glutathione transferases, the cellular context of their functions may be essentially different from those of canonical GSTs. The biochemistry of GDAP1-class proteins may thus offer totally new insights into the functional roles of GST-like proteins in animals. Obvious orthologs of GDAP1 exist in many vertebrate species, including fishes. GDAP1 paralogs, corresponding to the closely related GDAP1L1 human gene and its orthologs, also are found in different vertebrates. Moreover, GDAP1-related genes, most likely with a common evolutionary origin, are detected in invertebrates. These results suggest that the GDAP1 class of GSTs may have originated before the protostome-deuterostome split, perhaps 700 MYA. In addition, we have detected genes notably similar in sequence and also in structure to GDAP1 in a few plants and, even more surprisingly, bacterial species. To explain the plant results, we favor the hypothesis that GDAP1-class genes may in fact have originated very early in eukaryotes, before the plant-animal split. Thus, we predict that GDAP1 genes could be found in other eukaryotic lineages, and that lack of these genes in some eukaryotes would be the result of secondary losses. The classification of the PcpC and LinD genes has always been problematic. Their products were originally classified as theta-class GSTs. They were later ascribed to the zeta class, despite a low level of sequence similarity. This reassignment was based on two facts. First, the presence of two catalytically essential serine and cysteine residues at the N-terminal ends of the proteins encoded by both these genes and in zeta-class GSTs. Second, the possibility that TCHQ dehalogenase (encoded by PcpC) works like maleylacetoacetate isomerase, a role performed by some zeta-class GSTs (Anandarajah et al. 2000 [reviewed in Copley 2000]). However, our results clearly show that both in sequence and structurally, PcpC and LinD are more similar to GDAP1-class genes than to zetaclass GST genes, suggesting their classification should be re-evaluated. If these bacterial genes are confirmed to be significantly related to GDAP1-class genessomething that our results do not fully prove and that probably will require the characterization of more genes of this classthis may be difficult to explain. We think that the presence of GDAP1-related genes in a few closely related proteobacteria could not be explained by conventional vertical transmission. We should hypothesize a relatively recent horizontal gene transfer event, perhaps from a eukaryote containing a GDAP1-like gene to an eubacteria. Evidence for horizontal transfer events of genes that are involved in glutathione biosynthesis have been recently obtained (Copley and Dhillon 2002). Phylogenetic analyses of bacterial GSTs also suggest that horizontal gene transfers may have occurred (summarized in Vuilleumier and Pagni 2002). Thus, eukaryotes and those prokaryotes that use glutathione may have been able in the past to share significant biochemical novelties, through horizontal gene transfer, to better utilize that molecule. In any case, the study of those bacterial proteins is unlikely to provide insights into GDAP1 functions in animals. They have highly specific functions that are unlikely to exist in eukaryotes (reviewed in Copley 1998, 2000; and Vuilleumier and Pagni 2002). The distribution of the known mutations in the GDAP1 gene that have been associated with CMT disease suggest that not only the characteristic GST homology region but also the most C-terminal end of the molecule, a short region that is predicted to contain two transmembrane domains, is important for the proper function of the GDAP1 protein. This is shown by the fact that truncations or amino acidic substitutions in that region have pathological consequences. In addition, the existence of a single amino acidic substitution (R161H) generating CMT disease that affects a conserved residue located in the middle of the a4-a5 loop characteristic of GDAP1 and GDAP1-related genes demonstrates that this peculiar region is also very significant for protein function. The large amount of available GST sequences gives obvious clues about significant, and thus highly conserved, amino acids in the GST core domain. Similarly, evolutionary conservation in both GDAP1 and GDAP1L1 may provide useful information about which residues in the a4-a5 loop are functionally significant. Our analyses do not offer much novel information about GST evolution. Most of the results shown here were already anticipated by Snyder and Maddison (1997). Those authors, who used a much smaller set of sequences, attributed the lack of resolution of their phylogenetic trees to the limited amount of available data, especially the fact that the phylogenetic range of the species included was narrow. This interpretation now appears to be incorrect, because neither the inclusion of many more sequences nor the combined use of different analytical methods solves the relationships among the innermost branches of the tree. As far as we know, ours is the most comprehensive analysis of GST proteins ever performed, and we have even expanded it to include close to 500 GST sequences (unpublished data), but, in all cases, the relationships among GST classes are largely unresolved. We conclude that the information provided by the GST core is insufficient to generate well-supported trees for all GST classes. The alternative approach of concentrating the analyses on closely related classes, to understand the evolution of particular groups of GSTs, may be thus more fruitful. Finally, the fact that we have not found any other novel GST classes suggests that GDAP1 genes may be the last type of GST-like proteins to be described in mammals. Acknowledgments I.M. is supported by Fundaci o La Caixa (01/080-00), and the Spanish Ministry of Science and Technology (MCYT; projects GEN2001-4851-C06-02 [NEUROGENOMICA/CAGEPEP] and SAF2003-09506). F.P. is supported by grants from MCYT and Fundaci o La Caixa. A.M. and L.P. are the recipients of predoctoral fellowships from the MCYT. A.C. is the recipient of a predoctoral fellowship from the Instituto de Salud Carlos III. Literature Cited David Irwin, Associate Editor Accepted September 8, 2003

This is a preview of a remote PDF: http://mbe.oxfordjournals.org/content/21/1/176.full.pdf

Antonio Marco, Ana Cuesta, Laia Pedrola, Francesc Palau, Ignacio Marín. Evolutionary and Structural Analyses of GDAP1, Involved in Charcot-Marie-Tooth Disease, Characterize a Novel Class of Glutathione Transferase-Related Genes, Molecular Biology and Evolution, 2004, 176-187, DOI: 10.1093/molbev/msh013