CGIN1: A Retroviral Contribution to Mammalian Genomes

Molecular Biology and Evolution, Oct 2009

This study describes the origin and structural features of a mammalian gene, CGIN1 (Cousin of GIN1). CGIN1 proteins contain an NYN domain, retroviral RNase H and integrase domains, and a domain of unknown function (CGIN1 domain) that is also present in two other genes (N4BP1 and KIAA0323). We suggest that CGIN1 derives from the fusion of a KIAA0323-like gene with retroviral sequences, which occurred prior to the marsupial–eutherian split. Sequence and structural analyses indicate that the CGIN1 integrase domain is inactive but still retains the 3D folding observed in retroviral integrases. We hypothesize that CGIN1 may contribute to retroviral resistance in mammals by regulating the ubiquitination of viral proteins.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://mbe.oxfordjournals.org/content/26/10/2167.full.pdf

CGIN1: A Retroviral Contribution to Mammalian Genomes

Antonio Marco 0 Ignacio Marn 0 Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University; and Instituto de Biomedicina de Valencia , Consejo Superior de Investigaciones Cientficas (IBV-CSIC), Spain This study describes the origin and structural features of a mammalian gene, CGIN1 (Cousin of GIN1). CGIN1 proteins contain an NYN domain, retroviral RNase H and integrase domains, and a domain of unknown function (CGIN1 domain) that is also present in two other genes (N4BP1 and KIAA0323). We suggest that CGIN1 derives from the fusion of a KIAA0323-like gene with retroviral sequences, which occurred prior to the marsupial-eutherian split. Sequence and structural analyses indicate that the CGIN1 integrase domain is inactive but still retains the 3D folding observed in retroviral integrases. We hypothesize that CGIN1 may contribute to retroviral resistance in mammals by regulating the ubiquitination of viral proteins. - Mammalian genomes are full of sequences that derive from retroviruses and retrotransposons, some of which have been recruited to perform cellular functions (Smit 1999; Maka1owski 2000; Nekrutenko and Li 2001; Britten 2006). Not only sequences derived from retroviral long terminal repeats (LTRs) act as promoters of some cellular genes (Stavenhagen and Robins 1988; Ting et al. 1992; Ling et al. 2002), but also some coding sequences from retrotransposons and retroviruses have been coopted to perform functions for the host. Among the best-known cases are those of the primate syncitin gene, essential for placentation, and the murine Fv1 and Fv4 genes, involved in resistance against retrovirus infection (Ikeda et al. 1985; Best et al. 1996; Kozak and Chakraborti 1996; Qi et al. 1998; Mi et al. 2000; Goff 2004; Bonnaud et al. 2005). Protection against infection was also hypothesized to be the function of GIN1 (Gypsy integrase 1), a cellular gene derived from the integrase of an LTR retrotransposon (Llore ns and Marn 2001). More recently, several other genes of unknown functions derived from retroviral or retrotransposon sequences have been characterized in vertebrates (Zdobnov et al. 2005; Campillos et al. 2006). When we recently performed a search for genes related to GIN1, we detected another mammalian gene with a similar integrase domain, which we have called Cousin of GIN1 (CGIN1; formerly KIAA1305). In our species, it is located in chromosome 14q11.2 and encodes for a 1,898-amino-acid-long protein. Human CGIN1 is widely expressed, according to the data compiled in UniGene (http://www.ncbi.nlm.nih.gov/UniGene/). CGIN1 genes, very similar in sequence and structure, were found to be restricted to mammals, including the marsupial Monodelphis domestica (opossum; see supplementary results and supplementary fig. 1, Supplementary Material online). However, we did not detect any CGIN1 gene in monotremes, such as the platypus, Ornithorhynchus anatinus. These results suggest that CGIN1 emerged after the monotreme split from the rest of mammals, but before the marsupialeutherian split, that is, 125180 Ma. In phylogenetic analyses using the sequences of integrase domains (see supplementary methods, Supplementary Material online), CGIN1 integrase domains appeared as a monophyletic group in an intermediate position between the integrases of retroviruses and gypsy retrotransposons (fig. 1). The sequences most similar to CGIN1 integrase domains were a few integrases detected in fishes and birds (fig. 1). Our findings refute a previous description of the gene CGIN1 as being related to Sushi retrotransposons (Youngson et al. 2005, which called the gene Sushi14C1). Figure 1 shows that the integrases of Sushi elements and the CGIN1 integrase domains are totally unrelated. We found that these CGIN1-like sequences corresponded to endogenous retroviruses (ERVs; see supplementary results, Supplementary Material online). Additional analyses using reverse transcriptase sequences confirmed that the CGIN1like sequences group with retroviruses and not with gypsy retrotransposons (supplementary fig. 2, Supplementary Material online). The simplest hypothesis to explain these results is therefore that part of CGIN1 has a retroviral origin. The structure of the protein encoded by CGIN1 is complex. Combining Blast, Prosite, and InterProScan analyses (see supplementary methods, Supplementary Material online), we determined that the gene contains four regions related to domains found in other proteins (amino acids 24196, 790926, 13081446, and 16091730, respectively, in human CGIN1 protein). The first conserved domain, so far undescribed and that we have called CGIN1 domain, is present in two other human proteins, encoded by the genes N4BP1 and KIAA0323, as well as in the proteins encoded by the orthologs of those two genes in other species. The second conserved region corresponds to an NYN domain, a domain of unknown function described by Anantharaman and Aravind (2006) in multiple eukaryotic and prokaryotic proteins. Experimental data for NYN domain functions are not yet available. The third and fourth domains in CGIN1 contain an RNase H fold. The third domain may correspond to a highly divergent RNase H. The fourth corresponds to the integrase, already mentioned. Figure 2 shows the structures deduced for all the human proteins that contain NYN domains. Phylogenetic analyses with NYN domain sequences indicate that CGIN1 and KIAA0323 are closely related (see supplementary fig. 3, Supplementary Material online). This is confirmed by the structures of the two genes, which only differ significantly FIG. 2.Structures of human NYN domaincontaining proteins. CGIN1: CGIN1 domain; NYN: NYN domain; RNaseH: Ribonuclease H domain; IN: Integrase domain; and CCCH: C3H zinc finger. in their final exons. The last exon of CGIN1 contains the sequences of retroviral origin (i.e., both the putative RNAse H domainencoding sequences and the integrase domain encoding sequences), whereas the last exon of KIAA0323 lacks those sequences (see supplementary fig. 1, Supplementary Material online). KIAA0323 is also mammalian specific. However, it is present not only in marsupials and eutherians, as CGIN1, but also in monotremes. It is therefore older than CGIN1. Significantly, KIAA0323 is found adjacent to CGIN1 in the human genome, in the same strand and orientation. These results indicate that CGIN1 is a KIAA0323 duplicate that suffered the substitution of its last exon by a fragment of an ERV. The precise way of CGIN1 emergence, as the product of a duplication plus a recombination event leading to the fusion of sequences of different origin, is identical to the one that we described some time ago for the PARC gene (Marn and Ferrus 2002; Marn et al. 2004). However, in the case of PARC, recombination merged two genes that encoded potentially interacting proteins. That made reasonable to postulate that such fusion was a secondary event that provided the advantage of avoiding the independent regulation of two genes whose products could be needed in the same tissues and potentially at the same levels (Marn et al. 2004). In the case of CGIN1, such interpretation cannot be proposed: It is a novel addition to the repertoire of mammalian genes and may thus provide an innovative function. Figure 3 shows an alignment of the integrase domain encoded by CGIN1 and the sequences of several other integrases. In figure 1, we demonstrated that the integrase domain of CGIN1 has a sequence that is quite dissimilar to that of other integrases. Data in figure 3 show that such dissimilarity has functional implications. One of the characteristic features of the catalytic core of active integrases, the DDE motif, which is present not only in retroviral FIG. 3.Sequences of representative CGIN1, CGIN1-like, retroviral and retrotransposon integrase sequences. The locations of the HHCC and DDE domains are indicated. Arrows point to the critical residues that gave name to those domains. Notice that CGIN1 proteins lack the two last acidic residues of the DDE motif. integrases but also in eukaryotic and prokaryotic transposases, and is required for integrase activity (Haren et al. 1999), is missing in CGIN1. Two of the key amino acids have suffered nonconservative substitutions. This means that CGIN1 protein most probably lacks integrase activity. However, the critical residues in the HHCC domain, involved in integrase multimerization (see again the review by Haren et al. 1999), are intact. A model of the 3D structure of the integrase domain of CGIN1 suggests that it folds as a typical integrase, except in the DDE motif (supplementary fig. 4, Supplementary Material online). We may ask which could be the function of CGIN1 based on what is known of related genes. Some functional data exist for the N4BP1 protein, involved in the regulation of ubiquitination through its interaction with the ubiquitin ligase Itch. Oberst et al. (2007) showed that N4BP1 physically interacts with Itch, inhibiting further interactions with Itch substrates. We hypothesize that CGIN1 function may also be linked to the ubiquitination machinery, leading to a role in retroviral control. The enzymatically inactive integrase domain of CGIN1 could be incorporated into multimeric integrase complexes. After that (and given the inhibitory role described for N4BP1), CGIN1 might interfere with integrase complex ubiquitination and degradation. This may lead to repression of viral expression. It has been shown that ubiquitination and degradation of HIV1 integrase is essential for transcription of viral genes after provirus integration (Mousnier et al. 2007). Interestingly, we suggested a related mechanism for GIN1, which may explain the paucity of active Gypsy elements in mammals (Llorens and Marn 2001). Future experimental work may establish whether this hypothesis for CGIN1 protein function is correct. Supplementary Material Supplementary results, including four supplementary figures, and supplementary methods are available at Molecular Biology and Evolution online (http://www.mbe. oxfordjournals.org/). Acknowledgments This project was supported by grant BIO2008-05067 (Programa Nacional de Biotecnologa; Ministerio de Ciencia e Innovaci on, Spain). Literature Cited


This is a preview of a remote PDF: http://mbe.oxfordjournals.org/content/26/10/2167.full.pdf

Antonio Marco, Ignacio Marín. CGIN1: A Retroviral Contribution to Mammalian Genomes, Molecular Biology and Evolution, 2009, 2167-2170, DOI: 10.1093/molbev/msp127