CGIN1: A Retroviral Contribution to Mammalian Genomes
Center for Evolutionary Functional Genomics, The Biodesign Institute, Arizona State University; and Instituto de Biomedicina de Valencia
, Consejo Superior de Investigaciones Cientficas (IBV-CSIC),
This study describes the origin and structural features of a mammalian gene, CGIN1 (Cousin of GIN1). CGIN1 proteins contain an NYN domain, retroviral RNase H and integrase domains, and a domain of unknown function (CGIN1 domain) that is also present in two other genes (N4BP1 and KIAA0323). We suggest that CGIN1 derives from the fusion of a KIAA0323-like gene with retroviral sequences, which occurred prior to the marsupial-eutherian split. Sequence and structural analyses indicate that the CGIN1 integrase domain is inactive but still retains the 3D folding observed in retroviral integrases. We hypothesize that CGIN1 may contribute to retroviral resistance in mammals by regulating the ubiquitination of viral proteins.
Mammalian genomes are full of sequences that
derive from retroviruses and retrotransposons, some of
which have been recruited to perform cellular functions
(Smit 1999; Maka1owski 2000; Nekrutenko and Li
2001; Britten 2006). Not only sequences derived from
retroviral long terminal repeats (LTRs) act as promoters of
some cellular genes (Stavenhagen and Robins 1988; Ting
et al. 1992; Ling et al. 2002), but also some coding
sequences from retrotransposons and retroviruses have been
coopted to perform functions for the host. Among the
best-known cases are those of the primate syncitin gene,
essential for placentation, and the murine Fv1 and Fv4
genes, involved in resistance against retrovirus infection
(Ikeda et al. 1985; Best et al. 1996; Kozak and Chakraborti
1996; Qi et al. 1998; Mi et al. 2000; Goff 2004; Bonnaud
et al. 2005). Protection against infection was also
hypothesized to be the function of GIN1 (Gypsy integrase 1),
a cellular gene derived from the integrase of an LTR
retrotransposon (Llore ns and Marn 2001). More recently,
several other genes of unknown functions derived from
retroviral or retrotransposon sequences have been
characterized in vertebrates (Zdobnov et al. 2005; Campillos
et al. 2006).
When we recently performed a search for genes related
to GIN1, we detected another mammalian gene with a
similar integrase domain, which we have called Cousin of
GIN1 (CGIN1; formerly KIAA1305). In our species, it is
located in chromosome 14q11.2 and encodes for
a 1,898-amino-acid-long protein. Human CGIN1 is widely
expressed, according to the data compiled in UniGene
(http://www.ncbi.nlm.nih.gov/UniGene/). CGIN1 genes,
very similar in sequence and structure, were found to be
restricted to mammals, including the marsupial
Monodelphis domestica (opossum; see supplementary results and
supplementary fig. 1, Supplementary Material online).
However, we did not detect any CGIN1 gene in
monotremes, such as the platypus, Ornithorhynchus anatinus.
These results suggest that CGIN1 emerged after the
monotreme split from the rest of mammals, but before the
marsupialeutherian split, that is, 125180 Ma.
In phylogenetic analyses using the sequences of
integrase domains (see supplementary methods,
Supplementary Material online), CGIN1 integrase domains appeared
as a monophyletic group in an intermediate position
between the integrases of retroviruses and gypsy
retrotransposons (fig. 1). The sequences most similar to CGIN1
integrase domains were a few integrases detected in fishes
and birds (fig. 1). Our findings refute a previous description
of the gene CGIN1 as being related to Sushi retrotransposons
(Youngson et al. 2005, which called the gene
Sushi14C1). Figure 1 shows that the integrases of Sushi elements
and the CGIN1 integrase domains are totally unrelated. We
found that these CGIN1-like sequences corresponded to
endogenous retroviruses (ERVs; see supplementary results,
Supplementary Material online). Additional analyses using
reverse transcriptase sequences confirmed that the
CGIN1like sequences group with retroviruses and not with gypsy
retrotransposons (supplementary fig. 2, Supplementary
Material online). The simplest hypothesis to explain these
results is therefore that part of CGIN1 has a retroviral origin.
The structure of the protein encoded by CGIN1 is complex.
Combining Blast, Prosite, and InterProScan analyses (see
supplementary methods, Supplementary Material online),
we determined that the gene contains four regions related
to domains found in other proteins (amino acids 24196,
790926, 13081446, and 16091730, respectively, in
human CGIN1 protein). The first conserved domain, so far
undescribed and that we have called CGIN1 domain,
is present in two other human proteins, encoded by the
genes N4BP1 and KIAA0323, as well as in the proteins
encoded by the orthologs of those two genes in other species.
The second conserved region corresponds to an NYN
domain, a domain of unknown function described by
Anantharaman and Aravind (2006) in multiple eukaryotic and
prokaryotic proteins. Experimental data for NYN domain
functions are not yet available. The third and fourth
domains in CGIN1 contain an RNase H fold. The third domain
may correspond to a highly divergent RNase H. The fourth
corresponds to the integrase, already mentioned.
Figure 2 shows the structures deduced for all the
human proteins that contain NYN domains. Phylogenetic
analyses with NYN domain sequences indicate that CGIN1
and KIAA0323 are closely related (see supplementary fig. 3,
Supplementary Material online). This is confirmed by the
structures of the two genes, which only differ significantly
FIG. 2.Structures of human NYN domaincontaining proteins. CGIN1: CGIN1 domain; NYN: NYN domain; RNaseH: Ribonuclease H domain;
IN: Integrase domain; and CCCH: C3H zinc finger.
in their final exons. The last exon of CGIN1 contains the
sequences of retroviral origin (i.e., both the putative RNAse H
domainencoding sequences and the integrase domain
encoding sequences), whereas the last exon of KIAA0323
lacks those sequences (see supplementary fig. 1,
Supplementary Material online). KIAA0323 is also mammalian specific.
However, it is present not only in marsupials and eutherians,
as CGIN1, but also in monotremes. It is therefore older than
CGIN1. Significantly, KIAA0323 is found adjacent to CGIN1
in the human genome, in the same strand and orientation.
These results indicate that CGIN1 is a KIAA0323 duplicate
that suffered the substitution of its last exon by a fragment
of an ERV. The precise way of CGIN1 emergence, as the
product of a duplication plus a recombination event leading to the
fusion of sequences of different origin, is identical to the one
that we described some time ago for the PARC gene (Marn
and Ferrus 2002; Marn et al. 2004). However, in the case of
PARC, recombination merged two genes that encoded
potentially interacting proteins. That made reasonable to postulate
that such fusion was a secondary event that provided the
advantage of avoiding the independent regulation of two genes
whose products could be needed in the same tissues and
potentially at the same levels (Marn et al. 2004). In the case of
CGIN1, such interpretation cannot be proposed: It is a novel
addition to the repertoire of mammalian genes and may thus
provide an innovative function.
Figure 3 shows an alignment of the integrase domain
encoded by CGIN1 and the sequences of several other
integrases. In figure 1, we demonstrated that the integrase
domain of CGIN1 has a sequence that is quite dissimilar to
that of other integrases. Data in figure 3 show that such
dissimilarity has functional implications. One of the
characteristic features of the catalytic core of active integrases, the
DDE motif, which is present not only in retroviral
FIG. 3.Sequences of representative CGIN1, CGIN1-like, retroviral and retrotransposon integrase sequences. The locations of the HHCC and
DDE domains are indicated. Arrows point to the critical residues that gave name to those domains. Notice that CGIN1 proteins lack the two last acidic
residues of the DDE motif.
integrases but also in eukaryotic and prokaryotic
transposases, and is required for integrase activity (Haren et al.
1999), is missing in CGIN1. Two of the key amino acids
have suffered nonconservative substitutions. This means
that CGIN1 protein most probably lacks integrase activity.
However, the critical residues in the HHCC domain,
involved in integrase multimerization (see again the review
by Haren et al. 1999), are intact. A model of the 3D
structure of the integrase domain of CGIN1 suggests that it folds
as a typical integrase, except in the DDE motif
(supplementary fig. 4, Supplementary Material online).
We may ask which could be the function of CGIN1 based
on what is known of related genes. Some functional data
exist for the N4BP1 protein, involved in the regulation of
ubiquitination through its interaction with the ubiquitin ligase
Itch. Oberst et al. (2007) showed that N4BP1 physically
interacts with Itch, inhibiting further interactions with Itch
substrates. We hypothesize that CGIN1 function may also
be linked to the ubiquitination machinery, leading to a role
in retroviral control. The enzymatically inactive integrase
domain of CGIN1 could be incorporated into multimeric
integrase complexes. After that (and given the inhibitory role
described for N4BP1), CGIN1 might interfere with integrase
complex ubiquitination and degradation. This may lead to
repression of viral expression. It has been shown that
ubiquitination and degradation of HIV1 integrase is essential for
transcription of viral genes after provirus integration
(Mousnier et al. 2007). Interestingly, we suggested a related
mechanism for GIN1, which may explain the paucity of
active Gypsy elements in mammals (Llorens and Marn
2001). Future experimental work may establish whether
this hypothesis for CGIN1 protein function is correct.
Supplementary results, including four supplementary
figures, and supplementary methods are available at
Molecular Biology and Evolution online (http://www.mbe.
This project was supported by grant BIO2008-05067
(Programa Nacional de Biotecnologa; Ministerio de
Ciencia e Innovaci on, Spain).