A new example of viral intein in Mimivirus
Hiroyuki Ogata
1
Didier Raoult
0
Jean-Michel Claverie
1
0
Unite des Rickettsies, CNRS UPRESA 6020, Faculte de Medecine
,
27 Boulevard Jean Moulin, 13385 Marseille Cedex 05
,
France
1
Information Genomique et Structurale
,
UPR2589 CNRS, IBSM, IFR88, 31 chemin Joseph Aiguier, 13402 Marseille Cedex 20
,
France
Background: Inteins are "protein introns" that remove themselves from their host proteins through an autocatalytic protein-splicing. After their discovery, inteins have been quickly identified in all domains of life, but only once to date in the genome of a eukaryote-infecting virus. Results: Here we report the identification and bioinformatics characterization of an intein in the DNA polymerase PolB gene of amoeba infecting Mimivirus, the largest known double-stranded DNA virus, the origin of which has been proposed to predate the emergence of eukaryotes. Mimivirus intein exhibits canonical sequence motifs and clearly belongs to a subclass of archaeal inteins always found in the same location of PolB genes. On the other hand, the Mimivirus PolB is most similar to eukaryotic Pol sequences. Conclusions: The intriguing association of an extremophilic archaeal-type intein with a mesophilic eukaryotic-like PolB in Mimivirus is consistent with the hypothesis that DNA viruses might have been the central reservoir of inteins throughout the course of evolution.
-
Background
Mimivirus is the largest known virus, both in particle size
(>0.4 m in diameter) and genome length, recently
discovered in amoeba, following the inspection of a hospital
cooling tower prompted by a pneumonia outbreak [1].
Recently, its entire 1.2-Mbp genome sequence was
determined [2]. Extensive phylogenetic studies and gene
content analyses defined Mimivirus as a new family of
nucleocytoplasmic large DNA viruses (NCLDV) besides
Poxviridae, Iridoviridae, Phycodnaviridae and Asfarviridae,
and suggested its early origin, probably before the
individualization of the three domains of life [2].
While analyzing Mimivirus genome sequence, we noticed
the unusual length of its putative DNA polymerase. A
detailed analysis identified an intein in this gene. After the
recent discovery of an intein in Chilo iridescent virus [3],
an insect-infecting NCLDV of Iridoviridae, this is the
second report of an intein sequence in a eukaryote-infecting
virus.
Inteins are "protein introns" that catalyze self-splicing at
the protein level. The splicing is defined by the
self-catalytic excision of an intervening sequence ("intein") from a
precursor host protein where it is located, and the
concomitant ligation of the flanking amino- and
carboxy-terminal fragments ("exteins") of the precursor. Inteins often
possess a homing endonuclease domain, and are
considered as mobile elements. Since their first discovery in
1990 [4,5], inteins have been identified in a wide variety
of organisms, including bacteria, archaea, and unicellular
eukaryotes, albeit with sporadic distribution (see http://
bioinformatics.weizmann.ac.il/~pietro/inteins/ for a
comprehensive list). For instance, they are relatively
abundant in some hyperthermophilic archaea species (such as
Methanococcus jannaschii possessing nineteen inteins), but
absent in closely related species such as Methanococcus
maripaludis [6]. Similarly, they are observed in many
unrelated bacterial clades, but appear often limited to several
species within each clade. It was suggested that viruses
were potential "vectors" of inteins across species and
responsible for the sporadic distribution of inteins [3].
Accordingly, inteins have been identified in many
bacteriophages and prophages [7-10]. To our knowledge, the
sole published account of eukaryote-infecting viruses
harboring an intein concerns iridoviruses [3].
Results
Eukaryotic Pol-like Mimivirus PolB
Mimivirus genome sequence exhibits a putative ORF
(R322, 1740 amino acid long) corresponding to a family
B DNA polymerase PolB. This ORF R322 exhibits high
scoring sequence homology (BLAST E-value<10-24)
against eukaryotic PolBs in the public database. However,
this Mimivirus PolB is much larger than its eukaryotic and
viral homologues (about 1000 aa), and its optimal
alignment with the other PolB sequences reveals four
unmatched extraneous segments (Fig. 1A, Fig. S1).
Focusing on these extra segments, we identified a 351-aa intein
(position 1053 to 1403) in the Mimivirus PolB sequence.
After removing those four Mimivirus specific insertions,
the Mimivirus PolB sequence exhibited the highest BLAST
scores (E-value = 10-125, 32% identity) against a soybean
DNA polymerase Pol (SWISS-PROT: O48901) with an
alignment covering both the entire Mimivirus and the
target sequence. Near equivalent matches are observed with
a variety of eukaryotic (from yeast to human) family B
DNA polymerase sequences. The best viral homologues
were found in phycodnaviruses (E-value = 10-116).
Conserved carboxylate residues (aspartate and glutamate) at
the exonuclease and polymerase active sites [11,12] were
all identified in the Mimivirus PolB (Fig. S1). There was
no other ORF encoding a putative PolB in the genome.
These suggest that R322 encodes a functional PolB.
Consistent with the homology search result, a phylogenetic
analysis places the Mimivirus PolB near the root of
eukaryotic Pols (Fig. 1B). A similar branching position is
obtained for the seven universally conserved Mimivirus
genes [2]. Despite low bootstrap values for some of the
deep branches in the Fig. 1B, this tree clearly indicates the
lack of any specific affinity between the Mimivirus PolB
and the archaeal PolB sequences containing inteins (bold
letters in the Fig. 1B). It should also be noted that several
other large DNA viruses are known to possess PolBs with
a similar phylogenetic pattern [13].
Canonical/archaeal type Mimivirus intein
The Mimivirus intein sequence (351 aa) exhibits
significant sequence similarities to several known inteins
(Evalue<10-4), all of which are from
thermophilic/halophilic archaea. The best matching intein (E-value = 3
108) is the second intein of the Thermococcus sp. PolB
(InBase: Tsp-GE8 Pol-2) with 24% amino acid sequence
identity. The Mimivirus sequence exhibits all the expected
features required for an active intein (Fig. 2). Sequence
motifs [14] characterizing the splicing domain (N1-4, C2,
C1) and the dodecapeptide LAGLIDADG
homing-endonuclease domain (EN1-4) were all identified in the
Mimivirus sequence except N4 motif. N4 motif is occasionally
absent in the previously characterized active inteins [14].
Amino acid residues providing nucleophilic groups in
self-splicing reactions are all present: the first serine and
the last asparagine residues of the intein, and the first
threonine residue of the downstream extein. Accordingly the
Mimivirus intein is a canonical "asparagine-type" intein,
of which the close homologues have previously been
observed only in archaea species. In contrast, the
previously reported Chilo iridescent virus intein is a
noncano (...truncated)