A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes
Genome Biology
Re e2CVt0oa0rlults9.emezea1r0c,hIssue 6, Article R65 A hidden reservoir of integrative elements is the major source of recently acquired foreign genes and ORFans in archaeal and bacterial genomes
Diego Cortez 0
Patrick Forterre 0
Simonetta Gribaldo 0
0 Address: Institut Pasteur, Département de Microbiologie, Unité de Biologie Moléculaire du Gène chez les Extrêmophiles , Rue du Dr Roux, 75 724 PARIS cedex 15 , France
Background: Archaeal and bacterial genomes contain a number of genes of foreign origin that arose from recent horizontal gene transfer, but the role of integrative elements (IEs), such as viruses, plasmids, and transposable elements, in this process has not been extensively quantified. Moreover, it is not known whether IEs play an important role in the origin of ORFans (open reading frames without matches in current sequence databases), whose proportion remains stable despite the growing number of complete sequenced genomes. Results: We have performed a large-scale survey of potential recently acquired IEs in 119 archaeal and bacterial genomes. We developed an accurate in silico Markov model-based strategy to identify clusters of genes that show atypical sequence composition (clusters of atypical genes or CAGs) and are thus likely to be recently integrated foreign elements, including IEs. Our method identified a high number of new CAGs. Probabilistic analysis of gene content indicates that 56% of these new CAGs are likely IEs, whereas only 7% likely originated via horizontal gene transfer from distant cellular sources. Thirty-four percent of CAGs remain unassigned, what may reflect a still poor sampling of IEs associated with bacterial and archaeal diversity. Moreover, our study contributes to the issue of the origin of ORFans, because 39% of these are found inside CAGs, many of which likely represent recently acquired IEs. Conclusions: Our results strongly indicate that archaeal and bacterial genomes contain an impressive proportion of recently acquired foreign genes (including ORFans) coming from a still largely unexplored reservoir of IEs.
Background
Integrative elements (IEs) such as viruses and plasmids and
their associated hitchhiking elements, transposons,
integrons, and so on, mediate the movement of DNA within
genomes and between genomes, and play a key role in the
emergence of infectious diseases, antibiotic resistance,
biotransformation of xenobiotics, and so on [
1-3
]. Traces of
IE activity have been highlighted in many prokaryotic
genomes, which carry different repertoires of inserted
prophages, plasmids, transposons and/or genomic islands
[
4-7
]. These few characterized IEs are most likely only a
reflection of a more diverse and still unknown IE universe
that shapes bacterial and archaeal genomes [8].
The importance of IEs in the origin of ORFans (open reading
frames (ORFs) without matches in current sequence
databases) [
9
] is still controversial. Indeed, the source of ORFans
remains a major mystery of the post-genomic era since,
contrary to previous expectations, their proportion remains
stable despite the increasing number of complete genome
sequences available [
10
]. It has been suggested that ORFans
are either misannotated genes, rapidly evolving sequences,
newly formed genes, or genes recently transferred from not
yet sequenced cellular or viral genomes [
10,11
]. The
possibility that ORFans originate from the integration of elements of
viral origin is appealing since viral genomes themselves
always contain a high proportion of ORFans [
12,13
].
Consistent with this hypothesis, Daubin and Ochman [
14
] noticed
that ORFans from γ-Proteobacteria share several features
with viral ORFans (for example, small size, AT-rich) and
suggested that 'ORFans in the genomes of free-living
microorganisms apparently derive from bacteriophages and
occasionally become established by assuming roles in key
cellular functions.' However, Yin and Fisher [
10
] recently
reported that, on average, only 2.8% of all cellular ORFans
have homologues in current viral sequence databases, raising
doubts about the hypothesis of a viral origin of ORFans, and
proposed that 'lateral transfer from viruses alone is unlikely
to explain the origin of the majority of ORFans in the majority
of prokaryotes and consequently, other, not necessarily
exclusive, mechanisms are likely to better explain the origin of the
increasing number of ORFans.' More recently, the same
authors found that only 18% of viral ORFans (ORFs present
in only one viral genome) have homologues in archaeal or
bacterial genomes, and concluded that 'phage ORFans play a
lesser role in horizontal gene transfer to prokaryotes' [
12
].
Several in silico methods based on composition have been
conceived in the past few years to identify foreign genes that
were recently acquired by cellular genomes, such as atypical
G+C content, atypical codon usage, Markov model
(MM)based approaches, and Bayesian model (BM)- (...truncated)