GOHTAM: a website for ‘Genomic Origin of Horizontal Transfers, Alignment and Metagenomics’
Copyedited by: TRJ
MANUSCRIPT CATEGORY: APPLICATIONS NOTE
BIOINFORMATICS APPLICATIONS NOTE
Genome analysis
Vol. 28 no. 9 2012, pages 1270–1271
doi:10.1093/bioinformatics/bts118
Advance Access publication March 15, 2012
GOHTAM: a website for ‘Genomic Origin of Horizontal Transfers,
Alignment and Metagenomics’
Sabine Ménigaud† , Ludovic Mallet∗† , Géraldine Picord, Cécile Churlaud, Alexandre Borrel
and Patrick Deschavanne∗
Molécules Thérapeutiques in silico, Institut National de la Santé et de la Recherche Médicale (INSERM) UMR-S 973,
Université Paris Diderot, Sorbonne Paris Cité, 35 rue Héléne Brion, 75013, Paris, France
Associate Editor: Martin Bishop
Received on August 16, 2011; revised on March 2, 2012; accepted
on March 6, 2012
1
INTRODUCTION
Horizontal transfers (HTs) are a major force of evolution (Keeling
and Palmer, 2008; Ochman et al., 2000) and this website
proposes methods for their detection. The genomic signature was
demonstrated to be species-specific (Deschavanne et al., 1999;
Sandberg et al., 2001) and allows HT detection in terms of
tetranucleotide frequencies (Dufraigne et al., 2005). Parametric
methods were designed to work only with the information contained
in genomic sequences. They rely either on the whole set of genes or
on local variations of genomic signature (Dufraigne et al., 2005;
Mallet et al., 2010). Recently, a benchmark has determined the
most efficient parametric methods in different conditions and has
proposed to use a combination of methods to analyze HTs in
genomes (Becq et al., 2010). This site provides user-friendly access
to such methods as well as some unique features including signaturebased phylogeny and potential origin of a set of metagenomics
sequences.
Fig. 1. Some partial screens of the website. (A) Window-based HT detection;
(B) table of neighbors; (C) signature-based phylogenetic tree; (D) species
signature; and (E) genome alignment.
2
2.1
whom correspondence should be addressed.
authors wish it to be known that, in their opinion, the first two authors
should be regarded as joint First Authors.
† The
HT detection
The two methods proposed can be used alone or in combination. The
first is a window-based signature method as described in Dufraigne
et al. (2005), except that the distance used is the Jensen–Shannon
divergence, a symmetric version of the Kullback–Leibler divergence
(Azad and Lawrence, 2007; Becq et al., 2010). Either sensitivity
or specificity can be increased by adjustable classification process
(Azad and Lawrence, 2007). A gene-based method is also proposed
with the same distance (Becq et al., 2010). Up to now, these methods
were never proposed for online genome analysis (Fig. 1A).
2.2
Bank of genomic signatures
A key feature of GOHTAM is the biggest bank of genomic signatures
to date. Instead of using only complete genomes (van Passel et al.,
2005; Teeling et al., 2004), this bank is based on the whole
set of sequences of Genbank (release 188, only sequences <1 kb
were discarded) and contains ∼248 000 tetranucleotide species
signatures. The bank is updated at each major release.
2.3
∗ To
GOHTAM SERVICES
ABSTRACT
Motivation: This website allows the detection of horizontal transfers
based on a combination of parametric methods and proposes an
origin by researching neighbors in a bank of genomic signatures.
This bank is also used to research an origin to DNA fragments from
metagenomics studies.
Results: Different services are provided like the possibility of
inferring a phylogenetic tree with sequence signatures or comparing
two genomes and displaying the rearrangements that happened
since their separation.
Availability and implementation: http://gohtam.rpbs.univ-parisdiderot.fr/
Contact: ; ludovic.mallet
@jouy.inra.fr
Supplementary information: Supplementary data are available at
Bioinformatics online http://gohtam.rpbs.univ-paris-diderot.fr:8080/
Data/bin/GOHTAM_bin.tgz
Origin of transferred regions
Each detected region signature is compared with the signatures of the
bank and the 10 closest neighbors are displayed with a confidence
rating depending on the length of both query and reference sequences
and the distance between the two signatures (Fig. 1B).
© The Author(s) 2012. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/
by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
[15:35 10/4/2012 Bioinformatics-bts118.tex]
Page: 1270
1270–1271
Copyedited by: TRJ
MANUSCRIPT CATEGORY: APPLICATIONS NOTE
GOHTAM
2.4
Metagenomics
In the case of a metagenomics study, a sequence or a set of
sequences (multi-Fasta) is loaded; the signatures of these sequences
are compared as above to propose a species of origin.
2.5
Oligonucleotide content
The whole set of tetranucleotides of a sequence represents the
signature of a sequence (Deschavanne et al., 1999). This signature
of the 256 possible tetranucleotides is under the form of a 16×16
frequency matrix and can be displayed as a signature image
(Fig. 1D).
2.6
Phylogenetic tree of sequence signatures
2.7
Genome alignment
The website uses maximum unique matches (MUMs) to align
genomes. All rearrangements superiors to 1 kb between two
genomes are graphically displayed with the possibility to choose
a region or modify the length of MUMs (Fig. 1E; Delcher et al.
1999).
3
IMPLEMENTATION
Except for use of programs like the Phylip package (http://evolution
.gs.washington.edu/phylip.html) or Mummer (http://mummer
.sourceforge.net/), the original programs are written in Python, Perl
or R and available at: http://gohtam.rpbs.univ-paris-diderot.fr:8080/
Data/bin/GOHTAM_bin.tgz
An online help is available. Some analyses require time; HT
detection lasts ∼6 min and the research for neighbors ∼2 min
depending on the server load and the sequence length.
This site provides some unique features in terms of HT detection,
origin of HT regions, metagenomics studies as well as for
Funding: This work was supported by a grant from ANR MIE/TBHits 2010.
Conflict of Interest: none declared.
REFERENCES
Azad,R.K. and LawrenceJ.G. (2007) Detecting laterally transferred genes: use of
entropic clustering methods and genome position. Nucleic Acids Res., 35,
4629–4639.
Becq,J. et al. (2010) ‘A benchmark of parametric methods for horizontal transfers
detection’. PLoS ONE, 5, e9989.
Chapus,C. et al. (2005) ‘Exploration of phylogenetic data using a global sequence
analysis method’ BMC Evol. Biol., 5, 63–83.
Delcher,A.L. et al. (1999) ‘Alignment of whole genomes’ Nucleic Acids Res, 27,
2369–2376.
Deschavanne,P.J. et al. (1999). ‘Genomic signature: characterization and classification
of species assessed by Chaos Game Representation of sequences’. Mol. Biol. Evol.,
16, 1391–1399.
Dufraigne,C. et al. (2005) ‘Detection and character (...truncated)