GRAST: a new way of genome reduction analysis using comparative genomics
BIOINFORMATICS
ORIGINAL PAPER
Vol. 22 no. 13 2006, pages 1551–1561
doi:10.1093/bioinformatics/btl139
Genome analysis
GRAST: a new way of genome reduction analysis using
comparative genomics
Christina Toft and Mario A. Fares
Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth
ABSTRACT
Motivation: Establishment of intra-cellular life involved a profound
re-configuration of the genetic characteristics of bacteria, including genome reduction and rearrangements. Understanding the mechanisms
underlying these phenomena will shed light on the genome rearrangements essential for the development of an intra-cellular lifestyle.
Comparison of genomes with differences in their sizes poses statistical
as well as computational problems. Little efforts have been made to
develop flexible computational tools with which to analyse genome
reduction and rearrangements.
Results: Investigation of genome reduction and rearrangements in
endosymbionts using a novel computational tool (GRAST) identified
gathering of genes with similar functions. Conserved clusters of functionally related genes (CGSCs) were detected. Heterogeneous gene
and gene cluster non-functionalization/loss are identified between genome regions, functional gene categories and during evolution. Results
show that gene non-functionalisation has accelerated during the last
50 MY of Buchnera’s evolution while CGSCs have been static.
Availability: Software is available at http://biology.nuim.ie/staff/
mfmolecevolandbioinf.shtml/
Contact:
INTRODUCTION
Intra-cellular bacteria are characterized by their intimate biochemical and genetic relationships with the host that resulted in a pathogenic or symbiotic relationship. Symbiosis has been largely
associated with the emergence of metabolic, ecological and genetic
novelties in the host and the bacteria (Gil et al., 2002). The epidemiological behaviour of intracellular bacteria relies on specific
population genetics factors that have an enormous influence on the
mutational dynamics at the genome and proteome levels. The most
important of these factors is the strong bottlenecks to which the
bacterial effective population sizes are subjected between generations and the absence of lateral gene transfer and recombination
(Tamas et al., 2002). This results in a high rate of fixation of slightly
deleterious mutations by genetic drift (Rispe and Moran, 2000).
This scenario has been confirmed through comparative genomic
analyses (Moran and Mira, 2001; Silva et al., 2001; Tamas
et al., 2002) and has been associated to the non-functionalization
of genes (Andersson and Kurland, 1998; McClelland et al., 2004)
followed by disintegration and genome reduction (Andersson and
To whom correspondence should be addressed.
Andersson, 1999; Gil et al., 2002; Silva et al., 2001). As a result,
intra-cellular bacteria are expected to form unstable biological
systems (Kondrashov, 1988; Lynch et al., 1993). Despite this,
the symbiotic relationship between the bacteria, such as the
endosymbiotic bacteria of aphids Buchnera aphidicola, and their
hosts has been successfully maintained for 100–150 MY. Mechanisms that compensate the effects of slightly deleterious mutations
have been thus proposed (Moran, 1996) and demonstrated (Fares
et al., 2002a, b).
Effects attributable to the intracellular life are the reduced genome sizes and the high level of genome rearrangements (Belda
et al., 2005; Mira et al., 2001). Understanding the underlying mechanisms responsible for such genome dynamics is instrumental in
uncovering the genome rearrangement patterns and genes responsible for the establishment of the intra-cellular lifestyle. These mechanisms may also be crucial in defining the final outcome of the
interaction between the biological system of the host and of the
bacteria.
An increasing number of computational tools have been
developed to visualize genomes, locate genes, determine their
function and identify their replication direction (Ciria et al.,
2004; Ghai et al., 2004; Gibson and Smith, 2003; Stothard and
Wishart, 2005; Vernikos et al., 2003). Other software identify conserved regions and rearrangements throughout the organism’s
evolution by linear genomes comparison (Carver et al., 2005;
Chen et al., 2005; Leader, 2004; Xie and Hood, 2003; Yang
et al., 2003). Alternatives to these approaches have permitted the
comparison of genome lengths, the identification of common genes
and the presence and absence of genes or regions when comparing
circular genomes (‘Genome versus Genome Protein Hits’ from
TIGR CMR, http://www.tigr.org) (Romualdi et al., 2005).
Tools for gene order comparison between two genomes have
recently become available. The two genomes are BLASTed against
each other and the most significant hits are plotted onto a graph
where each of the axes represents one of the genomes (e.g. see Silva
et al., 2001). These computational tools differ, in that while some
software plot all BLAST hits that satisfies a specific threshold,
normally set by the user (Celamkoti et al., 2004; Choi et al.,
2005), others only plot the hit that are found by mutual top
BLAST hits (GenePlot from NCBI, http://www.ncbi.nlm.nih.gov/
sutils/geneplot.cgi).
Websites such as NCBI (http://www.ncbi.nlm.nih.gov), Microbes
Online (Alm et al., 2005), STRING (von Mering et al., 2005),
KEGG (Arakawa et al., 2005), BuchneraBASE (Prickett et al.,
2006) and PLARCOM (Choi et al., 2005) perform a variety of
The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email:
1551
Received on February 3, 2006; revised and accepted on April 4, 2006
Advance Access publication April 6, 2006
Associate Editor: Dmitrij Frishman
C.Toft and M.A.Fares
SYSTEMS AND METHODS
Orthologous pairs of genes between the reduced genome and the reference
genomes are identified by mutual BLASTP (Altschul et al., 1997) searches
of the genes of both genomes. Orthologous gene pairs are those finding each
other as top BLAST hits with E-value being lower than a certain cut-off
value. In this analysis only orthologous functional genes are compared
between the two genomes.
Genome sequences
In this study we have compared the genomes of the endosymbiotic bacterium
B.aphidicola from the aphid strains Acyrthosiphon pisum (BAp; Accession
number: NC_002528), Schizaphis graminum (BSg; Accession number:
NC_004061) and Baizongia pistaciae (BBp; Accession number:
NC_4545) to that of their closest free-living relatives Escherichia coli
K12 (E.coli; Accession number: NC_000913) and Salmonella typhimurium
LT2 (S.typhimurium; Accession number: NC_003197). Similar to the
establishment of endosymbiosis in aphids, both free-living bacteria were
separated 100–160 MY.
Genome rearrangements
GRAST examines three ways in which genes can undergo rearrangements
(Fig. 1). First, two adjacent genes in the reference genome can be separated
in the rearranged genome by translocation ( (...truncated)