GRAST: a new way of genome reduction analysis using comparative genomics

Jul 2006

Motivation: Establishment of intra-cellular life involved a profound re-configuration of the genetic characteristics of bacteria, including genome reduction and rearrangements. Understanding the mechanisms underlying these phenomena will shed light on the genome rearrangements essential for the development of an intra-cellular lifestyle. Comparison of genomes with differences in their sizes poses statistical as well as computational problems. Little efforts have been made to develop flexible computational tools with which to analyse genome reduction and rearrangements.

GRAST: a new way of genome reduction analysis using comparative genomics

BIOINFORMATICS ORIGINAL PAPER Vol. 22 no. 13 2006, pages 1551–1561 doi:10.1093/bioinformatics/btl139 Genome analysis GRAST: a new way of genome reduction analysis using comparative genomics Christina Toft and Mario A. Fares Molecular Evolution and Bioinformatics Laboratory, Department of Biology, National University of Ireland, Maynooth ABSTRACT Motivation: Establishment of intra-cellular life involved a profound re-configuration of the genetic characteristics of bacteria, including genome reduction and rearrangements. Understanding the mechanisms underlying these phenomena will shed light on the genome rearrangements essential for the development of an intra-cellular lifestyle. Comparison of genomes with differences in their sizes poses statistical as well as computational problems. Little efforts have been made to develop flexible computational tools with which to analyse genome reduction and rearrangements. Results: Investigation of genome reduction and rearrangements in endosymbionts using a novel computational tool (GRAST) identified gathering of genes with similar functions. Conserved clusters of functionally related genes (CGSCs) were detected. Heterogeneous gene and gene cluster non-functionalization/loss are identified between genome regions, functional gene categories and during evolution. Results show that gene non-functionalisation has accelerated during the last 50 MY of Buchnera’s evolution while CGSCs have been static. Availability: Software is available at http://biology.nuim.ie/staff/ mfmolecevolandbioinf.shtml/ Contact: INTRODUCTION Intra-cellular bacteria are characterized by their intimate biochemical and genetic relationships with the host that resulted in a pathogenic or symbiotic relationship. Symbiosis has been largely associated with the emergence of metabolic, ecological and genetic novelties in the host and the bacteria (Gil et al., 2002). The epidemiological behaviour of intracellular bacteria relies on specific population genetics factors that have an enormous influence on the mutational dynamics at the genome and proteome levels. The most important of these factors is the strong bottlenecks to which the bacterial effective population sizes are subjected between generations and the absence of lateral gene transfer and recombination (Tamas et al., 2002). This results in a high rate of fixation of slightly deleterious mutations by genetic drift (Rispe and Moran, 2000). This scenario has been confirmed through comparative genomic analyses (Moran and Mira, 2001; Silva et al., 2001; Tamas et al., 2002) and has been associated to the non-functionalization of genes (Andersson and Kurland, 1998; McClelland et al., 2004) followed by disintegration and genome reduction (Andersson and  To whom correspondence should be addressed. Andersson, 1999; Gil et al., 2002; Silva et al., 2001). As a result, intra-cellular bacteria are expected to form unstable biological systems (Kondrashov, 1988; Lynch et al., 1993). Despite this, the symbiotic relationship between the bacteria, such as the endosymbiotic bacteria of aphids Buchnera aphidicola, and their hosts has been successfully maintained for 100–150 MY. Mechanisms that compensate the effects of slightly deleterious mutations have been thus proposed (Moran, 1996) and demonstrated (Fares et al., 2002a, b). Effects attributable to the intracellular life are the reduced genome sizes and the high level of genome rearrangements (Belda et al., 2005; Mira et al., 2001). Understanding the underlying mechanisms responsible for such genome dynamics is instrumental in uncovering the genome rearrangement patterns and genes responsible for the establishment of the intra-cellular lifestyle. These mechanisms may also be crucial in defining the final outcome of the interaction between the biological system of the host and of the bacteria. An increasing number of computational tools have been developed to visualize genomes, locate genes, determine their function and identify their replication direction (Ciria et al., 2004; Ghai et al., 2004; Gibson and Smith, 2003; Stothard and Wishart, 2005; Vernikos et al., 2003). Other software identify conserved regions and rearrangements throughout the organism’s evolution by linear genomes comparison (Carver et al., 2005; Chen et al., 2005; Leader, 2004; Xie and Hood, 2003; Yang et al., 2003). Alternatives to these approaches have permitted the comparison of genome lengths, the identification of common genes and the presence and absence of genes or regions when comparing circular genomes (‘Genome versus Genome Protein Hits’ from TIGR CMR, http://www.tigr.org) (Romualdi et al., 2005). Tools for gene order comparison between two genomes have recently become available. The two genomes are BLASTed against each other and the most significant hits are plotted onto a graph where each of the axes represents one of the genomes (e.g. see Silva et al., 2001). These computational tools differ, in that while some software plot all BLAST hits that satisfies a specific threshold, normally set by the user (Celamkoti et al., 2004; Choi et al., 2005), others only plot the hit that are found by mutual top BLAST hits (GenePlot from NCBI, http://www.ncbi.nlm.nih.gov/ sutils/geneplot.cgi). Websites such as NCBI (http://www.ncbi.nlm.nih.gov), Microbes Online (Alm et al., 2005), STRING (von Mering et al., 2005), KEGG (Arakawa et al., 2005), BuchneraBASE (Prickett et al., 2006) and PLARCOM (Choi et al., 2005) perform a variety of  The Author 2006. Published by Oxford University Press. All rights reserved. For Permissions, please email: 1551 Received on February 3, 2006; revised and accepted on April 4, 2006 Advance Access publication April 6, 2006 Associate Editor: Dmitrij Frishman C.Toft and M.A.Fares SYSTEMS AND METHODS Orthologous pairs of genes between the reduced genome and the reference genomes are identified by mutual BLASTP (Altschul et al., 1997) searches of the genes of both genomes. Orthologous gene pairs are those finding each other as top BLAST hits with E-value being lower than a certain cut-off value. In this analysis only orthologous functional genes are compared between the two genomes. Genome sequences In this study we have compared the genomes of the endosymbiotic bacterium B.aphidicola from the aphid strains Acyrthosiphon pisum (BAp; Accession number: NC_002528), Schizaphis graminum (BSg; Accession number: NC_004061) and Baizongia pistaciae (BBp; Accession number: NC_4545) to that of their closest free-living relatives Escherichia coli K12 (E.coli; Accession number: NC_000913) and Salmonella typhimurium LT2 (S.typhimurium; Accession number: NC_003197). Similar to the establishment of endosymbiosis in aphids, both free-living bacteria were separated 100–160 MY. Genome rearrangements GRAST examines three ways in which genes can undergo rearrangements (Fig. 1). First, two adjacent genes in the reference genome can be separated in the rearranged genome by translocation ( (...truncated)


This is a preview of a remote PDF: https://academic.oup.com/bioinformatics/article-pdf/22/13/1551/48839248/bioinformatics_22_13_1551.pdf
Article home page: https://academic.oup.com/bioinformatics/article/22/13/1551/193590

Toft, Christina, Fares, Mario A.. GRAST: a new way of genome reduction analysis using comparative genomics, 2006, pp. 1551-1561, Volume 22, Issue 13, DOI: 10.1093/bioinformatics/btl139