Genome-level homology and phylogeny of Shewanella (Gammaproteobacteria: lteromonadales: Shewanellaceae) (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/1471-2164-12-237.pdf

Genome-level homology and phylogeny of Shewanella (Gammaproteobacteria: lteromonadales: Shewanellaceae)

Rebecca B Dikow 0 1 0 Committee on Evolutionary Biology, The University of Chicago , Chicago, IL , USA 1 Division of Fishes, The Field Museum of Natural History , Chicago, IL , USA Background: The explosion in availability of whole genome data provides the opportunity to build phylogenetic hypotheses based on these data as well as the ability to learn more about the genomes themselves. The biological history of genes and genomes can be investigated based on the taxomonic history provided by the phylogeny. A phylogenetic hypothesis based on complete genome data is presented for the genus Shewanella (Gammaproteobacteria: Alteromonadales: Shewanellaceae). Nineteen taxa from Shewanella (16 species and 3 additional strains of one species) as well as three outgroup species representing the genera Aeromonas (Gammaproteobacteria: Aeromonadales: Aeromonadaceae), Alteromonas (Gammaproteobacteria: Alteromonadales: Alteromonadaceae) and Colwellia (Gammaproteobacteria: Alteromonadales: Colwelliaceae) are included for a total of 22 taxa. Results: Putatively homologous regions were found across unannotated genomes and tested with a phylogenetic analysis. Two genome-wide data-sets are considered, one including only those genomic regions for which all taxa are represented, which included 3,361,015 aligned nucleotide base-pairs (bp) and a second that additionally includes those regions present in only subsets of taxa, which totaled 12,456,624 aligned bp. Alignment columns in these large data-sets were then randomly sampled to create smaller data-sets. After the phylogenetic hypothesis was generated, genome annotations were projected onto the DNA sequence alignment to compare the historical hypothesis generated by the phylogeny with the functional hypothesis posited by annotation. Conclusions: Individual phylogenetic analyses of the 243 locally co-linear genome regions all failed to recover the genome topology, but the smaller data-sets that were random samplings of the large concatenated alignments all produced the genome topology. It is shown that there is not a single orthologous copy of 16S rRNA across the taxon sampling included in this study and that the relationships among the multiple copies are consistent with 16S rRNA undergoing concerted evolution. Unannotated whole genome data can provide excellent raw material for generating hypotheses of historical homology, which can be tested with phylogenetic analysis and compared with hypotheses of gene function. - Background Shewanella is a genus of marine and freshwater gramnegative Gammaproteobacteria within the monogeneric family Shewanellaceae Ivanova et al., 2004. While members of Shewanella have been recognized since 1931 (e. g. Achromobacter putrefaciens Derby and Hammer 1931 now Shewanella putrefaciens), the genus Shewanella has only been recognized with its present name since 1985 [1] and 39 of the 52 currently recognized species have been described since 2000 [2]. There are also multiple strains that are commonly studied but have not been given a proper name (some of these have been included below and will be referred to by their strain number). Members of Shewanella have been described from diverse habitats, including deep cold-water marine environments to shallow Antarctic Ocean habitats to hydrothermal vents and freshwater lakes (see Table 1[1,3-21]). Shewanella has been of great interest due to the ability Table 1 Taxon table and Mauve results Shewanella baltica OS223 Shewanella baltica OS155 Shewanella baltica OS185 Shewanella baltica OS195 Shewanella pealeana ATCC 700345 Shewanella piezotolerans WP3 Shewanella putrefaciens CN-32 Shewanella sediminis HAW-EB3 Shewanella sp. ANA-3 LCBs present in all taxa of its species to convert heavy metals and toxic substances (e.g. iron, sulfur, uranium) into less toxic products by using them as electron acceptors in certain respiratory situations, making them of interest for environmental clean-up (e.g. iron, sulfur: [22]; uranium: [23]). To this end, 19 genomes have been fully sequenced and deposited on GenBank as of 2009. Annotations suggest that species possess approximately 5,000 genes and have genomes of approximately 5 Mbp (details in Table 1). The goal of the study presented here is to investigate how we can use whole genome data, not only to build a tree but to inform us of gene and genome history by comparing the hypothesis of historical homology supported by the phylogenetic hypothesis to what is known about gene function. There is a computational interest in the ability to build large trees, both in number of taxa and number of characters, e.g. [24,25]. The biological history of genes and genomes can be investigated based on the taxomonic history of the bearers of these characters. This goes further than just the prediction of function of uncharacterized genes, but also includes the potential to track changing function over gene history and finding up- or down-stream segments of co-evolving DNA. Eisen and Fraser highlighted many of these goals when they introduced the term phylogenomics [26]. While these goals are broad and ambitious, it is the hope that the present study represents a step in this direction. The presented approach also represents a shift for phylogenetic systematics, in which historically one has generally known all the characters of interest very well and perhaps had a well-formed opinion about their history based on a lifetime of knowledge about their distribution and subtle variations. Even with molecular characters in the form of one or a few genes, even with many taxa, one gets to know the reliable parts of an alignment and often memorizes the DNA sequence after having sequenced and edited the same marker for several years. The approach presented here proposes a new perspective which is obligated by the new kinds of data being gathered, particularly those from next-generation and shotgun sequencing, which generate millions of nucleotide base-pairs (bp) as opposed to thousands. Primary homology (sensu dePinna, [27]) must be determined in an automated fashion given the vast amount of data and the few character states of nucleotide data. The phylogenetic tree becomes an intermediate point it is built based on hypotheses of primary homology, which it tests, and then is used as a framework for optimizing the character states and looking back to functional gene annotations to begin to answer questions about gene and genome history. Polymerase chain reaction (PCR) primers can provide hypotheses of primary homology, as amplifications using primers target conserved flanking regions, which provide a sufficient level of confidence that the same regions are being sequenced. With next-generation sequencing, we have no such sense of location (particularly with bacteria), as we expect rearrangement of genes or other genomic segments over evolutionary history [28-31]. Annotations can provide information about the function of genes and the location of op (...truncated)