The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes

PLoS Computational Biology, Dec 2009

Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pcbi.1000593&representation=PDF

The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes

et al. (2009) The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes. PLoS Comput Biol 5(12): e1000593. doi:10.1371/journal.pcbi.1000593 The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes Florent E. Angly Dana Willner Alejandra Prieto-Davo Robert A. Edwards Robert Schmieder Rebecca Vega-Thurber Dionysios A. Antonopoulos Katie Barott Matthew T. Cottrell Christelle Desnues Elizabeth A. Dinsdale Mike Furlan Matthew Haynes Matthew R. Henn Yongfei Hu David L. Kirchman Tracey McDole John D. McPherson Folker Meyer R. Michael Miller Egbert Mundt Robert K. Naviaux Beltran Rodriguez-Mueller Rick Stevens Linda Wegley Lixin Zhang Baoli Zhu Forest Rohwer Gary D. Stormo, Washington University School of Medicine, United States of America Metagenomic studies characterize both the composition and diversity of uncultured viral and microbial communities. BLAST-based comparisons have typically been used for such analyses; however, sampling biases, high percentages of unknown sequences, and the use of arbitrary thresholds to find significant similarities can decrease the accuracy and validity of estimates. Here, we present Genome relative Abundance and Average Size (GAAS), a complete software package that provides improved estimates of community composition and average genome length for metagenomes in both textual and graphical formats. GAAS implements a novel methodology to control for sampling bias via length normalization, to adjust for multiple BLAST similarities by similarity weighting, and to select significant similarities using relative alignment lengths. In benchmark tests, the GAAS method was robust to both high percentages of unknown sequences and to variations in metagenomic sequence read lengths. Re-analysis of the Sargasso Sea virome using GAAS indicated that standard methodologies for metagenomic analysis may dramatically underestimate the abundance and importance of organisms with small genomes in environmental systems. Using GAAS, we conducted a meta-analysis of microbial and viral average genome lengths in over 150 metagenomes from four biomes to determine whether genome lengths vary consistently between and within biomes, and between microbial and viral communities from the same environment. Significant differences between biomes and within aquatic sub-biomes (oceans, hypersaline systems, freshwater, and microbialites) suggested that average genome length is a fundamental property of environments driven by factors at the sub-biome level. The behavior of paired viral and microbial metagenomes from the same environment indicated that microbial and viral average genome sizes are independent of each other, but indicative of community responses to stressors and environmental conditions. - Funding: The Massachusetts Institute of Technology and the Agouron Institute for sequencing funded the Oxygen Minimum Zone project. The National High Technology Research and Development Program of China (2007AA09Z443 and 2007AA021301) and Knowledge Innovation Project of The Chinese Academy of Sciences (KSCX2-YW-G-022) supported the South China sediments microbiome project. The Antarctica Lakes research was supported by the Gordon and Betty Moore Foundation. NSF OPP 0124733 funded the Arctic microbiome sampling. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: The authors have declared that no competing interests exist. Metagenomic approaches to the study of microbial and viral communities have revealed previously undiscovered diversity on a tremendous scale [1,2]. Metagenomic sequences are typically compared to sequences from known genomes using BLAST to estimate the taxonomic and functional composition of the original environmental community [3]. Many software tools Metagenomics uses DNA or RNA sequences isolated directly from the environment to determine what viruses or microorganisms exist in natural communities and what metabolic activities they encode. Typically, metagenomic sequences are compared to annotated sequences in public databases using the BLAST search tool. Our methods, implemented in the Genome relative Abundance and Average Size (GAAS) software, improve the way BLAST searches are processed to estimate the taxonomic composition of communities and their average genome length. GAAS provides a more accurate picture of community composition by correcting for a systematic sampling bias towards larger genomes, and is useful in situations where organisms with small genomes are abundant, such as disease outbreaks caused by small RNA viruses. Microbial average genome length relates to environmental complexity and the distribution of genome lengths describes community diversity. A study of the average genome length of viruses and microorganisms in four different biomes using GAAS on 169 metagenomes showed si (...truncated)


This is a preview of a remote PDF: http://www.ploscompbiol.org/article/fetchObject.action?uri=info%3Adoi%2F10.1371/journal.pcbi.1000593&representation=PDF

Florent E. Angly, Dana Willner, Alejandra Prieto-Davó, Robert A. Edwards, Robert Schmieder, Rebecca Vega-Thurber, Dionysios A. Antonopoulos, Katie Barott, Matthew T. Cottrell, Christelle Desnues, Elizabeth A. Dinsdale, Mike Furlan, Matthew Haynes, Matthew R. Henn, Yongfei Hu, David L. Kirchman, Tracey McDole, John D. McPherson, Folker Meyer, R. Michael Miller, Egbert Mundt, Robert K. Naviaux, Beltran Rodriguez-Mueller, Rick Stevens, Linda Wegley, Lixin Zhang, Baoli Zhu, Forest Rohwer. The GAAS Metagenomic Tool and Its Estimations of Viral and Microbial Average Genome Size in Four Major Biomes, PLoS Computational Biology, 2009, Volume 5, Issue 12, DOI: 10.1371/journal.pcbi.1000593