The Bio-Community Perl toolkit for microbial ecology (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/bioinformatics/article-pdf/30/13/1926/48924659/bioinformatics_30_13_1926.pdf

The Bio-Community Perl toolkit for microbial ecology

BIOINFORMATICS APPLICATIONS NOTE Sequence analysis Vol. 30 no. 13 2014, pages 1926–1927 doi:10.1093/bioinformatics/btu130 Advance Access publication March 10, 2014 The Bio-Community Perl toolkit for microbial ecology Florent E. Angly1,*, Christopher J. Fields2 and Gene W. Tyson1 1 Australian Centre for Ecogenomics, School of Chemistry and Molecular Biosciences, Level 5, Molecular Biosciences Building (76), The University of Queensland, Brisbane St Lucia, QLD 4072, Australia and 2HPCBio, Carver Biotechnology Center, Institute for Genomic Biology, 1206 West Gregory Drive | MC-195, Urbana, IL 61801, USA Associate Editor: John Hancock Summary: The development of bioinformatic solutions for microbial ecology in Perl is limited by the lack of modules to represent and manipulate microbial community profiles from amplicon and metaomics studies. Here we introduce Bio-Community, an open-source, collaborative toolkit that extends BioPerl. Bio-Community interfaces with commonly used programs using various file formats, including BIOM, and provides operations such as rarefaction and taxonomic summaries. Bio-Community will help bioinformaticians to quickly piece together custom analysis pipelines and develop novel software. Availability an implementation: Bio-Community is cross-platform Perl code available from http://search.cpan.org/dist/Bio-Community under the Perl license. A readme file describes software installation and how to contribute. Contact: Supplementary information: Supplementary data are available at Bioinformatics online Received on November 28, 2013; revised on January 29, 2014; accepted on March 2, 2014 1 INTRODUCTION Sequencing is common in most fields of biological research, and the throughput of modern platforms is orders of magnitudes higher than traditional Sanger sequencing (Metzker, 2010). The BioPerl bioinformatic toolkit (Stajich et al., 2002) has attracted a large community of users and developers and has become critical in many sequencing projects by allowing quick code development and interaction between programs using incompatible file formats. In microbial ecology, sequencing is used routinely for 16S rRNA gene amplicon surveys (Tringe and Hugenholtz, 2008), metagenomics (Handelsman, 2004) and metatranscriptomics (Frias-Lopez et al., 2008). Because most microorganisms remain uncultivated (Rappé and Giovannoni, 2003), culture-independent molecular surveys are essential for the characterization of environmental microbial communities. However, they require large computational resources, novel bioinformatic tools and elaborate pipelines. Many tools have been developed to analyze the resulting sequence data. For example, libraries written in Python (Knight et al., 2007) and R (Dixon, 2003; Kembel et al., 2010) provide blocks for building bioinformatic software. QIIME (Caporaso et al., 2010) and mothur (Schloss et al., 2009) are dedicated packages with scripts to build complete analysis pipelines, but they use incompatible file formats. Here, we introduce Bio-Community, a *To whom correspondence should be addressed. set of format-agnostic modules and scripts to parse and manipulate taxonomic or functional microbial community profiles. 2 2.1 FEATURES Object model Bio-Community is a Perl object-oriented toolkit that extends BioPerl. It is centered around the Community object, which contains a group of entities from the same geographic area (Fig. 1). These entities are Member objects, representing individual genomes, genes, taxa or operational taxonomic units from amplicon and meta-omic surveys. Member objects store attributes such as an identifier, a taxon or a sequence and can be given weights to account for the fact that there is no one-to-one relationship between a sequencing read and a microbial cell. The relative abundance or abundance rank of a Member can be calculated based on this Member’s count, weight and the total count in the Community (Fig. 2). Similarly, absolute abundance is based on total microbial abundance in the community, quantifiable by epifluorescence microscopy, qPCR or flow cytometry (Rinsoz et al., 2008). 2.2 Diversity metrics Bio-Community quantifies community , and diversity (Whittaker, 1972) using a range of metrics [reviewed by Magurran (2004)]. The diversity of a single Community object, diversity, is represented by metrics of richness, evenness, dominance and indices (Supplementary Table S1). Several Community objects can be grouped into a Meta object, representing a metacommunity (Leibold et al., 2004). This object provides methods to measure diversity, i.e. the collective diversity of its communities, and diversity, i.e. their dissimilarity. The metrics are the same as those available for diversity, whereas those for diversity include qualitative and quantitative forms (Supplementary Table S1). 2.3 Data input and output Community profiles (e.g. a site-by-species table) describe the distribution of members in biological samples. Operations to read and write these files are handled by the IO module and are important for exchanging data between programs using different formats. We have implemented parsers for five common file types (Supplementary Table S2), including the BIOM standard (McDonald et al., 2012). Examples of these file types are given in the t/data folder of the Bio-Community package. The parsers automatically detect file format based on its content using the ß The Author 2014. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. ABSTRACT Bio-Community Perl toolkit? Fig. 1. Main objects, their attributes and operation modules construct custom analysis pipelines or novel software for microbial ecology. The integration of relative and absolute abundance with diversity metrics permits holistic microbial studies (Dinsdale et al., 2008; Dove et al., 2013; Nathani et al., 2013), while weights can be added to account for gene copy number (Kembel et al., 2012) or genome length (Angly et al., 2009; Beszteri et al., 2010) bias. We encourage programmers to join the development of BioCommunity at https://github.com/bioperl/Bio-Community and to add support for new file formats, diversity metrics or tools. Conflict of interest: none declared. REFERENCES Fig. 2. Relation between abundance types. Relative abundance depends on member counts and weights, whereas absolute abundance is further derived from a total abundance measure Fig. 3. Vignette illustrating the use of Bio-Community to read a BIOM community profile and report member information FormatGuesser module, and iteratively record member identifier, taxonomy and abundance. 2.4 Tools Tool modules can perform operations such as community transformation, rarefaction and taxonomic summar (...truncated)