Orione, a web-based framework for NGS analysis in microbiology

Bioinformatics, Jul 2014

Summary: End-to-end next-generation sequencing microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult owing to a lack of interoperability, reproducibility and transparency. To overcome these limitations we present Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for next-generation sequencing microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics. Availability and implementation: Orione is available online at http://orione.crs4.it. Contact: gianmauro.cuccuru{at}crs4.it Supplementary information: Supplementary data are available at Bioinformatics online.

A PDF file should load here. If you do not see its contents the file may be temporarily unavailable at the journal website or you do not have a PDF plug-in installed and enabled in your browser.

Alternatively, you can download the file locally and open with any standalone PDF reader:

http://bioinformatics.oxfordjournals.org/content/30/13/1928.full.pdf

Orione, a web-based framework for NGS analysis in microbiology

Gianmauro Cuccuru 0 Massimiliano Orsini 0 Andrea Pinna 0 Andrea Sbardellati 0 Nicola Soranzo 0 Antonella Travaglione 0 Paolo Uva 0 Gianluigi Zanetti 0 Giorgio Fotia 0 Associate Editor: Michael Brudno 0 0 CRS4, Science and Technology Park Polaris , Piscina Manna, 09010 Pula ( CA ), Italy Summary: End-to-end next-generation sequencing microbiology data analysis requires a diversity of tools covering bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation and metagenomics. However, the construction of computational pipelines that use different software packages is difficult owing to a lack of interoperability, reproducibility and transparency. To overcome these limitations we present Orione, a Galaxy-based framework consisting of publicly available research software and specifically designed pipelines to build complex, reproducible workflows for next-generation sequencing microbiology data analysis. Enabling microbiology researchers to conduct their own custom analysis and data manipulation without software installation or programming, Orione provides new opportunities for data-intensive computational analyses in microbiology and metagenomics. Availability and implementation: Orione is available online at http:// orione.crs4.it. Contact: Supplementary information: Supplementary data are available at Bioinformatics online. 1 INTRODUCTION Application of next-generation sequencing (NGS) in microbiology is becoming a common practice with a profound impact on research, diagnostic and clinical microbiology (Loman et al., 2012). Recent applications include genomic sequencing, differential transcription analysis, variant investigation, as well as metagenomics studies. Major challenges include draft assemblies finishing followed by reliable genome annotation or robust dissection of microbial communities including those associated with human health and disease. Furthermore, there is an increasing need to process and present data in a fashion that is transparent and reproducible and to provide analysis frameworks that are usable and cost-effective for biomedical researchers. To address these challenges, we developed Orione, an online framework for integrative analysis of NGS microbiology data. Orione is based on Galaxy (Goecks et al., 2010), an open platform for reproducible data-intensive computational analysis used in many diverse biomedical research environment. Orione is the first freely available platform that supports the whole life cycle of microbiology research data from production and annotation to *To whom correspondence should be addressed. publication and sharing. Other commercial alternative exists (e.g. CLC Genomics Workbench by CLC Bio), but Orione is unique in transparently combining the most used open source bioinformatics tools for microbiology. Orione is currently applied to a variety of microbiological projects including bacteria resequencing, de novo assembling and microbiome investigations; see http://goo.gl/DwbgPD for a list. Furthermore, Orione is part of an ongoing project to integrate Galaxy with Hadoop-based tools to provide scalable computing (Leo et al., 2012); a specialized version of OMERO (Allan et al., 2012) to model biomedical data and the chain of actions that connect them; and iRODS (Rajasekar et al., 2010) to efficiently support inter-institutional data sharing. This infrastructure is already used in production at Center for Advanced Studies, Research and Development in Sardinia for the automated processing of sequencing data (Pireddu et al., 2013) and for quality control in gene therapy applications (Biffi et al., 2013). FEATURES AND METHODS Orione consists of best-of-breed NGS bioinformatics tools covering end-to-end data analysis for bacterial resequencing, de novo assembly, scaffolding, bacterial RNA-Seq, gene annotation, metagenomics and metatranscriptomics. Publicly available research tools were integrated under the open source Galaxy framework with pipelines and workflows newly developed by our group for ready-to-go microbiological analysis. Although several of the tools for NGS microbiology data analysis were already available in Galaxy, a significant effort was required to expand the Galaxy functionalities with new features such as SSPACE (Boetzer et al., 2011), SSAKE (Warren et al., 2007), SOPRA (Dayarian et al., 2010), SEQuel (Ronen et al., 2012), EDGE-pro (Magoc et al., 2013), Gene Locator and Interpolated Markov ModelER (Delcher et al., 2007) and Prokka (http://goo. gl/aSuHb). We refer to the Supplementary information for a description of the complete set of Orione tools and workflows. FUNCTIONALITIES Orione complements the flexible Galaxy workflow environment, allowing microbiologists without any specific hardware or informatics skill to consistently access a set of NGS data analysis tools and conduct reproducible data-intensive computational analyses from quality control to microbial gene annotation. In the following paragraphs, we describe the main Orione functionalities. Preprocessing, quality control and trimming. The fundamental step before any NGS analysis is the quality control of reads and their trimming. To cope with long reads and pairedend technology, FastX (http://goo.gl/GxqyV) and FASTQC (http://goo.gl/6TUqD) were complemented with specifically developed tools (see also workflow #1 in the Supplementary information). Reads mapping. Mapping is a key step in many NGS applications from bacteria resequencing to variant calling. The most widely used aligners are integrated in Orione, including BWA (Li and Durbin, 2009), Bowtie1 (Langmead et al., 2009), Bowtie2 (Langmead and Salzberg, 2012), SOAP (Li et al., 2008) and MOSAIK (http://git.io/QrYWXg). We further added BLAT (Kent, 2002), SHRiMP (David et al., 2011), LASTZ (Harris, 2007) and BFAST (Homer et al., 2009) for use with long reads from 454 Roche. De novo assembly. De novo assembly produces contigs without the aid of a reference genome. Different methods, either based on a de Bruijn graph [Velvet (Zerbino and Birney, 2008), ABySS (Simpson et al., 2009) and SPAdes (Bankevich et al., 2012)] or on a greedy approach [SSAKE, Edena (Hernandez et al., 2008)], are available in Orione. Scaffolding. After mapping, contigs are ordered and oriented to produce even longer sequences called scaffolds, exploiting the mate-pair/paired-end information. Orione includes the most established scaffolders such as SSAKE, SSPACE, SEQuel and SOPRA. Post assembly, contigs statistics, (multi) aligning and variant calling. This section of Orione includes tools we have developed covering task such as genome-scale alignment, high-quality contigs extraction, statistics over contigs or draft genomes (N50/ NG50 values, contigs length distribution, high/low quality regions/gaps in draft genomes). Annotation. Annotation is the process of identifying meaningful biological information from sequences. Glimmer and tRNAscan-SE (Lowe and Eddy, 1997) were wrapped into Orione together with the Prokka pipeline, enabling easy Genbank/DDJB/ENA submission. RNA-Seq. We integrated EDGE-pro tool for bacterial RNASeq analysis. As EDGE-pro requires genome annotation files, we developed an accessory tool (Get EDGE-pro files) that retrieves them directly from the NCBI RefSeq repository. Metagenomics and other tools. We added to the standard Galaxy metagenomics pipeline MetaPhlAn (Segata et al., 2012) and MetaVelvet (Namiki et al., 2012). The MetaGeneMark (Zhu et al., 2010) annotation tool has been added for gene prediction in metagenomic sequences and a workflow has been developed for (bacterial) metatranscriptome analysis. We complete this section with instruments for data filtering, conversion and taxonomy abundance displaying into the Krona visualizer (Ondov et al., 2011). ACKNOWLEDGEMENTS The authors would like to thank Dr Cesare Camm a` (Istituto Zooprofilattico Sperimentale dellAbruzzo e del Molise) and Prof. Sergio Uzzau (Universita` di Sassari and Porto Conte Ricerche) for providing us with the data we used for the set up of Orione. Orione Funding: This work was partially supported by the Sardinian Regional Authorities and the Wellcome Trust (095931). Conflict of Interest: none declared.


This is a preview of a remote PDF: http://bioinformatics.oxfordjournals.org/content/30/13/1928.full.pdf

Gianmauro Cuccuru, Massimiliano Orsini, Andrea Pinna, Andrea Sbardellati, Nicola Soranzo, Antonella Travaglione, Paolo Uva, Gianluigi Zanetti, Giorgio Fotia. Orione, a web-based framework for NGS analysis in microbiology, Bioinformatics, 2014, 1928-1929, DOI: 10.1093/bioinformatics/btu135