ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; referees: 2 approved]

F1000Research, Sep 2018

The internally transcribed spacer (ITS) region between the small subunit ribosomal RNA gene and large subunit ribosomal RNA gene is a widely used phylogenetic marker for fungi and other taxa. The eukaryotic ITS contains the conserved 5.8S rRNA and is divided into the ITS1 and ITS2 hypervariable regions. These regions are variable in length and are amplified using primers complementary to the conserved regions of their flanking genes. Previous work has shown that removing the conserved regions results in more accurate taxonomic classification. An existing software program, ITSx, is capable of trimming FASTA sequences by matching hidden Markov model profiles to the ends of the conserved genes using the software suite HMMER. ITSxpress was developed to extend this technique from marker gene studies using Operational Taxonomic Units (OTU’s) to studies using exact sequence variants; a method used by the software packages Dada2, Deblur, QIIME 2, and Unoise. The sequence variant approach uses the quality scores of each read to identify sequences that are statistically likely to represent real sequences. ITSxpress enables this by processing FASTQ rather than FASTA files. The software also speeds up the trimming of reads by a factor of 14-23 times on a 4-core computer by temporarily clustering highly similar sequences that are common in amplicon data and utilizing optimized parameters for Hmmsearch. ITSxpress is available as a QIIME 2 plugin and a stand-alone application installable from the Python package index, Bioconda, and Github.

Article PDF cannot be displayed. You can download it here:

https://f1000research.com/articles/7-1418/v1/pdf

ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; referees: 2 approved]

F1000Research 2018, 7:1418 Last updated: 31 MAR 2022 SOFTWARE TOOL ARTICLE ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; peer review: 2 approved] Adam R. Rivers1, Kyle C. Weber1, Terrence G. Gardner Shalamar D. Armstrong3 2, Shuang Liu2, 1Genomics and Bioinformatics Research Unit, USDA Agricultural Research Service, Gainesville, FL, 32608, USA 2Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA 3Department of Agronomy, Purdue University, Purdue, IN, 47907, USA v1 First published: 06 Sep 2018, 7:1418 https://doi.org/10.12688/f1000research.15704.1 Open Peer Review Latest published: 06 Sep 2018, 7:1418 https://doi.org/10.12688/f1000research.15704.1 Approval Status Abstract The internally transcribed spacer (ITS) region between the small subunit ribosomal RNA gene and large subunit ribosomal RNA gene is a widely used phylogenetic marker for fungi and other taxa. The eukaryotic ITS contains the conserved 5.8S rRNA and is divided into the ITS1 and ITS2 hypervariable regions. These regions are variable in length and are amplified using primers complementary to the conserved regions of their flanking genes. Previous work has shown that removing the conserved regions results in more accurate taxonomic classification. An existing software program, ITSx, is capable of trimming FASTA sequences by matching hidden Markov model profiles to the ends of the conserved genes using the software suite HMMER. ITSxpress was developed to extend this technique from marker gene studies using Operational Taxonomic Units (OTU’s) to studies using exact sequence variants; a method used by the software packages Dada2, Deblur, QIIME 2, and Unoise. The sequence variant approach uses the quality scores of each read to identify sequences that are statistically likely to represent real sequences. ITSxpress enables this by processing FASTQ rather than FASTA files. The software also speeds up the trimming of reads by a factor of 14-23 times on a 4-core computer by temporarily clustering highly similar sequences that are common in amplicon data and utilizing optimized parameters for Hmmsearch. ITSxpress is available as a QIIME 2 plugin and a stand-alone application installable from the Python package index, Bioconda, and Github. version 1 06 Sep 2018 1 2 view view 1. J. Gregory Caporaso , Northern Arizona University, Flagstaff, USA 2. Johanna B. Holm , University of Maryland School of Medicine, Baltimore, USA Any reports and responses or comments on the article can be found at the end of the article. Keywords Amplicon sequencing, marker gene sequencing, internally transcribed spacer, ITS, trimming, QIIME Page 1 of 10 F1000Research 2018, 7:1418 Last updated: 31 MAR 2022 Corresponding author: Adam R. Rivers () Author roles: Rivers AR: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project Administration, Software, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing; Weber KC: Software, Writing – Review & Editing; Gardner TG: Resources, Writing – Review & Editing; Liu S: Resources; Armstrong SD: Resources Competing interests: No competing interests were disclosed. Grant information: This research was funded by the United States Department of Agriculture (USDA), Agricultural Research Service (ARS) research project number 6066-21310-005-00-D and computational analysis using SCINet under project 0500-00093-001-00-D. Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the USDA. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Copyright: © 2018 Rivers AR et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The author(s) is/are employees of the US Government and therefore domestic copyright protection in USA does not apply to this work. The work may be protected under the copyright laws of other jurisdictions when used in those jurisdictions. How to cite this article: Rivers AR, Weber KC, Gardner TG et al. ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; peer review: 2 approved] F1000Research 2018, 7:1418 https://doi.org/10.12688/f1000research.15704.1 First published: 06 Sep 2018, 7:1418 https://doi.org/10.12688/f1000research.15704.1 Page 2 of 10 F1000Research 2018, 7:1418 Last updated: 31 MAR 2022 Introduction The internally transcribed spacer (ITS) between the small subunit (SSU/18S) ribosomal RNA gene and the large subunit (LSU/28S) ribosomal RNA gene is a commonly used phylogenetic marker. The Fungal Barcoding Consortium standardized the practice of ITS sequencing by adopting the region for its efforts (Schoch et al., 2012), and the major fungal database UNITE uses the region as well (Kõljalg et al., 2013). It is a common practice to amplify the ITS1 or ITS2 region using primers located in the more conserved 18S/5.8S genes or the 5.8S/28S genes. Previous work has shown that leaving these more conserved regions on the ITS sequence creates miss-assignments. In one study of full length ITS sequences, 11% of the time the ITS1 and ITS2 regions matched one reference sequence but the full sequence including ITS1, ITS2 and the 5.8S did not (Nilsson et al., 2009). The software package ITSx was developed and subsequently improved (Bengtsson-Palme et al., 2013; Nilsson et al., 2010) to accurately trim ITS sequences from longer reads. ITSx uses hidden Markov models (HMMs) created for fungi and 17 other groups of eukaryotes to identify the start and stop sites for the ITS region. The software used the HMMER package Hmmscan until version 1.1b when Hmmsearch was substituted for increased speed (Eddy, 2011). ITSxpress was created to extend the capabilities of ITSx from marker gene studies using operational taxonomic units (OTUs) to studies using exact sequence variants. Amplicon sequencing creates sequences with errors. In order to distinguish true sequences from sequencing errors, sequences have been clustered into OTU’s by sorting reads by abundance then clustering them in a greedy fashion at a specified percent identity (often 97%). Recently, new methods (e.g. Dada2, Deblur and Unoise) have been published that use statistical models or information theoretic models to identify exact sequence variants that represent true biological sequences (Amir et al., 2017; Callahan et al., 2016; Caporaso et al., 2010; Edgar, 2016). These methods require the error profiles of individual sequences, which requires trimming each FA (...truncated)


This is a preview of a remote PDF: https://f1000research.com/articles/7-1418/v1/pdf
Article home page: https://doaj.org/article/c47bca58cf684f4bb032cc1e34be240e

Adam R. Rivers, Kyle C. Weber, Terrence G. Gardner, Shuang Liu, Shalamar D. Armstrong. ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; referees: 2 approved], F1000Research, 2018, Issue 7, DOI: 10.12688/f1000research.15704.1