ITSxpress: Software to rapidly trim internally transcribed spacer sequences with quality scores for marker gene analysis [version 1; referees: 2 approved]
F1000Research 2018, 7:1418 Last updated: 31 MAR 2022
SOFTWARE TOOL ARTICLE
ITSxpress: Software to rapidly trim internally transcribed
spacer sequences with quality scores for marker gene
analysis [version 1; peer review: 2 approved]
Adam R. Rivers1, Kyle C. Weber1, Terrence G. Gardner
Shalamar D. Armstrong3
2,
Shuang Liu2,
1Genomics and Bioinformatics Research Unit, USDA Agricultural Research Service, Gainesville, FL, 32608, USA
2Department of Crop and Soil Sciences, North Carolina State University, Raleigh, NC, 27695, USA
3Department of Agronomy, Purdue University, Purdue, IN, 47907, USA
v1
First published: 06 Sep 2018, 7:1418
https://doi.org/10.12688/f1000research.15704.1
Open Peer Review
Latest published: 06 Sep 2018, 7:1418
https://doi.org/10.12688/f1000research.15704.1
Approval Status
Abstract
The internally transcribed spacer (ITS) region between the small
subunit ribosomal RNA gene and large subunit ribosomal RNA gene is
a widely used phylogenetic marker for fungi and other taxa. The
eukaryotic ITS contains the conserved 5.8S rRNA and is divided into
the ITS1 and ITS2 hypervariable regions. These regions are variable in
length and are amplified using primers complementary to the
conserved regions of their flanking genes. Previous work has shown
that removing the conserved regions results in more accurate
taxonomic classification. An existing software program, ITSx, is
capable of trimming FASTA sequences by matching hidden Markov
model profiles to the ends of the conserved genes using the software
suite HMMER. ITSxpress was developed to extend this technique from
marker gene studies using Operational Taxonomic Units (OTU’s) to
studies using exact sequence variants; a method used by the software
packages Dada2, Deblur, QIIME 2, and Unoise. The sequence variant
approach uses the quality scores of each read to identify sequences
that are statistically likely to represent real sequences. ITSxpress
enables this by processing FASTQ rather than FASTA files. The
software also speeds up the trimming of reads by a factor of 14-23
times on a 4-core computer by temporarily clustering highly similar
sequences that are common in amplicon data and utilizing optimized
parameters for Hmmsearch. ITSxpress is available as a QIIME 2 plugin
and a stand-alone application installable from the Python package
index, Bioconda, and Github.
version 1
06 Sep 2018
1
2
view
view
1. J. Gregory Caporaso
, Northern Arizona
University, Flagstaff, USA
2. Johanna B. Holm
, University of Maryland
School of Medicine, Baltimore, USA
Any reports and responses or comments on the
article can be found at the end of the article.
Keywords
Amplicon sequencing, marker gene sequencing, internally transcribed
spacer, ITS, trimming, QIIME
Page 1 of 10
F1000Research 2018, 7:1418 Last updated: 31 MAR 2022
Corresponding author: Adam R. Rivers ()
Author roles: Rivers AR: Conceptualization, Data Curation, Formal Analysis, Funding Acquisition, Investigation, Methodology, Project
Administration, Software, Supervision, Visualization, Writing – Original Draft Preparation, Writing – Review & Editing; Weber KC:
Software, Writing – Review & Editing; Gardner TG: Resources, Writing – Review & Editing; Liu S: Resources; Armstrong SD: Resources
Competing interests: No competing interests were disclosed.
Grant information: This research was funded by the United States Department of Agriculture (USDA), Agricultural Research Service
(ARS) research project number 6066-21310-005-00-D and computational analysis using SCINet under project 0500-00093-001-00-D.
Mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does
not imply recommendation or endorsement by the USDA.
The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Copyright: © 2018 Rivers AR et al. This is an open access article distributed under the terms of the Creative Commons Attribution
License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The
author(s) is/are employees of the US Government and therefore domestic copyright protection in USA does not apply to this work. The
work may be protected under the copyright laws of other jurisdictions when used in those jurisdictions.
How to cite this article: Rivers AR, Weber KC, Gardner TG et al. ITSxpress: Software to rapidly trim internally transcribed spacer
sequences with quality scores for marker gene analysis [version 1; peer review: 2 approved] F1000Research 2018, 7:1418
https://doi.org/10.12688/f1000research.15704.1
First published: 06 Sep 2018, 7:1418 https://doi.org/10.12688/f1000research.15704.1
Page 2 of 10
F1000Research 2018, 7:1418 Last updated: 31 MAR 2022
Introduction
The internally transcribed spacer (ITS) between the small subunit (SSU/18S) ribosomal RNA gene and the large subunit
(LSU/28S) ribosomal RNA gene is a commonly used phylogenetic marker. The Fungal Barcoding Consortium standardized the
practice of ITS sequencing by adopting the region for its efforts
(Schoch et al., 2012), and the major fungal database UNITE
uses the region as well (Kõljalg et al., 2013). It is a common
practice to amplify the ITS1 or ITS2 region using primers located
in the more conserved 18S/5.8S genes or the 5.8S/28S genes.
Previous work has shown that leaving these more conserved
regions on the ITS sequence creates miss-assignments. In
one study of full length ITS sequences, 11% of the time the
ITS1 and ITS2 regions matched one reference sequence but
the full sequence including ITS1, ITS2 and the 5.8S did not
(Nilsson et al., 2009). The software package ITSx was developed and subsequently improved (Bengtsson-Palme et al., 2013;
Nilsson et al., 2010) to accurately trim ITS sequences from
longer reads. ITSx uses hidden Markov models (HMMs) created for fungi and 17 other groups of eukaryotes to identify the start and stop sites for the ITS region. The software
used the HMMER package Hmmscan until version 1.1b
when Hmmsearch was substituted for increased speed (Eddy,
2011).
ITSxpress was created to extend the capabilities of
ITSx from marker gene studies using operational taxonomic
units (OTUs) to studies using exact sequence variants. Amplicon
sequencing creates sequences with errors. In order to distinguish
true sequences from sequencing errors, sequences have been
clustered into OTU’s by sorting reads by abundance then clustering them in a greedy fashion at a specified percent identity (often
97%). Recently, new methods (e.g. Dada2, Deblur and Unoise)
have been published that use statistical models or information
theoretic models to identify exact sequence variants that represent true biological sequences (Amir et al., 2017; Callahan
et al., 2016; Caporaso et al., 2010; Edgar, 2016). These methods require the error profiles of individual sequences, which
requires trimming each FA (...truncated)