Reliability of species detection in 16S microbiome analysis: Comparison of five widely used pipelines and recommendations for a more standardized approach
PLOS ONE
RESEARCH ARTICLE
Reliability of species detection in 16S
microbiome analysis: Comparison of five
widely used pipelines and recommendations
for a more standardized approach
Andreas Hiergeist ID1*, Jean Ruelle2, Stefan Emler2, André Gessner1
a1111111111
a1111111111
a1111111111
a1111111111
a1111111111
OPEN ACCESS
Citation: Hiergeist A, Ruelle J, Emler S, Gessner A
(2023) Reliability of species detection in 16S
microbiome analysis: Comparison of five widely
used pipelines and recommendations for a more
standardized approach. PLoS ONE 18(2):
e0280870. https://doi.org/10.1371/journal.
pone.0280870
Editor: Yanbin Yin, University of Nebraska-Lincoln,
UNITED STATES
1 Institute of Clinical Microbiology and Hygiene, University Hospital Regensburg, Regensburg, Germany,
2 SmartGene Services SARL, Lausanne, Switzerland
*
Abstract
The use of NGS-based testing of the bacterial microbiota is often impeded by inconsistent
or non-reproducible results, especially when applying different analysis pipelines and reference databases. We investigated five frequently used software packages by submitting the
same monobacterial datasets to them, representing the V1-2 and the V3-4 regions of the
16S-rRNA gene of 26 well characterized strains, which were sequenced by the Ion Torrent™ GeneStudio S5 system. The results obtained were divergent and calculations of relative abundance did not yield the expected 100%. We investigated these inconsistencies and
were able to attribute them to failures either of the pipelines themselves or of the reference
databases they rely on. On the basis of these findings, we recommend certain standards
which should help to render microbiome testing more consistent and reproducible, and thus
useful in clinical practice.
Received: July 13, 2022
Accepted: January 10, 2023
Published: February 16, 2023
Copyright: © 2023 Hiergeist et al. This is an open
access article distributed under the terms of the
Creative Commons Attribution License, which
permits unrestricted use, distribution, and
reproduction in any medium, provided the original
author and source are credited.
Data Availability Statement: Raw sequencing data
is available at the European Nucleotide Archive
under accession number PRJEB52644.
Funding: This work was funded by the Deutsche
Forschungsgemeinschaft (DFG, German Research
Foundation) – Projektnummer 395357507 – SFB
1371.It was also funded by the Bavarian Ministry of
Science and the Arts in the framework of the
Bavarian Research Network "New Strategies
Against Multi-Resistant Pathogens by Means of
Introduction
Microbiome sequencing enables new insights into the role of microorganisms in various
pathologies, as well as into their roles when interacting with the host immune system [1].
High-throughput sequencing techniques could enable broad-range molecular diagnostics, not
only from primary sterile material like cerebrospinal fluid, organ tissue or vitreous aspirates,
but also for the detection of pathogens within complex communities of commensal microorganisms. The transition of microbiome analysis into routine diagnostics with clinical application is still hampered by the lack of standardization, that renders the reproducibility and
comparison of such results difficult [2]. Methodological variations during all steps from sampling, through to wet-lab processes, including cell lysis, PCR amplification, library preparation
and high-throughput sequencing platforms, have been extensively analyzed in various studies
[3–5]. The choice of target region to be sequenced as well as the analysis software and databases used also have an impact on the results and thus need to be evaluated and understood.
Our study therefore focuses on this aspect of the workflow.
PLOS ONE | https://doi.org/10.1371/journal.pone.0280870 February 16, 2023
1 / 16
PLOS ONE
Digital Networking-bayresq.net"
Förderkennzeichen: Kap. 1528 TG 83. The work
also received financial support was supported by
the Society for Promotion of Quality Assurance in
Medical Laboratories e. V. (INSTAND e.V.,
Düsseldorf, Germany)The funders had no role in
study design, data collection and analysis, decision
to publish, or preparation of the manuscript.
Competing interests: Jean Ruelle and Stefan Emler
are employees of SmartGene, a company
marketing cloud-based solutions for microbiome
analysis. This does not alter our adherence to
PLOS ONE policies on sharing data and materials.
Reliability of species detection in 16S microbiome analysis
A prerequisite for implementation of microbiome sequencing in clinical diagnostics is the
ability to accurately determine the presence or absence of pathogenic and beneficial species,
and their abundance. Reliable species-level identification and quantification is necessary to
identify compositional shifts over time within complex sample matrices. These requirements
hold true also for pre-clinical and clinical studies which build the basis for valid scientific conclusions and interpretation for certain pathologies.
The gene (named here 16S) which encodes for the small ribosomal subunit, 16S rRNA, is
the most widely used phylogenetic marker and sequences for all recognized species are available. The 16S gene has 9 variable regions, but not all have the same potential to differentiate
species [6]. Since NGS sequencing often relies on the generation of rather short reads, many
authors focus on the V3-V4 region, thus often restricting their analysis to the genus-level
because species differentiation cannot be achieved. Other variable regions such as the V1-V2
region hold the promise for better species differentiation [7], but would need to be evaluated
for the purpose of microbiome analysis. Recently, sequencing of full-length 16S became available and Johnson et al. demonstrated differentiation of intra-genomic 16S polymorphisms
within one genome for strain-level discrimination [6]. However, this technology is still rather
complex, expensive and sensitivity is an issue; therefore, routine applications focus rather on
partial 16S analyses. In our study we compare results obtained with the V1-V2 and V3-V4
regions which are easily covered by widely used sequencing technologies.
After sequencing is performed, reads need to be matched against a reference database to
assign a species. Frequently used clustering or binning algorithms group reads or contigs into
operational taxonomic units (OTU) based on a predefined similarity cutoff, usually e.g. 97%;
consensus sequences are then used for mapping the most similar reference sequence. Given
the high resemblance of 16S sequences of some close species, other approaches favor comparing all divergent contigs against the references, without prior clustering. In our study we have
evaluated both types of approaches.
Currently, multiple 16S reference databases exist, including publicly available ones such as
SILVA [8], Greengenes [9], All Species Living Tree Project (LTP, [10]), Genome Taxonomy
Database (GTDB, [11]), as well a (...truncated)