Reliability of species detection in 16S microbiome analysis: Comparison of five widely used pipelines and recommendations for a more standardized approach

PLOS ONE, Feb 2023

The use of NGS-based testing of the bacterial microbiota is often impeded by inconsistent or non-reproducible results, especially when applying different analysis pipelines and reference databases. We investigated five frequently used software packages by submitting the same monobacterial datasets to them, representing the V1-2 and the V3-4 regions of the 16S-rRNA gene of 26 well characterized strains, which were sequenced by the Ion Torrent™ GeneStudio S5 system. The results obtained were divergent and calculations of relative abundance did not yield the expected 100%. We investigated these inconsistencies and were able to attribute them to failures either of the pipelines themselves or of the reference databases they rely on. On the basis of these findings, we recommend certain standards which should help to render microbiome testing more consistent and reproducible, and thus useful in clinical practice.

Reliability of species detection in 16S microbiome analysis: Comparison of five widely used pipelines and recommendations for a more standardized approach

PLOS ONE RESEARCH ARTICLE Reliability of species detection in 16S microbiome analysis: Comparison of five widely used pipelines and recommendations for a more standardized approach Andreas Hiergeist ID1*, Jean Ruelle2, Stefan Emler2, André Gessner1 a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Hiergeist A, Ruelle J, Emler S, Gessner A (2023) Reliability of species detection in 16S microbiome analysis: Comparison of five widely used pipelines and recommendations for a more standardized approach. PLoS ONE 18(2): e0280870. https://doi.org/10.1371/journal. pone.0280870 Editor: Yanbin Yin, University of Nebraska-Lincoln, UNITED STATES 1 Institute of Clinical Microbiology and Hygiene, University Hospital Regensburg, Regensburg, Germany, 2 SmartGene Services SARL, Lausanne, Switzerland * Abstract The use of NGS-based testing of the bacterial microbiota is often impeded by inconsistent or non-reproducible results, especially when applying different analysis pipelines and reference databases. We investigated five frequently used software packages by submitting the same monobacterial datasets to them, representing the V1-2 and the V3-4 regions of the 16S-rRNA gene of 26 well characterized strains, which were sequenced by the Ion Torrent™ GeneStudio S5 system. The results obtained were divergent and calculations of relative abundance did not yield the expected 100%. We investigated these inconsistencies and were able to attribute them to failures either of the pipelines themselves or of the reference databases they rely on. On the basis of these findings, we recommend certain standards which should help to render microbiome testing more consistent and reproducible, and thus useful in clinical practice. Received: July 13, 2022 Accepted: January 10, 2023 Published: February 16, 2023 Copyright: © 2023 Hiergeist et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Data Availability Statement: Raw sequencing data is available at the European Nucleotide Archive under accession number PRJEB52644. Funding: This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) – Projektnummer 395357507 – SFB 1371.It was also funded by the Bavarian Ministry of Science and the Arts in the framework of the Bavarian Research Network "New Strategies Against Multi-Resistant Pathogens by Means of Introduction Microbiome sequencing enables new insights into the role of microorganisms in various pathologies, as well as into their roles when interacting with the host immune system [1]. High-throughput sequencing techniques could enable broad-range molecular diagnostics, not only from primary sterile material like cerebrospinal fluid, organ tissue or vitreous aspirates, but also for the detection of pathogens within complex communities of commensal microorganisms. The transition of microbiome analysis into routine diagnostics with clinical application is still hampered by the lack of standardization, that renders the reproducibility and comparison of such results difficult [2]. Methodological variations during all steps from sampling, through to wet-lab processes, including cell lysis, PCR amplification, library preparation and high-throughput sequencing platforms, have been extensively analyzed in various studies [3–5]. The choice of target region to be sequenced as well as the analysis software and databases used also have an impact on the results and thus need to be evaluated and understood. Our study therefore focuses on this aspect of the workflow. PLOS ONE | https://doi.org/10.1371/journal.pone.0280870 February 16, 2023 1 / 16 PLOS ONE Digital Networking-bayresq.net" Förderkennzeichen: Kap. 1528 TG 83. The work also received financial support was supported by the Society for Promotion of Quality Assurance in Medical Laboratories e. V. (INSTAND e.V., Düsseldorf, Germany)The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing interests: Jean Ruelle and Stefan Emler are employees of SmartGene, a company marketing cloud-based solutions for microbiome analysis. This does not alter our adherence to PLOS ONE policies on sharing data and materials. Reliability of species detection in 16S microbiome analysis A prerequisite for implementation of microbiome sequencing in clinical diagnostics is the ability to accurately determine the presence or absence of pathogenic and beneficial species, and their abundance. Reliable species-level identification and quantification is necessary to identify compositional shifts over time within complex sample matrices. These requirements hold true also for pre-clinical and clinical studies which build the basis for valid scientific conclusions and interpretation for certain pathologies. The gene (named here 16S) which encodes for the small ribosomal subunit, 16S rRNA, is the most widely used phylogenetic marker and sequences for all recognized species are available. The 16S gene has 9 variable regions, but not all have the same potential to differentiate species [6]. Since NGS sequencing often relies on the generation of rather short reads, many authors focus on the V3-V4 region, thus often restricting their analysis to the genus-level because species differentiation cannot be achieved. Other variable regions such as the V1-V2 region hold the promise for better species differentiation [7], but would need to be evaluated for the purpose of microbiome analysis. Recently, sequencing of full-length 16S became available and Johnson et al. demonstrated differentiation of intra-genomic 16S polymorphisms within one genome for strain-level discrimination [6]. However, this technology is still rather complex, expensive and sensitivity is an issue; therefore, routine applications focus rather on partial 16S analyses. In our study we compare results obtained with the V1-V2 and V3-V4 regions which are easily covered by widely used sequencing technologies. After sequencing is performed, reads need to be matched against a reference database to assign a species. Frequently used clustering or binning algorithms group reads or contigs into operational taxonomic units (OTU) based on a predefined similarity cutoff, usually e.g. 97%; consensus sequences are then used for mapping the most similar reference sequence. Given the high resemblance of 16S sequences of some close species, other approaches favor comparing all divergent contigs against the references, without prior clustering. In our study we have evaluated both types of approaches. Currently, multiple 16S reference databases exist, including publicly available ones such as SILVA [8], Greengenes [9], All Species Living Tree Project (LTP, [10]), Genome Taxonomy Database (GTDB, [11]), as well a (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0280870&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0280870

Andreas Hiergeist, Jean Ruelle, Stefan Emler, André Gessner. Reliability of species detection in 16S microbiome analysis: Comparison of five widely used pipelines and recommendations for a more standardized approach, PLOS ONE, 2023, Volume 18, Issue 2, DOI: 10.1371/journal.pone.0280870