Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms

PeerJ, Sep 2021

With the developments in DNA nanoball sequencing technologies and the emergence of new platforms, there is an increasing interest in their performance in comparison with the widely used sequencing-by-synthesis methods. Here, we test the consistency of metabarcoding results from DNBSEQ-G400RS (DNA nanoball sequencing platform by MGI-Tech) and NovaSeq 6000 (sequencing-by-synthesis platform by Illumina) platforms using technical replicates of DNA libraries that consist of COI gene amplicons from 120 soil DNA samples. By subjecting raw sequencing data from both platforms to a uniform bioinformatics processing, we found that the proportion of high-quality reads passing through the filtering steps was similar in both datasets. Per-sample operational taxonomic unit (OTU) and amplicon sequence variant (ASV) richness patterns were highly correlated, but sequencing data from DNBSEQ-G400RS harbored a higher number of OTUs. This may be related to the lower dominance of most common OTUs in DNBSEQ data set (thus revealing higher richness by detecting rare taxa) and/or to a lower effective read quality leading to generation of spurious OTUs. However, there was no statistical difference in the ASV and post-clustered ASV richness between platforms, suggesting that additional denoising step in the ASV workflow had effectively removed the ‘noisy’ reads. Both OTU-based and ASV-based composition were strongly correlated between the sequencing platforms, with essentially interchangeable results. Therefore, we conclude that DNBSEQ-G400RS and NovaSeq 6000 are both equally efficient high-throughput sequencing platforms to be utilized in studies aiming to apply the metabarcoding approach, but the main benefit of the former is related to lower sequencing cost.

Article PDF cannot be displayed. You can download it here:

https://peerj.com/articles/12254.pdf

Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms

Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms Sten Anslan1,2, Vladimir Mikryukov1,2, Kęstutis Armolaitis3, Jelena Ankuda3, Dagnija Lazdina4, Kristaps Makovskis4, Lars Vesterdal5, Inger Kappel Schmidt5 and Leho Tedersoo1,2 1 Institute of Ecology and Earth Sciences, University of Tartu, Tartu, Tartumaa, Estonia Mycology and Microbiology Center, University of Tartu, Tartu, Tartumaa, Estonia 3 Department of Ecology, Institute of Forestry of Lithuanian Research Centre for Agriculture and Forestry (LAMMC), Kaunas, Lithuania 4 Latvian State Forest Research Institute SILAVA, Riga, Latvia 5 Department of Geosciences and Natural Resource Management, University of Copenhagen, Copenhagen, Denmark 2 ABSTRACT Submitted 2 July 2021 Accepted 14 September 2021 Published 30 September 2021 Corresponding author Sten Anslan, Academic editor Vladimir Uversky Additional Information and Declarations can be found on page 16 With the developments in DNA nanoball sequencing technologies and the emergence of new platforms, there is an increasing interest in their performance in comparison with the widely used sequencing-by-synthesis methods. Here, we test the consistency of metabarcoding results from DNBSEQ-G400RS (DNA nanoball sequencing platform by MGI-Tech) and NovaSeq 6000 (sequencing-by-synthesis platform by Illumina) platforms using technical replicates of DNA libraries that consist of COI gene amplicons from 120 soil DNA samples. By subjecting raw sequencing data from both platforms to a uniform bioinformatics processing, we found that the proportion of high-quality reads passing through the filtering steps was similar in both datasets. Per-sample operational taxonomic unit (OTU) and amplicon sequence variant (ASV) richness patterns were highly correlated, but sequencing data from DNBSEQ-G400RS harbored a higher number of OTUs. This may be related to the lower dominance of most common OTUs in DNBSEQ data set (thus revealing higher richness by detecting rare taxa) and/or to a lower effective read quality leading to generation of spurious OTUs. However, there was no statistical difference in the ASV and post-clustered ASV richness between platforms, suggesting that additional denoising step in the ASV workflow had effectively removed the ‘noisy’ reads. Both OTU-based and ASV-based composition were strongly correlated between the sequencing platforms, with essentially interchangeable results. Therefore, we conclude that DNBSEQ-G400RS and NovaSeq 6000 are both equally efficient high-throughput sequencing platforms to be utilized in studies aiming to apply the metabarcoding approach, but the main benefit of the former is related to lower sequencing cost. DOI 10.7717/peerj.12254 Copyright 2021 Anslan et al. Distributed under Creative Commons CC-BY 4.0 Subjects Bioinformatics, Entomology, Genomics, Molecular Biology, Zoology Keywords Metabarcoding, COI, Illumina, NovaSeq, DNBSEQ, MGI-Tech How to cite this article Anslan S, Mikryukov V, Armolaitis K, Ankuda J, Lazdina D, Makovskis K, Vesterdal L, Schmidt IK, Tedersoo L. 2021. Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms. PeerJ 9:e12254 DOI 10.7717/peerj.12254 INTRODUCTION Metabarcoding, the identification of organisms via DNA marker genes from environmental samples or a mixture of heterospecific specimens (Taberlet et al., 2018), is a powerful tool in biodiversity analysis (Kelly et al., 2018; Pont et al., 2021; Valentin et al., 2019; Watts et al., 2019). This approach has been efficiently used to characterize the community composition of microbial and animal taxa from various types of environmental samples such as soil (Bahram et al., 2018; Nilsson et al., 2019), water (Djurhuus et al., 2018; Liu et al., 2020), sediments (Kang et al., 2021; Wurzbacher et al., 2017), dust (de Groot et al., 2021; Rocchi et al., 2017) and feces (Ando et al., 2020; Anslan et al., 2021). In animals, metabarcoding has also been widely used to identify host-associated microbiomes, determine the structure of entire holobionts and dietary differences in various species (Alberdi et al., 2019; Kueneman et al., 2019). The information acquired through DNA marker gene sequencing has greatly boosted our knowledge about the ecology and distribution patterns of various aquatic and terrestrial animal groups such as nematodes, arthropods and annelids (Arribas et al., 2016; Beng & Corlett, 2020; Compson et al., 2020; Deiner et al., 2017; Zawierucha et al., 2021). Since the mid-2000s, the metabarcoding technique has greatly benefited from technological advances in library preparation, primer and sample-specific index design, novel sequencing platforms as well as from optimized bioinformatics workflows and accumulating reference data (Taberlet et al., 2018; Nilsson et al., 2019). Short-read, second-generation high-throughput sequencing (HTS) technologies are currently the most widely used means for metabarcoding due to a relatively low cost per sample, high sequencing depth and accuracy. Sequencing instruments produced by Illumina, Inc. (e.g., MiSeq and NovaSeq) using sequencing-by-synthesis technology are dominating the market as they offer viable solutions for both ultra-high sequencing depth and paired-end sequencing of short- and mid-sized amplicons (up to 500–600 bases; Kumar, Cowley & Davis, 2019). By utilizing recent advances in DNA nanoball sequencing technology (Drmanac et al., 2010; Li et al., 2019), MGI-Tech, Inc. has produced several DNBSEQ (MGISEQ) platforms with similar throughput and quality profiles compared with Illumina sequencing (Jeon et al., 2021; Kumar, Cowley & Davis, 2019). The results from Illumina and MGI-Tech sequencing platforms are highly comparable and may be used interchangeably for RNA sequencing and whole genome sequencing (Jeon et al., 2019; Kim et al., 2021; Korostin et al., 2020). However, the error rate of DNBSEQ technology (MGI-2000 instrument) was marginally higher than for Illumina (HiSeq instrument) when using 2 × 150 paired-end sequencing mode on both platforms (quality scores >30: 95.03% and 97.18% for MGISEQ-2000 and HiSeq 2500, respectively; Korostin et al., 2020). The results of these early genome sequencing-oriented studies suggest that MGI-Tech platforms may be used efficiently also in metabarcoding studies. In early 2021, sequencing costs for MGI-Tech DNBSEQ-T7 were about 50% lower compared with Illumina NovaSeq platform (cost per read) for the greatest throughput analyses (Tedersoo et al., 2021). So far, only a single metabarcoding study has been conducted to compare these sequencing platforms (DNBSEQ-G400 and Illumina MiSeq) for recovering rRNA gene 16S and Anslan et al. (2021), PeerJ, DOI 10.7717/peerj.12254 2/21 ITS amplicons of bacterial and fungal mock communities (Sun et al., 2021). For the ITS2 amplicon, Sun et al. (2021) reported small but significant differences between DNBSEQ-G400 and MiSeq platforms, but this difference can be attributed t (...truncated)


This is a preview of a remote PDF: https://peerj.com/articles/12254.pdf
Article home page: https://doaj.org/article/3d4ec68d2e2141a6a162ae66de40cc45

Sten Anslan, Vladimir Mikryukov, Kęstutis Armolaitis, Jelena Ankuda, Dagnija Lazdina, Kristaps Makovskis, Lars Vesterdal, Inger Kappel Schmidt, Leho Tedersoo. Highly comparable metabarcoding results from MGI-Tech and Illumina sequencing platforms, PeerJ, 2021, pp. e12254, Issue 9, DOI: 10.7717/peerj.12254