Estimating intraspecific genetic diversity from community DNA metabarcoding data (pdf)

Article PDF cannot be displayed. You can download it here:

Estimating intraspecific genetic diversity from community DNA metabarcoding data

Estimating intraspecific genetic diversity from community DNA metabarcoding data Vasco Elbrecht1,2, Ecaterina Edith Vamos1, Dirk Steinke2 and Florian Leese1,3 1 Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, North Rhine-Westphalia, Germany 2 Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada 3 Centre for Water and Environmental Research (ZWU) Essen, University of Duisburg-Essen, Essen, North Rhine-Westphalia, Germany ABSTRACT Submitted 10 February 2018 Accepted 28 March 2018 Published 9 April 2018 Corresponding author Vasco Elbrecht, Academic editor Donald Baird Additional Information and Declarations can be found on page 9 DOI 10.7717/peerj.4644 Copyright 2018 Elbrecht et al. Distributed under Creative Commons CC-BY 4.0 Background: DNA metabarcoding is used to generate species composition data for entire communities. However, sequencing errors in high-throughput sequencing instruments are fairly common, usually requiring reads to be clustered into operational taxonomic units (OTUs), losing information on intraspecific diversity in the process. While Cytochrome c oxidase subunit I (COI) haplotype information is limited in resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic context, helping to formulate hypotheses on taxon distribution and dispersal. Methods: This study combines sequence denoising strategies, normally applied in microbial research, with additional abundance-based filtering to extract haplotype information from freshwater macroinvertebrate metabarcoding datasets. This novel approach was added to the R package “JAMP” and can be applied to COI amplicon datasets. We tested our haplotyping method by sequencing (i) a single-species mock community composed of 31 individuals with 15 different haplotypes spanning three orders of magnitude in biomass and (ii) 18 monitoring samples each amplified with four different primer sets and two PCR replicates. Results: We detected all 15 haplotypes of the single specimens in the mock community with relaxed filtering and denoising settings. However, up to 480 additional unexpected haplotypes remained in both replicates. Rigorous filtering removes most unexpected haplotypes, but also can discard expected haplotypes mainly from the small specimens. In the monitoring samples, the different primer sets detected 177–200 OTUs, each containing an average of 2.40–3.30 haplotypes per OTU. The derived intraspecific diversity data showed population structures that were consistent between replicates and similar between primer pairs but resolution depended on the primer length. A closer look at abundant taxa in the dataset revealed various population genetic patterns, e.g. the stonefly Taeniopteryx nebulosa and the caddisfly Hydropsyche pellucidula showed a distinct north–south cline with respect to haplotype distribution, while the beetle Oulimnius tuberculatus and the isopod Asellus aquaticus displayed no clear population pattern but differed in genetic diversity. Discussion: We developed a strategy to infer intraspecific genetic diversity from bulk invertebrate metabarcoding data. It needs to be stressed that at this point this How to cite this article Elbrecht et al. (2018), Estimating intraspecific genetic diversity from community DNA metabarcoding data. PeerJ 6:e4644; DOI 10.7717/peerj.4644 metabarcoding-informed haplotyping is not capable of capturing the full diversity present in such samples, due to variation in specimen size, primer bias and loss of sequence variants with low abundance. Nevertheless, for a high number of species intraspecific diversity was recovered, identifying potentially isolated populations and taxa for further more detailed phylogeographic investigation. While we are currently lacking large-scale metabarcoding datasets to fully take advantage of our new approach, metabarcoding-informed haplotyping holds great promise for biomonitoring efforts that not only seek information about species diversity but also underlying genetic diversity. Subjects Biogeography, Bioinformatics, Molecular Biology, Freshwater Biology Keywords Metabarcoding, High-throughput sequencing, Population genetics, Haplotyping, Ecosystem assessment, Exact sequence variant, CO1 INTRODUCTION High-throughput analysis of DNA barcodes retrieved from environmental samples, i.e. DNA metabarcoding, allows for the rapid and standardized assessment of community composition without the need for morpho-taxonomy (Taberlet et al., 2012a; Creer et al., 2016). This new surge of data enables biodiversity surveys at speeds and scales that were previously inconceivable in ecological and evolutionary studies. While the approach has major strengths and is generally regarded as a game changer for ecological research (Creer et al., 2016), it still has limitations such as the fact that sequences are typically clustered into operational taxonomic units (OTUs, Fig. S1) thereby ignoring any intraspecific sequence variation (Callahan, McMurdie & Holmes, 2017). However, clustering is often used to reduce the influence of PCR and sequencing errors that can otherwise generate false OTUs (Edgar, 2013). The inability to detect sequence variation within OTUs hampers our ability to detect impacts at population level. Simultaneous assessment of inter- and intraspecific diversity, however, represents a leap forward in ecological research and management because haplotype data are direct proxies for spatio-temporal dynamics of populations and both parameters can differ substantially (Taberlet et al., 2012b). In particular the assessment of fragmentation (Weiss & Leese, 2016) or changes in population size in response to environmental impacts are key areas of basic and applied ecological research (Sutherland et al., 2012). For management, this parameter is also important because genetic variation is typically lost long before species or OTUs disappear (Bálint et al., 2011). Unfortunately, methods to extract haplotype information from metabarcoding datasets are generally not widely available and thus most studies are based on single-specimen analyses. Some of those are based on denoising algorithms capable of distinguishing between true haplotypes and sequencing noise (Tikhonov, Leach & Wingreen, 2015; Eren et al., 2015; Edgar, 2016; Callahan et al., 2016; Amir et al., 2017) and have been tested for microbial samples (Eren et al., 2015; Callahan et al., 2016; Needham, Sachdeva & Fuhrman, 2017). Wares & Pappalardo (2016) suggested that haplotype information in metazoan datasets can be used to, for instance, improve taxa abundance estimates, which was successfully demonstrated with freshwater fish Elbrecht et al. (2018), PeerJ, DOI 10.7717/peerj.4644 2/13 fecal samples (Corse et al., 2017). Recent studies were also able to infer haplotypes with metabarcoding for single specimens (Shokralla et al., 2014), arthropod bulk samples (Elbrecht & Leese, 2015; Pedro et al., 2017) and environmen (...truncated)