Estimating intraspecific genetic diversity from community DNA metabarcoding data
Estimating intraspecific genetic diversity
from community DNA metabarcoding
data
Vasco Elbrecht1,2, Ecaterina Edith Vamos1, Dirk Steinke2 and
Florian Leese1,3
1
Aquatic Ecosystem Research, University of Duisburg-Essen, Essen, North Rhine-Westphalia,
Germany
2
Centre for Biodiversity Genomics, University of Guelph, Guelph, ON, Canada
3
Centre for Water and Environmental Research (ZWU) Essen, University of Duisburg-Essen,
Essen, North Rhine-Westphalia, Germany
ABSTRACT
Submitted 10 February 2018
Accepted 28 March 2018
Published 9 April 2018
Corresponding author
Vasco Elbrecht,
Academic editor
Donald Baird
Additional Information and
Declarations can be found on
page 9
DOI 10.7717/peerj.4644
Copyright
2018 Elbrecht et al.
Distributed under
Creative Commons CC-BY 4.0
Background: DNA metabarcoding is used to generate species composition data for
entire communities. However, sequencing errors in high-throughput sequencing
instruments are fairly common, usually requiring reads to be clustered into operational
taxonomic units (OTUs), losing information on intraspecific diversity in the process.
While Cytochrome c oxidase subunit I (COI) haplotype information is limited in
resolving intraspecific diversity it is nevertheless often useful e.g. in a phylogeographic
context, helping to formulate hypotheses on taxon distribution and dispersal.
Methods: This study combines sequence denoising strategies, normally applied in
microbial research, with additional abundance-based filtering to extract haplotype
information from freshwater macroinvertebrate metabarcoding datasets. This novel
approach was added to the R package “JAMP” and can be applied to COI amplicon
datasets. We tested our haplotyping method by sequencing (i) a single-species mock
community composed of 31 individuals with 15 different haplotypes spanning three
orders of magnitude in biomass and (ii) 18 monitoring samples each amplified
with four different primer sets and two PCR replicates.
Results: We detected all 15 haplotypes of the single specimens in the mock
community with relaxed filtering and denoising settings. However, up to 480
additional unexpected haplotypes remained in both replicates. Rigorous filtering
removes most unexpected haplotypes, but also can discard expected haplotypes
mainly from the small specimens. In the monitoring samples, the different primer
sets detected 177–200 OTUs, each containing an average of 2.40–3.30 haplotypes
per OTU. The derived intraspecific diversity data showed population structures that
were consistent between replicates and similar between primer pairs but resolution
depended on the primer length. A closer look at abundant taxa in the dataset
revealed various population genetic patterns, e.g. the stonefly Taeniopteryx nebulosa
and the caddisfly Hydropsyche pellucidula showed a distinct north–south cline with
respect to haplotype distribution, while the beetle Oulimnius tuberculatus and the
isopod Asellus aquaticus displayed no clear population pattern but differed in genetic
diversity.
Discussion: We developed a strategy to infer intraspecific genetic diversity from
bulk invertebrate metabarcoding data. It needs to be stressed that at this point this
How to cite this article Elbrecht et al. (2018), Estimating intraspecific genetic diversity from community DNA metabarcoding data. PeerJ
6:e4644; DOI 10.7717/peerj.4644
metabarcoding-informed haplotyping is not capable of capturing the full diversity
present in such samples, due to variation in specimen size, primer bias and loss of
sequence variants with low abundance. Nevertheless, for a high number of species
intraspecific diversity was recovered, identifying potentially isolated populations and
taxa for further more detailed phylogeographic investigation. While we are currently
lacking large-scale metabarcoding datasets to fully take advantage of our new
approach, metabarcoding-informed haplotyping holds great promise for
biomonitoring efforts that not only seek information about species diversity but also
underlying genetic diversity.
Subjects Biogeography, Bioinformatics, Molecular Biology, Freshwater Biology
Keywords Metabarcoding, High-throughput sequencing, Population genetics, Haplotyping,
Ecosystem assessment, Exact sequence variant, CO1
INTRODUCTION
High-throughput analysis of DNA barcodes retrieved from environmental samples,
i.e. DNA metabarcoding, allows for the rapid and standardized assessment of community
composition without the need for morpho-taxonomy (Taberlet et al., 2012a; Creer et al.,
2016). This new surge of data enables biodiversity surveys at speeds and scales that
were previously inconceivable in ecological and evolutionary studies. While the approach
has major strengths and is generally regarded as a game changer for ecological research
(Creer et al., 2016), it still has limitations such as the fact that sequences are typically
clustered into operational taxonomic units (OTUs, Fig. S1) thereby ignoring any
intraspecific sequence variation (Callahan, McMurdie & Holmes, 2017). However,
clustering is often used to reduce the influence of PCR and sequencing errors that can
otherwise generate false OTUs (Edgar, 2013). The inability to detect sequence variation
within OTUs hampers our ability to detect impacts at population level. Simultaneous
assessment of inter- and intraspecific diversity, however, represents a leap forward in
ecological research and management because haplotype data are direct proxies for
spatio-temporal dynamics of populations and both parameters can differ substantially
(Taberlet et al., 2012b). In particular the assessment of fragmentation (Weiss & Leese, 2016)
or changes in population size in response to environmental impacts are key areas of
basic and applied ecological research (Sutherland et al., 2012). For management, this
parameter is also important because genetic variation is typically lost long before species
or OTUs disappear (Bálint et al., 2011). Unfortunately, methods to extract haplotype
information from metabarcoding datasets are generally not widely available and thus
most studies are based on single-specimen analyses. Some of those are based on denoising
algorithms capable of distinguishing between true haplotypes and sequencing noise
(Tikhonov, Leach & Wingreen, 2015; Eren et al., 2015; Edgar, 2016; Callahan et al., 2016;
Amir et al., 2017) and have been tested for microbial samples (Eren et al., 2015; Callahan
et al., 2016; Needham, Sachdeva & Fuhrman, 2017). Wares & Pappalardo (2016) suggested
that haplotype information in metazoan datasets can be used to, for instance, improve
taxa abundance estimates, which was successfully demonstrated with freshwater fish
Elbrecht et al. (2018), PeerJ, DOI 10.7717/peerj.4644
2/13
fecal samples (Corse et al., 2017). Recent studies were also able to infer haplotypes with
metabarcoding for single specimens (Shokralla et al., 2014), arthropod bulk samples
(Elbrecht & Leese, 2015; Pedro et al., 2017) and environmen (...truncated)