Delineating Species with DNA Barcodes: A Case of Taxon Dependent Method Performance in Moths
April
Delineating Species with DNA Barcodes: A Case of Taxon Dependent Method Performance in Moths
Mari Kekkonen 0 1
Marko Mutanen 0 1
Lauri Kaila 0 1
Marko Nieminen 0 1
Paul D. N. Hebert 0 1
0 1 Finnish Museum of Natural History, University of Helsinki, Zoology Unit, University of Helsinki , Helsinki , Finland , 2 Biodiversity Institute of Ontario, University of Guelph , Guelph, Ontario , Canada , 3 Department of Genetics and Physiology, University of Oulu , Oulu , Finland , 4 Metapopulation Research Centre, Department of Biosciences, University of Helsinki , Helsinki , Finland
1 Academic Editor: Bernd Schierwater, University of Veterinary Medicine Hanover , GERMANY
The accelerating loss of biodiversity has created a need for more effective ways to discover species. Novel algorithmic approaches for analyzing sequence data combined with rapidly expanding DNA barcode libraries provide a potential solution. While several analytical methods are available for the delineation of operational taxonomic units (OTUs), few studies have compared their performance. This study compares the performance of one morphology-based and four DNA-based (BIN, parsimony networks, ABGD, GMYC) methods on two groups of gelechioid moths. It examines 92 species of Finnish Gelechiinae and 103 species of Australian Elachistinae which were delineated by traditional taxonomy. The results reveal a striking difference in performance between the two taxa with all four DNA-based methods. OTU counts in the Elachistinae showed a wider range and a relatively low (ca. 65%) OTU match with reference species while OTU counts were more congruent and performance was higher (ca. 90%) in the Gelechiinae. Performance rose when only monophyletic species were compared, but the taxon-dependence remained. None of the DNA-based methods produced a correct match with non-monophyletic species, but singletons were handled well. A simulated test of morphospecies-grouping performed very poorly in revealing taxon diversity in these small, dull-colored moths. Despite the strong performance of analyses based on DNA barcodes, species delineated using single-locus mtDNA data are best viewed as OTUs that require validation by subsequent integrative taxonomic work.
-
OPEN ACCESS
Introduction
www.hok-elanto.fi/in-brief/), and the government of
Canada through Genome Canada and the Ontario
Genomics Institute in support of the International
Barcode of Life project PDNH (http://www.
genomecanada.ca/en/). The funders had no role in
study design, data collection and analysis, decision to
publish, or preparation of the manuscript.
Competing Interests: HOK-Elanto has been a
sponsor for Finnish Museum of Natural History,
sponsoring without specifications activities of the
institute. It was an internal decision in the Finnish
Museum of Natural History to support the publications
in the form of financing a collecting trip of Dr. Lauri
Kaila to Australia, with no communication with the
sponsor regarding this decision. Therefore, the
authors declare impartiality and no influence by
HOKElanto regarding the support for the activity relating to
this publication. This does not alter the authors'
adherence to PLOS ONE policies on sharing data
and materials.
coupling of novel analytical methods with the rapid increase in data provided by DNA
barcoding is creating a tremendous opportunity for taxonomists and biodiversity scientists. Large
barcode datasets enable the delineation of hundreds or even thousands of putative species (i.e.,
operational taxonomic units, OTUs) simultaneously, allowing species recognition to proceed
far more rapidly than through morphological approaches. Faced with accelerating losses of
biodiversity, this increase in the efficiency of taxonomic workflows is acutely needed. Initial OTU
delineation generates a good estimate of species diversity and provides a framework for
subsequent taxonomic revisions (e.g., [17]).
Some methods available for species delineation are inappropriate for use with single-locus
data (e.g., bpp [5]). Other methods, those requiring a priori defined groups (e.g., Population
Aggregation Analysis [18]), cannot be employed for species discovery. However, a number of
analytical approaches can be used for species delineation with single-locus data and they can
be divided into three primary categories: clustering, tree-based and character-based. Clustering
methods, the dominant category, employ diverse algorithms to recognize boundaries in
distance matrices. This category includes, for instance, statistical parsimony networks (referred to
here as TCS [19,20]), jMOTU [7], Clustering 16S rRNA for OTU Prediction (CROP [6]),
Automatic Barcode Gap Discovery (ABGD [8]), and Barcode Index Number (BIN [9]). By
comparison, tree-based methods, such as the Generalized Mixed Yule Coalescent (GMYC [3,4,21]),
and Poisson Tree Processes (PTP [10]), employ a gene tree as input for the analysis. The third
category, character-based methods, employs diagnostic base substitutions as a basis for
decisions. To our knowledge, Character Attribute Organization System, CAOS [2224] is the only
available character-based method for testing species boundaries, although it also requires a
priori defined groups so it cannot be used for their discovery. Cluster and tree-based approaches
have become the dominant approaches used in studies of species delineation (bacteria [25],
corals [26], molluscs [2733], millipedes [34], spiders [35], insects [3643], amphibians [44],
bats [45], orchids [46]).
The relative performance of differing algorithmic approaches to species delineation has
been examined in a few past studies. For example, it has been noted that GMYC produces
more OTUs than TCS or ABGD [3,8,47,28,36,37] (but see [39]). When clusters recognized by
GMYC have been compared with morphospecies, the conclusions have been variable. Early
results showed high congruence between morphology and GMYC [3,4,36], but subsequent
studies have indicated that GMYC often delivers a higher species count than morphology
[29,44,45,48]. The largest comparison to date [9] examined one tree-based (GMYC) and four
clustering (BIN, ABGD, jMOTU, CROP) methods with eight datasets comprising over 3000
species and close to 19000 DNA barcode sequences. This study revealed high performance for
all methods with BIN slightly outperforming the other clustering methods, but similar to
GMYC. In accordance with other studies, GMYC produced more splits than alternate
methods. Zhang et al. [10] proposed that tree-based methods should outperform clustering methods
in species assemblages lacking a barcode gap, the break between intra- and interspecific
pairwise distances that underpins the success of DNA barcoding [12]. The lack of a gap is generally
linked to recently diverged species with little genetic diversification, often coupled with
incomplete lineage sorting and introgression [49,50]. In addition, it should be noted that incomplete
lineage sorting and/or introgressi (...truncated)