Delineating Species with DNA Barcodes: A Case of Taxon Dependent Method Performance in Moths

PLOS ONE, Dec 2019

The accelerating loss of biodiversity has created a need for more effective ways to discover species. Novel algorithmic approaches for analyzing sequence data combined with rapidly expanding DNA barcode libraries provide a potential solution. While several analytical methods are available for the delineation of operational taxonomic units (OTUs), few studies have compared their performance. This study compares the performance of one morphology-based and four DNA-based (BIN, parsimony networks, ABGD, GMYC) methods on two groups of gelechioid moths. It examines 92 species of Finnish Gelechiinae and 103 species of Australian Elachistinae which were delineated by traditional taxonomy. The results reveal a striking difference in performance between the two taxa with all four DNA-based methods. OTU counts in the Elachistinae showed a wider range and a relatively low (ca. 65%) OTU match with reference species while OTU counts were more congruent and performance was higher (ca. 90%) in the Gelechiinae. Performance rose when only monophyletic species were compared, but the taxon-dependence remained. None of the DNA-based methods produced a correct match with non-monophyletic species, but singletons were handled well. A simulated test of morphospecies-grouping performed very poorly in revealing taxon diversity in these small, dull-colored moths. Despite the strong performance of analyses based on DNA barcodes, species delineated using single-locus mtDNA data are best viewed as OTUs that require validation by subsequent integrative taxonomic work.

Delineating Species with DNA Barcodes: A Case of Taxon Dependent Method Performance in Moths

April Delineating Species with DNA Barcodes: A Case of Taxon Dependent Method Performance in Moths Mari Kekkonen 0 1 Marko Mutanen 0 1 Lauri Kaila 0 1 Marko Nieminen 0 1 Paul D. N. Hebert 0 1 0 1 Finnish Museum of Natural History, University of Helsinki, Zoology Unit, University of Helsinki , Helsinki , Finland , 2 Biodiversity Institute of Ontario, University of Guelph , Guelph, Ontario , Canada , 3 Department of Genetics and Physiology, University of Oulu , Oulu , Finland , 4 Metapopulation Research Centre, Department of Biosciences, University of Helsinki , Helsinki , Finland 1 Academic Editor: Bernd Schierwater, University of Veterinary Medicine Hanover , GERMANY The accelerating loss of biodiversity has created a need for more effective ways to discover species. Novel algorithmic approaches for analyzing sequence data combined with rapidly expanding DNA barcode libraries provide a potential solution. While several analytical methods are available for the delineation of operational taxonomic units (OTUs), few studies have compared their performance. This study compares the performance of one morphology-based and four DNA-based (BIN, parsimony networks, ABGD, GMYC) methods on two groups of gelechioid moths. It examines 92 species of Finnish Gelechiinae and 103 species of Australian Elachistinae which were delineated by traditional taxonomy. The results reveal a striking difference in performance between the two taxa with all four DNA-based methods. OTU counts in the Elachistinae showed a wider range and a relatively low (ca. 65%) OTU match with reference species while OTU counts were more congruent and performance was higher (ca. 90%) in the Gelechiinae. Performance rose when only monophyletic species were compared, but the taxon-dependence remained. None of the DNA-based methods produced a correct match with non-monophyletic species, but singletons were handled well. A simulated test of morphospecies-grouping performed very poorly in revealing taxon diversity in these small, dull-colored moths. Despite the strong performance of analyses based on DNA barcodes, species delineated using single-locus mtDNA data are best viewed as OTUs that require validation by subsequent integrative taxonomic work. - OPEN ACCESS Introduction www.hok-elanto.fi/in-brief/), and the government of Canada through Genome Canada and the Ontario Genomics Institute in support of the International Barcode of Life project PDNH (http://www. genomecanada.ca/en/). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Competing Interests: HOK-Elanto has been a sponsor for Finnish Museum of Natural History, sponsoring without specifications activities of the institute. It was an internal decision in the Finnish Museum of Natural History to support the publications in the form of financing a collecting trip of Dr. Lauri Kaila to Australia, with no communication with the sponsor regarding this decision. Therefore, the authors declare impartiality and no influence by HOKElanto regarding the support for the activity relating to this publication. This does not alter the authors' adherence to PLOS ONE policies on sharing data and materials. coupling of novel analytical methods with the rapid increase in data provided by DNA barcoding is creating a tremendous opportunity for taxonomists and biodiversity scientists. Large barcode datasets enable the delineation of hundreds or even thousands of putative species (i.e., operational taxonomic units, OTUs) simultaneously, allowing species recognition to proceed far more rapidly than through morphological approaches. Faced with accelerating losses of biodiversity, this increase in the efficiency of taxonomic workflows is acutely needed. Initial OTU delineation generates a good estimate of species diversity and provides a framework for subsequent taxonomic revisions (e.g., [17]). Some methods available for species delineation are inappropriate for use with single-locus data (e.g., bpp [5]). Other methods, those requiring a priori defined groups (e.g., Population Aggregation Analysis [18]), cannot be employed for species discovery. However, a number of analytical approaches can be used for species delineation with single-locus data and they can be divided into three primary categories: clustering, tree-based and character-based. Clustering methods, the dominant category, employ diverse algorithms to recognize boundaries in distance matrices. This category includes, for instance, statistical parsimony networks (referred to here as TCS [19,20]), jMOTU [7], Clustering 16S rRNA for OTU Prediction (CROP [6]), Automatic Barcode Gap Discovery (ABGD [8]), and Barcode Index Number (BIN [9]). By comparison, tree-based methods, such as the Generalized Mixed Yule Coalescent (GMYC [3,4,21]), and Poisson Tree Processes (PTP [10]), employ a gene tree as input for the analysis. The third category, character-based methods, employs diagnostic base substitutions as a basis for decisions. To our knowledge, Character Attribute Organization System, CAOS [2224] is the only available character-based method for testing species boundaries, although it also requires a priori defined groups so it cannot be used for their discovery. Cluster and tree-based approaches have become the dominant approaches used in studies of species delineation (bacteria [25], corals [26], molluscs [2733], millipedes [34], spiders [35], insects [3643], amphibians [44], bats [45], orchids [46]). The relative performance of differing algorithmic approaches to species delineation has been examined in a few past studies. For example, it has been noted that GMYC produces more OTUs than TCS or ABGD [3,8,47,28,36,37] (but see [39]). When clusters recognized by GMYC have been compared with morphospecies, the conclusions have been variable. Early results showed high congruence between morphology and GMYC [3,4,36], but subsequent studies have indicated that GMYC often delivers a higher species count than morphology [29,44,45,48]. The largest comparison to date [9] examined one tree-based (GMYC) and four clustering (BIN, ABGD, jMOTU, CROP) methods with eight datasets comprising over 3000 species and close to 19000 DNA barcode sequences. This study revealed high performance for all methods with BIN slightly outperforming the other clustering methods, but similar to GMYC. In accordance with other studies, GMYC produced more splits than alternate methods. Zhang et al. [10] proposed that tree-based methods should outperform clustering methods in species assemblages lacking a barcode gap, the break between intra- and interspecific pairwise distances that underpins the success of DNA barcoding [12]. The lack of a gap is generally linked to recently diverged species with little genetic diversification, often coupled with incomplete lineage sorting and introgression [49,50]. In addition, it should be noted that incomplete lineage sorting and/or introgressi (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0122481&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0122481

Mari Kekkonen, Marko Mutanen, Lauri Kaila, Marko Nieminen, Paul D. N. Hebert. Delineating Species with DNA Barcodes: A Case of Taxon Dependent Method Performance in Moths, PLOS ONE, 2015, 4, DOI: 10.1371/journal.pone.0122481