Does the DNA barcoding gap exist? – a case study in blue butterflies (Lepidoptera: Lycaenidae)
Frontiers in Zoology
Does the DNA barcoding gap exist? - a case study in blue butterflies (Lepidoptera: Lycaenidae) Martin Wiemers* and Konrad Fiedler
0 Address: Department of Population Ecology, Faculty of Life Sciences, University of Vienna , Althanstrasse 14, 1090 Vienna , Austria
Background: DNA barcoding, i.e. the use of a 648 bp section of the mitochondrial gene cytochrome c oxidase I, has recently been promoted as useful for the rapid identification and discovery of species. Its success is dependent either on the strength of the claim that interspecific variation exceeds intraspecific variation by one order of magnitude, thus establishing a "barcoding gap", or on the reciprocal monophyly of species. Results: We present an analysis of intra- and interspecific variation in the butterfly family Lycaenidae which includes a well-sampled clade (genus Agrodiaetus) with a peculiar characteristic: most of its members are karyologically differentiated from each other which facilitates the recognition of species as reproductively isolated units even in allopatric populations. The analysis shows that there is an 18% overlap in the range of intra- and interspecific COI sequence divergence due to low interspecific divergence between many closely related species. In a Neighbour-Joining tree profile approach which does not depend on a barcoding gap, but on comprehensive sampling of taxa and the reciprocal monophyly of species, at least 16% of specimens with conspecific sequences in the profile were misidentified. This is due to paraphyly or polyphyly of conspecific DNA sequences probably caused by incomplete lineage sorting. Conclusion: Our results indicate that the "barcoding gap" is an artifact of insufficient sampling across taxa. Although DNA barcodes can help to identify and distinguish species, we advocate using them in combination with other data, since otherwise there would be a high probability that sequences are misidentified. Although high differences in DNA sequences can help to identify cryptic species, a high percentage of well-differentiated species has similar or even identical COI sequences and would be overlooked in an isolated DNA barcoding approach.
-
Background
Molecular tools have provided a plethora of new
opportunities to study questions in evolutionary biology (e.g.
speciation processes) and in phylogenetic systematics. Only
recently, however, have claims been made that the
sequencing of a small (648 bp) fragment at the 5' end of
the gene cytochrome c oxidase subunit 1 (COI or cox1)
from the mitochondrial genome would be sufficient in
most Metazoa to identify them to the species level [1,2].
This approach called "DNA barcoding" has gained
momentum and the "Consortium for the Bar Code of Life
(CBOL)" founded in September 2004 intends to create a
global biodiversity barcode database in order to facilitate
automated species identifications. Right from the start,
however, this approach received opposition, especially
from the taxonomists' community [3-8]. Some arguments
in this debate are political in nature, others have a
scientific basis. Concerning the latter, one of the most essential
arguments focuses on the so-called "barcoding gap".
Advocates of barcoding claim that interspecific genetic
variation exceeds intraspecific variation to such an extent
that a clear gap exists which enables the assignment of
unidentified individuals to their species with a negligible
error rate [1,9,10]. The errors are attributed to a small
number of incipient species pairs with incomplete lineage
sorting (e.g. [11]). As a consequence, establishing the
degree of sequence divergence between two samples
above a given threshold (proposed to be at least 10 times
greater than within species [10]) would indicate specific
distinctness, whereas divergence below such a threshold
would indicate taxonomic identity among the samples.
Furthermore, the existence of a barcoding gap would even
enable the identification of previously undescribed
species ([11-13] but see [14]). Possible errors of this
approach include false positives and false negatives. False
positives occur if populations within one species are
genetically quite distinct, e.g. in distant populations with
limited gene flow or in allopatric populations with
interrupted gene flow. In the latter case it must be noted that,
depending on the amount of morphological
differentiation and the species concept to be applied, such
populations may also qualify as 'cryptic species' in the view of
some scientists. False negatives, in contrast, occur when
little or no sequence variation in the barcoding fragment
is found between different biospecies (= reproductively
isolated population groups sensu Mayr [15]). Hence, false
negatives are more critical for the barcoding approach,
because the existence of such cases would reveal examples
where the barcoding approach is less powerful than the
use of other and more holistic approaches to delimit
species boundaries.
Initial studies on birds [10] and arthropods [9,16]
appeared to corroborate the existence of a distinct
barcoding gap, but two recent studies on gastropods [17] and
flies [18] challenge its existence. The reasons for these
discrepancies are not entirely clear. Although levels of COI
sequence divergence differ between higher taxa (e.g. an
exceptionally low mean COI sequence divergence of only
1.0% was found in congeneric species pairs of Cnidaria
compared to 9.615.7% in other animal phyla [2]),
Mollusca (with 11.1% mean sequence divergence between
species) and Diptera (9.3%) are not peculiar in this
respect. Meyer & Paulay [17] assume that insufficient
sampling on both the interspecific and intraspecific level
create the artifact of a barcode gap. Proponents of barcoding
might argue, however, that the main reason for this
overlap is the poor taxonomy of these groups, e.g. cryptic
species may have been overlooked which are differentiated
genetically but very similar or even identical in
morphology.
If the barcode gap does not exist, then the threshold
approach in barcoding becomes inapplicable. Although
more sophisticated techniques (e.g. using coalescence
theory and statistical population genetic methods [19-21])
can sometimes help to delimit species with overlapping
genetic divergences, these approaches require additional
assumptions (e.g. about the choice of population genetic
models or clustering algorithms) and are only feasible in
well-sampled clades.
Barcoding holds promise nonetheless especially in the
identification of arthropods, the most species-rich animal
phylum in terrestrial ecosystems. Identification of
arthropods is often extremely time-consuming and generally
requires taxonomic specialists for any given group.
Moreover, the fraction of undescribed species is particularly
high, as opposed to vertebrates. Hence, there is substantial
demand for improved (and rapid) identification tools by
scientists who seek identification of large arthropod
samples from complex (...truncated)