Tracking crop varieties using genotyping-by-sequencing markers: a case study using cassava (Manihot esculenta Crantz)
Rabbi et al. BMC Genetics (2015) 16:115
DOI 10.1186/s12863-015-0273-1
RESEARCH ARTICLE
Open Access
Tracking crop varieties using genotypingby-sequencing markers: a case study using
cassava (Manihot esculenta Crantz)
Ismail Y. Rabbi1*, Peter A. Kulakow1, Joseph A. Manu-Aduening2, Ansong A. Dankyi3, James Y. Asibuo2,
Elizabeth Y. Parkes1, Tahirou Abdoulaye1, Gezahegn Girma1, Melaku A. Gedil1, Punna Ramu4, Byron Reyes5
and Mywish K. Maredia6
Abstract
Background: Accurate identification of crop cultivars is crucial in assessing the impact of crop improvement research
outputs. Two commonly used identification approaches, elicitation of variety names from farmer interviews and
morphological plant descriptors, have inherent uncertainty levels. Genotyping-by-sequencing (GBS) was used in a case
study as an alternative method to track released varieties in farmers’ fields, using cassava, a clonally propagated root
crop widely grown in the tropics, and often disseminated through extension services and informal seed systems. A
total of 917 accessions collected from 495 farming households across Ghana were genotyped at 56,489 SNP loci along
with a “reference library” of 64 accessions of released varieties and popular landraces.
Results: Accurate cultivar identification and ancestry estimation was accomplished through two complementary
clustering methods: (i) distance-based hierarchical clustering; and (ii) model-based maximum likelihood admixture
analysis. Subsequently, 30 % of the identified accessions from farmers’ fields were matched to specific released varieties
represented in the reference library. ADMIXTURE analysis revealed that the optimum number of major varieties was 11
and matched the hierarchical clustering results. The majority of the accessions (69 %) belonged purely to one of the 11
groups, while the remaining accessions showed two or more ancestries. Further analysis using subsets of SNP markers
reproduced results obtained from the full-set of markers, suggesting that GBS can be done at higher DNA multiplexing,
thereby reducing the costs of variety fingerprinting. A large proportion of discrepancy between genetically unique
cultivars as identified by markers and variety names as elicited from farmers were observed. Clustering results from
ADMIXTURE analysis was validated using the assumption-free Discriminant Analysis of Principal Components (DAPC)
method.
Conclusion: We show that genome-wide SNP markers from increasingly affordable GBS methods coupled with
complementary cluster analysis is a powerful tool for fine-scale population structure analysis and variety identification.
Moreover, the ancestry estimation provides a framework for quantifying the contribution of exotic germplasm or older
improved varieties to the genetic background of contemporary improved cultivars.
Keywords: Cassava, Variety identification, Impact assessment, Genotyping-by-sequencing, Ancestry estimations
* Correspondence:
1
International Institute of Tropical Agriculture (IITA), PMB 5320 Ibadan, Nigeria
Full list of author information is available at the end of the article
© 2015 Rabbi et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Rabbi et al. BMC Genetics (2015) 16:115
Background
Agricultural productivity in developing countries is affected by limited access to improved varieties, in
addition to biotic, abiotic constraints and sub-optimal
agronomic practices [1, 2]. Successful dissemination and
adoption of improved varieties from both private and
public breeding programs is expected to contribute positively to farm-level productivity and income generation.
It is the role of household level impact assessment studies, particularly collection of variety specific adoption
data, to determine whether this is happening [3, 4].
Traditionally, estimation of improved variety adoption
in socio-economic impact studies relies mostly on: expert opinion of breeders, extension services and other
experts; elicited responses from farmers in farmer-level
surveys; and morphological descriptors. However, such
methods have several inherent uncertainty levels. For example, variety naming systems in the absence of formal
seed systems can be quite temporally and spatially variable leading to inconsistencies in the names of a particular variety. Also, environmental conditions and different
stages of plant development influence morphological descriptors [5, 6]. Finally, the number of descriptors can be
quite limited as varieties are developed to conform to
desired ideotypes, thus greatly reducing the power to
distinguish consanguineous varieties [7].
These challenges can be overcome by using molecular
markers which are not only unaffected by the environmental factors and crop developmental stages but are
also ubiquitous throughout plant genomes. Genomewide markers, like single nucleotide polymorphisms
(SNP), not only facilitate germplasm classification using
genetic distance estimates but can also be used to quantify the relative proportion of ancestries derived from
various founder genotypes of currently grown cultivars
[8]. Such inferences of ancestries are useful in understanding and/or reconstructing the evolution of successful varieties, either landraces or products of formal
breeding programs that lack breeding pedigree records
or where the varieties are derived from open-pollinated
breeding methods [9]. In the context of impact assessment of a specific breeding program, ancestry inferences
can be useful in estimating the benefits resulting from
the usage of its improved germplasm by other programs
[10]. This is because improved germplasm often moves
easily throughout the network of plant breeding systems,
resulting in research spill-over benefits.
In the past, simple sequence repeats and anonymous
markers such as amplified fragment length polymorphisms and randomly amplified DNA polymorphisms
have been used in DNA-based fingerprinting applications [11]. However, due to inadequacies of these
markers, including limited multiplexing ability, high
genotyping costs and low frequency in the genome, they
Page 2 of 11
are increasingly being displaced by SNP markers generated from next-generation sequencing using reduced
representation library (RRLs) methods. These recent
methods rely on restriction enzymes to target a specific
and reproducible subset of the genome for sequencing,
thus allowing for simultaneous discovery and scoring of
large numbers of markers. Genotyp (...truncated)