Tracking crop varieties using genotyping-by-sequencing markers: a case study using cassava (Manihot esculenta Crantz) (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.biomedcentral.com/content/pdf/s12863-015-0273-1.pdf

Tracking crop varieties using genotyping-by-sequencing markers: a case study using cassava (Manihot esculenta Crantz)

Rabbi et al. BMC Genetics (2015) 16:115 DOI 10.1186/s12863-015-0273-1 RESEARCH ARTICLE Open Access Tracking crop varieties using genotypingby-sequencing markers: a case study using cassava (Manihot esculenta Crantz) Ismail Y. Rabbi1*, Peter A. Kulakow1, Joseph A. Manu-Aduening2, Ansong A. Dankyi3, James Y. Asibuo2, Elizabeth Y. Parkes1, Tahirou Abdoulaye1, Gezahegn Girma1, Melaku A. Gedil1, Punna Ramu4, Byron Reyes5 and Mywish K. Maredia6 Abstract Background: Accurate identification of crop cultivars is crucial in assessing the impact of crop improvement research outputs. Two commonly used identification approaches, elicitation of variety names from farmer interviews and morphological plant descriptors, have inherent uncertainty levels. Genotyping-by-sequencing (GBS) was used in a case study as an alternative method to track released varieties in farmers’ fields, using cassava, a clonally propagated root crop widely grown in the tropics, and often disseminated through extension services and informal seed systems. A total of 917 accessions collected from 495 farming households across Ghana were genotyped at 56,489 SNP loci along with a “reference library” of 64 accessions of released varieties and popular landraces. Results: Accurate cultivar identification and ancestry estimation was accomplished through two complementary clustering methods: (i) distance-based hierarchical clustering; and (ii) model-based maximum likelihood admixture analysis. Subsequently, 30 % of the identified accessions from farmers’ fields were matched to specific released varieties represented in the reference library. ADMIXTURE analysis revealed that the optimum number of major varieties was 11 and matched the hierarchical clustering results. The majority of the accessions (69 %) belonged purely to one of the 11 groups, while the remaining accessions showed two or more ancestries. Further analysis using subsets of SNP markers reproduced results obtained from the full-set of markers, suggesting that GBS can be done at higher DNA multiplexing, thereby reducing the costs of variety fingerprinting. A large proportion of discrepancy between genetically unique cultivars as identified by markers and variety names as elicited from farmers were observed. Clustering results from ADMIXTURE analysis was validated using the assumption-free Discriminant Analysis of Principal Components (DAPC) method. Conclusion: We show that genome-wide SNP markers from increasingly affordable GBS methods coupled with complementary cluster analysis is a powerful tool for fine-scale population structure analysis and variety identification. Moreover, the ancestry estimation provides a framework for quantifying the contribution of exotic germplasm or older improved varieties to the genetic background of contemporary improved cultivars. Keywords: Cassava, Variety identification, Impact assessment, Genotyping-by-sequencing, Ancestry estimations * Correspondence: 1 International Institute of Tropical Agriculture (IITA), PMB 5320 Ibadan, Nigeria Full list of author information is available at the end of the article © 2015 Rabbi et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Rabbi et al. BMC Genetics (2015) 16:115 Background Agricultural productivity in developing countries is affected by limited access to improved varieties, in addition to biotic, abiotic constraints and sub-optimal agronomic practices [1, 2]. Successful dissemination and adoption of improved varieties from both private and public breeding programs is expected to contribute positively to farm-level productivity and income generation. It is the role of household level impact assessment studies, particularly collection of variety specific adoption data, to determine whether this is happening [3, 4]. Traditionally, estimation of improved variety adoption in socio-economic impact studies relies mostly on: expert opinion of breeders, extension services and other experts; elicited responses from farmers in farmer-level surveys; and morphological descriptors. However, such methods have several inherent uncertainty levels. For example, variety naming systems in the absence of formal seed systems can be quite temporally and spatially variable leading to inconsistencies in the names of a particular variety. Also, environmental conditions and different stages of plant development influence morphological descriptors [5, 6]. Finally, the number of descriptors can be quite limited as varieties are developed to conform to desired ideotypes, thus greatly reducing the power to distinguish consanguineous varieties [7]. These challenges can be overcome by using molecular markers which are not only unaffected by the environmental factors and crop developmental stages but are also ubiquitous throughout plant genomes. Genomewide markers, like single nucleotide polymorphisms (SNP), not only facilitate germplasm classification using genetic distance estimates but can also be used to quantify the relative proportion of ancestries derived from various founder genotypes of currently grown cultivars [8]. Such inferences of ancestries are useful in understanding and/or reconstructing the evolution of successful varieties, either landraces or products of formal breeding programs that lack breeding pedigree records or where the varieties are derived from open-pollinated breeding methods [9]. In the context of impact assessment of a specific breeding program, ancestry inferences can be useful in estimating the benefits resulting from the usage of its improved germplasm by other programs [10]. This is because improved germplasm often moves easily throughout the network of plant breeding systems, resulting in research spill-over benefits. In the past, simple sequence repeats and anonymous markers such as amplified fragment length polymorphisms and randomly amplified DNA polymorphisms have been used in DNA-based fingerprinting applications [11]. However, due to inadequacies of these markers, including limited multiplexing ability, high genotyping costs and low frequency in the genome, they Page 2 of 11 are increasingly being displaced by SNP markers generated from next-generation sequencing using reduced representation library (RRLs) methods. These recent methods rely on restriction enzymes to target a specific and reproducible subset of the genome for sequencing, thus allowing for simultaneous discovery and scoring of large numbers of markers. Genotyp (...truncated)