Construction of Core Collections Suitable for Association Mapping to Optimize Use of Mediterranean Olive (Olea europaea L.) Genetic Resources
et al. (2013) Construction of Core Collections Suitable for Association Mapping to Optimize
Use of Mediterranean Olive (Olea europaea L.) Genetic Resources. PLoS ONE 8(5): e61265. doi:10.1371/journal.pone.0061265
Construction of Core Collections Suitable for Association Mapping to Optimize Use of Mediterranean Olive (Olea europaea L.) Genetic Resources
Ahmed El Bakkali 0
Hicham Haouane 0
Abdelmajid Moukhli 0
Evelyne Costes 0
Patrick Van 0
Damme 0
Bouchaib Khadari 0
Randall P. Niedz, United States Department of Agriculture, United States of America
0 1 INRA, UMR Ame lioration Ge ne tique et Adaptation des Plantes (AGAP), Montpellier, France, 2 Montpellier SupAgro, UMR AGAP, Montpellier, France, 3 INRA Mekne`s, UR Ame lioration des Plantes et Conservation des Ressources Phytoge ne tiques, Mekne`s, Morocco, 4 Department of Plant Production, Ghent University , Ghent, Belgium, 5 INRA Marrakech , UR Ame lioration des Plantes, Marrakech, Morocco, 6 Institute of Tropics and Subtropics, Czech University of Life Sciences Prague , Prague , Czech Republic , 7 Conservatoire Botanique National Me diterrane en, UMR AGAP , Montpellier , France
Phenotypic characterisation of germplasm collections is a decisive step towards association mapping analyses, but it is particularly expensive and tedious for woody perennial plant species. Characterisation could be more efficient if focused on a reasonably sized subset of accessions, or so-called core collection (CC), reflecting the geographic origin and variability of the germplasm. The questions that arise concern the sample size to use and genetic parameters that should be optimized in a core collection to make it suitable for association mapping. Here we investigated these questions in olive (Olea europaea L.), a perennial fruit species. By testing different sampling methods and sizes in a worldwide olive germplasm bank (OWGB Marrakech, Morocco) containing 502 unique genotypes characterized by nuclear and plastid loci, a two-step sampling method was proposed. The Shannon-Weaver diversity index was found to be the best criterion to be maximized in the first step using the CORE HUNTER program. A primary core collection of 50 entries (CC50) was defined that captured more than 80% of the diversity. This latter was subsequently used as a kernel with the MSTRAT program to capture the remaining diversity. 200 core collections of 94 entries (CC94) were thus built for flexibility in the choice of varieties to be studied. Most entries of both core collections (CC50 and CC94) were revealed to be unrelated due to the low kinship coefficient, whereas a genetic structure spanning the eastern and western/central Mediterranean regions was noted. Linkage disequilibrium was observed in CC94 which was mainly explained by a genetic structure effect as noted for OWGB Marrakech. Since they reflect the geographic origin and diversity of olive germplasm and are of reasonable size, both core collections will be of major interest to develop long-term association studies and thus enhance genomic selection in olive species.
-
Recent advances in genomic tools, including genome
sequencing [1] and high-density single nucleotide polymorphism (SNP)
genotyping [2], and statistical methods have enabled the
development of new approaches for mapping of complex traits.
The identification of causal genes underlying specific traits is a
major goal in plant breeding, subsequently offering opportunities
to develop genomic selection tools [34]. Association mapping
(also known as linkage disequilibrium (LD)-based association
mapping) [5] has been proposed to associate single DNA sequence
changes with traits of interest using collections of unrelated
individuals, as an alternative or complement to quantitative trait
locus (QTL)-mapping (also known as family-based linkage
mapping) [6]. Association mapping has been largely documented
and successfully used to identify the genetic basis of many complex
diseases in humans [7], and is now emerging in plants [89]. It has
the advantage of being rapid and cost effective as many alleles may
be assessed simultaneously, resulting in higher resolution mapping
by the use of most recombination events that occur over time,
while avoiding the need to expensively and tediously develop
crossing populations, particularly for perennial and forest tree
species [10]. The number of markers needed to map specific
associations depends on the extent and distribution of LD within
the species and among linkage groups [5]. Many studies have thus
proposed an estimate of LD in different plant species as a
preliminary step for association analysis [1114]. Association
mapping results obtained in a number of annual species, e.g.
Arabidopsis thaliana [1516], Oryza sativa [1718], Triticum aestivum
[19] and Zea mays [2021], indicate that the approach is promising
to identify markers correlated with desirable traits such as
flowering time [1516,20], seed morphology [19,22] and disease
resistance [15,2324]. However, for woody and perennial species,
studies have been performed on a limited number of species, such
as Pinus taeda L. [25], Eucalyptus spp. [26] and Prunus persica [27].
Beyond the importance of ex situ conservation of genetic
resources to avoid genetic erosion and provide plant breeders
with easy access to study ranges of variation in phenotypic traits,
germplasm collections could serve as a reservoir of outstanding
genes to enhance agronomic traits so as to meet the needs of
diverse agricultural systems. However, field evaluation and use of
large germplasm collections for association mapping purposes are
mostly constrained by problems of accession redundancy,
economic cost and time, especially for clonally propagated
perennial species where clones have to be maintained and
evaluated for several years at different sites. Genetic resource
assessments could thus be more rational if focused on a subset of
accessions, or so-called core collection (CC; also known as core
subset), which includes in the sample as much variability present in
the whole collection as possible with minimal size [28].
Determining the best sample size to use and genetic criteria to be
optimized for association mapping in one core collection is an
open issue requiring further investigation, especially for perennial
species. Over the last decade, several core subsets have been
proposed for both annual species, e.g. Arabidopsis thaliania [29],
Oryza sativa [30], Triticum aestivum [31] and Zea mays [32], and
perennial species, e.g. Annona cherimola [33], Malus domestica [34],
Prunus armeniaca [35] and Vitis vinifera [36], using different
ecogeographical, agro-morphological, biochemical or molecular data.
Despite the many approaches used to design core collections that
optimize the genetic distance between accessions and/or the allelic
diversity [3744], most of core collections have been constructed
based on the so-called maximizing method (M-method) [37]
through the MSTRAT program [40] by (...truncated)