In silico genotyping of the maize nested association mapping population
Baohong Guo
0
1
William D. Beavis
0
1
0
Present Address: B. Guo Syngenta Seeds, Inc, Slater,
IA 50244, USA
1
B. Guo W. D. Beavis (&) Department of Agronomy, Iowa State University
, 1208 Agronomy Hall, Ames,
IA 50011, USA
Nested Association Mapping (NAM) has been proposed as a means to combine the power of linkage mapping with the resolution of association mapping. It is enabled through sequencing or array genotyping of parental inbred lines while using lowcost, low-density genotyping technologies for their segregating progenies. For purposes of data analyses of NAM populations, parental genotypes at a large number of Single Nucleotide Polymorphic (SNP) loci need to be projected to their segregating progeny. Herein we demonstrate how approximately 0.5 million SNPs that have been genotyped in 26 parental lines of the publicly available maize NAM population can be projected onto their segregating progeny using only 1,106 SNP loci that have been genotyped in both the parents and their 5,000 progeny. The challenge is to estimate both the genotype and genetic location of the parental SNP genotypes in segregating progeny. Both challenges were met by estimating their expected genotypic values conditional on observed flanking markers through the use of both physical and linkage maps. About 90%, of 500,000 genotyped SNPs from the maize HapMap project, were assigned linkage map positions using linear interpolation between the maize Accessioned Gold Path (AGP) and NAM linkage maps. Of these, almost 70% provided high probability estimates of genotypes in almost 5,000 recombinant inbred lines.
-
Forward genetic approaches for relating genomic
variability with phenotypic variability can be grouped
as either linkage or association mapping. Because it is
easy to create and maximize linkage disequilibrium
in plant species the former set of methods were
initially referred to as Quantitative Trait Locus (QTL)
mapping, although it is now clear that association
mapping also can be applied to quantitative traits.
Linkage mapping is powerful but of low resolution,
resulting in identifying genomic regions consisting of
about 10 cM, which often consists of tens of millions
of bases for most plant species. With the advent of
high-throughput technologies for resequencing and
genotyping, association mapping has emerged for
species where it is not easy to create linkage
disequilibrium. This approach exploits historical
linkage and recombination accumulated over a large
number of generations (Andersson and Georges
2004). Thus, it can provide high resolution
information that can be used to identify the causative
nucleotides underlying phenotypic variability.
Depending upon the amount of linkage disequilibrium
(LD) across the genome in the breeding population,
association mapping can require genotyping with very
high densities of molecular markers (Yu et al. 2008)
and extremely large samples to achieve reasonable
power (Hirschhorn and Daly 2005; Kingsmore et al.
2008).
A third approach is to combine the power of
linkage mapping with the resolution of association
mapping. This third approach can be thought of as an
extension of the multiple family QTL approach
(Jansen et al. 2003; Blanc et al. 2006), but is
distinctive in that parental inbred lines are
resequenced or array genotyped and this information is
coupled with low-cost genotyping of their
segregating progenies. The approach is conceptually
equivalent to the human quantitative transmission
disequilibrium test (QTDT) (Abecasis et al. 2000)
combined with imputation of genotypes of relatives
(Burdick et al. 2006). For the special case where the
mapping population consists of multiple families of
segregating progeny, usually Recombinant Inbred
Lines (RILs), derived from inbred lines crossed to a
single reference inbred line, the method has been
called Nested Association Mapping (NAM) (Yu et al.
2008; Nordborg and Weigel 2008).
For purposes of mapping functional markers in
NAM populations, parental genotypes at a large
number of SNP loci need to be projected to their
segregating progeny. For example, approximately
0.5 million SNPs have been genotyped in the 26
parental lines of the publicly available maize NAM
population whereas only 1,106 SNP loci have been
genotyped in both the parents and their 5,000
progeny. The challenge is to estimate both the
genotype and genetic location of the parental
genotypes in the segregating progeny. Three approaches
might be considered (Yi and Shriner 2007): (1)
estimate all missing genotypes by their expected
values conditional on observed flanking markers
(Haley and Knott 1992), (2) consider genotypes as
unknowns to be predicted using an MCMC update
procedure, and (3) multiple sampling of genotypes
from a conditional probability distribution for each
unknown locus (Sen and Churchill 2001). Given the
large number of SNP loci and large number of
families and progeny in NAM populations, the latter
two approaches could be computationally
challenging, depending upon the quality of the physical map.
The first approach, however, may be accurate while
computationally feasible.
Herein, we report on: (1) development of a method
for imputing genotypes using an expectation
approach, and (2) illustrate its use by applying it to
the maize NAM population. In human family based
association mapping (Burdick et al. 2006) parental
SNPs are projected onto progeny in intervals with no
recombinants. Herein, the method is extended to
intervals with known recombination events.
Data and methods
The following data sets were obtained from public
information resources: (1) genotypes of 5,000 RILs
representing 25 segregating families of the maize
NAM mapping population (McMullen et al. 2009).
These data are represented as NAM_SNP_genos_
raw_20080703 at http://www.panzea.org/. (2) A
composite linkage map created by McMullen et al.
(2009) using the maize NAM genotypic data (http://
www.panzea.org/). (3) The maize Accessioned Gold
Path (AGP v1) (Wei et al. 2009), consisting of 10
chromosome pseudo-assemblies guided by the
physical map, was obtained from the Arizona Genomics
Institute (http://www2.genome.arizona.edu/genomes/
maize). (4) the maize HapMap for the 26 founder
lines of the maize NAM population. These data
comprise nearly half a million SNP genotypes, and
can be obtained from http://www.maizegenetics.net/
maize-hap-map. Note that the maize HapMap data
are continuing to be updated with new releases, so the
version utilized herein will likely be outdated before
publication of this manuscript.
Estimation of linkage map positions
In order to detect the associations between genotypes
and complex quantitative traits, it is necessary to
know the linkage map positions of the polymorphic
loci and to trace inheritance of these using flanking
markers. The linkage map positions are unknown for
the majority of the 0.5 million SNPs which are
genotyped in the parental lines maize NAM families.
Their linkage map positions were assigned throug (...truncated)