Bayesian shrinkage mapping of quantitative trait loci in variance component models
BMC Genetics
MBeathyodeosloigay narticslehrinkage mapping of quantitative trait loci in variance component models
Ming Fang 0 1
0 Life Science College, Heilongjiang August First Land Reclamation University , Daqing, 163319 , China
1 Department of Animal Genetics and breeding, College of Animal Science and Technology, China Agricultural University , Beijing 100193 , China
Background: In this article, I propose a model-selection-free method to map multiple quantitative trait loci (QTL) in variance component model, which is useful in outbred populations. The new method can estimate the variance of zero-effect QTL infinitely to zero, but nearly unbiased for non-zero-effect QTL. It is analogous to Xu's Bayesian shrinkage estimation method, but his method is based on allelic substitution model, while the new method is based on the variance component models. Results: Extensive simulation experiments were conducted to investigate the performance of the proposed method. The results showed that the proposed method was efficient in mapping multiple QTL simultaneously, and moreover it was more competitive than the reversible jump MCMC (RJMCMC) method and may even out-perform it. Conclusions: The newly developed Bayesian shrinkage method is very efficient and powerful for mapping multiple QTL in outbred populations.
-
Background
There are two kinds of models which can be used to map
QTL in outbred populations, the allelic substitution
model [1-3] and the variance component model [4-7]. In
the allelic substitution model, the number of QTL alleles
is assumed to be known, and the QTL allelic substitution
is estimated by the given linkage phases of parents, which
can be inferred from genotypes of family members. The
least square [2] and maximum likelihood [1,3,8] of
interval mapping are two popular statistical approaches for
such models. Compared with the allelic substitute model,
the variance component model is more robust because it
can handle an arbitrary number of alleles with arbitrary
modes of gene actions[9]. Moreover, the linkage phase of
parents is unnecessary, which is nice since it is hard to
accurately infer, particularly when family size is small,
such as with human populations. Therefore, the variance
component model is usually used to map QTL in outbred
populations [4-7,9-12]. In the variance component
model, the identity-based-decent (IBD) matrix may be
different for each locus and can provide information to
localize the QTL. The least square method [10,11] and
the maximum likelihood method [4,13] are also two
important statistical methods for handling this model.
Because of the polygenic nature of quantitative traits,
multiple QTL mapping is a problem of model selection.
The least square method and the maximum likelihood
method can nicely handle single QTL model, but is
difficult for them to handle multiple QTL model. Recently,
the Bayesian reversible jump MCMC (RJMCMC) method
has been used to map multiple QTL in the variance
component model [9,12]. However, it still has some
disadvantages. Because the model dimension is variable, it usually
has poor mixing character and is difficult to converge
[14-16]; moreover, it is also difficult to explore all the
model space, especially in genome-wide mapping where
thousands of possible locus are scanned [16].
Therefore, in this article I proposed a
model-dimension-fixed method, in which the estimate of variance is
very precise for nonzero-effect QTL, and gradually
converges to zero for zero-effect QTL. Therefore, special
model selection is needless. It is similar to the recent
Bayesian shrinkage estimation methods [15,17-19], which
are based on the allelic substitution model, whereas my
method is based on the variance component model. The
efficiency of the new method is demonstrated by a series
of simulation experiments.
Method
Genetic model
Suppose that one has a sample of n individuals from
outbred populations. Assuming that QTL dominant effect
and polygenic dominant effect are absent. Then the linear
model can be expressed as
where, y is the n 1 phenotypic vector; is the k 1
vector of covariate effects; k is the number of the
covariate; X is the n k design matrix related to the covariate
QTL effect, for j = 1,2, ..., q, where j is the IBD matrix
and can be inferred by the conditional expectation
approach [20]; and s 2j is the QTL variance; e ~ N (0,
Ins e2 ) is the vector of random error, where In, is the n n
identity matrix and s e2 is the residual variance; q is the
maximum QTL number, which is set beforehand; g ~
N(0, As A2 ) is the n 1 vector of random polygenic effect,
here A is the additive relationship matrix and s A2 is the
polygenic additive variance, the polygenic term g may be
excluded from equation (1) in genome-wide mapping.
The variance component model can be expressed as
Similarly, the term of polygenic variance As A2 should
also be excluded from equation (2) in genome-wide
mapping.
Prior specification and joint posterior distribution
Yi and Xu [9] assigned a uniform prior distribution for
Jeffreys' hyper prior p(s 2j) ~ 1 / s 2j is assumed. The
special prior is the key in the new method and will be
illustrated in detail later. The prior for polygenic variance and
residual variance is assumed to follow scaled inverted
chi-square distribution with degree of freedom and
scaled parameter s2 (see also [21] for detail); and the prior
for covariant effect and QTL position j are assumed to
follow normal distribution, ~ N (0, V0), and uniform
distribution, respectively. The joint posterior distribution
is given in Appendix.
Updating QTL variance by random walk
MetropolisHastings algorithm
Because there is no close form for the posterior
distribualgorithm [22,23] is used to simulate it. I firstly propose a
new QTL variance and then accept it according to its
acceptance probability.
Generating the new proposal QTL variance
I employ the Browne's method [24], a special random
walk Metropolis-Hastings algorithm (RWM-H) to update
posed and sampled from a scaled inverted chi-squared
distribution, conditional on the current value of QTL
equals the expectation of the current value s 2j(t) , i.e.
variance is accepted according to its accept probability.
Since the new generated value closely relies on the old
one, this approach is a special case of RWM-H, and the
degree of freedom is equivalent to the tuning parameter
[21,24].
Calculating the acceptance probability
The new proposal QTL variance is accepted with
probability equal to min (1, r), where,
r =
hr,
and s 2j represents all elements of except s 2j . In
equation (3), the first term is likelihood, the second is
prior and the third is called proposal ratio or Hastings
ratio [23]. Because the proposal distribution is not
symhr =
p(s 2j(t) s 2j())
p(s 2j() s 2j(t))
Inv 2 n ,(n 2)s 2j() /n
Inv 2 n ,(n 2)s 2j(t) /n
MCMC implementations
The implementations of the MCMC algorithm are
summarized as follows:
a. Initialize all parameters from lega (...truncated)