Qxpak: a versatile mixed model application for genetical genomics and QTL analyses
M. Prez-Enciso
1
2
I. Misztal
0
0
Department of Animal and Dairy Science, University of Georgia
,
Athens, GA 30602, USA
1
Departament de Cincia Animal i del Aliments
, Facultat de Veterinria,
Universitat Autnoma de Barcelona
, 08193 Bellaterra,
Spain
2
Institut Catal de Reerca i Estudis Avanats
, Pg Lluis Companys 23,
08010 Barcelona, Spain
Motivation: Current methodology and software for quantitative trait loci (QTL) analyses do not use all available information and are inadequate to deal with the huge amount of QTL analyses to be needed in forecoming genetical genomics' studies. Results: We show that a mixed model statistical framework provides a very flexible tool for QTL modeling in a variety of populations, be it a cross between inbred lines, a within population study, or experiments involving a mixture of populations or crosses. The software allows multitrait and multiQTL analyses, inclusion of infinitesimal genetic value and a batch multitrait option suitable for genetical genomics studies. It also allows massive association studies between single nucleotide polymorphisms and the trait(s) of interest. Availability: A software (Qxpak), together with a manual and example files, is freely available for research purposes. So far, the compiled program is available for linux systems, the windows version will follow soon. See http://www.icrea.es/pag.asp?id=Miguel.Perez Contact:
-
INTRODUCTION
The mapping of quantitative trait loci (QTL) is now feasible
due to the vast amount of DNA polymorphisms that is being
uncovered in all species of interest. Traditionally,
quantitative trait loci analyses have been carried out in well-designed
experiments, like crosses between inbred lines or within
family designs (Liu, 1998). Specific software program is used in
each design. For instance, QTL cartographer by Z.B. Zeng and
coworkers (http://statgen.ncsu.edu/qtlcart/index.php) allows
only data from crosses between inbred lines. Another
popular software, QTL express (Seaton et al., 2002), has different
modules, each appropriate for specific designs, e.g. within
family analysis or crosses between inbred lines, but not
both. Generally, the software available is limited in modeling
flexibility, e.g. multitrait models are not usually
implemented or it is not possible to include an infinitesimal genetic
effect. Furthermore, there is currently no public software
that allows analysis of crosses between outbred lines, i.e.
when there is genetic variation between as well within the
line. Often, the number of QTL fitted or the number of
chromosomes analyzed is limited in available programs.
Similarly, the specificities of sex chromosomes are not dealt
with. For a recent review on QTL analysis challenges and
weblinks containing software available, see Abiola et al.
(2003).
In addition, the recent advent of microarray technology
has spurred the massive search of polymorphisms affecting
the amount of mRNA level in the cell (Brem et al., 2002;
Schadt et al., 2003) in what has been called genetical
genomics (Jansen and Nap, 2001). This poses new challenges
both in terms of computing requirements and in modeling
strategies.
Using different approaches for different designs is not only
cumbersome but also it is not efficient and theoretically
unsatisfactory. It may result in less power and less insight into the
genetic architecture of the trait. Here, we present a
coherent methodology for QTL analyses that is also suitable for
genetical genomics studies. The method is based on the
mixed model theory, which provides a flexible and elegant
modeling tool.
The Qxpak package presented implements multitrait,
multiQTL options, and can be applied to populations of any
complexity, using all marker and pedigree information jointly.
Different models per trait can be fitted and missing data
is allowed for automatically. QTL effects can be modeled
as fixed, random or mixed. Sex chromosome-linked QTL
can also be analyzed, as dosage compensation and different
chromosome lengths can be accommodated.
Suppose two breeds, A and B, with genetic effects (g)
normally distributed as gA N ( A, A2 ) and gB N ( B , B2 ),
respectively. Now, assume that a quantitative trait has been
recorded in a population with an arbitrary pedigree
complexity, where individuals can be purebred from either A or B
populations, F1, F2 or any other combination (e.g.
recombinant inbred lines, backcross, advanced intercross and so on). A
general explicative model is
k=0
y = Xb +
where y is a vector containing the recorded performances,
b contains the fixed effects to be estimated, gk contains the
genetic (QTL) effects for any of the Nq QTL affecting the
trait. By convention, we take g0 to stand for the infinitesimal
genetic effects, i.e. the genetic effects not accounted for by
individual QTL. Finally, X and Z are incidence matrices that
relate observations to the parameters in the b and g vectors,
and e is the residuals vector. Typically, Z is a diagonal matrix
with elements 1 at position (i, i) if i-th individual has a record,
0 otherwise. If there are several traits or repeated measures for
the same individual and trait, Z is block diagonal.
The model in Equation (1) is termed mixed because it
contains fixed effects, such as sex or age, and random effects,
such as the genetic effects, g. Statistical theory for mixed
models is well developed (McCulloch and Searle, 2000) and
theory dictates that we also have to specify the distribution
of the random variables, i.e. their means and variances (see
Appendix). In the case of the QTL effects, gk, the expected
value of the i-th indivual at k-th locus is
E(gik) = P (gi1k A, gi2k A) AAk
+ P (gi1k B, gi2k B) BBk
+ [P (gi1k A, gi2k B)
+ P (gi1k B, gi2k A)] ABk.
Here P (gi1k U , gi2k W ) is the probability that alleles from
k-th QTL at paternal and maternal haplotypes are of breed U
and W origins, W Zk is the mean genetic effect of individuals
having received a U and W origin alleles at locus k. The
variance of gk is a matrix, Gk, that contains the covariance
between the i-th and j -th genetic values at k-th locus. The
covariance between i-th and j -th genetic values is
1 2
Cov(gik, gjk) = 2
h=1 h =1
h=1 h =1
P (gihk gihk | gik A)A2k
h
P (gihk gkhi | gik B)B2k,
h
where P (gihk gihk | gik U ) is the probability of alleles gihk
h
h
and gjk being identical by descent (IBD) and from origin
U , superscript h stands for the paternal or maternal phases,
numbered 1 or 2, respectively, and U2 k is the variance of
genetic effects of U origin at locus k.
In order to compute the likelihood and carry out standard
statistical tests, it suffices to compute quantities (2) and (3)
at any desired genome positions for all individuals and plug
them into the likelihood function. It is important to notice that
exactly the same computing strategy is followed irrespective
of the pedigree complexity, number of QTL or traits. For
instance, a cross between inbred lines can be modeled setting
all element (...truncated)