Qxpak: a versatile mixed model application for genetical genomics and QTL analyses (pdf)

Article PDF cannot be displayed. You can download it here:

https://bioinformatics.oxfordjournals.org/content/20/16/2792.full.pdf

Qxpak: a versatile mixed model application for genetical genomics and QTL analyses

M. Prez-Enciso 1 2 I. Misztal 0 0 Department of Animal and Dairy Science, University of Georgia , Athens, GA 30602, USA 1 Departament de Cincia Animal i del Aliments , Facultat de Veterinria, Universitat Autnoma de Barcelona , 08193 Bellaterra, Spain 2 Institut Catal de Reerca i Estudis Avanats , Pg Lluis Companys 23, 08010 Barcelona, Spain Motivation: Current methodology and software for quantitative trait loci (QTL) analyses do not use all available information and are inadequate to deal with the huge amount of QTL analyses to be needed in forecoming genetical genomics' studies. Results: We show that a mixed model statistical framework provides a very flexible tool for QTL modeling in a variety of populations, be it a cross between inbred lines, a within population study, or experiments involving a mixture of populations or crosses. The software allows multitrait and multiQTL analyses, inclusion of infinitesimal genetic value and a batch multitrait option suitable for genetical genomics studies. It also allows massive association studies between single nucleotide polymorphisms and the trait(s) of interest. Availability: A software (Qxpak), together with a manual and example files, is freely available for research purposes. So far, the compiled program is available for linux systems, the windows version will follow soon. See http://www.icrea.es/pag.asp?id=Miguel.Perez Contact: - INTRODUCTION The mapping of quantitative trait loci (QTL) is now feasible due to the vast amount of DNA polymorphisms that is being uncovered in all species of interest. Traditionally, quantitative trait loci analyses have been carried out in well-designed experiments, like crosses between inbred lines or within family designs (Liu, 1998). Specific software program is used in each design. For instance, QTL cartographer by Z.B. Zeng and coworkers (http://statgen.ncsu.edu/qtlcart/index.php) allows only data from crosses between inbred lines. Another popular software, QTL express (Seaton et al., 2002), has different modules, each appropriate for specific designs, e.g. within family analysis or crosses between inbred lines, but not both. Generally, the software available is limited in modeling flexibility, e.g. multitrait models are not usually implemented or it is not possible to include an infinitesimal genetic effect. Furthermore, there is currently no public software that allows analysis of crosses between outbred lines, i.e. when there is genetic variation between as well within the line. Often, the number of QTL fitted or the number of chromosomes analyzed is limited in available programs. Similarly, the specificities of sex chromosomes are not dealt with. For a recent review on QTL analysis challenges and weblinks containing software available, see Abiola et al. (2003). In addition, the recent advent of microarray technology has spurred the massive search of polymorphisms affecting the amount of mRNA level in the cell (Brem et al., 2002; Schadt et al., 2003) in what has been called genetical genomics (Jansen and Nap, 2001). This poses new challenges both in terms of computing requirements and in modeling strategies. Using different approaches for different designs is not only cumbersome but also it is not efficient and theoretically unsatisfactory. It may result in less power and less insight into the genetic architecture of the trait. Here, we present a coherent methodology for QTL analyses that is also suitable for genetical genomics studies. The method is based on the mixed model theory, which provides a flexible and elegant modeling tool. The Qxpak package presented implements multitrait, multiQTL options, and can be applied to populations of any complexity, using all marker and pedigree information jointly. Different models per trait can be fitted and missing data is allowed for automatically. QTL effects can be modeled as fixed, random or mixed. Sex chromosome-linked QTL can also be analyzed, as dosage compensation and different chromosome lengths can be accommodated. Suppose two breeds, A and B, with genetic effects (g) normally distributed as gA N ( A, A2 ) and gB N ( B , B2 ), respectively. Now, assume that a quantitative trait has been recorded in a population with an arbitrary pedigree complexity, where individuals can be purebred from either A or B populations, F1, F2 or any other combination (e.g. recombinant inbred lines, backcross, advanced intercross and so on). A general explicative model is k=0 y = Xb + where y is a vector containing the recorded performances, b contains the fixed effects to be estimated, gk contains the genetic (QTL) effects for any of the Nq QTL affecting the trait. By convention, we take g0 to stand for the infinitesimal genetic effects, i.e. the genetic effects not accounted for by individual QTL. Finally, X and Z are incidence matrices that relate observations to the parameters in the b and g vectors, and e is the residuals vector. Typically, Z is a diagonal matrix with elements 1 at position (i, i) if i-th individual has a record, 0 otherwise. If there are several traits or repeated measures for the same individual and trait, Z is block diagonal. The model in Equation (1) is termed mixed because it contains fixed effects, such as sex or age, and random effects, such as the genetic effects, g. Statistical theory for mixed models is well developed (McCulloch and Searle, 2000) and theory dictates that we also have to specify the distribution of the random variables, i.e. their means and variances (see Appendix). In the case of the QTL effects, gk, the expected value of the i-th indivual at k-th locus is E(gik) = P (gi1k A, gi2k A) AAk + P (gi1k B, gi2k B) BBk + [P (gi1k A, gi2k B) + P (gi1k B, gi2k A)] ABk. Here P (gi1k U , gi2k W ) is the probability that alleles from k-th QTL at paternal and maternal haplotypes are of breed U and W origins, W Zk is the mean genetic effect of individuals having received a U and W origin alleles at locus k. The variance of gk is a matrix, Gk, that contains the covariance between the i-th and j -th genetic values at k-th locus. The covariance between i-th and j -th genetic values is 1 2 Cov(gik, gjk) = 2 h=1 h =1 h=1 h =1 P (gihk gihk | gik A)A2k h P (gihk gkhi | gik B)B2k, h where P (gihk gihk | gik U ) is the probability of alleles gihk h h and gjk being identical by descent (IBD) and from origin U , superscript h stands for the paternal or maternal phases, numbered 1 or 2, respectively, and U2 k is the variance of genetic effects of U origin at locus k. In order to compute the likelihood and carry out standard statistical tests, it suffices to compute quantities (2) and (3) at any desired genome positions for all individuals and plug them into the likelihood function. It is important to notice that exactly the same computing strategy is followed irrespective of the pedigree complexity, number of QTL or traits. For instance, a cross between inbred lines can be modeled setting all element (...truncated)