Modelling expectation and variance for genotype by environment data (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/hdy1997139.pdf

Modelling expectation and variance for genotype by environment data

Received 5 August 1996 Heredity 79 (1997) 162—171 Modelling expectation and variance for genotype by environment data JEAN-BAPTISTE DENIS*, HANS-PETER PIEPHOI- & FRED A. VAN EEUWIJK Laboratoire de BiomOtrie, INRA, Route de Saint-Cyr, F-78026 Versailles, France, tFaculty of Agricultural and Environmental Sciences, University of Kassel, D-37213 Witzenhausen, Germany and DL 0-Center for P/ant Breeding and Reproduction Research, CPRO-DLO, P0 Box 16, NL-6700AA Wageningen, The Netherlands An integration of two types of models for the analysis of genotype by environment interaction is presented. On the one hand, the expectation of G x E interaction is frequently modelled by regression models; on the other hand, for deviations from these regressions, either separate stability parameters are defined or extra components of variance are introduced. A class of mixed models is described that contains facilities for modelling expectation by regression and, in addition, has extensive possibilities for dealing with heteroscedasticity. Practical aspects of the use of these mixed models are illustrated on a data set involving sugar yield in beet. Keywords: covariate, factorial regression, genotype x environment interaction, heteroscedasticity, interaction, mixed model. Introduction This paper presents a number of models that can account for interaction and heteroscedasticity in genotype by environment tables. These models can be viewed as generalizations of both the classical model by Shukia (1972) and the mixed factorial regression model by Denis & Dhorne (1989). The models can be used for the analysis of replicated and unreplicated tables alike, as no estimate for error is required. Modelling heteroscedasticity is especially relevant for genotype by environment interaction (Kang & Gorman, 1989; Kang, 1993), for this choice is that we are studying a given set of genotypes and are not interested in testing the environments themselves; they are considered only to provide information about the genotypes. Later some classical models will be described, after which their common structural features will be discussed, leading to the delineation of a coherent family of models for the analysis of genotype by environment data; some of its more interesting members are presented. To illustrate the practical aspects of interpreting model parameters, a set of sugar beet data is analysed. GENSTAT and SAS source codes for running some of the presented models are given in the Appendix. but similar models may be used to analyse, for example, repeated measures data accruing in socio- logical and psychological research (Crowder & Review of current models Hand, 1990; Longford, 1993). For selecting genotypes, a plant breeder uses Additive model assessments of the phenotypic value under different The additive two-way mixed model provides a baseline against which other more elaborate models can environmental conditions. These assessments are collected in genotype by environment tables. Inferences follow from adequate statistical models for these tables, and decisions are made regarding the be compared. Let 1' be a typical entry for a genotype by environment table, where jE { 1 . . .1 } corresponds to the ith genotype andj e { 1 . . .J } corresponds selection and rejection of varieties. We will consider to the jth environment. Y, is taken as the sum of a (fixed) parameter depending on the genotype (c), a random parameter depending on the environment environments to be either locations or years, i.e. there is no factorial structure in the environments. Of course, in some cases, the environments comprise location by year combinations, and it may be worth- (B1) and an independent residual term (E1): while exploiting this factorial structure (Piepho, Y,1 1994a). In this paper, we will take genotypes as fixed and environments as random. A partial justification = This model has an obvious interpretation. Its first two moments are: e()'1) = *Correspondence E-mail: 162 V(}) = 58+ YE, 1997 The Genetical Society of Great Britain. MODELLING EXPECTATION AND VARIANCE FOR GE DATA 163 Cov(Y1, Y1) = 1B forj =j', 0 otherwise. The similarity in performance of different genotypes grown in the same environment is represented by a constant positive correlation, identical for every pair of genotypes: Cor(Y,3, Ye,) Bradley (1958) and Shukla (1972, 1982). Some extensions of Shukla's stability variance concept were given by Piepho (1994a,b,c, 1995). A recent review may be found in Piepho (1996a). Scheffé model = ______ The mixed model proposed by Scheffé (1959, p. 266) + Yr provides a further generalization by allowing any cr13 Between performances in different environments, this correlation is zero and this basic assumption will be true for all models presented in this paper. Thus, a convenient notation is to introduce Y1, the vector of the I performances of the genotypes in the jth environment. Covariances between different Y3 are null and models can be defined by their expectations and variances. For the additive model, it turns out that E(Y1) = V(Yj,) = (1) matrix with all components equal where J is the I x I to 1, I is the identity matrix of size I and is the vector of General heteroscedastic model The additive model may be extended by attributing a different variance to each genotype. The model formulation is identical, but the variance structure is now different; each genotype is considered to have its own variance, y. Shukia (1982) suggested the term stability variance for 'i,. Earlier, Wricke (1962) had proposed the term ecovalence for the contribution of a genotype to the interaction sum of squares, and this quantity is directly related to y,. Expectation and variance structures are given by = V(Y,) = 7BJ+dg (l') (2) where dg (v) is the diagonal matrix whose terms are y, the components of vector v. The interpretation is straightforward: the variance depends on the geno- type and the correlation differs among pairs of genotypes: cIB Cor(Y,1, Y11) = _____________ + Yi)(B + j) The more variable a genotype is, the less correlated it will be with other genotypes. This model is much more flexible than the additive model (1), as the number of variance parameters increases from 2 to 1+1. The above type of model appears to have been used first by Grubbs (1948) for the analysis of measurement errors. Subsequently, it has been reconsidered by several authors, e.g. Russell & The Genetical Society of Great Britain, Heredity, 79, 162—171. covariance structure between performances from the same environment. As a consequence, the B1 term (environment main effect) becomes redundant and the model may be written as: = In contrast to Scheffé, we cannot include a residual term, as we are addressing the non-replicated case. The E components are correlated within environments: (3) is a column vector of size I and I' = {y,1'} is any covariance matrix of size I. The model is very flex- E( (...truncated)