Modelling expectation and variance for genotype by environment data
Received 5 August 1996
Heredity 79 (1997) 162—171
Modelling expectation and variance for
genotype by environment data
JEAN-BAPTISTE DENIS*, HANS-PETER PIEPHOI- & FRED A. VAN EEUWIJK
Laboratoire de BiomOtrie, INRA, Route de Saint-Cyr, F-78026 Versailles, France, tFaculty of Agricultural and
Environmental Sciences, University of Kassel, D-37213 Witzenhausen, Germany and DL 0-Center for P/ant Breeding
and Reproduction Research, CPRO-DLO, P0 Box 16, NL-6700AA Wageningen, The Netherlands
An integration of two types of models for the analysis of genotype by environment interaction
is presented. On the one hand, the expectation of G x E interaction is frequently modelled by
regression models; on the other hand, for deviations from these regressions, either separate
stability parameters are defined or extra components of variance are introduced. A class of
mixed models is described that contains facilities for modelling expectation by regression and,
in addition, has extensive possibilities for dealing with heteroscedasticity. Practical aspects of
the use of these mixed models are illustrated on a data set involving sugar yield in beet.
Keywords: covariate, factorial regression, genotype x environment interaction, heteroscedasticity, interaction, mixed model.
Introduction
This paper presents a number of models that can
account for interaction and heteroscedasticity in
genotype by environment tables. These models can
be viewed as generalizations of both the classical
model by Shukia (1972) and the mixed factorial
regression model by Denis & Dhorne (1989). The
models can be used for the analysis of replicated
and unreplicated tables alike, as no estimate for
error is required. Modelling heteroscedasticity is
especially relevant for genotype by environment
interaction (Kang & Gorman, 1989; Kang, 1993),
for this choice is that we are studying a given set of
genotypes and are not interested in testing the
environments themselves; they are considered only
to provide information about the genotypes.
Later some classical models will be described,
after which their common structural features will be
discussed, leading to the delineation of a coherent
family of models for the analysis of genotype by
environment data; some of its more interesting
members are presented. To illustrate the practical
aspects of interpreting model parameters, a set of
sugar beet data is analysed. GENSTAT and SAS source
codes for running some of the presented models are
given in the Appendix.
but similar models may be used to analyse, for
example, repeated measures data accruing in socio-
logical and psychological research (Crowder &
Review of current models
Hand, 1990; Longford, 1993).
For selecting genotypes, a plant breeder uses
Additive model
assessments of the phenotypic value under different
The additive two-way mixed model provides a baseline against which other more elaborate models can
environmental conditions. These assessments are
collected in genotype by environment tables. Inferences follow from adequate statistical models for
these tables, and decisions are made regarding the
be compared. Let 1' be a typical entry for a genotype by environment table, where jE { 1 . . .1 } corresponds to the ith genotype andj e { 1 . . .J } corresponds
selection and rejection of varieties. We will consider
to the jth environment. Y, is taken as the sum of a
(fixed) parameter depending on the genotype (c), a
random parameter depending on the environment
environments to be either locations or years, i.e.
there is no factorial structure in the environments.
Of course, in some cases, the environments comprise
location by year combinations, and it may be worth-
(B1) and an independent residual term (E1):
while exploiting this factorial structure (Piepho,
Y,1
1994a). In this paper, we will take genotypes as fixed
and environments as random. A partial justification
=
This model has an obvious interpretation. Its first
two moments are:
e()'1) =
*Correspondence E-mail:
162
V(}) = 58+ YE,
1997 The Genetical Society of Great Britain.
MODELLING EXPECTATION AND VARIANCE FOR GE DATA 163
Cov(Y1, Y1) = 1B forj =j', 0 otherwise.
The similarity in performance of different genotypes
grown in the same environment is represented by a
constant positive correlation, identical for every pair
of genotypes:
Cor(Y,3, Ye,)
Bradley (1958) and Shukla (1972, 1982). Some
extensions of Shukla's stability variance concept
were given by Piepho (1994a,b,c, 1995). A recent
review may be found in Piepho (1996a).
Scheffé model
= ______
The mixed model proposed by Scheffé (1959, p. 266)
+ Yr
provides a further generalization by allowing any
cr13
Between performances in different environments,
this correlation is zero and this basic assumption will
be true for all models presented in this paper. Thus,
a convenient notation is to introduce Y1, the vector
of the I performances of the genotypes in the jth
environment. Covariances between different Y3 are
null and models can be defined by their expectations
and variances. For the additive model, it turns out
that
E(Y1) = V(Yj,) =
(1)
matrix
with
all
components
equal
where J is the I x I
to 1, I is the identity matrix of size I and is the
vector of
General heteroscedastic model
The additive model may be extended by attributing a
different variance to each genotype. The model
formulation is identical, but the variance structure is
now different; each genotype is considered to have
its own variance, y. Shukia (1982) suggested the
term stability variance for 'i,. Earlier, Wricke (1962)
had proposed the term ecovalence for the contribution of a genotype to the interaction sum of squares,
and this quantity is directly related to y,. Expectation
and variance structures are given by
= V(Y,) = 7BJ+dg (l')
(2)
where dg (v) is the diagonal matrix whose terms are
y, the components of vector v. The interpretation is
straightforward: the variance depends on the geno-
type and the correlation differs among pairs of
genotypes:
cIB
Cor(Y,1, Y11) =
_____________
+ Yi)(B + j)
The more variable a genotype is, the less correlated
it will be with other genotypes. This model is much
more flexible than the additive model (1), as the
number of variance parameters increases from 2 to
1+1.
The above type of model appears to have been
used first by Grubbs (1948) for the analysis of
measurement errors. Subsequently, it has been
reconsidered by several authors, e.g. Russell &
The Genetical Society of Great Britain, Heredity, 79, 162—171.
covariance structure between performances from the
same environment. As a consequence, the B1 term
(environment main effect) becomes redundant and
the model may be written as:
=
In contrast to Scheffé, we cannot include a residual
term, as we are addressing the non-replicated case.
The E components are correlated within environments:
(3)
is a column vector of size I and I' = {y,1'} is any
covariance matrix of size I. The model is very flex-
E( (...truncated)