Missing observations in the analysis of stability
Received 3 June 1993
Heredity72 (1994) 141—145
Genetical Society of Great Britain
Missing observations in the analysis of
stability
HANS-PETER PIEPHO
University of Kassel, Faculty of Agriculture, Steinstrasse 19, 37213 Witzenhausen, Germany
In crop variety testing it is frequently of interest to estimate measures of yielding stability or phenotypic stability. The common procedures for stability analysis require a balanced two-way table of
genotypes and environments, in which all cells are filled. Frequently, however, empirical data sets
are unbalanced due to missing observations. This paper explores methods to estimate stability when
some cells are empty. A set of wheat data is used to exemplify these methods.
Keywords: Genotype—environment interaction, method of moments, MINQUE, reliability,
stability variance.
Introduction
Yielding stability as a selection trait in plant breeding
programmes and evaluation trials is constantly gaining
importance over yielding ability. Various statistical
year as new genotypes become available and older
ones become obsolete. The purpose of this paper is to
explore ways to estimate stability in cases where the
data are unbalanced. We will confine our attention to
Shukla's stability variance.
concepts and measures of stability have been proposed
(see reviews by Lin et al., 1986; Westcott, 1986;
Becker & Leon, 1988). Most of them are based on a
two-way table of genotypes and environments, where
'environments' may be different locations or different
years or both, depending on the scope of the analysis.
Estimation methods
Usually such data exhibit genotype—environment inter-
y=i+ a+/31+(a/3)+ e17(i= 1,...,K;j= 1,...,N),
action, which makes selection of high yielding genotypes a difficult task. A genotype interacting strongly
with environments may outperform most genotypes in
some environments while being at a disadvantage in
other environments. The larger the genotype—environment interaction of a genotype the less stable, i.e. the
less predictable, is its performance in different environments. If one adheres to this concept of stability it is
desirable to minimize genotype—environment interaction. Common measures for this type of stability are
the ecovalence (Wricke, 1965), the stability variance
(Shukla, 1972) and the non-parametric measures
suggested by Nassar & Hühn (1987) and Hühn &
Nassar (1989). For these measures to be computable it
is necessary that all cells in the data set be filled, i.e. that
each genotype be grown in the same set of environments. Rather frequently, however, one is faced with
unbalanced data. If data sets from several locations are
combined, some genotypes may not have been tested in
all sites. Similarly, if yield tests of different years are
accumulated, genotypes are typically tested in many
but not all of the years. Genotypes change from year to
For the statistical analysis it is common to assume the
following model:
where y,,, = mean yield of genotype i in environment i,
grand mean (fixed), a1 = effect of genotype i (fixed),
f3 = effect of environment j (random), (a/3 ) = interaction of genotype i with environment j (random) and
eq = mean error of genotype i in environment j (ran-
dom).
It is assumed that random effects /3, (a/3), and ey
are independently normally distributed with variances
var[fi1] = a, var[( a/3 )] = a'2, and var[e] = u, respectively (Shukla, 1972). The assumption of homogeneous
error variances is reasonable if the test design is the
same for all environments. In accordance with the concept of stability used in this paper, genotype—environ-
ment interaction variance is allowed to differ among
genotypes. Maximum stability of a genotype is attained
if the interaction variance
0. The larger a2, the
less stable is the corresponding genotype.
In the model for means y11 we cannot distinguish
interaction from error. It is solely possible to estimate
rq=(afi)q+ e, and hence to estimate the variance
= var( r,) of genotype i. a is the stability variance
142 H-P. PIEPHO
where a is a K dimensional vector of a, V is a
introduced by Shukla(1972). We see that
a2 = a'2 +
a.
So if the assumption of homogeneous error variance is
correct, the genotype rank order given by the stability
K(K— 1)/2 dimensional vector of V_r'5, and Q is a
K(K— 1)/2 x K matrix with elements 0 and 1, that
picks the appropriate o's. Q'Q has full rank and can
thus be inverted. The solution of eqn 2 is
variance a will exactly equal that given by the interaction variance a2. The most stable genotype, i.e. the
genotype with the smallest interaction variance a2, will
(3)
then also have the smallest stability variance a, the
genotype with the second smallest a2 will have the
Grubbs' estimates are unbiased, which is seen by taking
expectations on both sides of eqn 3 and inserting eqn 2
on the right hand side:
second smallest a, and so on.
Shukia (1972) gave the following estimate of the
E[2iJ=E[Q'Q)'Q'V]=E[Q'Q)'Q'Qa]=E[Ia]=a.
stability variance:
K
>ws
-2
s= 1
KW
(K—2)(N—1) (K—1)(K—2)(N—1)'
where
W=j(Yq.j. —,)3.1+9..)2with
= y/N, 1y/K, and = yq/KN.
=
This estimate can be shown to be identical to Grubbs'
estimate of the variance in errors of measurement
(Piepho, 1992). For the first genotype Grubbs' estimate
is given by:
(Rao, 1970). MINQUE stands for MInimum Norm
Quadratic Unbiased Estimation or Estimator,
depending on context. It is noted that for balanced data
Shukla's estimator is a MINQUE of u (Shukia, 1972).
=(K- i)1[r2Vi2r(K 2)1l<s<r
Rao (1970) provides a computational procedure for
MINQUE in the general case, which can be used in
where
V2s— r
The method of moments may also be employed when
some data are missing. For two genotypes s and r we
can compute V r as long as they are grown together in
at least two environments. If this were the case we will
say that the two genotypes s and r are connected, To
obtain a unique solution of eqn 2, we require that there
be at least K connected pairs of genotypes as we need
at least as many equations as there are unknowns. Also,
each genotype must be connected to at least one other
genotype. Thus, the method may break down in some
instances when very many data are missing.
In the unbalanced case, another estimate of a can
be obtained by the MINQUE principle of estimation
data sets with empty cells.
We write the linear model for the two-way classification in the form
1)
with Xcrj= Ysj yr1 (Grubbs, 1948).
Estimates for the other genotypes are obtained by an
obvious rotation of subscripts. Grubbs estimate is
based on the method of moments, in which sample
moments are equated to population moments (Jaech,
1985). We have
Y=Xb+r,
(4)
where 1" is the vector of observations y, b is the parameter vector of main effects, is a vector of r-effects
and X is the design matrix. Denote by n the number of
There are K(K— 1)/2 different equations in K
filled cells, by M{mpq} (p,q= 1,...,n) the projection
matrix I— X(X'X)X', by (the vector of squares of (...truncated)