Missing observations in the analysis of stability (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/hdy199420.pdf

Missing observations in the analysis of stability

Received 3 June 1993 Heredity72 (1994) 141—145 Genetical Society of Great Britain Missing observations in the analysis of stability HANS-PETER PIEPHO University of Kassel, Faculty of Agriculture, Steinstrasse 19, 37213 Witzenhausen, Germany In crop variety testing it is frequently of interest to estimate measures of yielding stability or phenotypic stability. The common procedures for stability analysis require a balanced two-way table of genotypes and environments, in which all cells are filled. Frequently, however, empirical data sets are unbalanced due to missing observations. This paper explores methods to estimate stability when some cells are empty. A set of wheat data is used to exemplify these methods. Keywords: Genotype—environment interaction, method of moments, MINQUE, reliability, stability variance. Introduction Yielding stability as a selection trait in plant breeding programmes and evaluation trials is constantly gaining importance over yielding ability. Various statistical year as new genotypes become available and older ones become obsolete. The purpose of this paper is to explore ways to estimate stability in cases where the data are unbalanced. We will confine our attention to Shukla's stability variance. concepts and measures of stability have been proposed (see reviews by Lin et al., 1986; Westcott, 1986; Becker & Leon, 1988). Most of them are based on a two-way table of genotypes and environments, where 'environments' may be different locations or different years or both, depending on the scope of the analysis. Estimation methods Usually such data exhibit genotype—environment inter- y=i+ a+/31+(a/3)+ e17(i= 1,...,K;j= 1,...,N), action, which makes selection of high yielding genotypes a difficult task. A genotype interacting strongly with environments may outperform most genotypes in some environments while being at a disadvantage in other environments. The larger the genotype—environment interaction of a genotype the less stable, i.e. the less predictable, is its performance in different environments. If one adheres to this concept of stability it is desirable to minimize genotype—environment interaction. Common measures for this type of stability are the ecovalence (Wricke, 1965), the stability variance (Shukla, 1972) and the non-parametric measures suggested by Nassar & Hühn (1987) and Hühn & Nassar (1989). For these measures to be computable it is necessary that all cells in the data set be filled, i.e. that each genotype be grown in the same set of environments. Rather frequently, however, one is faced with unbalanced data. If data sets from several locations are combined, some genotypes may not have been tested in all sites. Similarly, if yield tests of different years are accumulated, genotypes are typically tested in many but not all of the years. Genotypes change from year to For the statistical analysis it is common to assume the following model: where y,,, = mean yield of genotype i in environment i, grand mean (fixed), a1 = effect of genotype i (fixed), f3 = effect of environment j (random), (a/3 ) = interaction of genotype i with environment j (random) and eq = mean error of genotype i in environment j (ran- dom). It is assumed that random effects /3, (a/3), and ey are independently normally distributed with variances var[fi1] = a, var[( a/3 )] = a'2, and var[e] = u, respectively (Shukla, 1972). The assumption of homogeneous error variances is reasonable if the test design is the same for all environments. In accordance with the concept of stability used in this paper, genotype—environ- ment interaction variance is allowed to differ among genotypes. Maximum stability of a genotype is attained if the interaction variance 0. The larger a2, the less stable is the corresponding genotype. In the model for means y11 we cannot distinguish interaction from error. It is solely possible to estimate rq=(afi)q+ e, and hence to estimate the variance = var( r,) of genotype i. a is the stability variance 142 H-P. PIEPHO where a is a K dimensional vector of a, V is a introduced by Shukla(1972). We see that a2 = a'2 + a. So if the assumption of homogeneous error variance is correct, the genotype rank order given by the stability K(K— 1)/2 dimensional vector of V_r'5, and Q is a K(K— 1)/2 x K matrix with elements 0 and 1, that picks the appropriate o's. Q'Q has full rank and can thus be inverted. The solution of eqn 2 is variance a will exactly equal that given by the interaction variance a2. The most stable genotype, i.e. the genotype with the smallest interaction variance a2, will (3) then also have the smallest stability variance a, the genotype with the second smallest a2 will have the Grubbs' estimates are unbiased, which is seen by taking expectations on both sides of eqn 3 and inserting eqn 2 on the right hand side: second smallest a, and so on. Shukia (1972) gave the following estimate of the E[2iJ=E[Q'Q)'Q'V]=E[Q'Q)'Q'Qa]=E[Ia]=a. stability variance: K >ws -2 s= 1 KW (K—2)(N—1) (K—1)(K—2)(N—1)' where W=j(Yq.j. —,)3.1+9..)2with = y/N, 1y/K, and = yq/KN. = This estimate can be shown to be identical to Grubbs' estimate of the variance in errors of measurement (Piepho, 1992). For the first genotype Grubbs' estimate is given by: (Rao, 1970). MINQUE stands for MInimum Norm Quadratic Unbiased Estimation or Estimator, depending on context. It is noted that for balanced data Shukla's estimator is a MINQUE of u (Shukia, 1972). =(K- i)1[r2Vi2r(K 2)1l<s<r Rao (1970) provides a computational procedure for MINQUE in the general case, which can be used in where V2s— r The method of moments may also be employed when some data are missing. For two genotypes s and r we can compute V r as long as they are grown together in at least two environments. If this were the case we will say that the two genotypes s and r are connected, To obtain a unique solution of eqn 2, we require that there be at least K connected pairs of genotypes as we need at least as many equations as there are unknowns. Also, each genotype must be connected to at least one other genotype. Thus, the method may break down in some instances when very many data are missing. In the unbalanced case, another estimate of a can be obtained by the MINQUE principle of estimation data sets with empty cells. We write the linear model for the two-way classification in the form 1) with Xcrj= Ysj yr1 (Grubbs, 1948). Estimates for the other genotypes are obtained by an obvious rotation of subscripts. Grubbs estimate is based on the method of moments, in which sample moments are equated to population moments (Jaech, 1985). We have Y=Xb+r, (4) where 1" is the vector of observations y, b is the parameter vector of main effects, is a vector of r-effects and X is the design matrix. Denote by n the number of There are K(K— 1)/2 different equations in K filled cells, by M{mpq} (p,q= 1,...,n) the projection matrix I— X(X'X)X', by (the vector of squares of (...truncated)