Inference about multiplicative heteroskedastic components of variance in a mixed linear Gaussian model with an application to beef cattle breeding
Original article
Inference about multiplicative
heteroskedastic components
of variance in a mixed linear
Gaussian model with an application
to beef cattle breeding
M San Cristobal
JL Foulley
E Manfredi
1
INRA, Station de Genetique Quantitative et Appliquée,
78352 Jouy-en-Josas Cedex;
2
INRA, Station d’Amelioration G6n6tique des Animaux, BP, 27,
31326 Castanet-Tolosan Cedex, France
(Received 28 April 1992 ; accepted 23 September 1992)
Summary - A statistical method for identifying meaningful sources of heterogeneity of
residual and genetic variances in mixed linear Gaussian models is presented. The method is
based on a structural linear model for log variances. Inference about dispersion parameters
is based on the marginal likelihood after integrating out location parameters. A likelihood
ratio test using the marginal likelihood is also proposed to test for hypotheses about sources
of variation involved. A Bayesian extension of the estimation procedure of the dispersion
parameters is presented which consists of determining the mode of their marginal posterior
distribution using log inverted chi-square or Gaussian distributions as priors. Procedures
presented in the paper are illustrated with the analysis of muscle development scores
at weaning of 8575 progeny of 142 sires in the Maine-Anjou breed. In this analysis,
heteroskedasticity is found, both for the sire and residual components of variance.
heteroskedasticity / mixed linear model / Bayesian technique
R.ésumé - Inférence sur une hétérogénéité multiplicative des composantes de la
variance dans un modèle linéaire mixte gaussien: application à la sélection des
bovins à viande. Une méthode statistique est présentée, capable d’identifier les sources
significatives d’hétérogénéité de variances résiduelles et génétiques dans un modèle linéaire
mixte gaussien. La méthode est fondée sur un modèle structurel de décomposition du
logarithme des variances. L’inférence concernant les paramètres de dispersion est basée
sur la vraisemblance marginale obtenue après intégration des paramètres de position. Un
*
Correspondence and reprints
**
Adresse actuelle: Laboratoire de génétique cellulaire, BP 27, 31326 Castanet Tolosan
Cedex
test du rapport des vraisemblances utilisant la vraisemblance marginale est aussi proposé
afin de tester des hypothèses sur différentes sources de variation. Une extension bayésienne
de la procédure d’estimation des paramètres de dispersion est présentée; elle consiste en
la maximisation de leur distribution marginale a posteriori, pour des distributions a priori
log x inverse ou gaussienne. Les procédures présentées dans ce papier sont illustrées par
2
l’analyse de notes de pointages sur le développement musculaire au sevrage de8 575 jeunes
veaux de race Maine-Anjou, issus de 142 pères. Dans cette analyse, une hétéroscédasticité
a été trouvée sur les composantes père et résiduelle de la variance.
hétéroscédasticité / modèles linéaires mixtes / techniques bayésiennes
INTRODUCTION
One of the main concerns of quantitative geneticists lies in evaluation of individuals
for selection. The statistical framework to achieve that is nowadays the mixed linear
model (Searle, 1971), usually under the assumptions of normality and homogeneity
of variances. The estimation of the location parameters is performed with BLUEBLUP (Best Linear Unbiased Estimation-Prediction), leading to the well-known
Mixed Model Equations (MME) of Henderson (1973), and REML (acronym for
REstricted -or REsidual- Maximum Likelihood) turns out to be the method of
choice for estimating variance components (Patterson and Thompson, 1971):
However, heterogeneous variances are often encountered in practice, eg for milk
yield in cattle (Hill et al, 1983; Meinert et al, 1988; Dong and Mao, 1990; Visscher
et al, 1991; Weigel, 1992) for meat traits in swine (Tholen, 1990) and for growth
performance in beef cattle (Garrick et al, 1989). This heterogeneity of variances,
also called heteroskedasticity (McCullogh, 1985), can be due to many factors, eg
management level, genotype x environment interactions, segregating major genes,
preferential treatments (Visscher et al, 1991).
Ignoring heterogeneity of variance may reduce the reliability of ranking and
selection procedures although, in cattle for instance, dam evaluation is likely to
be more affected than sire evaluation (Hill, 1984; Vinson, 1987; Winkelman and
Schaeffer, 1988).
To overcome this problem, 3 main alternatives are possible. First, a transformation of data can be performed in order to match the usual assumption of homogeneity of variance. A log transformation was proposed by several authors in
quantitative genetics (see eg Everett and Keown, 1984; De Veer and Van Vleck,
1987; Short et al, 1990, for milk production traits in cattle). However, while genetic variances tend to stabilize, residual variances of log-transformed records are
larger in herds with the lowest production level (De Veer and Van Vleck, 1987;
Boldman and Freeman, 1990; Visscher et al, 1991).
The second alternative is to develop robust methods which are insensitive to
’
moderate heteroskedasticity (Brown, 1982).
The last choice is to take heteroskedasticity into account. Factors (eg region,
herd, year, parity, sex) to adjust for heterogeneous variances can be identified. But
such a stratification generates a very large number of cells (800 000 levels of herd
x year in the French Holstein file) with obvious problems of estimability. Hence,
it is logical to handle unequal variances in the same way as unequal means, ie
via a modelling (or structural) approach so as to reduce the parameter space, by
appropriate identification and testing of meaningful sources of variation of such
variances.
The model for the variance components is described in the Model section. Model
fitting and estimation of parameters based on marginal likelihood procedures are
presented in the Estimation of Parameters, followed by a test statistic in Hypothesis
Testing. A Bayesian alternative to maximum marginal likelihood estimation is
presented in A Bayesian Approach to a Mixed Model Structure In the Numerical
application section, data on French beef cattle are analyzed to illustrate the
procedures given in the paper. Finally, some comments on the methodology are
made in the Discussion and Conclusion.
MODEL
Following Foulley et al (1990, 1992) and Gianola et al (1992), the population is
assumed to be stratified into I subpopulations, or strata (indexed by i
1, 2, ... , I)
with an (n
, sampled from a normal distribution having mean
i
i x 1) data vector y
=
i
ii and variance R.
i
=
i
Given ii and R
a2i
.
ei I&dquo; i
Following Henderson (1973), the vector II
i is decomposed according to a linear
mixed model structure:
where X
i and Z;are (n
i x p) and (n
ii
x q incidence matrices, corresponding to fixed
)
J3 (p x 1 ) and random i
u
(q x 1 ) effects respectively. Fixed effects can be factors or
covariates, but it is assumed in the fol (...truncated)