Inference about multiplicative heteroskedastic components of variance in a mixed linear Gaussian model with an application to beef cattle breeding (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.gsejournal.org/content/pdf/1297-9686-25-1-3.pdf

Inference about multiplicative heteroskedastic components of variance in a mixed linear Gaussian model with an application to beef cattle breeding

Original article Inference about multiplicative heteroskedastic components of variance in a mixed linear Gaussian model with an application to beef cattle breeding M San Cristobal JL Foulley E Manfredi 1 INRA, Station de Genetique Quantitative et Appliquée, 78352 Jouy-en-Josas Cedex; 2 INRA, Station d’Amelioration G6n6tique des Animaux, BP, 27, 31326 Castanet-Tolosan Cedex, France (Received 28 April 1992 ; accepted 23 September 1992) Summary - A statistical method for identifying meaningful sources of heterogeneity of residual and genetic variances in mixed linear Gaussian models is presented. The method is based on a structural linear model for log variances. Inference about dispersion parameters is based on the marginal likelihood after integrating out location parameters. A likelihood ratio test using the marginal likelihood is also proposed to test for hypotheses about sources of variation involved. A Bayesian extension of the estimation procedure of the dispersion parameters is presented which consists of determining the mode of their marginal posterior distribution using log inverted chi-square or Gaussian distributions as priors. Procedures presented in the paper are illustrated with the analysis of muscle development scores at weaning of 8575 progeny of 142 sires in the Maine-Anjou breed. In this analysis, heteroskedasticity is found, both for the sire and residual components of variance. heteroskedasticity / mixed linear model / Bayesian technique R.ésumé - Inférence sur une hétérogénéité multiplicative des composantes de la variance dans un modèle linéaire mixte gaussien: application à la sélection des bovins à viande. Une méthode statistique est présentée, capable d’identifier les sources significatives d’hétérogénéité de variances résiduelles et génétiques dans un modèle linéaire mixte gaussien. La méthode est fondée sur un modèle structurel de décomposition du logarithme des variances. L’inférence concernant les paramètres de dispersion est basée sur la vraisemblance marginale obtenue après intégration des paramètres de position. Un * Correspondence and reprints ** Adresse actuelle: Laboratoire de génétique cellulaire, BP 27, 31326 Castanet Tolosan Cedex test du rapport des vraisemblances utilisant la vraisemblance marginale est aussi proposé afin de tester des hypothèses sur différentes sources de variation. Une extension bayésienne de la procédure d’estimation des paramètres de dispersion est présentée; elle consiste en la maximisation de leur distribution marginale a posteriori, pour des distributions a priori log x inverse ou gaussienne. Les procédures présentées dans ce papier sont illustrées par 2 l’analyse de notes de pointages sur le développement musculaire au sevrage de8 575 jeunes veaux de race Maine-Anjou, issus de 142 pères. Dans cette analyse, une hétéroscédasticité a été trouvée sur les composantes père et résiduelle de la variance. hétéroscédasticité / modèles linéaires mixtes / techniques bayésiennes INTRODUCTION One of the main concerns of quantitative geneticists lies in evaluation of individuals for selection. The statistical framework to achieve that is nowadays the mixed linear model (Searle, 1971), usually under the assumptions of normality and homogeneity of variances. The estimation of the location parameters is performed with BLUEBLUP (Best Linear Unbiased Estimation-Prediction), leading to the well-known Mixed Model Equations (MME) of Henderson (1973), and REML (acronym for REstricted -or REsidual- Maximum Likelihood) turns out to be the method of choice for estimating variance components (Patterson and Thompson, 1971): However, heterogeneous variances are often encountered in practice, eg for milk yield in cattle (Hill et al, 1983; Meinert et al, 1988; Dong and Mao, 1990; Visscher et al, 1991; Weigel, 1992) for meat traits in swine (Tholen, 1990) and for growth performance in beef cattle (Garrick et al, 1989). This heterogeneity of variances, also called heteroskedasticity (McCullogh, 1985), can be due to many factors, eg management level, genotype x environment interactions, segregating major genes, preferential treatments (Visscher et al, 1991). Ignoring heterogeneity of variance may reduce the reliability of ranking and selection procedures although, in cattle for instance, dam evaluation is likely to be more affected than sire evaluation (Hill, 1984; Vinson, 1987; Winkelman and Schaeffer, 1988). To overcome this problem, 3 main alternatives are possible. First, a transformation of data can be performed in order to match the usual assumption of homogeneity of variance. A log transformation was proposed by several authors in quantitative genetics (see eg Everett and Keown, 1984; De Veer and Van Vleck, 1987; Short et al, 1990, for milk production traits in cattle). However, while genetic variances tend to stabilize, residual variances of log-transformed records are larger in herds with the lowest production level (De Veer and Van Vleck, 1987; Boldman and Freeman, 1990; Visscher et al, 1991). The second alternative is to develop robust methods which are insensitive to ’ moderate heteroskedasticity (Brown, 1982). The last choice is to take heteroskedasticity into account. Factors (eg region, herd, year, parity, sex) to adjust for heterogeneous variances can be identified. But such a stratification generates a very large number of cells (800 000 levels of herd x year in the French Holstein file) with obvious problems of estimability. Hence, it is logical to handle unequal variances in the same way as unequal means, ie via a modelling (or structural) approach so as to reduce the parameter space, by appropriate identification and testing of meaningful sources of variation of such variances. The model for the variance components is described in the Model section. Model fitting and estimation of parameters based on marginal likelihood procedures are presented in the Estimation of Parameters, followed by a test statistic in Hypothesis Testing. A Bayesian alternative to maximum marginal likelihood estimation is presented in A Bayesian Approach to a Mixed Model Structure In the Numerical application section, data on French beef cattle are analyzed to illustrate the procedures given in the paper. Finally, some comments on the methodology are made in the Discussion and Conclusion. MODEL Following Foulley et al (1990, 1992) and Gianola et al (1992), the population is assumed to be stratified into I subpopulations, or strata (indexed by i 1, 2, ... , I) with an (n , sampled from a normal distribution having mean i i x 1) data vector y = i ii and variance R. i = i Given ii and R a2i . ei I&dquo; i Following Henderson (1973), the vector II i is decomposed according to a linear mixed model structure: where X i and Z;are (n i x p) and (n ii x q incidence matrices, corresponding to fixed ) J3 (p x 1 ) and random i u (q x 1 ) effects respectively. Fixed effects can be factors or covariates, but it is assumed in the fol (...truncated)