Likelihood inferences in animal breeding under selection: a missing-data theory view point (pdf)

Article PDF cannot be displayed. You can download it here:

http://www.gsejournal.org/content/pdf/1297-9686-21-4-399.pdf

Likelihood inferences in animal breeding under selection: a missing-data theory view point

R.L. F 0 ernando D. Gianola 0 University of Illinois 1 Institut National de la Recherche Agrono!nique, laboratoire de bioraetrie , BP 27, 3i326 Castanet-Tolosan , France of an open discussion within these columns. The Editorial Board here introduces a new kind of scientific report in the Journal, whereby a current field of research and debate is given emphasis, being the subject one of his latest contributions. As a first essay, we propose a discussion about a difficult and somehow trouble some question in applied animal genetics: how to take proper account of the observed data being selected data? Several attempts have been carried out in the past 15 years, without any clear and unanimous solution. In the following, Im, Fernando and Gianola propose a general approach that should make it possible to deal with every problem. In addition to the interest of an original article, we hope that their own discussion and response to the comments given by Henderson and Thompson will provide the reader with a sound insight into this complex topic. - on dveloppe les mthodes dinfrence fondes sur les vraiserrebdances, en explicitant dans leur calcul le processus, d la slection, qui induit les donnes manquantes. On discute les conditions dans lesquelles on peut ignorer la slection, et donc considrer seulement la vraisemblance des donnes e,!ective!rcent recueillies. gntique animale - slection - donnes manquantes - vraisemblance Data available in animal breeding often come from populations undergoing selection. Several authors have considered methods for the proper treatment of data subject to selection in animal breeding. Examples are Henderson et al. (1959), Curnow (1961), Thompson (1973), Henderson (1975), Rothshild et al. (1979), Goffinet (1983), Meyer and Thompson (1984), Fernando and Gianola (1989), and Schaeffer (1987). Data subject to selection can be viewed as data with missing values, selection being the process that causes missing data. The statistical literature discusses missing data that arise intentionally. Rubin (1976) has given a mathematically precise treatment which encompasses frequentist approaches that are not based on likelihoods as well as inferences from likelihoods (including maximum likelihood and Bayesien approaches). Whether it is appropriate to ignore the process that causes the missing data depends on the method of inference and on the process that causes the missing values. Rubin (1976) suggested that in many practical problems, inferences based on likelihoods are less sensitive than sampling distribution inferences to the process that causes data. Goffinet (1987) gave alternative conditions to those of Rubin (1976) for ignoring the process that causes md-iasstinag when making sampling distribution inferences, with an application to animal breeding. The objective of this paper is to consider inferences based on likelihoods derived from statistical models for the data and the missing-data process, in analysis of data from populations undergoing selection. As in Little and Rubin (1987), we consider inferences based on likelihoods, in the sense described above, because of their flexibility and avoidance of ad-hoc methods. Assumptions underlying the resulting methods can be displayed and evaluated, and large sample estimates of variances based on second derivatives of the log-likelihood taking into account the missing data process, can be obtained. MODELING THE MISSING-DATA PROCESS Ideas described by Little and Rubin (1987) are employed in subsequent developments. Let y, the realized value of a random vector Y, denote the data that would occur in the absence of missing values, or complete data. The vector y is partitioned into observed values, oy,bs and missing values, .yi. Let be the probability density function of the joint distribution of Y = o(bsY; Y!i!), and 0 be an unknown parameter vector. We define for each component of Y an indicator variable, Ri (with realized value )rt, taking the value 1 if the component is observed and 0 if it is missing. In order to illustrate the notation, 3 types of missing data are described in table 1. Consider 2 correlated traits measured on n unrelated individuals; for example, first and second lactation yields of n cows. The complete data are y = (y2!), where iyj is the realized value of trait j in individuali (j = 1,2;i = 1... n). Suppose that selection acts on the first trait (case (a) in Table I). As a result, a subset of y, oy,bs becomes available for analysis. The pattern of the available data is a random variable. For example, if the better of two cows (n = 2) is selected to have a second lactation, the complete data would be Thus, in analysis of selected data, the pattern of records available for analysis, characterized by the value of r, should be considered as part of the data. If this is not done, there will be a loss of information. To treat R = (i)R as a random variable, we need to specify the conditional probability that R = r, f (rly, 41), given the complete data Y = y; the vector 41 The likelihood ignoring the missing-data process, or marginal density of oybs in the absence of selection, is obtained by integrating out the missing data myis from (equ.(l)) - --The problem with using of[(b0ys) as a basis for inferences is that it does not take into account the selection process. The information about R, a random variable whose value r is also observed, is ignored. The actual likelihood is The question now arises as to when inferences on 0 should be based on the joint likelihood (equ.(4)), and when can it based on equ.(3), which ignores the missing data process. Rubin (1976) has studied conditions under which inferences from equ.(3) are equivalent to those obtained from equ.(4). If these hold, one can say that the missing data process can be ignored. The conditions given by Rubin (1976) are: 1) the missing data are missing at random, ie, /(r!yobs,ymis) 4*) = /(r!yobs) l4) for all 4o and Ysmi evaluated at the observed values r and ogyb; and 2) the parameters 0 and + are distinct, in the sense that the joint parameter space of (0, ,) is the product of the parameter space of 8 and the parameter space of !. Within the contexte of Bayesian inference, the missing data process is ignorable when 1) the missing data are missing at random, and 2) the prior density of 0 and, is the product of the marginal prior density of 0 and the marginal prior density of ,. IGNORABLE OR NON-IGNORABLE SELECTION Without loss of generality, we examine ignorability of selection when making likelihood inferences about 0 for each of the three examples given in Table I. Suppose individuals 1, 2 ... m (< n) are selected. Selection based on observations on the first trait, which are a part of the observed data and all the data used to make selection decisions are available. The likelihood for the observed data, ignoring selection, is Because selection is based on the observed data only, the conditional probability .f (r!Y! !) - f (...truncated)