Causal analysis of case-control data (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1186%2F1742-5573-3-2.pdf

Causal analysis of case-control data

Epidemiologic Perspectives & Innovations BioMed Central Methodology Open Access Causal analysis of case-control data Stephen C Newman* Address: Department of Psychiatry, Mackenzie Health Sciences Centre, University of Alberta, Edmonton, Alberta, T6G 2B7, Canada Email: Stephen C Newman* - * Corresponding author Published: 27 January 2006 Epidemiologic Perspectives & Innovations 2006, 3:2 doi:10.1186/1742-5573-3-2 Received: 20 July 2005 Accepted: 27 January 2006 This article is available from: http://www.epi-perspectives.com/content/3/1/2 © 2006 Newman; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract In a series of papers, Robins and colleagues describe inverse probability of treatment weighted (IPTW) estimation in marginal structural models (MSMs), a method of causal analysis of longitudinal data based on counterfactual principles. This family of statistical techniques is similar in concept to weighting of survey data, except that the weights are estimated using study data rather than defined so as to reflect sampling design and post-stratification to an external population. Several decades ago Miettinen described an elementary method of causal analysis of case-control data based on indirect standardization. In this paper we extend the Miettinen approach using ideas closely related to IPTW estimation in MSMs. The technique is illustrated using data from a case-control study of oral contraceptives and myocardial infarction. Introduction In a series of papers, Robins and colleagues describe inverse probability of treatment weighted (IPTW) estimation in marginal structural models (MSMs) [1-7], a method of causal analysis of longitudinal data based on counterfactual principles. This family of statistical techniques is similar in concept to weighting of survey data, except that weights are estimated using study data rather than defined so as to reflect sampling design and poststratification to an external population. Several decades ago Miettinen [8] described an elementary method of causal analysis of case-control data based on indirect standardization. In this paper we extend the Miettinen approach using ideas closely related to IPTW estimation in MSMs. For simplicity we ignore random error until the illustrative example. Population-based incidence case-control study Consider a population-based case-control study having an incidence design, that is, one in which only incident cases are eligible for recruitment. Let E be a dichotomous variable (0: absent, 1: present) representing the exposure of interest, and let F be a polychotomous variable (i = 0,1, ..., I), which we later treat as a confounder. At any time point we may think of the population as being comprised of exposed and unexposed (sub)populations. Suppose that recruitment of cases and controls takes place over a period of T years. We assume that during the period of recruitment the exposed and unexposed populations are stationary (i.e., independent of time) with respect to population size and incidence rate (of disease) in each of the strata of F [9]. Provided that T is not too large, say no more than two or three years, this assumption is likely to be approximately satisfied in practice. Let N1i be the number of people in the ith stratum of the exposed population who are free of disease (at any time during the period of recruitment), and let N0i be the corresponding number in the ith stratum of the unexposed population. Let N1 = ∑ i N1i and N0 = ∑ i N0i . There- Page 1 of 6 (page number not for citation purposes) Epidemiologic Perspectives & Innovations 2006, 3:2 Table 1: Number of cases and controls in ith stratum of F under simple random sampling E 1 0 Case a1i = γR1iN1iT a0i = γR0iN0iT Control b1i = λN1i b0i = λN0i fore at any time during the period of recruitment, there are N1 exposed and N0 unexposed people in the population "at risk" of disease, hence eligible to be controls. Since the population is stationary, we may assume that controls are selected at the end of the period of recruitment. This avoids the inconvenience of having a control selected early in the study become a case later on. In practice, controls are usually sampled throughout the period of recruitment, with one or more controls enrolled as each case enters the study. The case triggering this activity and the associated controls can be thought of as a matched set, where the matching variable is "time." This method of subject recruitment is a type of risk set sampling and, in theory, should be followed by a conditional statistical analysis [10]. Generally, matching on time is ignored in the analysis of case-control data, which in practical terms is not that different from making the stationary population assumption. Let R1i and R0i be the incidence rates (of disease) in the ith stratum of the exposed and unexposed populations, respectively. The crude incidence rates are R1 = ∑ i R1i N1i N1 (1) and R0 = http://www.epi-perspectives.com/content/3/1/2 and SMRT = ∑ i R1i ( N1i + N0i ) . ∑ i R0i ( N1i + N0i ) We now view the population as an open (dynamic) cohort that is followed over the period of recruitment, with onset of disease as the endpoint of interest [12]. Entry into the cohort occurs, for example, as a result of birth and in-migration, and censoring takes place when, for instance, there is out-migration and death from a cause other than the disease of interest. Simple random sampling Assume that cases and controls are sampled using simple random sampling. Let γ and λ be the sampling probabilities for cases and controls, respectively; that is, γ is the proportion of eligible cases enrolled in the study during the period of recruitment, and λ is the corresponding proportion of controls. We assume that these are also the sampling probabilities within each of the strata of E × F, the cross-classification of E and F. It follows from the stationary population assumption that over the period of recruitment the number of person-years experienced by individuals in the ith stratum who are exposed and at risk of disease is N1iT. The corresponding number of (incident) cases is R1iN1iT, with a1i = γR1iN1iT of them recruited into the study. Likewise, the number of cases recruited into the study among individuals in the ith stratum who are unexposed and at risk of disease is a0i = γR0iN0iT. In view of remarks made above, b1i = λN1i exposed and b0i = λN0i unexposed controls will be recruited into the study from the ith stratum. Table 1 summarizes these observations. It follows from Table 1 that ∑ i R0i N0i . N0 The impact of exposure can be measured using the standardized morbidity ratio, which has different forms depending on the choice of standard population (...truncated)