Causal analysis of case-control data
Epidemiologic Perspectives &
Innovations
BioMed Central
Methodology
Open Access
Causal analysis of case-control data
Stephen C Newman*
Address: Department of Psychiatry, Mackenzie Health Sciences Centre, University of Alberta, Edmonton, Alberta, T6G 2B7, Canada
Email: Stephen C Newman* -
* Corresponding author
Published: 27 January 2006
Epidemiologic Perspectives & Innovations 2006, 3:2
doi:10.1186/1742-5573-3-2
Received: 20 July 2005
Accepted: 27 January 2006
This article is available from: http://www.epi-perspectives.com/content/3/1/2
© 2006 Newman; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
In a series of papers, Robins and colleagues describe inverse probability of treatment weighted
(IPTW) estimation in marginal structural models (MSMs), a method of causal analysis of longitudinal
data based on counterfactual principles. This family of statistical techniques is similar in concept to
weighting of survey data, except that the weights are estimated using study data rather than defined
so as to reflect sampling design and post-stratification to an external population. Several decades
ago Miettinen described an elementary method of causal analysis of case-control data based on
indirect standardization. In this paper we extend the Miettinen approach using ideas closely related
to IPTW estimation in MSMs. The technique is illustrated using data from a case-control study of
oral contraceptives and myocardial infarction.
Introduction
In a series of papers, Robins and colleagues describe
inverse probability of treatment weighted (IPTW) estimation in marginal structural models (MSMs) [1-7], a
method of causal analysis of longitudinal data based on
counterfactual principles. This family of statistical techniques is similar in concept to weighting of survey data,
except that weights are estimated using study data rather
than defined so as to reflect sampling design and poststratification to an external population. Several decades
ago Miettinen [8] described an elementary method of
causal analysis of case-control data based on indirect
standardization. In this paper we extend the Miettinen
approach using ideas closely related to IPTW estimation
in MSMs. For simplicity we ignore random error until the
illustrative example.
Population-based incidence case-control study
Consider a population-based case-control study having
an incidence design, that is, one in which only incident
cases are eligible for recruitment. Let E be a dichotomous
variable (0: absent, 1: present) representing the exposure
of interest, and let F be a polychotomous variable (i = 0,1,
..., I), which we later treat as a confounder. At any time
point we may think of the population as being comprised
of exposed and unexposed (sub)populations. Suppose
that recruitment of cases and controls takes place over a
period of T years. We assume that during the period of
recruitment the exposed and unexposed populations are
stationary (i.e., independent of time) with respect to population size and incidence rate (of disease) in each of the
strata of F [9]. Provided that T is not too large, say no more
than two or three years, this assumption is likely to be
approximately satisfied in practice.
Let N1i be the number of people in the ith stratum of the
exposed population who are free of disease (at any time
during the period of recruitment), and let N0i be the corresponding number in the ith stratum of the unexposed
population. Let N1 = ∑ i N1i and N0 = ∑ i N0i . There-
Page 1 of 6
(page number not for citation purposes)
Epidemiologic Perspectives & Innovations 2006, 3:2
Table 1: Number of cases and controls in ith stratum of F under
simple random sampling
E
1
0
Case
a1i = γR1iN1iT
a0i = γR0iN0iT
Control
b1i = λN1i
b0i = λN0i
fore at any time during the period of recruitment, there are
N1 exposed and N0 unexposed people in the population
"at risk" of disease, hence eligible to be controls. Since the
population is stationary, we may assume that controls are
selected at the end of the period of recruitment. This
avoids the inconvenience of having a control selected
early in the study become a case later on. In practice, controls are usually sampled throughout the period of recruitment, with one or more controls enrolled as each case
enters the study. The case triggering this activity and the
associated controls can be thought of as a matched set,
where the matching variable is "time." This method of
subject recruitment is a type of risk set sampling and, in
theory, should be followed by a conditional statistical
analysis [10]. Generally, matching on time is ignored in
the analysis of case-control data, which in practical terms
is not that different from making the stationary population assumption.
Let R1i and R0i be the incidence rates (of disease) in the ith
stratum of the exposed and unexposed populations,
respectively. The crude incidence rates are
R1 =
∑ i R1i N1i
N1
(1)
and
R0 =
http://www.epi-perspectives.com/content/3/1/2
and
SMRT =
∑ i R1i ( N1i + N0i ) .
∑ i R0i ( N1i + N0i )
We now view the population as an open (dynamic)
cohort that is followed over the period of recruitment,
with onset of disease as the endpoint of interest [12].
Entry into the cohort occurs, for example, as a result of
birth and in-migration, and censoring takes place when,
for instance, there is out-migration and death from a cause
other than the disease of interest.
Simple random sampling
Assume that cases and controls are sampled using simple
random sampling. Let γ and λ be the sampling probabilities for cases and controls, respectively; that is, γ is the proportion of eligible cases enrolled in the study during the
period of recruitment, and λ is the corresponding proportion of controls. We assume that these are also the sampling probabilities within each of the strata of E × F, the
cross-classification of E and F. It follows from the stationary population assumption that over the period of recruitment the number of person-years experienced by
individuals in the ith stratum who are exposed and at risk
of disease is N1iT. The corresponding number of (incident) cases is R1iN1iT, with a1i = γR1iN1iT of them recruited
into the study. Likewise, the number of cases recruited
into the study among individuals in the ith stratum who
are unexposed and at risk of disease is a0i = γR0iN0iT. In
view of remarks made above, b1i = λN1i exposed and b0i =
λN0i unexposed controls will be recruited into the study
from the ith stratum. Table 1 summarizes these observations.
It follows from Table 1 that
∑ i R0i N0i .
N0
The impact of exposure can be measured using the standardized morbidity ratio, which has different forms
depending on the choice of standard population (...truncated)