Testing the proportional hazards assumption in case-cohort analysis
Xue et al. BMC Medical Research Methodology 2013, 13:88
http://www.biomedcentral.com/1471-2288/13/88
RESEARCH ARTICLE
Open Access
Testing the proportional hazards assumption in
case-cohort analysis
Xiaonan Xue1*, Xianhong Xie1, Marc Gunter2, Thomas E Rohan1, Sylvia Wassertheil-Smoller1, Gloria YF Ho1,
Dominic Cirillo3, Herbert Yu4 and Howard D Strickler1
Abstract
Background: Case-cohort studies have become common in epidemiological studies of rare disease, with Cox
regression models the principal method used in their analysis. However, no appropriate procedures to assess the
assumption of proportional hazards of case-cohort Cox models have been proposed.
Methods: We extended the correlation test based on Schoenfeld residuals, an approach used to evaluate the
proportionality of hazards in standard Cox models. Specifically, pseudolikelihood functions were used to define
“case-cohort Schoenfeld residuals”, and then the correlation of these residuals with each of three functions of event
time (i.e., the event time itself, rank order, Kaplan-Meier estimates) was determined. The performances of the
proposed tests were examined using simulation studies. We then applied these methods to data from a previously
published case-cohort investigation of the insulin/IGF-axis and colorectal cancer.
Results: Simulation studies showed that each of the three correlation tests accurately detected non-proportionality.
Application of the proposed tests to the example case-cohort investigation dataset showed that the Cox
proportional hazards assumption was not satisfied for certain exposure variables in that study, an issue we
addressed through use of available, alternative analytical approaches.
Conclusions: The proposed correlation tests provide a simple and accurate approach for testing the proportional
hazards assumption of Cox models in case-cohort analysis. Evaluation of the proportional hazards assumption is
essential since its violation raises questions regarding the validity of Cox model results which, if unrecognized,
could result in the publication of erroneous scientific findings.
Keywords: Proportional hazards, Schoenfeld residuals, Case-cohort studies, Cox models
Background
Case-cohort design is an efficient and increasingly popular
method for conducting prospective epidemiological studies
of rare outcomes. Compared with standard longitudinal
cohort studies, case-cohort investigations are typically less
costly, use less resources, and require less time to conduct,
though they entail little loss in statistical power [1-3]. In
case-cohort studies, relevant but costly or difficult to obtain
information is obtained for only a subset of subjects rather
than the entire cohort. Specifically, there are two subject
groups: (i) the subcohort - a random sample of all subjects
in the cohort with no history of the outcome of interest at
* Correspondence:
1
Department of Epidemiology and Population Health, Albert Einstein College
of Medicine, New York, New York, USA
Full list of author information is available at the end of the article
baseline, selected without regard to future outcomes. Thus,
the subcohort may include some individuals who later become cases; and (ii) the case group - all or random sample
of the incident cases of disease, the vast majority of whom
will be from outside the subcohort. Furthermore, because
the subcohort is a representative sample of the entire
cohort without disease at enrollment, it is possible to adopt
case-cohort design to study multiple different types of
disease outcomes (e.g. multiple types of cancer) using
the same subcohort. For example, we present below a
recent prospective study of fasting serum insulin levels
and the risk of three cancer case groups which involved
a single subcohort [4-6].
Case-cohort studies are typically analyzed using Cox
proportional hazards (PH) models [7]. Specifically, estimation of the Cox proportional hazards model in a
© 2013 Xue et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Xue et al. BMC Medical Research Methodology 2013, 13:88
http://www.biomedcentral.com/1471-2288/13/88
case-cohort analysis is obtained by approximating each
instantaneous risk set of the entire cohort included in
the partial likelihood function of a standard Cox model
by a case-cohort risk set. Several approaches to define
a case-cohort risk set have been proposed [1,2,8]. Prentice
[1] defined the case-cohort risk set as the following: at the
instantaneous moment of an event the case-cohort risk
set includes the subject who had the event plus all the
subjects in the subcohort who remained in the study but
did not have the event at least until that exact time. The
Cox-type likelihood function that is conditioned on the
case-cohort risk sets is referred to as the pseudolikelihood
function [1]. Statistical inferences in case-cohort analyses
are then determined based on maximization of this
pseudolikelihood function. As the Prentice approach
involves an exact pseudolikelihood function and, in large
samples the two other well-established approaches [2,8]
provide similar results to the Prentice method, this paper
focuses exclusively on the latter. Appropriate methods to
conduct these analyses are now available in standard
software such as SAS and R, which has helped to reduce
computational obstacles to the adoption of case-cohort
design, and has been a major factor in the growing use
of this cost-effective design.
One of the key assumptions of the Cox model is the
proportional hazards function assumption. Specifically,
the model assumes that each covariate has a multiplicative effect in the hazards function that is constant over
time. The PH assumption is often of substantial importance. For example, in a randomized controlled trial, we
may wish to know whether one treatment is superior to
another uniformly over time or only in the short term.
Similarly, in observational studies, it is often important
to determine whether a factor is associated with a constantly higher or lower risk of the outcome over time.
For example, Bellera et al. [9] showed that the prognostic
relevance of tumor grade for breast cancer metastases
diminished over time and negative hormone receptor status
was associated with an increased risk of metastases early
but became protective thereafter.
Many approaches for assessing the PH assumption are
available for standard cohort studies, including both
graphical methods and statistical tests [10-19]. Graphical
approaches are a visual form of screening for nonproportionality which can provide insight into the temporality and the extent of non-proportionality that is
otherwise difficult to obtain using statistical methods.
Conversely, graphical methods involve a moderate d (...truncated)