Secondary Analysis under Cohort Sampling Designs Using Conditional Likelihood (pdf)

Article PDF cannot be displayed. You can download it here:

http://downloads.hindawi.com/journals/jps/2012/931416.pdf

Secondary Analysis under Cohort Sampling Designs Using Conditional Likelihood

Hindawi Publishing Corporation Journal of Probability and Statistics Volume 2012, Article ID 931416, 37 pages doi:10.1155/2012/931416 Research Article Secondary Analysis under Cohort Sampling Designs Using Conditional Likelihood Olli Saarela,1 Sangita Kulathinal,2, 3 and Juha Karvanen4, 5 1 Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, QC, Canada H3A 1A2 2 Indic Society for Education and Development (INSEED), Nashik, Maharashtra 422 011, India 3 Department of Vaccines, National Institute for Health and Welfare, 00271 Helsinki, Finland 4 Department of Mathematics and Statistics, University of Tampere, 33014 Tampere, Finland 5 Department of Mathematics and Statistics, University of Helsinki, 00014 Helsinki, Finland Correspondence should be addressed to Olli Saarela, Received 28 July 2011; Revised 29 December 2011; Accepted 24 January 2012 Academic Editor: Kari Auranen Copyright q 2012 Olli Saarela et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Under cohort sampling designs, additional covariate data are collected on cases of a specific type and a randomly selected subset of noncases, primarily for the purpose of studying associations with a time-to-event response of interest. With such data available, an interest may arise to reuse them for studying associations between the additional covariate data and a secondary non-timeto-event response variable, usually collected for the whole study cohort at the outset of the study. Following earlier literature, we refer to such a situation as secondary analysis. We outline a general conditional likelihood approach for secondary analysis under cohort sampling designs and discuss the specific situations of case-cohort and nested case-control designs. We also review alternative methods based on full likelihood and inverse probability weighting. We compare the alternative methods for secondary analysis in two simulated settings and apply them in a real-data example. 1. Introduction Cohort sampling designs are two-phase epidemiological study designs where information on time-to-event outcomes of interest over a followup period and some basic covariate data are collected on the whole first-phase study group, referred to as a cohort, and in the second phase, more expensive or diﬃcult-to-obtain additional covariate data are collected only on a subset of the study cohort. This usually comprises the cases, that is, individuals with a disease event of interest during the followup, and a randomly selected subset of noncases. Examples are the case-cohort 1–3 and nested case-control 4, 5 designs. Primarily, such designs are applied for the purpose of studying associations between the time-to-event 2 Journal of Probability and Statistics outcomes and the covariates collected in the second phase. However, with such data having been collected, an interest frequently arises to reuse it for studying associations between the second-phase covariates and the other available covariate data. For instance, the covariates collected in the second phase could be genotypes, while the other covariates may be various phenotype measurements carried out at the outset of the followup period for the whole cohort. The interest would then be to explain a phenotypic response with the genetic covariates. Following Jiang et al. 6 and Lin and Zeng 7, we refer to such a situation as secondary analysis. Here, we concentrate specifically on non-time-to-event secondary outcomes. Analysis of secondary time-to-event outcomes under the nested case-control design has been considered previously by Saarela et al. 8 and Salim et al. 9. As our motivating example, we consider here a single cohort which was used in a larger meta-analysis of association between the European lactase persistence genotype and body mass index BMI 10, the latter being a secondary outcome in the cohort study in question. The cohort consists of 5073 men aged 55–77 years from southern and western Finland, who originally formed the placebo group of the ATBC cancer prevention study 11. Whole blood samples of the participants were taken between 1992 and 1993, which is here considered as the baseline of the cohort, with followup for cardiovascular disease events and all-cause mortality available until the end of year 1999. There is no loss to followup, so the only censoring present is of type I due to end of the followup period. This cohort is a part of MORGAM project, an international pooling of cardiovascular cohorts 12. Genotype data including the lactase persistence SNP rs4988235 under this project have been collected under a case-cohort design described in detail by 13 and herein in Section 4.3.1. Given such data, our aim is to estimate the association between the lactase persistence genotype and BMI making use of genotype data collected on both the random subcohort and cases of all-cause mortality. Secondary analysis of case-control data has been studied previously, using profile likelihood 14, inverse selection probability weighting methods 15–17, or retrospective likelihood 6, 7. However, to the best of our knowledge, a systematic discussion on secondary analysis under cohort sampling designs has been lacking, which we will aim to rectify here by discussing alternative approaches for such an analysis under a generic two-phase study design. We will briefly review the full likelihood approach which utilizes all observed data Section 2, as well as pseudolikelihoods based on inverse selection probability weighting Section 3. For these approaches, we propose a conditional likelihood-based alternative Section 4, restricted to the fully observed second-phase study group. Conditional likelihood inference under cohort sampling designs has been studied previously for the analysis of the primary time-to-event outcome by Langholz and Goldstein 18 and Saarela and Kulathinal 19; here, we extend these methods to the secondary analysis setting. The main interest is in continuous secondary outcomes, though the approach would also be valid for categorical responses. As special cases of the general setting, we consider case-cohort and nested casecontrol designs. As extensions to the basic setting, we consider treatment of missing secondphase covariate data and adjustment for left truncation in the case of incident time-to-event outcomes Section 5. In Section 6, we present two simulation studies, first comparing the eﬃciencies of the alternative approaches and then demonstrating the potential adverse eﬀects of small sampling fraction in full likelihood inference. We also carry out the analysis in the real-data example using all three alternative methods. As the model for the continuous secondary response variable, in addition to the customary normal distr (...truncated)