The Limitations due to Exposure Detection Limits for Regression Models (pdf)

Article PDF cannot be displayed. You can download it here:

https://aje.oxfordjournals.org/content/163/4/374.full.pdf

The Limitations due to Exposure Detection Limits for Regression Models

Enrique F. Schisterman 0 1 2 Albert Vexler 0 1 2 Brian W. Whitcomb 0 1 2 Aiyi Liu 0 1 2 0 National Institute of Child Health and Human Development , 6100 Executive Boulevard, Room 7B03, Rockville, MD 20852 ( 1 Abbreviations: dl , detection limit; ND, ''nondetects.'' 2 From the Division of Epidemiology, Statistics, and Prevention Research, National Institute of Child Health and Human Development, National Institutes of Health , Rockville, MD Biomarker use in exposure assessment is increasingly common, and consideration of related issues is of growing importance. Exposure quantification may be compromised when measurement is subject to a lower threshold. Statistical modeling of such data requires a decision regarding the handling of such readings. Various authors have considered this problem. In the context of linear regression analysis, Richardson and Ciampi (Am J Epidemiol 2003;157:355-63) proposed replacement of data below a threshold by a constant equal to the expectation for such data to yield unbiased estimates. Use of such an imputation has some limitations; distributional assumptions are required, and bias reduction in estimation of regression parameters is asymptotic, thereby presenting concerns about small studies. In this paper, the authors propose distribution-free methods for managing values below detection limits and evaluate the biases that may result when exposure measurement is constrained by a lower threshold. The authors utilize an analytical approach and a simulation study to assess the effects of the proposed replacement method on estimates. These results may inform decisions regarding analytical plans for future studies and provide a possible explanation for some amount of the discordance seen in extant literature. bias (epidemiology); censored data; epidemiology, molecular; limit of detection; regression analysis - The growing use of biomarkers in exposure assessment suggests the need to address issues related to their measurement. Even when levels are sufficient for measurement, some random exposure measurement error is expected, in part related to instrument precision. However, in many cases a proportion of study participants have levels at or below some experimentally determined detection limit (dl). Investigators are often interested in the risk of negative health outcomes associated with such levels. For example, studies of serum organochlorine levels, lipophilic xenobiotics, and breast cancer have determined that up to 99 percent of study participants have levels below the dl for some toxicants under study (1). Biomarker quantification may be compromised if instrumentation cannot detect low levels. This may occur, for example, in quantitation of immunoassays (e.g., enzymelinked immunosorbent assays) that require antigen concentrations sufficient for binding by antibodies. Highly specific binding conditions may impair antibody sensitivity and thereby challenge quantitation of low levels (2). Alternatively, assays may detect low biomarker levels but suffer from insufficient specificity, and measurement of exposure is hampered by background. The detection limit is often determined as a function of observed variance for a series of blanks; the terms limit of detection and limit of quantification generally correspond to three and 10, respectively, standard deviations from serial measurement of blanks (3). As such, numerical data are observable above and below the dl; even among values above the threshold, it may not be possible to clearly delineate between those that are real and those that are not. Data below the threshold are often reported by laboratories as nondetects, and the data analyst or epidemiologist is limited to this qualitative assessment. Statistical modeling of these data requires decisions regarding their handling (4, 5). Conventional approaches include omission, resulting in a truncated data set, and imputation with a constant, such as the dl or a fraction thereof (e.g., dl/2, dl/O2); or the observed values may be used directly or indirectly (47). Many of these imputations have their origins in well-behaved distributions, such as normal (in the case of dl/2) and lognormal (in the case of dl/O2), and will yield correct inferences if these distributional assumptions are not grossly violated. Lubin et al. (5) propose a multiple imputation approach to handling nondetects when the exposure distribution can be assumed. Richardson and Ciampi (7) developed a coefficient of bias to linear regression coefficient estimates when exposure is measured with a detection threshold and random error, and they proposed replacement of below-threshold data by the expectation for such data (i.e., E[xjx < dl]) to yield unbiased estimates. Application of this theory to practice also requires investigators to assume an exposure distribution function. In contrast to these approaches, there has been comparatively little attention toward implicitly and explicitly nonparametric approaches to measurement with a threshold. In this paper, the authors propose distribution-free methods for managing values below the dl and evaluate biases that result when exposure measurement is constrained by a lower threshold. Results from an analytical approach and those of a simulation study assessing the proposed replacement method are described. The proposed method allows investigators to relax the assumptions (e.g., distributional, asymptotic) necessary for use of other approaches. These results may inform decisions by investigators regarding appropriate analytical plans for future studies and provide a possible explanation for the discordance seen in current literature. STATEMENT OF THE PROBLEM AND ANALYTICAL SOLUTION Let the observed continuous outcome, Y, satisfy the following linear regression model: Yi a bxi ei; with exposure variable xi, random noise ei, and regression parameters a and b. However, x is not observed. A lower threshold, dl, interferes with measurement of low exposure levels. In a simple case, we observe z, which equals either x or nondetects (ND), according to the following: for all x > dl; z x for all x Alternately, when the explanatory variable is less than dl, there is quantitative random noise, n, rather than the qualitative response, ND. In this setting, the observations are fY1, . . ., Yn, z1, . . ., zng. Without loss of generality, Yi and zi can be assumed to be scalars. This model can be considered in a more general context, where the exposure is measured with error, g. Thus, the linear regression model is Yi a bzi ei; zi xi gi Ifxi gi dlg ni Ifxi gi < dlg; is the exposure with measurement error, Ifdg is an indicator function (1 if fdg is true and 0 otherwise), and ei, gi, and ni are independent random disturbance terms related to regression error, measurement error, and detection limit error with fe(u), fg(u), and2 fn(u) densities, respectively, and Eei 0; varei re : The accuracy of regression parameter estimates depe (...truncated)