The Limitations due to Exposure Detection Limits for Regression Models
Enrique F. Schisterman
0
1
2
Albert Vexler
0
1
2
Brian W. Whitcomb
0
1
2
Aiyi Liu
0
1
2
0
National Institute of Child Health and Human Development
,
6100 Executive Boulevard, Room 7B03, Rockville, MD 20852 (
1
Abbreviations: dl
,
detection limit; ND, ''nondetects.''
2
From the Division of Epidemiology, Statistics, and Prevention Research, National Institute of Child Health and Human Development, National Institutes of Health
,
Rockville, MD
Biomarker use in exposure assessment is increasingly common, and consideration of related issues is of growing importance. Exposure quantification may be compromised when measurement is subject to a lower threshold. Statistical modeling of such data requires a decision regarding the handling of such readings. Various authors have considered this problem. In the context of linear regression analysis, Richardson and Ciampi (Am J Epidemiol 2003;157:355-63) proposed replacement of data below a threshold by a constant equal to the expectation for such data to yield unbiased estimates. Use of such an imputation has some limitations; distributional assumptions are required, and bias reduction in estimation of regression parameters is asymptotic, thereby presenting concerns about small studies. In this paper, the authors propose distribution-free methods for managing values below detection limits and evaluate the biases that may result when exposure measurement is constrained by a lower threshold. The authors utilize an analytical approach and a simulation study to assess the effects of the proposed replacement method on estimates. These results may inform decisions regarding analytical plans for future studies and provide a possible explanation for some amount of the discordance seen in extant literature. bias (epidemiology); censored data; epidemiology, molecular; limit of detection; regression analysis
-
The growing use of biomarkers in exposure assessment
suggests the need to address issues related to their
measurement. Even when levels are sufficient for measurement,
some random exposure measurement error is expected, in
part related to instrument precision. However, in many cases
a proportion of study participants have levels at or below
some experimentally determined detection limit (dl).
Investigators are often interested in the risk of negative health
outcomes associated with such levels. For example, studies
of serum organochlorine levels, lipophilic xenobiotics, and
breast cancer have determined that up to 99 percent of study
participants have levels below the dl for some toxicants
under study (1).
Biomarker quantification may be compromised if
instrumentation cannot detect low levels. This may occur, for
example, in quantitation of immunoassays (e.g.,
enzymelinked immunosorbent assays) that require antigen
concentrations sufficient for binding by antibodies. Highly specific
binding conditions may impair antibody sensitivity and
thereby challenge quantitation of low levels (2).
Alternatively, assays may detect low biomarker levels but suffer
from insufficient specificity, and measurement of exposure
is hampered by background. The detection limit is often
determined as a function of observed variance for a series
of blanks; the terms limit of detection and limit of
quantification generally correspond to three and 10,
respectively, standard deviations from serial measurement of
blanks (3). As such, numerical data are observable above
and below the dl; even among values above the threshold, it
may not be possible to clearly delineate between those that
are real and those that are not. Data below the threshold
are often reported by laboratories as nondetects, and the
data analyst or epidemiologist is limited to this qualitative
assessment.
Statistical modeling of these data requires decisions
regarding their handling (4, 5). Conventional approaches
include omission, resulting in a truncated data set, and
imputation with a constant, such as the dl or a fraction
thereof (e.g., dl/2, dl/O2); or the observed values may be
used directly or indirectly (47). Many of these imputations
have their origins in well-behaved distributions, such as
normal (in the case of dl/2) and lognormal (in the case of
dl/O2), and will yield correct inferences if these
distributional assumptions are not grossly violated. Lubin et al. (5)
propose a multiple imputation approach to handling
nondetects when the exposure distribution can be assumed.
Richardson and Ciampi (7) developed a coefficient of bias
to linear regression coefficient estimates when exposure is
measured with a detection threshold and random error, and
they proposed replacement of below-threshold data by the
expectation for such data (i.e., E[xjx < dl]) to yield unbiased
estimates. Application of this theory to practice also
requires investigators to assume an exposure distribution
function. In contrast to these approaches, there has been
comparatively little attention toward implicitly and
explicitly nonparametric approaches to measurement with
a threshold.
In this paper, the authors propose distribution-free
methods for managing values below the dl and evaluate biases
that result when exposure measurement is constrained by a
lower threshold. Results from an analytical approach and
those of a simulation study assessing the proposed
replacement method are described. The proposed method allows
investigators to relax the assumptions (e.g., distributional,
asymptotic) necessary for use of other approaches. These
results may inform decisions by investigators regarding
appropriate analytical plans for future studies and provide
a possible explanation for the discordance seen in current
literature.
STATEMENT OF THE PROBLEM AND ANALYTICAL
SOLUTION
Let the observed continuous outcome, Y, satisfy the
following linear regression model:
Yi a bxi ei;
with exposure variable xi, random noise ei, and regression
parameters a and b. However, x is not observed. A lower
threshold, dl, interferes with measurement of low exposure
levels. In a simple case, we observe z, which equals either x
or nondetects (ND), according to the following:
for all x > dl; z x
for all x
Alternately, when the explanatory variable is less than dl,
there is quantitative random noise, n, rather than the
qualitative response, ND.
In this setting, the observations are fY1, . . ., Yn, z1, . . .,
zng. Without loss of generality, Yi and zi can be assumed to
be scalars. This model can be considered in a more general
context, where the exposure is measured with error, g. Thus,
the linear regression model is
Yi a bzi ei;
zi xi gi Ifxi gi
dlg ni Ifxi gi < dlg;
is the exposure with measurement error, Ifdg is an indicator
function (1 if fdg is true and 0 otherwise), and ei, gi, and
ni are independent random disturbance terms related to
regression error, measurement error, and detection limit error
with fe(u), fg(u), and2 fn(u) densities, respectively, and
Eei 0; varei re :
The accuracy of regression parameter estimates depe (...truncated)