A Bayesian framework for estimating the incremental value of a diagnostic test in the absence of a gold standard
Ling et al. BMC Medical Research Methodology 2014, 14:67
http://www.biomedcentral.com/1471-2288/14/67
TECHNICAL ADVANCE
Open Access
A Bayesian framework for estimating the
incremental value of a diagnostic test in the
absence of a gold standard
Daphne I Ling1, Madhukar Pai1, Ian Schiller2 and Nandini Dendukuri1,2*
Abstract
Background: The absence of a gold standard, i.e., a diagnostic reference standard having perfect sensitivity and
specificity, is a common problem in clinical practice and in diagnostic research studies. There is a need for methods
to estimate the incremental value of a new, imperfect test in this context.
Methods: We use a Bayesian approach to estimate the probability of the unknown disease status via a latent class
model and extend two commonly-used measures of incremental value based on predictive values [difference in
the area under the ROC curve (AUC) and integrated discrimination improvement (IDI)] to the context where no
gold standard exists. The methods are illustrated using simulated data and applied to the problem of estimating
the incremental value of a novel interferon-gamma release assay (IGRA) over the tuberculin skin test (TST) for latent
tuberculosis (TB) screening. We also show how to estimate the incremental value of IGRAs when decisions are
based on observed test results rather than predictive values.
Results: We showed that the incremental value is greatest when both sensitivity and specificity of the new test are
better and that conditional dependence between the tests reduces the incremental value. The incremental value of
the IGRA depends on the sensitivity and specificity of the TST, as well as the prevalence of latent TB, and may thus
vary in different populations.
Conclusions: Even in the absence of a gold standard, incremental value statistics may be estimated and can aid
decisions about the practical value of a new diagnostic test.
Keywords: Area under the curve, Bayesian estimation, Incremental value, Informative priors, Integrated
discrimination improvement, Imperfect diagnostic tests, Latent class models, Tuberculosis
Background
Incremental value of a diagnostic test
The literature on diagnostic test evaluation has centered
on estimation of sensitivity and specificity, measures that
do not directly convey the clinical impact of a given test
[1-3]. The added value of a test will depend on how much
information is already available from the diagnostic workup and whether the test result actually changes clinical
decisions. The development of methods for evaluation
of the incremental value of new tests or biomarkers is
thus an active area of biostatistical research [4].
* Correspondence:
1
Department of Epidemiology and Biostatistics, McGill University, 1020 Pine
Ave West, Montreal H3A 1A2, QC, Canada
2
Division of Clinical Epidemiology, McGill University Health Centre–Research
Institute, 687 Pine Avenue West, Room R4.09, Montreal H3A 1A1, QC, Canada
Evaluation of the incremental value of a new test typically involves comparing prediction models of the outcome of interest (measured by a gold standard), with
and without the new test as a covariate. The difference
between the area under the receiver operating characteristic curve (AUC), or the C-statistic, for the 2 models is
the most familiar statistic for estimating incremental
value [5]. The AUC measures the discrimination of a
model, or its ability to distinguish between individuals
with and without the outcome. One criticism of the
AUC has been that it changes only slightly, even when
effect measures such as the odds ratio suggest that a
predictor is strongly associated with the outcome [6].
Another criticism is that the AUC has no direct clinical
interpretation for individual patients. This has led to
© 2014 Ling et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain
Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article,
unless otherwise stated.
Ling et al. BMC Medical Research Methodology 2014, 14:67
http://www.biomedcentral.com/1471-2288/14/67
Page 2 of 11
work on comparing predictive models in terms of the
number of patients who are reclassified by adding a new
test to an existing model.
Pencina and colleagues proposed 2 measures for the
net increase in patients who are appropriately classified,
i.e. higher predicted probabilities for patients with the
outcome and lower probabilities for those without the
outcome [7]. They defined the net reclassification improvement (NRI) as the increase in the proportion of
patients who are accurately reclassified by the new
versus the old model into pre-defined risk categories.
They also proposed the integrated discrimination improvement (IDI) as a continuous version of the NRI
across all possible risk thresholds from 0 to 1. The IDI
is defined as the sum of the average increase in predicted probability among patients with the outcome
and the average decrease in probability among patients
without the outcome. Pepe and colleagues have also
shown that the IDI is equivalent to the change in R2
for logistic regression [8].
Unlike active TB (which can be detected with high accuracy using culture), LTBI has no gold standard. Until recently, the tuberculin skin test (TST) was the only
screening test for LTBI. However, the TST suffers from imperfect sensitivity and specificity [14,15]. Interferon-gamma
release assays (IGRAs), such as the QuantiFERON-TB
Gold In-Tube (QFT), are now available and use antigens
that are more specific to M. tuberculosis than the TST.
Several meta-analyses show that the sensitivity of IGRAs
is at least as good as the TST [16-18]. While the specificity
of TST varies depending on when and how many BCG
vaccines are given, the specificity of IGRAs is consistently
high regardless of BCG vaccination [16-18]. Thus, a relevant question is whether IGRAs have any incremental
value over the TST at the time of diagnosis in order to
initiate preventive therapy, while using an approach that
adjusts for the lack of a gold standard for LTBI.
Evaluation of diagnostic tests in the absence of a gold
standard
The observed data may be described by a latent class
model which assumes that the standard test (T1) and new
test (T2) are imperfect measures of an underlying latent
variable D, or true disease status. Both tests and the disease status are assumed to be dichotomous, positive (+) or
negative (−) based on standard cut-offs. The observed data
follow a multinomial distribution where each probability
of the 4 combinations of 2 tests can be expressed in terms
of the sensitivity and specificity of both tests and the
prevalence. Furthermore, each probability is a mix (...truncated)