A Bayesian framework for estimating the incremental value of a diagnostic test in the absence of a gold standard (pdf)

Article PDF cannot be displayed. You can download it here:

https://bmcmedresmethodol.biomedcentral.com/track/pdf/10.1186/1471-2288-14-67

A Bayesian framework for estimating the incremental value of a diagnostic test in the absence of a gold standard

Ling et al. BMC Medical Research Methodology 2014, 14:67 http://www.biomedcentral.com/1471-2288/14/67 TECHNICAL ADVANCE Open Access A Bayesian framework for estimating the incremental value of a diagnostic test in the absence of a gold standard Daphne I Ling1, Madhukar Pai1, Ian Schiller2 and Nandini Dendukuri1,2* Abstract Background: The absence of a gold standard, i.e., a diagnostic reference standard having perfect sensitivity and specificity, is a common problem in clinical practice and in diagnostic research studies. There is a need for methods to estimate the incremental value of a new, imperfect test in this context. Methods: We use a Bayesian approach to estimate the probability of the unknown disease status via a latent class model and extend two commonly-used measures of incremental value based on predictive values [difference in the area under the ROC curve (AUC) and integrated discrimination improvement (IDI)] to the context where no gold standard exists. The methods are illustrated using simulated data and applied to the problem of estimating the incremental value of a novel interferon-gamma release assay (IGRA) over the tuberculin skin test (TST) for latent tuberculosis (TB) screening. We also show how to estimate the incremental value of IGRAs when decisions are based on observed test results rather than predictive values. Results: We showed that the incremental value is greatest when both sensitivity and specificity of the new test are better and that conditional dependence between the tests reduces the incremental value. The incremental value of the IGRA depends on the sensitivity and specificity of the TST, as well as the prevalence of latent TB, and may thus vary in different populations. Conclusions: Even in the absence of a gold standard, incremental value statistics may be estimated and can aid decisions about the practical value of a new diagnostic test. Keywords: Area under the curve, Bayesian estimation, Incremental value, Informative priors, Integrated discrimination improvement, Imperfect diagnostic tests, Latent class models, Tuberculosis Background Incremental value of a diagnostic test The literature on diagnostic test evaluation has centered on estimation of sensitivity and specificity, measures that do not directly convey the clinical impact of a given test [1-3]. The added value of a test will depend on how much information is already available from the diagnostic workup and whether the test result actually changes clinical decisions. The development of methods for evaluation of the incremental value of new tests or biomarkers is thus an active area of biostatistical research [4]. * Correspondence: 1 Department of Epidemiology and Biostatistics, McGill University, 1020 Pine Ave West, Montreal H3A 1A2, QC, Canada 2 Division of Clinical Epidemiology, McGill University Health Centre–Research Institute, 687 Pine Avenue West, Room R4.09, Montreal H3A 1A1, QC, Canada Evaluation of the incremental value of a new test typically involves comparing prediction models of the outcome of interest (measured by a gold standard), with and without the new test as a covariate. The difference between the area under the receiver operating characteristic curve (AUC), or the C-statistic, for the 2 models is the most familiar statistic for estimating incremental value [5]. The AUC measures the discrimination of a model, or its ability to distinguish between individuals with and without the outcome. One criticism of the AUC has been that it changes only slightly, even when effect measures such as the odds ratio suggest that a predictor is strongly associated with the outcome [6]. Another criticism is that the AUC has no direct clinical interpretation for individual patients. This has led to © 2014 Ling et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. Ling et al. BMC Medical Research Methodology 2014, 14:67 http://www.biomedcentral.com/1471-2288/14/67 Page 2 of 11 work on comparing predictive models in terms of the number of patients who are reclassified by adding a new test to an existing model. Pencina and colleagues proposed 2 measures for the net increase in patients who are appropriately classified, i.e. higher predicted probabilities for patients with the outcome and lower probabilities for those without the outcome [7]. They defined the net reclassification improvement (NRI) as the increase in the proportion of patients who are accurately reclassified by the new versus the old model into pre-defined risk categories. They also proposed the integrated discrimination improvement (IDI) as a continuous version of the NRI across all possible risk thresholds from 0 to 1. The IDI is defined as the sum of the average increase in predicted probability among patients with the outcome and the average decrease in probability among patients without the outcome. Pepe and colleagues have also shown that the IDI is equivalent to the change in R2 for logistic regression [8]. Unlike active TB (which can be detected with high accuracy using culture), LTBI has no gold standard. Until recently, the tuberculin skin test (TST) was the only screening test for LTBI. However, the TST suffers from imperfect sensitivity and specificity [14,15]. Interferon-gamma release assays (IGRAs), such as the QuantiFERON-TB Gold In-Tube (QFT), are now available and use antigens that are more specific to M. tuberculosis than the TST. Several meta-analyses show that the sensitivity of IGRAs is at least as good as the TST [16-18]. While the specificity of TST varies depending on when and how many BCG vaccines are given, the specificity of IGRAs is consistently high regardless of BCG vaccination [16-18]. Thus, a relevant question is whether IGRAs have any incremental value over the TST at the time of diagnosis in order to initiate preventive therapy, while using an approach that adjusts for the lack of a gold standard for LTBI. Evaluation of diagnostic tests in the absence of a gold standard The observed data may be described by a latent class model which assumes that the standard test (T1) and new test (T2) are imperfect measures of an underlying latent variable D, or true disease status. Both tests and the disease status are assumed to be dichotomous, positive (+) or negative (−) based on standard cut-offs. The observed data follow a multinomial distribution where each probability of the 4 combinations of 2 tests can be expressed in terms of the sensitivity and specificity of both tests and the prevalence. Furthermore, each probability is a mix (...truncated)