Commentary: Reporting standards are needed for evaluations of risk reclassification (pdf)

Article PDF cannot be displayed. You can download it here:

https://academic.oup.com/ije/article-pdf/40/4/1106/18481600/dyr083.pdf

Commentary: Reporting standards are needed for evaluations of risk reclassification

Published by Oxford University Press on behalf of the International Epidemiological Association ß The Author 2011; all rights reserved. Advance Access publication 13 May 2011 International Journal of Epidemiology 2011;40:1106–1108 doi:10.1093/ije/dyr083 Commentary: Reporting standards are needed for evaluations of risk reclassification Margaret S Pepe1,2* and Holly Janes1 1 Biostatistics and Biomathematics Program, Fred Hutchinson Cancer Research Center, Seattle, WA, USA and 2Biostastics Department, University of Washington, Seattle, WA, USA *Corresponding author. Biostatistics and Biomathematics Program, Fred Hutchinson Cancer Research Center, Seattle, WA 98109, USA. E-mail: 14 April 2011 New approaches have been developed in recent years to quantify the improvement in prediction performance gained by adding a novel marker to a set of baseline predictors of risk. The paper by Tzoulaki et al.1 concerns risk reclassification techniques and focuses specifically on the net reclassification improvement (NRI) index. Their review shows that use of risk reclassification analysis is extremely common in practice, with 51 papers using the technique published in only 3 years since its introduction. Unfortunately and alarmingly, the review shows that the quality of reporting is dismal. Investigators seem confused about the roles and interpretations of risk reclassification metrics. Guidance on how to report results of risk reclassification analysis would be helpful to authors, reviewers and the field in general. The risk reclassification table was first introduced by Cook.2 The table is constructed by choosing clinically meaningful risk categories and cross-classifying individuals according to their risks calculated with the baseline risk model and with the expanded risk model. The top panel of Table 1 provides an illustration. Cook and Ridker3 developed a whole analysis strategy around the risk reclassification table including new hypothesis tests and a new metric called ‘percent correct reclassification’. However, the value of these analysis techniques is doubtful and results can be misleading.4 Pencina et al.5 argued that the reclassification table itself was problematic, at least as proposed by Cook, because it did not distinguish between subjects with events (cases) and subjects without events (controls). They suggested constructing separate event and non-event reclassification tables as shown in the middle and bottom panels of Table 1. Entries above the diagonal correspond to risks that are higher with the expanded vs baseline model, representing improved prediction for subjects with events. Correspondingly, entries below the diagonal represent worse prediction for them. The event-NRI is the difference between the proportions of subjects above vs below the diagonal in the event reclassification table. Using a similar logic, the non-event-NRI is calculated from the non-event reclassification table by taking the difference between the proportions of subjects below vs above the diagonal. The NRI summary index that gained immediate popularity in the literature following Pencina’s paper is the sum, NRI ¼ event-NRI þ non-event-NRI: A prerequisite for considering risk reclassification is that the risk models are well calibrated, in the sense that the observed event rates for subgroups defined by the predictors in the models are close to the values calculated from the models. A poorly calibrated risk model is considered invalid for calculating risk as a function of the modelled predictors. It is of great concern therefore that almost half of the papers reporting risk reclassification results do not report assessment of model calibration. A second basic premise for considering risk reclassification is that the chosen categories of risk are clinically meaningful in the sense that changing risk categories has clinical consequences. The review indicates that only 27% of papers provided justification for the particular risk categories used. This is a very poor state of affairs. Even if the risk models are valid and the risk categories chosen are clinically relevant, is NRI a good way of summarizing improvement in risk reclassification performance? We do not find this single numeric summary very enlightening. Calculated as 17.4% in our example, the NRI seems to fall short of the task of gauging whether or not a substantial improvement has been obtained. Somewhat more revealing are its components, event-NRI and non-event-NRI. If only two risk categories were involved, the event-NRI is the increase in the proportion of subjects with events that are classified as high risk by the predictors and correspondingly the non-event-NRI is the increase in the proportion of subjects without events who are deemed at low risk. These are simple useful 1106 Accepted EVALUATIONS OF RISK RECLASSIFICATION Table 1 Illustration of risk reclassification tables Expanded model Baseline model 0–5% All subjects (n ¼ 10 000) 5–20% 0–5% 5558 437 5–20% 420% Total 25 6020 1036 1095 386 2517 420% 40 329 1094 1463 Total 6634 1861 1505 10000 Events only (n ¼ 1017) 72 38 4 114 5–10% 21 105 114 240 420% 0 33 630 663 Total 93 176 748 1017 Non-events only (n ¼ 8983) 0–5% 5486 399 21 5906 5–20% 1015 990 272 2277 420% 40 296 464 800 Total 6541 1685 757 8983 The original table proposed by Cook2 included all subjects (top panel). Pencina et al.5 proposed separate tables for events and non-events (middle and bottom panels). Event-NRI ¼ 10.0%, non-event-NRI ¼ 7.4% and NRI ¼ 17.4%. summaries. However, with more than two categories the interpretations are far less appealing because all upward movements of risk category are counted equally and all downward movements are counted equally. Yet, the clinical implications are usually not equal. For example, moving from the lowest to highest or moving from the lowest to intermediate categories typically has very different consequences. Perhaps a single numeric summary is not needed or at least should not be the main focus of analysis. One alternative suggestion is to report the net changes in proportions of subjects classified in each of the risk categories.6 These are (2.1%, 6.3%, 8.4%) for subjects with events and (7.1%, 6.6%, 0.5%) for subjects without events in Table 1. In other words, of subjects with events, 8.4% more are in the high-risk category and 2.1% fewer are in the low-risk category, whereas of subjects without events, 7.1% more are in the low-risk category and 0.5% fewer are in the high-risk category. These are simple summaries of reclassification performance that seem more clinically relevant than the NRI index of 17.4%. Although risk reclassification analysis with the NRI has taken off like wildfire in applications, it is not yet a highly developed rigorous statistical technique. Unfortunately, this point is not widely appreciated and it is not acknowledged in the review. In particular, statis (...truncated)