Machine learning to predict mortality after rehabilitation among patients with severe stroke (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-020-77243-3.pdf

Machine learning to predict mortality after rehabilitation among patients with severe stroke

www.nature.com/scientificreports OPEN Machine learning to predict mortality after rehabilitation among patients with severe stroke Domenico Scrutinio1, Carlo Ricciardi 1,2*, Leandro Donisi1,2, Ernesto Losavio1, Petronilla Battista1, Pietro Guida1, Mario Cesarelli1,3, Gaetano Pagano1 & Giovanni D’Addio1 Stroke is among the leading causes of death and disability worldwide. Approximately 20–25% of stroke survivors present severe disability, which is associated with increased mortality risk. Prognostication is inherent in the process of clinical decision-making. Machine learning (ML) methods have gained increasing popularity in the setting of biomedical research. The aim of this study was twofold: assessing the performance of ML tree-based algorithms for predicting three-year mortality model in 1207 stroke patients with severe disability who completed rehabilitation and comparing the performance of ML algorithms to that of a standard logistic regression. The logistic regression model achieved an area under the Receiver Operating Characteristics curve (AUC) of 0.745 and was well calibrated. At the optimal risk threshold, the model had an accuracy of 75.7%, a positive predictive value (PPV) of 33.9%, and a negative predictive value (NPV) of 91.0%. The ML algorithm outperformed the logistic regression model through the implementation of synthetic minority oversampling technique and the Random Forests, achieving an AUC of 0.928 and an accuracy of 86.3%. The PPV was 84.6% and the NPV 87.5%. This study introduced a step forward in the creation of standardisable tools for predicting health outcomes in individuals affected by stroke. Stroke is among the leading causes of death and disability worldwide1–4. Approximately 20–25% of stroke survivors present severe d isability5. Severe disability after stroke is associated with increased risk of mortality and readmission, wider inter-individual variation in responsiveness to rehabilitation, and higher healthcare and social costs compared with less severe s trokes6,7. Moreover, there is evidence that patients with severe post-stroke disability are less likely to be admitted to specialized inpatient rehabilitation facilities (IRF) and to receive appropriate secondary prevention than those with mild-to-moderate d isability8–12, with a possible negative impact on prognosis. Prognostication is inherent in the process of clinical decision-making13. The assessment of risk in stroke patients with severe disability might improve clinical decision-making, prompt clinicians to consider closer surveillance and more aggressive treatment to achieve goals in secondary prevention, and influence patient management. While not routinely used in clinical practice, multivariable models are well-accepted tools to predict prognosis. Three well-known prognostic models were developed to predict 90-day or 1-year mortality in patients with acute stroke14–16. These models had good discriminatory properties (C statistic ranging 0.706 and 0.840). However, the application of models developed from patients with heterogeneous neurological deficits using variables recorded at acute care admission to the subset of patients with severe stroke after discharge from the acute care setting can result in miscalibrated estimates of life expectancy and decreased discriminatory value. In addition, the beneficial effect of inpatient rehabilitation on mortality might confound the association between predictors recorded at admission to acute care and m ortality17–19. The standard approach to develop prognostic models involves the use of statistical regression models. Correlation between covariates, nonlinearity of the association between continuous covariates and risk for the outcome of interest, and potential complex interactions among covariates represent common analytic challenges in regression m odelling20,21. In comparison with statistical models, machine-learning (ML) methods have the advantages of using a larger number of predictors, requiring fewer assumptions, using an agnostic approach instead of a priori hypotheses, incorporating “multi-dimensional correlations that contain prognostic information”, and producing a “more flexible relationship among the predictor variables (alone or in combination) and the outcome”20,22–24. As observed by D eo24, “there may be features that are useful in combinations but not on their 1 Istituti Clinici Scientifici Maugeri IRCCS, Pavia, Italy. 2Department of Advanced Biomedical Sciences, University Hospital of Naples “Federico II”, Naples, Italy. 3Department of Electrical Engineering and Information Technology, University of Naples “Federico II”, Naples, Italy. *email: Scientific Reports | (2020) 10:20127 | https://doi.org/10.1038/s41598-020-77243-3 1 Vol.:(0123456789) www.nature.com/scientificreports/ Figure 1. The workflow of the study is represented: the data of 1207 patients from three facilities of Maugeri Institute in the South and in the North of Italy were collected and used to create models through a multivariate logistic regression and tree-based ML algorithms to predict three-year mortality in stroke patients after rehabilitation. own”. Theoretically, these properties might allow achieve an improved model performance for prognostication of the outcome of interest. The workflow of the study is shown in Fig. 1 and its aim was two-fold: (1) (2) Assessing the performance of ML–based algorithms for predicting long-term mortality in stroke patients with severe disability; Comparing the performance of ML algorithms to that of a standard regression model. To address these issues, we studied 1207 patients admitted to inpatients rehabilitation and classified as CaseMix Groups (CMGs) 0108, 0109, and 0110 of the Medicare case-mix classification s ystem25, which was specifically developed to account for “the level of severity of a given case”26. Case-mix groups 0108, 0109, and 0110 encompass the most severe strokes. Since our primary was a dichotomous outcome (dead/alive) rather than time-to-event and nearly all survivors had a complete follow-up up to three years, we chose to focus on a logistic regression analysis instead of a Cox regression analysis. We found that ML algorithms outperformed a standard regression model. Results Table 1 shows baseline patients’ characteristics. Of the 1241 patients who fulfilled the selection criteria, 34 were lost to follow-up after discharge, leaving 1207 patients available for analysis. A total of 3,267 person-years of follow-up were examined during which 189 deaths (5.8 deaths/100 person-years) occurred. The mean follow-up was 988 ± 273 days. The actual mortality rates were 8.3% at 1 year, 13.0% at 2 years, and 15.7% at 3 years. Logistic regression. At multivariate analysis, age, diabetes, CAD, AF, anemia, renal dysfunction, neglect, and cognitive FIM score were significantly associated with 3-year mortality (Table 2). Age was the most important variable (Table 3). The logistic mo (...truncated)