Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations

PLOS ONE, Feb 2023

Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset.

Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations

PLOS ONE RESEARCH ARTICLE Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations Alexander A. Huang ID1,2☯, Samuel Y. Huang ID1,3☯* 1 Department of Statistics and Data Science, Cornell University, Ithaca, New York, United States of America, 2 Department of MD Education, Northwestern University Feinberg School of Medicine, Chicago, Illinois, United States of America, 3 Department of Internal Medicine, Virginia Commonwealth University School of Medicine, Richmond, Virginia, United States of America a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 OPEN ACCESS Citation: Huang AA, Huang SY (2023) Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations. PLoS ONE 18(2): e0281922. https:// doi.org/10.1371/journal.pone.0281922 Editor: Loredana Bellantuono, Università degli Studi di Bari Aldo Moro: Universita degli Studi di Bari Aldo Moro, ITALY Received: November 23, 2022 Accepted: February 5, 2023 Published: February 23, 2023 Peer Review History: PLOS recognizes the benefits of transparency in the peer review process; therefore, we enable the publication of all of the content of peer review and author responses alongside final, published articles. The editorial history of this article is available here: https://doi.org/10.1371/journal.pone.0281922 Copyright: © 2023 Huang, Huang. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. ☯ These authors contributed equally to this work. * Abstract Machine learning methods are widely used within the medical field. However, the reliability and efficacy of these models is difficult to assess, making it difficult for researchers to identify which machine-learning model to apply to their dataset. We assessed whether variance calculations of model metrics (e.g., AUROC, Sensitivity, Specificity) through bootstrap simulation and SHapely Additive exPlanations (SHAP) could increase model transparency and improve model selection. Data from the England National Health Services Heart Disease Prediction Cohort was used. After comparison of model metrics for XGBoost, Random Forest, Artificial Neural Network, and Adaptive Boosting, XGBoost was used as the machine-learning model of choice in this study. Boost-strap simulation (N = 10,000) was used to empirically derive the distribution of model metrics and covariate Gain statistics. SHapely Additive exPlanations (SHAP) to provide explanations to machine-learning output and simulation to evaluate the variance of model accuracy metrics. For the XGBoost modeling method, we observed (through 10,000 completed simulations) that the AUROC ranged from 0.771 to 0.947, a difference of 0.176, the balanced accuracy ranged from 0.688 to 0.894, a 0.205 difference, the sensitivity ranged from 0.632 to 0.939, a 0.307 difference, and the specificity ranged from 0.595 to 0.944, a 0.394 difference. Among 10,000 simulations completed, we observed that the gain for Angina ranged from 0.225 to 0.456, a difference of 0.231, for Cholesterol ranged from 0.148 to 0.326, a difference of 0.178, for maximum heart rate (MaxHR) ranged from 0.081 to 0.200, a range of 0.119, and for Age ranged from 0.059 to 0.157, difference of 0.098. Use of simulations to empirically evaluate the variability of model metrics and explanatory algorithms to observe if covariates match the literature are necessary for increased transparency, reliability, and utility of machine learning methods. These variance statistics, combined with model accuracy statistics can help researchers identify the best model for a given dataset. Data Availability Statement: All relevant data are within the manuscript and its Supporting information files. PLOS ONE | https://doi.org/10.1371/journal.pone.0281922 February 23, 2023 1 / 15 PLOS ONE Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations Funding: The authors received no specific funding for this work. Introduction Competing interests: The authors have declared that no competing interests exist. Machine learning (ML) algorithms generate predictions from sample data without explicit directions from the user [1–4]. Common ML algorithms (e.g., XGBoost, Random Forest, Neural Networks) have been found to be more accurate than traditional parametric methods (linear regression, logistic regression) [5–8]. It has been hypothesized that this increase in accuracy can be attributed to potential non-linear relationships between the independent and dependent variables and interactions between multiple covariates [9, 10]. However, the increase in ML algorithms compared to traditional parametric methods comes at a significant cost: interpretability [11–15]. Linear regression and logistic regression have clear interpretable output that have been widely studied [16–18]. Machine-learning algorithms are often noninterpretable, leading to their reputation as a “black box” algorithm [10, 19–21]. As a result, the interpretability, reliability, and efficacy of machine-learning models is often difficult to assess [14, 20, 22–24]. Without methods that explain how machine learning algorithms reach their predictions, clinicians will not be able to identify if models are reliable and generalizable or just replicating the biases within the training datasets [11, 13, 25]. Provision of explanations about how model predictions are researched and providing accurate summary statistics for model accuracy metrics (e.g., AUROC, Sensitivity, Specificity, F1, Balanced Accuracy) will increase the transparency of machine learning methods and increase confidence when using their predictions [8, 9, 26, 27]. Potential solutions to these weaknesses in machine learning that have been applied within the field of computer science are SHapely Additive exPlanations (SHAP) for model interpretability and bootstrap simulation for quantifying the statistical distribution of model accuracy metrics [28–30]. However, little is known about the efficacy of SHAP and Bootstrap in evaluating machine-learning methods for medical outcomes such as heart disease. Given these limitations in the literature, with data from the England National Health Services Heart Disease Prediction Cohort, we leveraged SHAP to provide explanations to machine-learning output and bootstrap simulation to evaluate the variance of model accuracy metrics. Methods A retrospective, cohort study using the publicly available Heart Disease Prediction cohort (from the England National Health Services database) was conducted. All methods in this research were carried out in accordance with ethical guidelines detailed by the Data Alliance Partnership Board (DAPB) approved national information standards and data collections for use in health and adult (...truncated)


This is a preview of a remote PDF: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0281922&type=printable
Article home page: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0281922

Alexander A. Huang, Samuel Y. Huang. Increasing transparency in machine learning through bootstrap simulation and shapely additive explanations, PLOS ONE, 2023, Volume 18, Issue 2, DOI: 10.1371/journal.pone.0281922