Ethicara for Responsible AI in Healthcare: A System for Bias Detection and AI Risk Management. (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11492113/pdf/

Ethicara for Responsible AI in Healthcare: A System for Bias Detection and AI Risk Management.

Ethicara for Responsible AI in Healthcare: A System for Bias Detection and AI Risk Management Maria Kritharidou, MS1, Georgios Chrysogonidis, MS1, Tasos Ventouris, MS1, Vaios Tsarapastsanis, MS1, Danai Aristeridou, MS1, Anastasia Karatzia, MS1, Veena Calambur, BA2, Ahsan Huda, PhD1, Sabrina Hsueh, PhD, FAMIA1 1 Pfizer Inc., New York, NY, USA; 2Drexel University, Philadelphia, PA, USA Abstract The increasing torrents of health AI innovations hold promise for facilitating the delivery of patient-centered care. Yet the enablement and adoption of AI innovations in the healthcare and life science industries can be challenging with the rising concerns of AI risks and the potential harms to health equity. This paper describes Ethicara, a system that enables health AI risk assessment for responsible AI model development. Ethicara works by orchestrating a collection of self-analytics services that detect and mitigate bias and increase model transparency from harmonized data models. For the lack of risk controls currently in the health AI development and deployment process, the self-analytics tools enhanced by Ethicara are expected to provide repeatable and measurable controls to operationalize voluntary risk management frameworks and guidelines (e.g., NIST RMF, FDA GMLP) and regulatory requirements emerging from the upcoming AI regulations (e.g., EU AI Act, US Blueprint for an AI Bill of Rights). In addition, Ethicara provides plug-ins via which analytics results are incorporated into healthcare applications. This paper provides an overview of Ethicara’s architecture, pipeline, and technical components and showcases the system’s capability to facilitate responsible AI use, and exemplifies the types of AI risk controls it enables in the healthcare and life science industry. 1. Introduction Health AI innovations in real-world evidence generation and validation hold promise for facilitating the delivery of patient-centered care1. However, health systems, providers, and patients face challenges when integrating additional insights from AI into clinical workflow and wellness decisions. Multisite studies have demonstrated the varying performance of AI/ML models in real-world settings2, 3. In addition, the controversies of health AI on racial and gender bias have sparked an ongoing debate about the ethics and responsibility of such applications on patient care that would affect outcomes, care quality, and health equity4, 5. Following the 21st Century Care Act, FDA released guidelines for Good Machine Learning Practice (GMLP), Software as a Medical Device (SaMD), Real-World Evidence (RWE), and Clinical Decision Support Software. So far, it has led to more than 500 SaMD approvals and the incorporation of RWE in more than 100 regulatory decisions for new drugs and biologics6. However, in a recent survey of health AI innovation, the adoption rate of health AI in high-stakes decision-making scenarios is still in its infancy, given the lack of risk controls for enabling the responsible use of AI in healthcare7, 8. Meanwhile, with the increased staff shortage and clinician burnout rate, the healthcare industry is going through significant consolidations and transitions, putting AI adoption at the center of business priorities. Despite the emerging evidence on how health AI could help improve patient outcomes, care quality, and health equity, the lack of transparency on how AI insights have impeded its interpretation by clinicians in the workflow. Moreover, growing concerns about data and algorithmic bias have been introduced across the lifecycle of AI, from model development and deployment to its responsible use. The Gartner report hypothesized that 85% of the AI projects would deliver erroneous outcomes due to bias9. The newly released 2022 AI index report has documented the increase of bias further introduced by generative AI models, showing a 29% increase in elicited toxicity over state-of-the-art as of 201810. Despite these challenges, the stakeholders in the healthcare ecosystem are becoming increasingly active and engaged. All the concerns have been the driving force in understanding how to assess health AI risks in real-world settings systematically. A number of AI regulations and standards have been proposed to include bias and model transparency in the risk management framework formally. For example, the US White House Office of Science and Technology Policy has related the Blueprint for an AI Bill of Rights45. FDA has released an action plan for AI/ML as a medical device and good machine learning practice. The National Institute of Standards and Technology (NIST) of the U.S. Department of Commerce released the Artificial Intelligence Risk Management Framework (AI RMF 1.0) and bias standard 47. It warrants evaluating AI bias and maintaining responsible AI use with better transparency and 2023 bias mitigation schemes. The paper thus sets out to introduce Ethicara, an enablement tool for bias detection and AI risk management. We show the landscape of the related work and describe our implementation and major technical components, discuss our system architecture, and summarize lessons learned and future work. 2. Related Work 2.1 Types of Potential Health AI Bias Biases typically arise in health AI systems as related to data or from the algorithm itself. Data bias refers to biases in the data used to train an ML model, and these biases persist through the algorithm training to the final predictions. Algorithmic bias refers to the biases introduced in algorithms due to the design choices; it can exist even when the underlying data bias has been mitigated. We focus on the following types of AI biases as studied in the literature11, 12. Representation or sampling bias arises during data collection, when non-representative samples are drawn from a population, or when a non-random sampling approach is introduced. For example, suppose the effectiveness of a drug is determined by a clinical trial where predominantly male participants are included. In that case, the deemed-effective drug can have potentially unintended consequences when prescribed to female patients. Confounding bias arises when an unmeasured variable correlates with both the dependent and independent variables. Not controlling for confounding variables can induce a false relationship between variables of interest. For example, suppose the effect of a medication on a particular health outcome is being studied without accounting for factors like age and gender. This could lead to an overestimation or underestimation of the drug’s true effect on the outcome. Algorithmic bias occurs when bias is introduced due to algorithmic design choices such as optimization functions, regularization, and model selection methods. This can lead to biased algorithmic decisions, even if bias is minimized in the input data. For example, a common design flaw is to use a linear model to describe the relationship between input and output data w (...truncated)