Ethicara for Responsible AI in Healthcare: A System for Bias Detection and AI Risk Management.
Ethicara for Responsible AI in Healthcare:
A System for Bias Detection and AI Risk Management
Maria Kritharidou, MS1, Georgios Chrysogonidis, MS1, Tasos Ventouris, MS1, Vaios
Tsarapastsanis, MS1, Danai Aristeridou, MS1, Anastasia Karatzia, MS1, Veena Calambur,
BA2, Ahsan Huda, PhD1, Sabrina Hsueh, PhD, FAMIA1
1
Pfizer Inc., New York, NY, USA; 2Drexel University, Philadelphia, PA, USA
Abstract
The increasing torrents of health AI innovations hold promise for facilitating the delivery of patient-centered care.
Yet the enablement and adoption of AI innovations in the healthcare and life science industries can be challenging
with the rising concerns of AI risks and the potential harms to health equity. This paper describes Ethicara, a
system that enables health AI risk assessment for responsible AI model development. Ethicara works by
orchestrating a collection of self-analytics services that detect and mitigate bias and increase model transparency
from harmonized data models. For the lack of risk controls currently in the health AI development and deployment
process, the self-analytics tools enhanced by Ethicara are expected to provide repeatable and measurable controls
to operationalize voluntary risk management frameworks and guidelines (e.g., NIST RMF, FDA GMLP) and
regulatory requirements emerging from the upcoming AI regulations (e.g., EU AI Act, US Blueprint for an AI Bill of
Rights). In addition, Ethicara provides plug-ins via which analytics results are incorporated into healthcare
applications. This paper provides an overview of Ethicara’s architecture, pipeline, and technical components and
showcases the system’s capability to facilitate responsible AI use, and exemplifies the types of AI risk controls it
enables in the healthcare and life science industry.
1.
Introduction
Health AI innovations in real-world evidence generation and validation hold promise for facilitating the delivery of
patient-centered care1. However, health systems, providers, and patients face challenges when integrating additional
insights from AI into clinical workflow and wellness decisions. Multisite studies have demonstrated the varying
performance of AI/ML models in real-world settings2, 3. In addition, the controversies of health AI on racial and gender
bias have sparked an ongoing debate about the ethics and responsibility of such applications on patient care that would
affect outcomes, care quality, and health equity4, 5. Following the 21st Century Care Act, FDA released guidelines for
Good Machine Learning Practice (GMLP), Software as a Medical Device (SaMD), Real-World Evidence (RWE), and
Clinical Decision Support Software. So far, it has led to more than 500 SaMD approvals and the incorporation of
RWE in more than 100 regulatory decisions for new drugs and biologics6. However, in a recent survey of health AI
innovation, the adoption rate of health AI in high-stakes decision-making scenarios is still in its infancy, given the
lack of risk controls for enabling the responsible use of AI in healthcare7, 8.
Meanwhile, with the increased staff shortage and clinician burnout rate, the healthcare industry is going through
significant consolidations and transitions, putting AI adoption at the center of business priorities. Despite the emerging
evidence on how health AI could help improve patient outcomes, care quality, and health equity, the lack of
transparency on how AI insights have impeded its interpretation by clinicians in the workflow. Moreover, growing
concerns about data and algorithmic bias have been introduced across the lifecycle of AI, from model development
and deployment to its responsible use. The Gartner report hypothesized that 85% of the AI projects would deliver
erroneous outcomes due to bias9. The newly released 2022 AI index report has documented the increase of bias further
introduced by generative AI models, showing a 29% increase in elicited toxicity over state-of-the-art as of 201810.
Despite these challenges, the stakeholders in the healthcare ecosystem are becoming increasingly active and
engaged. All the concerns have been the driving force in understanding how to assess health AI risks in real-world
settings systematically. A number of AI regulations and standards have been proposed to include bias and model
transparency in the risk management framework formally. For example, the US White House Office of Science and
Technology Policy has related the Blueprint for an AI Bill of Rights45. FDA has released an action plan for AI/ML as
a medical device and good machine learning practice. The National Institute of Standards and Technology (NIST) of
the U.S. Department of Commerce released the Artificial Intelligence Risk Management Framework (AI RMF 1.0)
and bias standard 47. It warrants evaluating AI bias and maintaining responsible AI use with better transparency and
2023
bias mitigation schemes. The paper thus sets out to introduce Ethicara, an enablement tool for bias detection and AI
risk management. We show the landscape of the related work and describe our implementation and major technical
components, discuss our system architecture, and summarize lessons learned and future work.
2.
Related Work
2.1 Types of Potential Health AI Bias
Biases typically arise in health AI systems as related to data or from the algorithm itself. Data bias refers to biases
in the data used to train an ML model, and these biases persist through the algorithm training to the final predictions.
Algorithmic bias refers to the biases introduced in algorithms due to the design choices; it can exist even when the
underlying data bias has been mitigated. We focus on the following types of AI biases as studied in the literature11, 12.
Representation or sampling bias arises during data collection, when non-representative samples are drawn from a
population, or when a non-random sampling approach is introduced. For example, suppose the effectiveness of a drug
is determined by a clinical trial where predominantly male participants are included. In that case, the deemed-effective
drug can have potentially unintended consequences when prescribed to female patients.
Confounding bias arises when an unmeasured variable correlates with both the dependent and independent variables.
Not controlling for confounding variables can induce a false relationship between variables of interest. For example,
suppose the effect of a medication on a particular health outcome is being studied without accounting for factors like
age and gender. This could lead to an overestimation or underestimation of the drug’s true effect on the outcome.
Algorithmic bias occurs when bias is introduced due to algorithmic design choices such as optimization functions,
regularization, and model selection methods. This can lead to biased algorithmic decisions, even if bias is minimized
in the input data. For example, a common design flaw is to use a linear model to describe the relationship between
input and output data w (...truncated)