The necessity of AI audit standards boards
AI & SOCIETY
https://doi.org/10.1007/s00146-025-02320-y
REVIEW
The necessity of AI audit standards boards
David Manheim1,2 · Sammy Martin3 · Mark Bailey4 · Mikhail Samin5 · Ross Greutzmacher3
Received: 9 February 2025 / Accepted: 13 March 2025
© The Author(s) 2025
Abstract
Auditing of AI systems is a promising way to understand and manage ethical problems and societal risks associated with
contemporary AI systems, as well as some anticipated future risks. Efforts to develop standards for auditing artificial intelligence (AI) systems have therefore understandably gained momentum. However, current approaches are not just insufficient,
but can be actively harmful. Transparency alone does not address concerns about risk. Internal auditing is insufficient, and
easily becomes safety-washing. External audit is better, but requires credible standards. Industry-led approaches to building
standards or to perform audits lack credibility and undermine other efforts. Regulation often is ill adapted and becomes a
static barrier. Lastly, all of these limited technical, governance, and even ethical assessments fail to ensure continued stakeholder input and engagement. Instead, the paper proposes the establishment of an AI Audit Standards Board, in line with best
practices in other fields, including safety-critical industries like aviation and nuclear energy, as well as more prosaic ones
such as financial accounting and pharmaceuticals. This would address the evolving nature of AI technologies, help maintain public trust in AI, and promote a culture of safety and ethical responsibility within the AI industry. By ensuring audits
remain relevant, robust, and responsive to the rapid advancements in AI, auditing AI will not devolve into safety washing
and addresses risks and ethical concerns that will continue to arise as AI becomes increasingly important in society, and as
human interaction with these systems changes over time.
Keywords AI governance · AI audit · Technology audits · Ethical AI · AI policy · Organizational culture · Audit standards ·
Standards setting · Responsible AI
1 Introduction
* David Manheim
Sammy Martin
Mark Bailey
Mikhail Samin
Ross Greutzmacher
1
Technion Israel Institute of Technology, Haifa, Israel
2
Association for Long Term Existence and Resilience
(ALTER), Rehovot, Israel
3
Transformative Futures Institute, Wichita, KS, USA
4
National Intelligence University, Washington, D.C., USA
5
AI Governance and Safety Institute, Berkeley, California,
USA
Audits are used in different domains both for giving an
account of what is happening within a system, and verification of requirements (Courville et al. 2003). When considering how to perform audits in a new domain, drawing on
best practices from other domains is critical—and artificial
intelligence (AI) is a new domain with new risks (Hendrycks
2025). There is now significant focus on audits for ensuring
the safety and evaluating the risks and harms of AI systems
(Mökander et al. 2023; Shevlane et al. 2023; Sharkey et al.
2023), as well as significant earlier work on audit methods
for evaluation of societal implications (Raji et al. 2022a)
and on what a mature ethics process involves (Krijger et al.
2023). While it is encouraging to see action addressing the
critical role of evaluating and auditing risks from frontier
models and on evaluation of ethical standards, there are
many challenges for these types of evaluations and audits
(Courville et al. 2003).
Vol.:(0123456789)
AI & SOCIETY
Below, we argue that the current approach of standards
development for AI systems is harmful, among other reasons, due to the proliferation of inconsistent and rapidly
outdated static standards and a lack of clarity about what is
appropriate in any given domain, for any specific AI system,
and for specific applications. This fragments efforts, undermines efforts to make any specific auditing methods standard, and reduces the usefulness of standards development.
To supplement important technical approaches being
developed for auditing AI systems, we also need broader
audits, and ongoing development of standards—an audit
standards body, not just recently proposed audit standards,
such as suggested by Faveri et al. (2025). To explain what
is needed and why, we review past work and current audit
approaches. We then explain why current efforts fail at
addressing relevant challenges and risks. We also note that
auditing standards are not the same thing as standards for
audits, and neither necessarily implies regulation. In addition, different methods, audit approaches, and standards are
needed for different model types and applications (Frase
2023). This is especially true when specific standards are
unclear or disputed or when detailed standards would be
unwise, as we will explore. Therefore, the body of the paper
revisits some known ideas in AI audit, as well as some drawbacks of some of the approaches methods, both to provide
a brief overview of the issues that evaluation and auditing
should ideally address, and to show how the suggestion of
audit boards differs from other approaches to standards.
1.1 Background
AI auditing refers to the process of evaluating AI systems
for safety, fairness, transparency, and compliance with ethical and technical benchmarks. This includes internal audits
conducted by developers, external audits by independent
entities, and red-teaming to identify risks. In contrast, AI
standards are guidelines and best practices that define what
AI systems must do, or what vendors must do with the models, and AI audit standards are standards for how AI audits
should be conducted, ensuring consistency or appropriately
adapted methods for auditing different models and applications. Either class of standard may be developed by industry
groups, academic researchers, or regulatory bodies. Regulatory oversight, meanwhile, is the enforcement of legal and
policy frameworks governing AI systems, ensuring compliance with broader societal and governmental requirements.
This type of oversight can involve audits, but while audits
and standards inform best practices, regulatory oversight
determines legal obligations and consequences.
Auditing is well established in the context of computing
generally (van Biene-Hershey 2007, Hall and Hazell 2015).
Standards, such as COBIT, date at least to the 1990s, well
before the current classes of artificial intelligence. Recently,
debate about safety, misuse, and bias has led to internal and
external checks on models, and there are now AI systems
audits, including both a growing ecosystem of AI ethics and
accountability audits, Birhane et al. (2024) as well as safety
audit efforts such as the red-teaming performed for GPT-4
(OpenAI 2023a, OpenAI 2023c, OpenAI 2024).
Self-reporting via “Model Cards” (Mitchell et al. 2019)
has become commonplace, and internal auditing and redteaming have become more common prior to frontier model
release, with notable exceptions (Anil et al. (...truncated)