The necessity of AI audit standards boards (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s00146-025-02320-y.pdf

The necessity of AI audit standards boards

AI & SOCIETY https://doi.org/10.1007/s00146-025-02320-y REVIEW The necessity of AI audit standards boards David Manheim1,2 · Sammy Martin3 · Mark Bailey4 · Mikhail Samin5 · Ross Greutzmacher3 Received: 9 February 2025 / Accepted: 13 March 2025 © The Author(s) 2025 Abstract Auditing of AI systems is a promising way to understand and manage ethical problems and societal risks associated with contemporary AI systems, as well as some anticipated future risks. Efforts to develop standards for auditing artificial intelligence (AI) systems have therefore understandably gained momentum. However, current approaches are not just insufficient, but can be actively harmful. Transparency alone does not address concerns about risk. Internal auditing is insufficient, and easily becomes safety-washing. External audit is better, but requires credible standards. Industry-led approaches to building standards or to perform audits lack credibility and undermine other efforts. Regulation often is ill adapted and becomes a static barrier. Lastly, all of these limited technical, governance, and even ethical assessments fail to ensure continued stakeholder input and engagement. Instead, the paper proposes the establishment of an AI Audit Standards Board, in line with best practices in other fields, including safety-critical industries like aviation and nuclear energy, as well as more prosaic ones such as financial accounting and pharmaceuticals. This would address the evolving nature of AI technologies, help maintain public trust in AI, and promote a culture of safety and ethical responsibility within the AI industry. By ensuring audits remain relevant, robust, and responsive to the rapid advancements in AI, auditing AI will not devolve into safety washing and addresses risks and ethical concerns that will continue to arise as AI becomes increasingly important in society, and as human interaction with these systems changes over time. Keywords AI governance · AI audit · Technology audits · Ethical AI · AI policy · Organizational culture · Audit standards · Standards setting · Responsible AI 1 Introduction * David Manheim Sammy Martin Mark Bailey Mikhail Samin Ross Greutzmacher 1 Technion Israel Institute of Technology, Haifa, Israel 2 Association for Long Term Existence and Resilience (ALTER), Rehovot, Israel 3 Transformative Futures Institute, Wichita, KS, USA 4 National Intelligence University, Washington, D.C., USA 5 AI Governance and Safety Institute, Berkeley, California, USA Audits are used in different domains both for giving an account of what is happening within a system, and verification of requirements (Courville et al. 2003). When considering how to perform audits in a new domain, drawing on best practices from other domains is critical—and artificial intelligence (AI) is a new domain with new risks (Hendrycks 2025). There is now significant focus on audits for ensuring the safety and evaluating the risks and harms of AI systems (Mökander et al. 2023; Shevlane et al. 2023; Sharkey et al. 2023), as well as significant earlier work on audit methods for evaluation of societal implications (Raji et al. 2022a) and on what a mature ethics process involves (Krijger et al. 2023). While it is encouraging to see action addressing the critical role of evaluating and auditing risks from frontier models and on evaluation of ethical standards, there are many challenges for these types of evaluations and audits (Courville et al. 2003). Vol.:(0123456789) AI & SOCIETY Below, we argue that the current approach of standards development for AI systems is harmful, among other reasons, due to the proliferation of inconsistent and rapidly outdated static standards and a lack of clarity about what is appropriate in any given domain, for any specific AI system, and for specific applications. This fragments efforts, undermines efforts to make any specific auditing methods standard, and reduces the usefulness of standards development. To supplement important technical approaches being developed for auditing AI systems, we also need broader audits, and ongoing development of standards—an audit standards body, not just recently proposed audit standards, such as suggested by Faveri et al. (2025). To explain what is needed and why, we review past work and current audit approaches. We then explain why current efforts fail at addressing relevant challenges and risks. We also note that auditing standards are not the same thing as standards for audits, and neither necessarily implies regulation. In addition, different methods, audit approaches, and standards are needed for different model types and applications (Frase 2023). This is especially true when specific standards are unclear or disputed or when detailed standards would be unwise, as we will explore. Therefore, the body of the paper revisits some known ideas in AI audit, as well as some drawbacks of some of the approaches methods, both to provide a brief overview of the issues that evaluation and auditing should ideally address, and to show how the suggestion of audit boards differs from other approaches to standards. 1.1 Background AI auditing refers to the process of evaluating AI systems for safety, fairness, transparency, and compliance with ethical and technical benchmarks. This includes internal audits conducted by developers, external audits by independent entities, and red-teaming to identify risks. In contrast, AI standards are guidelines and best practices that define what AI systems must do, or what vendors must do with the models, and AI audit standards are standards for how AI audits should be conducted, ensuring consistency or appropriately adapted methods for auditing different models and applications. Either class of standard may be developed by industry groups, academic researchers, or regulatory bodies. Regulatory oversight, meanwhile, is the enforcement of legal and policy frameworks governing AI systems, ensuring compliance with broader societal and governmental requirements. This type of oversight can involve audits, but while audits and standards inform best practices, regulatory oversight determines legal obligations and consequences. Auditing is well established in the context of computing generally (van Biene-Hershey 2007, Hall and Hazell 2015). Standards, such as COBIT, date at least to the 1990s, well before the current classes of artificial intelligence. Recently, debate about safety, misuse, and bias has led to internal and external checks on models, and there are now AI systems audits, including both a growing ecosystem of AI ethics and accountability audits, Birhane et al. (2024) as well as safety audit efforts such as the red-teaming performed for GPT-4 (OpenAI 2023a, OpenAI 2023c, OpenAI 2024). Self-reporting via “Model Cards” (Mitchell et al. 2019) has become commonplace, and internal auditing and redteaming have become more common prior to frontier model release, with notable exceptions (Anil et al. (...truncated)