People underestimate the errors made by algorithms for credit scoring and recidivism prediction but accept even fewer errors (pdf)

Article PDF cannot be displayed. You can download it here:

https://www.nature.com/articles/s41598-021-99802-y.pdf

People underestimate the errors made by algorithms for credit scoring and recidivism prediction but accept even fewer errors

www.nature.com/scientificreports OPEN People underestimate the errors made by algorithms for credit scoring and recidivism prediction but accept even fewer errors Felix G. Rebitschek1,2*, Gerd Gigerenzer1,2 & Gert G. Wagner1,2,3 This study provides the first representative analysis of error estimations and willingness to accept errors in a Western country (Germany) with regards to algorithmic decision-making systems (ADM). We examine people’s expectations about the accuracy of algorithms that predict credit default, recidivism of an offender, suitability of a job applicant, and health behavior. Also, we ask whether expectations about algorithm errors vary between these domains and how they differ from expectations about errors made by human experts. In a nationwide representative study (N = 3086) we find that most respondents underestimated the actual errors made by algorithms and are willing to accept even fewer errors than estimated. Error estimates and error acceptance did not differ consistently for predictions made by algorithms or human experts, but people’s living conditions (e.g. unemployment, household income) affected domain-specific acceptance (job suitability, credit defaulting) of misses and false alarms. We conclude that people have unwarranted expectations about the performance of ADM systems and evaluate errors in terms of potential personal consequences. Given the general public’s low willingness to accept errors, we further conclude that acceptance of ADM appears to be conditional to strict accuracy requirements. This study provides the first representative analysis of error estimations and willingness to accept errors in a Western population (in Germany) with regards to specific algorithmic decision-making (ADM) systems. We examine how accurately algorithms are expected to perform in predicting credit defaults, recidivism of an offender, suitability of a job applicant, and health behavior. Algorithmic decision-making (ADM1) continues to spread into everyday life. At the same time, claims, risks2–4, and implementations related to ADM are under debate, e.g. in criminal risk a ssessment5–7 or allocation of public resources8. Despite these controversies, it is unclear how the general public perceives ADM quality. Representative studies focus on a ttitudes9–11 and on perceived understanding and application of ADM and artificial intelligence12–15, but rarely on perceptions of the systems’ reliability and validity (for an exception, s ee16). The public debate centers on laypeople’s trust in and fear of a lgorithms16, e.g. in media coverage of hopes and concerns regarding artificial i ntelligence17. Research on algorithm aversion and a ppreciation1,18 examines both the circumstances under which people trust algorithmic advice18,19—for instance, because they perceive the decision problem in question to be objective or to require mechanical skills20 or because they lack confidence in their own expertise—and the circumstances under which they are mistrustful, for instance21,22, in response to slow algorithm responses or to observing algorithm errors. Layperson’s knowledge about ADM systems is usually limited12, if the algorithms themselves are not even secret. This is where layperson’s theories about people’s ADM systems—their theory of machine—become crucial18. What do they think about input, processing, and output, and, moreover, about quality and fairness of algorithmic compared to expert judgments? Given the limited possibilities to actually observe ADM errors, expected instead of observed performance deficits may underpin critical attitudes of the public toward ADM. Extremely high performance expectations, for instance, could underlie algorithm aversion, when they provide a mental reference point that is failed by an a lgorithm23. But what are the people’s expectations with regard to ADM performance? 1 Harding Center for Risk Literacy, Faculty of Health Sciences Brandenburg, University of Potsdam, Potsdam, Germany. 2Max Planck Institute for Human Development, Berlin, Germany. 3German Socio-Economic Panel Study (SOEP), Berlin, Germany. *email: Scientific Reports | (2021) 11:20171 | https://doi.org/10.1038/s41598-021-99802-y 1 Vol.:(0123456789) www.nature.com/scientificreports/ Surprisingly, the expected level of accuracy of algorithms and the perceived c ompetence24 of algorithmic advice have been largely neglected in research on the general population. Fifty-eight percent of Americans expect some level of human bias in ADM systems, and 47% and 49% respectively believe that resume screening of job applicants and scoring for parole are “effective”16. In the present article, we aim to reduce the gap in knowledge about the general public’s concrete expectations by investigating what they expect regarding the accuracy of ADM in the financial, legal, occupational, and health domains. We ask people what they believe are the actual error rates made by ADM and how many errors they consider to be acceptable and check whether their responses meets current ADM accuracy standards (in the financial and legal domains). To the best of our knowledge (after a literature search in Web of Science, PsycNet, and Google Scholar, which revealed eleven survey studies on algorithm p erception9,10,12–17,25–27), this is the first representative study comparing error estimates and the willingness to accept errors for ADM. Classifying ADM systems balance two different types of errors, misses and false alarms, each associated with different consequences (costs). Thinking about types of decision errors and related costs can affect acceptance of errors (e.g. ‘bias’ in signal detection t heory28; cost-sensitive error m anagement29), and the degree to which 30 errors are accepted may differ in the legal and in the medical domain31, e.g. if the consequences of medical errors are irreversible. From the perspective of an unemployed person, for instance, mistakenly being overlooked for a job by an ADM (a false negative) is likely more costly than being hired in spite of being unsuitable (a false positive). In our analysis, we therefore relate error acceptance to critical factors such as unemployment phases. Also, we compare the willingness to accept errors with the typical error preference of exemplary stakeholders (e.g. non-recidivizing offenders want to avoid a false alarm). We additionally explore factors underlying attitudes toward technology, that may influence both algorithm error estimation and acceptance, such as risk p reference32, gender12,33, and a ge34. For instance, only one third of US-Americans above 50 years of age compared with half of those 18 to 29 years of age believe that algorithms can be free of human biases16. Because algorithm appreciation was shown to be lower among less numerate p eople18, we compare different educational groups. Best-selling authors and commercial companies have promoted “AI” as being superior to human experts35, and in some instances (...truncated)