People underestimate the errors made by algorithms for credit scoring and recidivism prediction but accept even fewer errors
www.nature.com/scientificreports
OPEN
People underestimate the errors
made by algorithms for credit
scoring and recidivism prediction
but accept even fewer errors
Felix G. Rebitschek1,2*, Gerd Gigerenzer1,2 & Gert G. Wagner1,2,3
This study provides the first representative analysis of error estimations and willingness to accept
errors in a Western country (Germany) with regards to algorithmic decision-making systems (ADM).
We examine people’s expectations about the accuracy of algorithms that predict credit default,
recidivism of an offender, suitability of a job applicant, and health behavior. Also, we ask whether
expectations about algorithm errors vary between these domains and how they differ from
expectations about errors made by human experts. In a nationwide representative study (N = 3086)
we find that most respondents underestimated the actual errors made by algorithms and are willing
to accept even fewer errors than estimated. Error estimates and error acceptance did not differ
consistently for predictions made by algorithms or human experts, but people’s living conditions
(e.g. unemployment, household income) affected domain-specific acceptance (job suitability, credit
defaulting) of misses and false alarms. We conclude that people have unwarranted expectations about
the performance of ADM systems and evaluate errors in terms of potential personal consequences.
Given the general public’s low willingness to accept errors, we further conclude that acceptance of
ADM appears to be conditional to strict accuracy requirements.
This study provides the first representative analysis of error estimations and willingness to accept errors in a
Western population (in Germany) with regards to specific algorithmic decision-making (ADM) systems. We
examine how accurately algorithms are expected to perform in predicting credit defaults, recidivism of an
offender, suitability of a job applicant, and health behavior.
Algorithmic decision-making (ADM1) continues to spread into everyday life. At the same time, claims,
risks2–4, and implementations related to ADM are under debate, e.g. in criminal risk a ssessment5–7 or allocation
of public resources8. Despite these controversies, it is unclear how the general public perceives ADM quality.
Representative studies focus on a ttitudes9–11 and on perceived understanding and application of ADM and artificial intelligence12–15, but rarely on perceptions of the systems’ reliability and validity (for an exception, s ee16).
The public debate centers on laypeople’s trust in and fear of a lgorithms16, e.g. in media coverage of hopes and
concerns regarding artificial i ntelligence17. Research on algorithm aversion and a ppreciation1,18 examines both
the circumstances under which people trust algorithmic advice18,19—for instance, because they perceive the
decision problem in question to be objective or to require mechanical skills20 or because they lack confidence in
their own expertise—and the circumstances under which they are mistrustful, for instance21,22, in response to
slow algorithm responses or to observing algorithm errors.
Layperson’s knowledge about ADM systems is usually limited12, if the algorithms themselves are not even
secret. This is where layperson’s theories about people’s ADM systems—their theory of machine—become
crucial18. What do they think about input, processing, and output, and, moreover, about quality and fairness
of algorithmic compared to expert judgments? Given the limited possibilities to actually observe ADM errors,
expected instead of observed performance deficits may underpin critical attitudes of the public toward ADM.
Extremely high performance expectations, for instance, could underlie algorithm aversion, when they provide
a mental reference point that is failed by an a lgorithm23. But what are the people’s expectations with regard to
ADM performance?
1
Harding Center for Risk Literacy, Faculty of Health Sciences Brandenburg, University of Potsdam, Potsdam,
Germany. 2Max Planck Institute for Human Development, Berlin, Germany. 3German Socio-Economic Panel Study
(SOEP), Berlin, Germany. *email:
Scientific Reports |
(2021) 11:20171
| https://doi.org/10.1038/s41598-021-99802-y
1
Vol.:(0123456789)
www.nature.com/scientificreports/
Surprisingly, the expected level of accuracy of algorithms and the perceived c ompetence24 of algorithmic
advice have been largely neglected in research on the general population. Fifty-eight percent of Americans expect
some level of human bias in ADM systems, and 47% and 49% respectively believe that resume screening of job
applicants and scoring for parole are “effective”16. In the present article, we aim to reduce the gap in knowledge
about the general public’s concrete expectations by investigating what they expect regarding the accuracy of
ADM in the financial, legal, occupational, and health domains. We ask people what they believe are the actual
error rates made by ADM and how many errors they consider to be acceptable and check whether their responses
meets current ADM accuracy standards (in the financial and legal domains). To the best of our knowledge
(after a literature search in Web of Science, PsycNet, and Google Scholar, which revealed eleven survey studies
on algorithm p
erception9,10,12–17,25–27), this is the first representative study comparing error estimates and the
willingness to accept errors for ADM.
Classifying ADM systems balance two different types of errors, misses and false alarms, each associated with
different consequences (costs). Thinking about types of decision errors and related costs can affect acceptance
of errors (e.g. ‘bias’ in signal detection t heory28; cost-sensitive error m
anagement29), and the degree to which
30
errors are accepted may differ in the legal and in the medical domain31, e.g. if the consequences of medical
errors are irreversible. From the perspective of an unemployed person, for instance, mistakenly being overlooked
for a job by an ADM (a false negative) is likely more costly than being hired in spite of being unsuitable (a false
positive). In our analysis, we therefore relate error acceptance to critical factors such as unemployment phases.
Also, we compare the willingness to accept errors with the typical error preference of exemplary stakeholders
(e.g. non-recidivizing offenders want to avoid a false alarm). We additionally explore factors underlying attitudes
toward technology, that may influence both algorithm error estimation and acceptance, such as risk p
reference32,
gender12,33, and a ge34. For instance, only one third of US-Americans above 50 years of age compared with half of
those 18 to 29 years of age believe that algorithms can be free of human biases16. Because algorithm appreciation
was shown to be lower among less numerate p
eople18, we compare different educational groups.
Best-selling authors and commercial companies have promoted “AI” as being superior to human experts35, and
in some instances (...truncated)