Algorithmic and Non-Algorithmic Fairness: Should We Revise our View of the Latter Given Our View of the Former?
Law and Philosophy
https://doi.org/10.1007/s10982-024-09505-4
The Author(s) 2024
KASPER LIPPERT-RASMUSSEN
ALGORITHMIC AND NON-ALGORITHMIC FAIRNESS:
SHOULD WE REVISE OUR VIEW OF THE LATTER GIVEN
OUR VIEW OF THE FORMER?
(Accepted 19 April 2024)
ABSTRACT. In the US context, critics of court use of algorithmic risk prediction
algorithms have argued that COMPAS involves unfair machine bias because it
generates higher false positive rates of predicted recidivism for black offenders
than for white offenders. In response, some have argued that algorithmic fairness
concerns, either also or only, calibration across groups–roughly, that a score assigned to different individuals by the algorithm involves the same probability of
the individual having the target property across different groups of individuals–and
that, for mathematical reasons, it is virtually impossible to equalize false positive
rates without impairing the calibration. I argue that in standard non-algorithmic
contexts, such as hirings, we do not think that lack of calibration entails unfair bias,
and that it is difficult to see why algorithmic contexts, as it were, should differ
fairness-wise from non-algorithmic ones in this respect. Hence, we should reject
the view that calibration is necessary for fairness in an algorithmic context.
I. INTRODUCTION
In a US context, critics of courts’ use of risk prediction algorithms
such as COMPAS (Correctional Offender Management Profiling for
Alternative Sanctions) have argued that black offenders are victims of
machine bias. This is because recidivism risk prediction algorithms
such as COMPAS burden black offenders with a higher rate of false
positives (essentially: inaccurate predictions that an offender will
reoffend) than white offenders face.1 In response, some have argued
that algorithmic fairness only concerns calibration across groups.
Roughly, calibration across groups means that a score assigned to
1
False positive rates are defined as: False Positives (FP)/Actual Negatives=FP/True Negatives (TN)
+ FP. False negative rates are: False Negatives (FN)/True Positives (TP) + FN. See also Table 1 below.
KASPER LIPPERT-RASMUSSEN
different individuals by the algorithm involves the same probability
of the individual having the target property across different groups of
individuals. By way of illustration: It is not as if offenders from one
racial group assigned the risk score 8–i.e., a high risk–have the same
probability of recidivating as offenders from another racial group
assigned a risk score of only 6, especially not when a higher risk score
translates into a harsher punishment. However, I argue that in
standard non-algorithmic contexts, such as hirings, we do not think
that lack of calibration entails unfair bias. Moreover, it is difficult to
see why algorithmic contexts, as it were, should differ fairness-wise
from non-algorithmic ones. Hence, despite appearances, we should
reject the view that calibration is necessary for fairness in an algorithmic context.
I begin, in Section 2, by describing the well-known controversy
over COMPAS. Section 3 briefly explores the implications both of a
commonly held view about unfair bias on the job market considering audit studies and of the conceptual apparatus introduced in
Section 2 in relation to COMPAS. The section explains that in a job
market where, because of past sexist discrimination, men are more
likely to be qualified for certain jobs, deeming an applicant to be
qualified means different things across male and female applicants.
Specifically, for a given qualification score there is a greater chance of
a male applicant being deemed qualified. Many, this author included,
would see no fairness-based reason in this situation for a post hoc
intervention to secure a well-calibrated hiring process. Thus, Section 3 ends with a trilemma consisting of three claims: 1) Lack of
calibration does not amount to unfair bias in job markets; 2) Job
markets and sentencing do not differ as regards whether a lack of
calibration amounts to unfair bias; 3) Lack of calibration amounts to
unfair bias in sentencing. Plainly, we must reject at least one of these
claims, so the following sections (4–6) go through each of them in
turn, asking which should be abandoned. Section 7 concludes.
In a nutshell, I argue, first, that we should bring what we think of
algorithmic fairness into line with what we think about job market
discrimination in an ordinary non-algorithmic setting. That result is
one I am quite confident of. I also think it is significant, since much
discussion of algorithmic fairness fails to connect with discussions of
fairness in other and more well-explored contexts. How we should
ALGORITHMIC AND NON-ALGORITHMIC FAIRNESS
resolve the trilemma, I am less clear about. However, I offer some
reasons suggesting, second, that in certain cases involving differential
base rates, we should allow for violating calibration.2 This is not to
say that, e.g., equal false positive/negative rates (henceforth: parity)
is the correct criterion for algorithmic fairness. Perhaps neither calibration nor parity defines algorithmic fairness.
II. COMPAS AND CALIBRATION
I start, then, with a thumbnail sketch of the COMPAS debate.
COMPAS uses information about an offender’s employment and
housing status, personality traits, criminal record, etc. to arrive at a
risk of recidivism score. Basically, that score is a number from 1
(least likely) to 10 (most likely), indicating how likely it is that an
offender will recidivate relative to other offenders. COMPAS does
not use information about race. Presented with higher scores, the
court will generally be less inclined to grant bail or parole, and more
inclined to sentence an offender to longer periods of incarceration,
than it would be if the scores were lower.3 Hence, for the offender, a
false positive is a bad thing and a false negative is a good thing.4
In a renowned article entitled ‘‘Machine Bias’’ in ProPublica,
Angwin and co-authors suggested that COMPAS is unfair because it
is racially biased.5 Like other ways of assessing the risk of recidivism,
2
Thus, in analogy with theorists who deny that differential false positive rates do not constitute
algorithmic unfairness (e.g., Brian Hedden, ‘‘On Statistical Criteria of Algorithmic Fairness’’, Philosophy
& Public Affairs 49 (2021): pp. 209–231; Robert Long, ‘‘Fairness in Machine Learning’’ (2020), https://
arxiv.org/abs/2007.02890), I am not arguing that lack of calibration for reasons other than differential
base rates might not amount to algorithmic unfairness. Hence, my argument is consistent with lack of
calibration being a good indicator of algorithmic unfairness. For instance, in the context where, generally, male members of a racial minority group are stereotyped as dangerous, lack of calibration to the
effect that male racial minority offenders are less likely to recidivate than racial majority offenders with
the same risk score probabl (...truncated)