Algorithmic and Non-Algorithmic Fairness: Should We Revise our View of the Latter Given Our View of the Former? (pdf)

Article PDF cannot be displayed. You can download it here:

https://link.springer.com/content/pdf/10.1007/s10982-024-09505-4.pdf

Algorithmic and Non-Algorithmic Fairness: Should We Revise our View of the Latter Given Our View of the Former?

Law and Philosophy https://doi.org/10.1007/s10982-024-09505-4 The Author(s) 2024 KASPER LIPPERT-RASMUSSEN ALGORITHMIC AND NON-ALGORITHMIC FAIRNESS: SHOULD WE REVISE OUR VIEW OF THE LATTER GIVEN OUR VIEW OF THE FORMER? (Accepted 19 April 2024) ABSTRACT. In the US context, critics of court use of algorithmic risk prediction algorithms have argued that COMPAS involves unfair machine bias because it generates higher false positive rates of predicted recidivism for black offenders than for white offenders. In response, some have argued that algorithmic fairness concerns, either also or only, calibration across groups–roughly, that a score assigned to different individuals by the algorithm involves the same probability of the individual having the target property across different groups of individuals–and that, for mathematical reasons, it is virtually impossible to equalize false positive rates without impairing the calibration. I argue that in standard non-algorithmic contexts, such as hirings, we do not think that lack of calibration entails unfair bias, and that it is difficult to see why algorithmic contexts, as it were, should differ fairness-wise from non-algorithmic ones in this respect. Hence, we should reject the view that calibration is necessary for fairness in an algorithmic context. I. INTRODUCTION In a US context, critics of courts’ use of risk prediction algorithms such as COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) have argued that black offenders are victims of machine bias. This is because recidivism risk prediction algorithms such as COMPAS burden black offenders with a higher rate of false positives (essentially: inaccurate predictions that an offender will reoffend) than white offenders face.1 In response, some have argued that algorithmic fairness only concerns calibration across groups. Roughly, calibration across groups means that a score assigned to 1 False positive rates are defined as: False Positives (FP)/Actual Negatives=FP/True Negatives (TN) + FP. False negative rates are: False Negatives (FN)/True Positives (TP) + FN. See also Table 1 below. KASPER LIPPERT-RASMUSSEN different individuals by the algorithm involves the same probability of the individual having the target property across different groups of individuals. By way of illustration: It is not as if offenders from one racial group assigned the risk score 8–i.e., a high risk–have the same probability of recidivating as offenders from another racial group assigned a risk score of only 6, especially not when a higher risk score translates into a harsher punishment. However, I argue that in standard non-algorithmic contexts, such as hirings, we do not think that lack of calibration entails unfair bias. Moreover, it is difficult to see why algorithmic contexts, as it were, should differ fairness-wise from non-algorithmic ones. Hence, despite appearances, we should reject the view that calibration is necessary for fairness in an algorithmic context. I begin, in Section 2, by describing the well-known controversy over COMPAS. Section 3 briefly explores the implications both of a commonly held view about unfair bias on the job market considering audit studies and of the conceptual apparatus introduced in Section 2 in relation to COMPAS. The section explains that in a job market where, because of past sexist discrimination, men are more likely to be qualified for certain jobs, deeming an applicant to be qualified means different things across male and female applicants. Specifically, for a given qualification score there is a greater chance of a male applicant being deemed qualified. Many, this author included, would see no fairness-based reason in this situation for a post hoc intervention to secure a well-calibrated hiring process. Thus, Section 3 ends with a trilemma consisting of three claims: 1) Lack of calibration does not amount to unfair bias in job markets; 2) Job markets and sentencing do not differ as regards whether a lack of calibration amounts to unfair bias; 3) Lack of calibration amounts to unfair bias in sentencing. Plainly, we must reject at least one of these claims, so the following sections (4–6) go through each of them in turn, asking which should be abandoned. Section 7 concludes. In a nutshell, I argue, first, that we should bring what we think of algorithmic fairness into line with what we think about job market discrimination in an ordinary non-algorithmic setting. That result is one I am quite confident of. I also think it is significant, since much discussion of algorithmic fairness fails to connect with discussions of fairness in other and more well-explored contexts. How we should ALGORITHMIC AND NON-ALGORITHMIC FAIRNESS resolve the trilemma, I am less clear about. However, I offer some reasons suggesting, second, that in certain cases involving differential base rates, we should allow for violating calibration.2 This is not to say that, e.g., equal false positive/negative rates (henceforth: parity) is the correct criterion for algorithmic fairness. Perhaps neither calibration nor parity defines algorithmic fairness. II. COMPAS AND CALIBRATION I start, then, with a thumbnail sketch of the COMPAS debate. COMPAS uses information about an offender’s employment and housing status, personality traits, criminal record, etc. to arrive at a risk of recidivism score. Basically, that score is a number from 1 (least likely) to 10 (most likely), indicating how likely it is that an offender will recidivate relative to other offenders. COMPAS does not use information about race. Presented with higher scores, the court will generally be less inclined to grant bail or parole, and more inclined to sentence an offender to longer periods of incarceration, than it would be if the scores were lower.3 Hence, for the offender, a false positive is a bad thing and a false negative is a good thing.4 In a renowned article entitled ‘‘Machine Bias’’ in ProPublica, Angwin and co-authors suggested that COMPAS is unfair because it is racially biased.5 Like other ways of assessing the risk of recidivism, 2 Thus, in analogy with theorists who deny that differential false positive rates do not constitute algorithmic unfairness (e.g., Brian Hedden, ‘‘On Statistical Criteria of Algorithmic Fairness’’, Philosophy & Public Affairs 49 (2021): pp. 209–231; Robert Long, ‘‘Fairness in Machine Learning’’ (2020), https:// arxiv.org/abs/2007.02890), I am not arguing that lack of calibration for reasons other than differential base rates might not amount to algorithmic unfairness. Hence, my argument is consistent with lack of calibration being a good indicator of algorithmic unfairness. For instance, in the context where, generally, male members of a racial minority group are stereotyped as dangerous, lack of calibration to the effect that male racial minority offenders are less likely to recidivate than racial majority offenders with the same risk score probabl (...truncated)