[This is the third in our series of posts on all things algorithmic, AI and tech in administrative law. New content will be posted each week in September. You can view earlier posts here and here. – Eds]
Many legal and administrative institutions in the U.S. and around the world have adopted automated tools to enhance the efficiency and scalability of their decision-making processes. The widespread uptake of automated decision tools has generated substantial scholarship on the harms associated with their use, including the attenuation of accountability, the lack of transparency, and the exacerbation of existing biases. Citron and Pasquale, for example, have developed the notion of “technological due process,” or the right to inspect, correct, and dispute inaccurate data about oneself, as it has been used for decision-making purposes. My concern with this approach is its willingness to limit the scope of inquiry to data inputs, provided their accuracy and provenance can be verified. Instead, I argue that the exclusion of non-data inputs is a significant, and understudied, source of harm. By “non-data inputs” I refer to those aspects of an individual’s character or circumstances that are relevant to the decision at hand, but are excluded from algorithmic consideration as “non-data”. They may be excluded because they do not fit within the algorithm’s existing parameters, or because they inherently resist quantification (e.g. the individual experience of trauma). Limiting the scope of decisional inputs to “data” thus fails to consider all factors relevant to the decision-making process. For this reason, where automated decision systems are used, “technological due process” is insufficient. Individuals must also be given a meaningful opportunity to present non-data about themselves or their circumstances which is relevant to the decision at hand. Without this opportunity, at least three harms will occur: first, “de-individualization,” or the failure to treat people as individuals by failing to consider relevant aspects of their character or circumstances; secondly, the treatment of people as data, which violates the deontological maxim to treat people as people, not things; and thirdly, the perpetuation of categorical inequality, or the withholding of resources from categories of people based on statistical inferences about their future behavior.
Types of Automated Decision Systems
The automation of human decision-making can be broken down into three distinct types: first, the automated application of a legal rule that was previously manually applied; secondly, the automated application of a legal standard that was previously manually applied; and finally, the use of a predictive risk assessment tool to inform the manual application of a legal standard.
Type I: The Automation of Legal Rules
Legal rules have always evinced a certain unwillingness to consider individual circumstances, at least relative to legal standards, and in this sense they seem uniquely suited to automation, and its attendant benefits of efficiency, consistency, and scalability. Yet automation achieves a level of “ruleishness” that legal rules cannot. Even the most precise legal rule requires discretion in its application to particular facts; the decision-maker’s characterization of the facts will be influenced by their nonlegal sensibilities. Software, in contrast, is “automated all the way down.” The interpretive natural language shell of legal rules is replaced by a strict binary logic (a 0 or a 1), thereby closing the “hermeneutic gap” between the text of a norm and its translation into real-world action. The immediacy and immutability of code forecloses alternative conduct so that it actively constitutes, rather than guides, legal reality. Code collapses the normative into the descriptive such that “what was once requested becomes simply what is or shall be.”
In this sense, the automation of a rule may reveal the inconsistency of its previous (manual) enforcement. Speeding, for example, may be inconsistently policed by a traffic officer, depending on mood, weather, and driver profiling. A speed camera, however, would consistently enforce the speed limit. We might regard this development positively, if it is associated with a decrease in the number of traffic casualties. And we may tolerate the rule’s insensitivity to individual circumstances (e.g. a man speeding to the hospital to deliver his pregnant wife) because we perceive the consequences of such inflexibility (issuance of a speeding ticket) to be relatively minor.
There are, however, circumstances where the automation of an existing rule may reveal its overbroad or erroneous nature. In Australia, for example, welfare agency Centrelink had engaged in data-matching with the Australian Taxation Office (ATO) since 1990, in an effort to identify discrepancies between income reported by welfare beneficiaries to Centrelink, and income reported to the ATO. Annual income reported to the ATO was averaged into a fortnightly income figure, and compared to the fortnightly Centrelink figure. Discrepancies in reported income suggested potential welfare overpayment, and were subject to manual verification. Only 7% of discrepancies were investigated due to resource constraints. When Centrelink shifted to an automated system in 2016, all discrepancies were presumed to indicate a debt, and individuals who did not disprove the existence of the debt were issued a debt notice, with no human oversight. Approximately $721 million in debt was erroneously collected under this system. The automation of the income-averaging rule thus revealed its significant capacity for error.
Type II: The Automation of Legal Standards
Where previously a legal standard was applied using human evaluation, and that evaluation is replaced by an automated decision system, the legal standard effectively becomes a legal rule. For example, in Arkansas, low-income individuals with disabilities are eligible to receive at-home care from a caregiver under the state’s Medicaid program, ARChoices in Homecare. For seventeen years, the Arkansas Department of Human Services (DHS) allocated caregiving hours to each beneficiary based on an individual assessment of needs carried out by a registered DHS nurse. From January 1, 2016, the DHS replaced nurse-based assessments with an algorithm, the Resource Utilization Groups system (RUGs), which assigned beneficiaries to one of twenty-three “tiers” of attendant-care hours based on information collected from 286 questions. Once the system had assigned a beneficiary to a specific tier of care, DHS nurses could not re-assign them to a different tier. The evaluative standard previously applied by DHS nurses effectively became a decision tree composed of a series of rules. Program beneficiaries claimed that the use of the automated system reduced their caregiving hours by an average of 43 per cent, and as a result, many of them were forced, inter alia, to go without food, remain in soiled clothes, or suffer worsened medical conditions.
Similarly, in Allegheny County, Pennsylvania, referral calls made to the Office of Children, Youth, and Families (CYF) about potential child abuse or neglect are screened by caseworkers for further investigation. To help caseworkers evaluate each referral, the CYF deployed a predictive risk assessment tool, the Allegheny Family Screening Tool (AFST), which measures, on a scale from 1 to 20, the likelihood that a child will be removed from their home by child protection services within 2 years of investigation. The AFST calculates this likelihood on the basis of historical and cross-sector administrative data (child protective services, mental health services, drug and alcohol services, homeless services) related to individuals associated with the referral. If the AFST score is 18 or higher, the system automatically triggers an investigation, unless it is overridden by a supervisor. In this way, automation effectively transforms the decision-making process into a rule: if the AFST score is 18 or more, the family will be investigated for child abuse.
Automated decision systems, like humans, can produce error. But there are additional harms associated with the automation of a legal standard. One such harm is “de-individualization”, or the failure to treat people as individuals. When a decision-making process relies exclusively on algorithmic inputs, it will not consider all factors relevant to the decision at hand. Many of these factors will exist outside the algorithm as “non-data” because no data label or category currently exists for this information within this particular system, and/or the information is inherently unquantifiable. Consequently, the automated decision tool will fail to provide a fully individualized assessment. The Allegheny Family Screening Tool, for example, can only evaluate “long arc risk” based on historical population data; it does not have access to the content of the specific referral call, which often reflects acute risk. It is also limited to public administrative data (e.g. time spent by parents on welfare), thereby excluding data related to parental use of private resources (e.g. substance abuse programs). The result, unsurprisingly, is the disproportionate surveillance of low-income (often non-White) households, which are denied the privacy afforded to their higher-income peers.
Failure to treat people as individuals is inconsistent both with constitutional guarantees of equal protection, as well our moral obligation to respect human autonomy. As Benjamin Eidelson explains, to treat someone as an individual is to recognize their life not solely as the product of inherited characteristics, but as the product of their conscious choices. People are individuated not merely by their ontological separateness, but by their actions. Consequently, our judgments about them should acknowledge the ways in which they have actively shaped their own lives, and will continue to do so. Respect for autonomy demands dual recognition: first, as an individual with a certain autonomous capacity that all individuals possess (“agency condition”), and secondly, as the particular individual that she is, constituted from the specific choices she has made (“character condition”). Treating people as individuals means recognizing them as partial authors of their lives, “rather than treating them as determined by demographic categories or other matters of statistical fate.”
Failure to treat people as individuals does harm not just to their conceptual autonomy but to their actual condition of autonomy. When we judge individuals based on aggregate evidence about what people of their race, sex, or nationality have done – ignoring the ways in which they have deliberately shaped their own lives – we undermine the efficacy of their very efforts. An employer who declines to assign a long-term project to a female employee (on the basis that she is likely to take maternity leave) disrespects her autonomy by declining to consider evidence of her particular choices (for example, she may have decided not to have children, or she may have a partner who does most of the child-rearing). And in depriving this employee of the opportunity to work on this project, the employer not only fails to respect his employee as an autonomous individual, but actually denies her the opportunity to exercise autonomous control over her life. In this way, failure to treat people as individuals produces tangible, as well as conceptual, harm.
A second, related, harm of automated decision-making is the treatment of people as data, which violates the deontological maxim to refrain from treating people as things. To cast the algorithmic gaze upon an individual is to focus on their computable parts – those parts that can be quantified, and coded into an existing category, or data label. To facilitate their movement through various information systems, individuals are flattened into “data points subject to pattern recognition engines,” and decisions about their rights and interests are determined by their data proxies. From one information system to the next, these data proxies are processed and reproduced as “more tangible, authoritative, and demonstrable, more fixed and provable” than the individuals themselves. As the incomputable self is washed away, “we become constituted, and not merely mediated, by our data. There is no essential self beneath all our data from which those data alienate us.” Furthermore, the datafication of individuals (for algorithmic legibility) prioritizes fixed data labels over fluid or shifting identities, ultimately producing “data violence” or the exclusion of non-normative individuals from rigid information systems. Heteronormativity, for example, is encoded through binary categorical prescriptions: “male” or “female.” In this way, automated decision-making harms both individual autonomy and dignity.
Type III: Predictive Risk Assessments Inform Application of Legal Standard
The preceding paragraphs have described two types of automated decision system: the automation of a legal rule (Type I), and the automation of a legal standard (Type II). A third kind of automated decision system involves the use of a predictive risk assessment to inform the application of a legal standard. Predictive risk assessments draw inferences about an individual’s likely future behavior from historical data about the behavior of populations who share their demographic characteristics, such as race or gender. An algorithmic prediction of recidivism, for example, will incorporate various data points about a defendant (age, sex, race, offense) and “calculate” their probability of reoffending on the basis of historical data pertaining to people who share their statistical categories. What will not be considered are the ways in which this particular defendant has actively shaped their own life, and will continue to do so. Much of this evidence of the defendant’s autonomy will exist as “non-data” (incapable of algorithmic consideration) because (a) no data label or category exists for this kind of information within this particular algorithm; and/or (b) the information inherently resists quantification (for example, the strength of the defendant’s emotional connection with their family). Even if barrier (a) can be overcome by an adjustment of the algorithmic model, barrier (b) (the inherent nonquantifiability of certain variables) cannot. Consequently, although these non-data variables may influence the defendant’s recidivism risk, they will not find their way into the algorithm’s assessment.
The Correctional Offender Management Profiling for Alternative Sanctions (COMPAS) risk assessment tool developed by Northpointe has received significant academic attention for its racial biases. In 2016, the Supreme Court of Wisconsin affirmed the La Crosse County Circuit Court’s denial of Eric Loomis’ petition for post-conviction relief. Eric Loomis had been sentenced to four years and seven years, respectively, for attempting to flee a traffic officer, and operating a motor vehicle without the owner’s consent. In ruling out probation, the circuit court referenced, along with other sentencing factors, Loomis’ COMPAS risk score, which identified him as “high risk” in three categories of recidivism: pretrial recidivism risk, general recidivism risk, and violent recidivism risk. Loomis argued that the circuit court’s consideration of the COMPAS risk assessment at sentencing violated his due process rights in three ways: (1) by violating his right to be sentenced based on accurate information, in part because the proprietary nature of the algorithm prevented him from assessing its accuracy; (2) by violating his right to an individualized sentence; and (3) by improperly using gendered assessments in sentencing.
The Supreme Court of Wisconsin rejected all three arguments, holding first that the defendant’s ability to correct the input data (his responses to the COMPAS questionnaire) was a sufficient guarantor of the “accuracy” of the risk assessment. This narrow conception of “accuracy” evinces a disturbing level of deference to algorithmic analysis. As the court itself acknowledged, COMPAS cannot predict the “specific likelihood” that the individual offender will reoffend; instead, “it provides a prediction based on a comparison of information about the individual to a similar data group.” The court did not interrogate: the quality of the group data against which the individual was compared (was the data representative of the particular demographic profile of La Crosse County?); the legitimacy of the “categories” assigned to Loomis by the algorithm; the weights assigned to different variables; or the manner in which the risk scores were determined. Without access to such information, the court could not meaningfully assess the “accuracy” of the risk assessment, particularly in light of its demonstrated racial bias. Black defendants who did not reoffend were erroneously predicted by COMPAS to reoffend at a rate of 44.9%, nearly twice as likely as White defendants (23.5%), while White defendants who did reoffend were erroneously predicted to not recidivate at a much higher rate (47.7%) than their Black peers (28%). In other words, COMPAS overpredicted recidivism for Black defendants (false positives), and underpredicted recidivism for White defendants (false negatives). This asymmetric distribution of classification errors only exacerbates existing racial inequities.
Secondly, the Supreme Court rejected Loomis’ claim that the circuit court’s consideration of his COMPAS score denied him an individualized sentence, on the basis that the risk score had not been “determinative,” and had been only one of many factors considered and weighed at sentencing. The court expressed confidence that “circuit courts will exercise discretion when assessing a COMPAS risk score with respect to each individual defendant.” The court made no reference to “automation bias”, or the tendency of human decision-makers to accept, rather than challenge, quantitative assessments, and to assign greater weight, amongst a set of variables, to the variable that has been quantified. This “hyper-nudging” is especially likely when the algorithm’s recommendation is expressed in un-nuanced terms, and its calculations are opaque to the human decision maker. Perceptions of the greater “accuracy” and “objectivity” of predictive risk assessments discourage human decision-makers from seeking information that contradicts algorithmic recommendations. Empirical evidence shows that human decision-makers struggle to evaluate the accuracy of predictive risk assessment tools, do not calibrate their reliance on the risk assessment based on the risk assessment’s performance, and exhibit racial bias in their interactions with the risk assessment. Judges in Broward County, Florida, for example, penalized black defendants more harshly than white defendants for higher algorithmic risk categorization.
As aforementioned, de-individualization and the treatment of people as data are two kinds of harm produced by automated decision systems. Another kind of harm, specific to Type III, involves the perpetuation of categorical inequality. Predictive risk assessment tools perpetuate stereotypes about categories of people based on correlations within historical population data. When judgments about individuals are rendered on the basis of statistical correlations, what is statistically average is reified, and made more real and authentic than the diversity of experience of which the average is merely a mathematical function. Statistically-evidenced propensities “treat the defendant as if his present conduct could be inferred from his past conduct; as if he were determined rather than free.” This cybernetic loop of datafied activity becomes a self-fulfilling prophecy: the “system watches what you do; it fits you into a pattern; the pattern is then fed back to you in the form of options set by the pattern; the options reinforce the patterns; the cycle begins again.” Charles Tilly explains that much of what we interpret as “innate individual differences in capacity actually results from categorically organized experience.” In other words, categorically differential access, from a young age, to nutrition, education, information, and socialization, manifests in categorical differences in human performance. Accordingly, we bear a moral obligation to avoid perpetuating inequality through the institutionalization of categorical sorting, of the kind that occurs with the widespread use of predictive risk assessment tools.
Conditioning access to opportunities on the basis of historical population data produces a “digital caste system,” structured by existing inequities. Society is stratified into two groups: on the one hand, those whose access to resources is exclusively determined by data-based inferences fed through automated systems, and on the other, the privileged few who are able to shed their data shadows, and access a future that looks different from their past. Disproportionate reliance on predictive risk assessment tools denies the inherent indeterminacy of individuals, and their capacity for change, and uncritically absorbs the dominant modes of seeing and understanding the world.
The increasing adoption of predictive risk assessment tools suggests a heightened unwillingness to tolerate uncertainty from specific categories of people. This diminishing tolerance for risk is evident in the ways that predictive tools engage in burden-shifting. The Allegheny Family Screening Tool, for example, requires a family to demonstrate that they are not harming their children, as the algorithm suspects. Similarly, algorithmic predictions of recidivism force a defendant to demonstrate that they won’t reoffend, as their statistical peers have, rather than benefiting from a presumption of non-recidivism. Likewise, algorithmic predictions of welfare fraud require a welfare beneficiary to disprove an automated allegation of fraud, rather than first requiring the relevant agency to prove the existence of fraud. Agency reliance on predictive risk assessments shifts the burden of proof from state to citizen, and lowers the evidence threshold for the regulation of future conduct. From the perspective of the state, the social benefit derived from preventing future harmful acts exceeds the cost of foreclosing future benign acts, which would generate only private benefits. In other words, false positives are tolerated by the state because it does not bear their harms. Systems of law effectively become systems of social control, for the benefit of a highly risk-averse state.
Katrina G Geddes is a JSD student at New York University. (Twitter: @kat_geddes)