The Algorithmic Racial Proxy

Volume 114April 2026Fanna GamalArticle

Apr 29

To comply with the colorblind impulses of American antidiscrimination law, computer programmers tend to exclude race as a data input when constructing a machine learning algorithm. Yet scholars and advocates consistently argue that even these formally race-blind algorithms can racially discriminate by relying on so-called “proxies for race,” or variables that have a strong correlation with race, such as zip code, income, or prior criminal arrest.

While a programmer wishing to respond to this argument might attempt to remove both race and all racial proxies from input data, their task is complicated by a key dilemma: The definition of a racial proxy is far from obvious. In response to this dilemma, scholars, computer programmers, and advocates have proffered various approaches to defining a racial proxy and solutions to the problem of racial proxy discrimination. Diverse as they are in their methodologies, these solutions share a common quality. Each relies on an underlying assumption about the relationship between race and the racial proxy, and these assumptions can have far-reaching implications for the development and regulation of machine learning algorithms.

This Article examines the myriad definitions of a racial proxy proffered by courts, scholars, and state and private actors to demonstrate how race and racial assumptions become embedded in the machine learning algorithms that increasingly structure human life. By examining the jurisprudence identifying a racial proxy, the Article explains how those who develop and deploy algorithms assume a powerful adjudicatory role that was once exclusively reserved for judges—that is, they can use their racial intuition to decide what gives a variable its racial quality. The Article shows that the answer to the question of what constitutes a racial proxy requires an explicitly normative and political solution and thus cannot be resolved with the purely technical solutions emerging within econometric literature on algorithmic antidiscrimination.

Ultimately, what is at stake in the ability to define a racial proxy is a novel form of algorithmically driven racial construction, which permits the production of new and meaningful classes of individuals that can later be exposed to differing resources, opportunities, subordination, and privilege. The Article identifies this process as a necessary site of legal thought and regulation.

Table of Contents Show

Download PDF

Introduction

On October 18, 1920, an Oregon district court ruled that Bhagat Singh Thind, an immigrant from India who served in the United States military, was entitled to naturalize as a United States citizen.[1] At the time, federal law restricted naturalization to “free [W]hite persons” as well as “aliens of African nativity and to persons of African descent.”[2] Unsatisfied with the court’s ruling, the federal government appealed Thind’s case to the Ninth Circuit Court of Appeals, which in turn asked the Supreme Court for instruction on the question of whether Thind was, in fact, a White person and thus eligible for naturalization.[3] More precisely, the circuit court presented the following question to the Justices: “Is a high-caste Hindu, of full Indian blood, born at Amritsar, Punjab, India a [W]hite person within the meaning of [the Act]?”[4]

The Supreme Court ultimately answered the question in the negative, effectively ruling that it was common knowledge that Asian Indians were distinguishable from Whites.[5] But the unwieldy composition of the circuit court’s question—a question that explicitly contemplated the role that features like caste, ancestry, religion, and birthplace played in determining racial identity—suggests that, for the judges, racial determinations could turn on the presence or absence of certain proxy variables.[6] The invocation of these racial proxies as part of an assessment of racial identity suggests that the judges relied on proxies for race to determine the very racial categories that were subsequently painted as natural, self-evident, or commonly understood. Importantly, each of these determinations was central to dispensing political, social, and economic rights and privileges, including Bhagat Singh Thind’s right to naturalize.

It may come as a surprise to some that in our digital age antidiscrimination discourse is still troubled by the same questions that perplexed this twentieth-century circuit court. While the constitutional law of equality forbids expressly conditioning the distribution of certain government benefits and burdens on race, the question of how, when, or why certain features like blood, caste, religion, birthplace, residency, and language proxy for racial identity is as salient and vexing as ever in the information age.

By surveying judicial, statistical, and critical race definitions of a racial proxy variable, this Article shows how equality law’s failure to acknowledge the co-constitutive relationship between race and its so-called proxies has opened the door to new and compounded forms of racial inequality in the digital age. Central to this analysis is the idea that race is the outcome produced when those holding immense powers of official decree embed the presence or absence of certain variables with political meaning in their attempt to hierarchically organize and order human life. That race itself has been a proxy for this extraordinary and continuous process of naming, instantiating, and ordering human inequality must be understood in the context of our digital age. Today, such powers of formal decree are undergoing monumental shifts and are, in many cases, driven by globalized technology conglomerates with vast political and economic power.

Indeed, in a world where machine learning algorithms distribute important benefits and burdens, the answer to the question of what features are sufficiently linked to race is no longer exclusively driven by courts, litigators, and juries.[7] Increasingly, a novel set of actors that develop and deploy algorithms make these determinations, and the consequences of their determinations are poorly understood and contained by law and legal thought. Understanding the interaction between antidiscrimination law and technology not only reveals how the collision of the two can reinforce existing forms of racial inequality, but also how novel forms of racialized inequality are made possible by the machine learning algorithms that increasingly structure human life.

I use the term “machine learning algorithm” to capture a diverse array of models and techniques. What unites these algorithms is the centrality of human decision-making to their development and deployment, especially decisions that encompass data selection and use.[8] At a high level, these algorithms are trained on large sets of historical data to search for patterns and complex rules, often to predict the probability of some future outcome.[9] Indeed, the term “machine learning” refers to automated processes of uncovering correlations, relationships, and patterns between variables in a dataset.[10] Determining the source and scope of this historical data is one of the most significant determinations a programmer will make, since a wide variety of data points can be sourced from equally numerous institutions, systems, and contexts. As with many decisions of great consequence to the construction of machine learning algorithms, decisions about data selection and use unfold with limited legal intervention.

An important exception to this absence of legal direction, however, is American equality law’s contemporary legal commitment to formal colorblindness. In practice, this commitment has meant that most machine learning algorithms are programmed not to “see” race.[11] The extent to which this claim to so-called “race-blindness” can be made rests on the fact that an individual’s race is usually excluded as an input variable when developing or deploying an algorithm.[12] For example, an algorithm designed to predict the probability that an individual borrower will default on a loan may exclude the person’s race from the many input variables used to predict their risk of default. This approach to algorithmic fairness is often called “fairness through blindness.”[13] It assumes that excluding the race variable from data will produce an algorithmic decision free from racial discrimination.

But the “fairness through blindness” approach is just one of many contested meanings, possibilities, and constructions of algorithmic fairness,[14] and it has drawn many critics.[15] These critics often argue that even when machine learning algorithms are built to exclude race as an input, these algorithms can still discriminate based on race by relying on so-called proxies for race, a facially race-neutral variable that has a significant relationship with race, such as income, geography, or criminal arrest history.[16]

Concerns about racial proxies impacting algorithmic decisions are buttressed by the technical nature of algorithms. Even when algorithms exclude race as an input, racial patterns and correlations remain visible in formally race-blind data and may recreate the effect of race on algorithmic outputs.[17] The tendency of algorithms to reconstruct an omitted input variable from other unrestricted input variables reinforces the concern about whether (or how) to allow an algorithm access to input data that can “proxy” for an individual’s race.[18] What should be done with these variables that carry the virtually certain risk of introducing racial information into formally race-neutral algorithms?

The technical and legal implications of this question are of great importance in algorithmic antidiscrimination discourse.[19] Scholars have examined the inclusion of potentially problematic racial proxies, discussed the trouble these variables pose for delivering a truly race-neutral algorithm, and proposed potential workarounds.[20] While one obvious approach is to treat racial proxies as one would treat the race variable and simply remove them from an algorithm’s input data, the feasibility of this approach rests on a definition of the term “racial proxy” that remains elusive.[21]

There is no static, universal, or precise conception of what makes a variable a proxy for race.[22] In the statistical literature, there is no consensus among computer programmers about the definition of a racial proxy.[23] Legal scholars have similarly established that even those variables we might intuitively understand to proxy for race, like zip code, income, or criminal history, may be less correlated with race than other, less intuitive variables, such as an individual’s online purchases, location data, or membership in a digital group or community.[24] If solving the problem of racial proxies first requires understanding their fundamental meaning and nature, then computer programmers and developers face the profound theoretical dilemma of deciding precisely what gives any given proxy variable its “racial” quality.

In practice, this dilemma gives rise to what some scholars deem a “haphazard” approach among computer programmers to identifying and dealing with these racial proxy variables.[25] To be sure, the normative decisions of programmers and developers may lack a fine-toothed explanation, but the capriciousness of their decisions ought not to be confused with randomness. Their decisions originate from a set of racial assumptions about the commonsense or objective reality of race and its relationship to a racial proxy: a racial intuition. This racial intuition forms the basis of technical decisions concerning input data that impact both the development of machine learning algorithms and, I will argue, the evolution of race in the digital age.

This Article analyzes the racial theories undergirding definitions of a racial proxy variable in algorithmic and antidiscrimination discourse and discusses their far-reaching implications for legal regulation and racial production. I begin by analyzing the definitions of a racial proxy proffered by courts to show how a powerful adjudicatory role once exclusively reserved for judges is now in the hands of those who develop and deploy algorithms. That role is to use their poorly defined racial intuition to decide what gives a variable its racial quality.

Throughout the Article, I draw connections between the racial common sense used by a programmer to identify a racial proxy and the confused and often contradictory understandings of race that have historically marked the racial assumptions of law and legal actors. The intention behind this examination is not simply to connect past and present but to reveal two significant features of our present-day legal order.

First, the production of so-called “haphazard” technical decisions made by programmers is made possible by the law’s delegation of breathtaking authority to those who develop and deploy algorithms to select a definition of a racial proxy among a host of alternative meanings. Our legal regimes open the door to a programmer’s exercise of extreme epistemic authority by framing the definition of a racial proxy as somehow self-evident or intuitive instead of the product of political and normative reasoning.

Second, the power delegated to those who develop and deploy algorithms goes beyond the power to embed contestable legal concepts like “fairness” and “discrimination” into technical systems.[26] It even extends beyond the power to project historical racial inequality into the future via predictive systems.[27] I argue that those who develop and deploy algorithms can also facilitate novel changes to the racial subject at the center of antidiscrimination law. My argument is that those who define racial proxies do not merely project the racialized past into the future: They construct new and salient classes of individuals who can, on the basis of their digitally observable attributes, be exposed to differentiated access to resources, privileges, opportunities, and hierarchies of status. This process represents a racialization by and for the information age, and the fact that this extraordinary power rests in the hands of modern technology companies with immense political and economic power is a regulatory failure with consequences that are both immediate and yet unknown.

The Article proceeds in four Parts. Part I further explicates the dilemma at the core of the racial proxy debate. Racial proxy discrimination stands as a ubiquitous feature of formally race-blind machine learning algorithms, but addressing this proxy discrimination is seemingly impossible without a precise meaning of a racial proxy. This meaning is far from obvious since there is no inherent quality that turns a variable into a racial proxy.[28] This Section also contextualizes the racial proxy debate, by placing it within a long history of critical scholarship concerned about discrimination based on “proxies” for legally protected categories like race and gender. Finally, I explain how the proliferation of machine learning algorithms has intensified and refashioned these earlier debates in important ways.

Part II turns to the jurisprudential answer to the question of what precisely constitutes a racial proxy. It shows how computer programmers identify racial proxies in a jurisprudential environment wrought with theoretical contradictions and unanswered questions.[29] Understanding the historical development of America’s racial jurisprudence assumes a new salience in a digital age, when legal constructions of the racial proxy are supplemented by algorithmic constructions authored by a new set of actors facing similar normative dilemmas.

Indeed, under American equality law’s reigning attachment to formal colorblindness, race-consciousness done for the purpose of racial animus is tantamount to race-consciousness done for the purpose of racial repair.[30] That each race-conscious action furthers a distinct ideological project is of little consequence under the current colorblind framework. As Cheryl Harris writes, “Under colorblindness race is reduced to color, a biological attribute like height or eye color, and is therefore presumptively normatively irrelevant.”[31] Under this commitment to formal colorblindness, the search for a racial proxy is similarly understood as the hunt for a presumptively fixed attribute that gives a variable its racial quality.[32] The assumption is that judges who identify a racial proxy are merely recognizing a racial fact, rather than demonstrating their investment in one particular conception of the racial proxy among the many that exist.

At a high level, the problem with racial proxies implicates two concerns in equality law. First, a concern about disparate racial treatment is implicated when a racial proxy is understood as functionally equivalent to race.[33] In this perspective, a decision-maker can be formally blind to race, but the functional equivalence between race and another variable means that intentional racial discrimination can still be surreptitiously smuggled in.[34] The racially inflected normative determination for judges (and now for programmers) becomes which variables are functionally equivalent to race and why.

The other legal concern is a worry about the high correlations between race and the racial proxies, which result in racially disparate outcomes.[35] Here, the normative determination for judges and programmers becomes the level of racial differentiation that is permissible before differentiation becomes discrimination. In other words, judges draw a normative distinction between disparities produced by discrimination and disparities attributed to purportedly “raw” racial difference.

Both strands of concern present important definitional challenges for identifying and resolving racial proxy discrimination. Once seen through the unifying project of defining the racial proxy, the distinction between the two strands appears less prominent[36] as courts move between and across these two variants and label a wide array of factors “proxies for race,” from geographic location[37] to ancestry.[38] At the same time, courts curiously decline to recognize racial proxy discrimination based on other highly race-correlated features—like pending criminal charge or certain hairstyles heavily associated with race.[39]

The unpredictable judicial construction of a racial proxy reveals that, apart from judges’ normative determinations, there is no abstract, formal property that turns a variable into a racial proxy. When courts describe something as a racial proxy, they exercise their powers of legal decree to make a political assertion about the necessary relationship between race and the racial proxy. Foundational to this determination is judges’ own “racial common sense.”[40] Translated into a world structured by machine learning algorithms, this jurisprudence gives those who develop and deploy algorithms the authority to identify racial proxies from a potentially infinite number of race-correlated variables whose effect on any given algorithmically produced outcome may be counterintuitive.[41]

From this perspective, what some scholars of law and technology have identified as a haphazard approach among computer programmers to addressing racial proxy discrimination in algorithms can instead be understood as an expression of the judicial reasoning internal to the very development of the legal regimes permeating America’s racial jurisprudence.[42] Judicial thinking about the racial proxy variable frames its definition as somehow obvious or self-evident. Practically speaking, within the context of colorblindness, both judges and programmers can seize on a seemingly intuitive definition of the racial proxy to avoid rigorously justifying their decisions with reference to antidiscrimination principles and moral commitments.

Recognizing the challenges inherent to defining racial proxies, emerging econometric solutions attempt to avoid the definitional question altogether. Part III discusses two methods that stand out because of their uptake in legal scholarship. One method promises to “purge” the proxy effects of race from all algorithmic inputs, allowing computer programmers to isolate and then excise the “racial” effects of all data inputs.[43] The other method is skeptical that excluding, orthogonalizing, or otherwise scrutinizing inputs will create a race-neutral algorithm.[44] Instead, the focus is on the racial outcomes produced by the algorithm.[45]

While both approaches attempt to avoid the normative difficulty in defining the racial proxy, they subsequently rely on their own complex and contestable assumptions about the relationship between race and the racial proxy. Given the potential for these assumptions to be embedded in the technological systems that increasingly structure our world, this Part probes each approach for a deeper understanding of its underlying racial ontology.

In response to techniques that seek to isolate and excise the racial effects of all data inputs, I raise an underexplored conceptual challenge to this approach, suggesting that it relies on a common but paradoxical understanding of race and its relationship to the racial proxy. The issue arises from the assumption that the race variable—which is usually secured via institutional ascription or self-identification—can be successfully isolated from the “proxies” that have historically been used to assign people to their proper race in the first place. This Section recasts race itself as a proxy, standing in for an authoritative background system of racial knowledge that has historically relied on so-called “racial proxies” to determine race. Labeling any resulting algorithm “colorblind” can only be true if one ignores the ways in which many proxy variables are historically constitutive of race itself.

For approaches that assert the fallacy of scrutinizing input variables and instead advocate for a turn to algorithmic outputs, I applaud scholarly skepticism of what is often assumed to be a race-blind algorithm.[46] But I nevertheless identify the outstanding racial questions inherent to scrutinizing outputs, which raise many of the same dilemmas as the identification of racial proxies in input data. In particular, I underscore the issues raised when a programmer attempts to draw distinctions between legitimate levels of racial differentiation and algorithmically produced discrimination.[47] Indeed, the question of how much racial differentiation an otherwise inscrutable algorithm is permitted to produce before it is considered discriminatory implicates contestable notions of algorithmic fairness, algorithmic justice, and algorithmic discrimination.[48] But it also allows a programmer to draw distinctions between algorithmically induced racial disparities and those attributable to purportedly inherent racial differences. Yet, if a programmer accepts any level of innate racial difference, they risk reinforcing a mythical belief in the natural or static nature of race and then embedding this racial ideology in the algorithm.

By placing the racial theories permeating these methods at the center of inquiry, Part III reveals how differences in our theoretical analysis of race shape how machine learning algorithms are developed and deployed. The complex, normative assumptions that undergird econometric approaches to racial proxy discrimination open the door to profound racial dilemmas that programmers and regulators must grapple with. Moreover, these approaches are marked by paradoxical and taken-for-granted beliefs about race that have similarly pervaded judicial reasoning about the racial proxy. Drawing a throughline between judicial and statistical patterns of racial reasoning challenges the belief that algorithmic proxy discrimination can be addressed with a technical fix, rather than an explicitly moral one.

Part IV continues a discussion of the material effects of competing racial proxy definitions, but this time focuses on how differences in our analysis of the racial proxy can instigate changes to the racial subject at the center of antidiscrimination law. I examine the recent and closely watched settlement between the U.S. Department of Justice (DOJ) and one of the largest technology conglomerates in the world—Meta Platforms, Inc. (Meta). I demonstrate how the acceptance of spurious and contestable notions of a racial proxy can limit our ability to recognize the full range of consequences associated with machine learning algorithms and, more importantly, regulate the powerful actors that develop and deploy them.[49] In the settlement, the government’s acceptance of a narrow and ambiguous notion of a racial proxy preserves Meta’s ability to construct new classes of people who, on the basis of digitally observable attributes, can be exposed to differentiated privileges, resources, opportunities, and statuses in some of the most legally significant domains of public life: housing, credit, and employment. I argue that such a power should not be in Meta’s control and that those concerned about algorithmic discrimination ought to attend to the novel political categories catalyzed by new information technologies as racialized categories.

The consequences of this novel approach to group formation are yet unknown, as is the evolving relationship between these groupings and those traditionally understood as politically subordinated.[50] But understanding these groupings as evidence of an algorithmically induced racialization allows us to draw on a long-established tradition of scholarly explorations of race. This tradition teaches that if the overarching objective behind the process of group construction remains the pursuit of economic interests, machine learning algorithms threaten to expand the regime of racism by constructing groups and rendering them differentially vulnerable to exclusion, containment, and exploitation. Referred to in various texts as racial formation,[51] racial construction,[52] or racecraft,[53] among others, what unites these ideas is a focus on the political and economic processes that are obscured by the taken-for-granted “major concept” of race or, in this case, the racial proxy.[54] Understanding, contesting, and regulating these processes ought to be a primary concern of legal thought and action.

I. The Dilemma

Because not all blacks in the United States were former slaves, “freedmen” was a decidedly under-inclusive proxy for race.

—Justice Clarence Thomas, Students for Fair Admissions, Inc. v. President & Fellows of Harvard College

What precisely is a racial proxy? That question has long preoccupied legal scholars, whose responses have been both thoughtful and diverse.[55] Indeed, observing the contested meaning of a racial proxy is not a debate freshly inaugurated by new information technologies, and its implications extend beyond the realm of machine learning algorithms. Scholars have long discussed the consequences that so-called proxies for protected categories like race and gender pose for antidiscrimination law.[56]

Constitutional equality law obstructs overt racial classifications in the distribution of government benefits and burdens.[57] Any intentional attempt to classify by race will be subject to the highest level of judicial scrutiny. Key civil rights statutes similarly prohibit racial discrimination by scrutinizing both racial classifications and racially disparate impacts in areas like housing, employment, and credit.[58] Imported to the world of machine learning algorithms, a concern about racial discrimination means that computer programmers will almost always exclude the formal variable of race from an algorithm to avoid running afoul of antidiscrimination law.[59]

For decades, in response to this legal superstructure, legal scholars have emphasized the disconnect between antidiscrimination doctrine and the reality of racial discrimination, which routinely involves differential treatment based on factors closely related to racial identity even as it stops short of making an explicit racial classification.[60] This differential treatment, they argue, even when ostensibly race neutral, leaves a discriminatory residue of the sort that antidiscrimination law ought to address.[61] And some have even connected this legal framework to an impoverished understanding of race and other meaningful categories of constructed difference.[62]

But the adoption of machine learning algorithms to inform decisions across an array of institutional sites intensifies and simultaneously refashions this debate in novel ways. For starters, the proliferation of machine learning algorithms opens the door to a new set of actors and decision-makers who assign a definition to the racial proxy variable. Programmers, developers, and others involved in the development and deployment of machine learning algorithms can now technologically embed their normative choices into algorithms.

Consider, for example, an emerging trend to remove zip code and other geographic indicators from predictive models in the name of racial and economic equality.[63] Developers may remove or restrict geographic indicators from input data without thorough explanation or confrontation with antidiscrimination principles and interests.[64] In doing so, they exercise their powers of “legal entrepreneurship”[65] to define the universe of racial concern and the nature of racial remedies without fully confronting their underlying equality commitments or racial assumptions.

Programmers who remove geographic indicators—while declining to restrict similarly race-correlated variables—exercise great control over highly configurable technological tools, often without the need to rigorously defend their underlying normative criteria for determinations. To be sure, one need not discount geographic segregation, or its historically state-mandated scaffolding,[66] to observe how the à la carte exclusion of racial proxies represents a powerful form of political line drawing. This power to embed highly contestable political decisions into the technology that increasingly structures human life illustrates the importance of interrogating the meaning of the racial proxy in algorithmic antidiscrimination discourse.

Further intensifying discussion about the precise definition of a racial proxy is the fact that without intervention, proxy discrimination is an unavoidable aspect of machine learning technologies.[67] When potentially relevant data is withheld, an algorithm may “seek out” proxies for the omitted data.[68] This tendency flows from the structure of machine learning algorithms. These algorithms are trained on historical data to learn patterns and uncover relationships upon which future decision-making can rely. The patterns and relationships discovered within the data result in an underlying model, which can then be exposed to new data in order to automate certain processes, including predicting the probability of future outcomes.[69]

Even if the formal category of race is withheld from a dataset, the algorithm may increase its reliance on variables that are correlated with race to account for the omitted race variable.[70] Theoretically, this concern is prevalent in human decision-making as well.[71] In the absence of explicit racial information, human decision-makers may rely on perceived racially inflected signifiers to inform their actions. Yet the sheer quantity of variables processed by algorithms sets them apart from human actors as the number of variables (or combination of variables) available for processing increases the number of potential racial proxies.[72] In fact, nearly all algorithmic inputs can be correlated with race in some way.[73]

Of course, not every data point that is correlated with race will trigger concerns of racial discrimination. This is precisely the critical task of identifying impermissible racial proxies. In a world structured by and through racial difference, nearly all recordable data contains some racial significance. Determining which variables correlate enough with race such that their inclusion in a dataset raises concerns about racial discrimination is a political and normative question that frequently evades scrutiny.

Legal scholars skeptical of actuarial predictions consider certain variables problematic proxies for race, including criminal history,[74] neighborhood crime rate, and other economic indicators.[75] Yet explanations of the precise qualities that give these variables a sufficient “racial” character are often dropped from the discursive frame.[76] The risk is that without a clear articulation of what transforms these variables into racial proxies, one might assume that the designations are somehow self-evident, instead of firmly attached to views about race and the meaning of equality.

Within racial proxy discourse, the legal impulse towards colorblindness combined with the proliferation of machine learning algorithms opens the door to new and surprising theoretical dilemmas. When programmers are tasked with systematic removal of not only race, but also racially inflected inputs from the algorithm, they must determine which variables to exclude and why.[77]

To complicate matters further, from a technical and conceptual standpoint, these determinations are far from obvious.[78] As Talia Gillis explains, even input variables that are commonly understood as uncontestable “proxies for race” (such as a person’s zip code) may be less concerning from an algorithmic standpoint than other, less intuitive inputs, or a combination of inputs, in replicating racial information.[79] It is difficult to articulate how exactly a machine learning algorithm generates individual predictions from a set of input variables.[80] This dilemma is usually understood as part of the “black box” nature of many machine learning algorithms, which places any intuitive judgment about the definition of a racial proxy even further out of reach.[81] This raises a predicament: Without consensus around the meaning of a racial proxy or a definitive understanding of the complex interactions between variables, the exclusion of all racial proxies from algorithmic inputs is not feasible.[82]

Nor is excluding all racially correlated variables desirable when considering an algorithm’s utility. Consider that each data input is usually relevant to algorithmic predictions, and excluding data in the name of legal fairness could impact any resulting knowledge produced by the algorithm.[83] Much of the legal and statistical fairness literature suggests an inevitable tradeoff between “fairness” and “accuracy” to describe the impact of excluding inputs.[84] In this framework, legal regimes regulating input variables threaten to degrade the algorithm’s “accuracy” when they prevent the algorithm from accessing all forms of relevant data.[85]

Taken together, these concerns throw into sharp relief the impracticability of legal conceptions of the racial proxy that assume that racial proxies must be perfectly correlated with race to raise concern about racial discrimination. In the quote that begins this Section, for example, U.S. Supreme Court Justice Clarence Thomas asserts that the Freedmen’s Bureau Acts—passed in the wake of the Civil War to expanded protection for formerly enslaved people—were race-neutral legislation.[86] The Acts should be considered race neutral, Justice Thomas argues, because not all Black people in the United States at the time the Acts were passed were formerly enslaved.[87] That a category as racialized as being formerly enslaved in the post–Civil War United States would be considered race neutral demonstrates the ways in which racial proxy determinations are highly vulnerable to political distortions. In this case, Justice Thomas leaned on this bewildering racial proxy determination to defend a colorblind reading of the Fourteenth Amendment in originalist terms.[88]

Presumably, by this normative reasoning, only a variable that is perfectly correlated with race could be identified as an impermissible racial proxy. Given without explanation or defense, this highly contestable vision of the racial proxy effectively contemplates no other role for courts than to determine the fact of perfect racial correlation. Yet, a perfectly race-correlated variable may not exist. Moreover, because there is often no way to tell if highly race-correlated variables are nevertheless reconstructing the effect of the omitted race variable, excluding only perfectly race-correlated variables from input data may not be enough to render an algorithm race blind.

Yet, in presenting his dubious definition of a racial proxy as a statement of fact rather than as a product of political reasoning, Justice Thomas’s opinion can hardly be considered an oddity in law. Even as judges deviate from Justice Thomas’s narrow and formalistic ideas about perfect racial correlation, they rarely defend their racial proxy determinations with respect to antidiscrimination values, commitments, and principles.

II. Legal Constructions of the Racial Proxy

Choosing not to acknowledge the cause and effect of racial power and adopting a vow of silence is one response, but is no more effective than a child closing his eyes in order to make himself invisible.

—Cheryl Harris, Critical Race Studies: An Introduction

In finalizing the legal definition of a racial proxy, just as in many other crucial choices relevant to the development of machine learning algorithms, there is no strict judicial guidance.[89] Courts have not provided a single theoretical baseline to determine whether any given feature is a proxy for race. Paradoxically, the lack of a precise definition has not prevented judges from invoking racial proxies to achieve some outcome within the framework of American equality law. In this Section, I explain how the irregular and opaque ways that programmers identify the racial proxy variable can be understood as a reflection of a racial commonsense approach internal to the very legal regimes governing antidiscrimination law.

Particularly within the framework of colorblind antidiscrimination law, where any decision-making based on race is highly scrutinized as deleterious,[90] the racial proxy is a powerful technology. It represents a relationship between race and another variable, and seizing on this relationship permits judges to harness what is arguably the most forceful prohibition in American antidiscrimination law. Indeed, because many actions based on race are considered injurious—regardless of their ideological origins[91]—evoking a racial proxy permits legal actors to command judicial outcomes in the name of equality, while ostensibly avoiding moral reasoning or meaningful confrontation with principles of equality and antidiscrimination.

To capture a racial proxy’s legal effect, however, judges must relate a given variable to race, and this has proved a vexing task. Judges appear conflicted about the relationship between race and a racial proxy, leading to contradictions and inconsistencies as they struggle to articulate precisely what gives a variable its racial quality. For example, courts might decide to eliminate prior criminal arrests from state predictions of an individual’s risk of missing a future court date on the grounds that such a variable is a proxy for race. But a court need not make the same decision for other variables that are, by the same measure, heavily race correlated.[92] To reach such an internally inconsistent outcome, however, requires some path of reasoned thought. To clear this path, judges could draw on factors like time, place, and history to explain why some race-correlated variables are identified as racial proxies, while others are not. But, as this Section demonstrates, judges often enlist taken-for-granted beliefs about the relationship between race and the racial proxy, gesturing to a presumably self-evident connection to justify their theoretical leaps and smooth over their conceptual discontinuities.

Courts often frame the relationship between race and the racial proxy as clearly apparent and in no need of political explanation. This framing avoids deeper engagement with the racial proxy’s inherently normative and contingent nature. The resulting opinions unfold in inconsistent, underexplained, and at times paradoxical directions.

The cases examined below demonstrate this phenomenon by introducing some of the varying and internally inconsistent definitions of a racial proxy found in judicial reasoning. These cases span the voting, criminal justice, and employment contexts, and the racial proxies themselves vary as courts consider whether geography, ancestry, criminal arrest history, and hair texture and style constitute racial proxies. The following examples are not meant to be an exhaustive presentation of how judges reference racial proxies in their judicial opinions; instead, these cases serve as examples of a pattern of judicial reasoning in which judges engage with the concept of a racial proxy but rely on definitions they understand to be self-evident or obvious. Not only do these taken-for-granted judgements lead to internally inconsistent results, but they also underscore the broad authority given to an algorithm’s developers to select a definition of the racial proxy intuitively from the many alternatives. From a technical standpoint, the trouble is that relying on their racial common sense represents the very genus of intuitive reasoning that a programmer cannot apply in the machine learning context. Moreover, the ability to make these contestable and significant racial determinations often falls to powerful, profit-driven companies, systems, and actors that develop and deploy algorithms.

The cases discussed below throw into sharp relief a tradition of legal thought that has struggled with the identification of a racial proxy and has therefore presaged this digital era moment of extreme epistemic deference to the racial common sense of an algorithm’s developers. Long before these variables gained significance in algorithmic antidiscrimination discourse, judges struggled to identify the precise relationship between race and a racial proxy.

A. A Racial Proxy as Suspect and Pejorative

Harold Rice was a White man born and raised in Hawai‘i.[93] He was a descendent of missionaries and ranchers who migrated to the Hawaiian Islands in the 1800s.[94] According to the Supreme Court, Rice was a citizen of Hawai‘i “in a well-accepted sense of the term.”[95] He did not, however, have the requisite ancestry to vote in the election for trustees of the Office of Hawaiian Affairs (OHA), a state agency created to manage land and resources reserved for the benefit of Indigenous Hawaiians.[96] Under state law, the only people who could vote in OHA elections were “Hawaiians,” defined as descendants of people who inhabited the Hawaiian Islands in 1778 and continued to live there.[97] Rice challenged his exclusion from voting for OHA trustees under the Fourteenth and Fifteenth Amendments.[98] After losing his case in lower courts, Rice prevailed at the U.S. Supreme Court when the Court held that the State of Hawai‘i had used ancestry as an impermissible “proxy for race” to determine voting qualification, in violation of the Fifteenth Amendment.[99]

On its face, the OHA’s voting plan made no racial reference. The reference to ancestry was designed to capture indigeneity, and it was crucial to the State’s attempts to enshrine protections for Indigenous people by demarcating the borders of the Indigenous group.[100] To grant descendants of Indigenous Hawaiians strategic and political rights to exercise control over their ancestral lands and resources, the State used ancestry to identify the descendants of people living in what is now the State of Hawai‘i prior to its colonization.[101]

The Rice Court,however, reasoned that the State was using ancestry as a “proxy for race” in violation of the Fifteenth Amendment, which forbids states from denying or abridging the right to vote on account of race.[102] It likened Hawai‘i’s statute to Jim Crow–era voting restrictions that used White ancestry to identify voters who would be exempt from literacy tests.[103] In both cases, the Court reasoned, ancestry was used to perpetuate a “racial” exclusion in voting without making an explicit reference to race.

In what sense, then, was the State’s reference to ancestry “racial”? According to the Court, ancestry exhibited a racial quality because conditioning voting rights on ancestral lineage was effectively an attempt to condition these rights on historical belonging to a “race” of people.[104] As the Court wrote, early Hawaiians were a “distinct people, commanding their own recognition and respect,” and the statute evinced “the State’s effort to preserve that commonality [of people] to the present day.”[105]

The State raised several issues with this conceptualization of a racial proxy, including that the voting restriction was a classification “limited to those whose ancestors were in [Hawai‘i] at a particular time, regardless of their race.”[106] The State asserted that in 1778, Hawai‘i was not a racially homogenous place, the Hawaiian Islands were inhabited by an ethnically diverse population that may have migrated from the Marquesas Islands, the Pacific Northwest, and Tahiti.[107] If the original inhabitants of the Hawaiian Islands were not a monolithic race of people, then one could not assert that ancestry was a tool of racial preservation.

Further, the OHA law barred voting for any “person whose traceable ancestors were exclusively Polynesian if none of those ancestors resided in Hawaii in 1778.”[108] Conversely, a White person who could only trace, say, “one sixty-fourth” of their ancestry to people who inhabited the Islands in 1778 would be permitted to vote.[109] This was a crucial point of contention for the State: that race and ancestry do not completely overlap. Some White Hawaiians could satisfy the ancestry requirement, and some non-White Hawaiians could not.

Thus, a rightful dispute emerged about the discontinuities between ancestry, a term referencing questions of descendance, and race, which the Court used to reference a people with common physical characteristics and a common culture.[110] Hardly a statement of fact, the Supreme Court’s insistence that ancestry amounted to a racial proxy necessitated a defense—one that explained why ancestry was a proxy for race despite its reference to a multiracial group of Indigenous inhabitants and its production of a multiracial voter block. But the Court’s defense of its proxy designation drew instead on ideas about a presumably self-evident relationship between race and the racial proxy.

While the Court acknowledged that ancestry was not a perfect substitute for race,[111] it simultaneously held that any incongruence between the variables did not defeat the racial proxy designation.[112] Justice Anthony Kennedy, who delivered the opinion of the Court, wrote that “[s]imply because a class defined by ancestry does not include all members of the race does not suffice to make the classification race neutral.”[113] Rather, ancestry was a proxy for race because classification by race and classification by ancestry evinced the same suspect “purpose and operation.”[114] In the Court’s eyes, an inquiry into both race and ancestry “demeans the dignity and worth of a person” and is inconsistent with “respect based on the unique personality each of us possesses.”[115] This understanding of a racial proxy sees any inquiry into race as fundamentally pejorative, and it is this pejorative information that is captured by the variable of ancestry.

Yet this reasoning still begs a question internal to the Court’s reasoning: If the Court sees race as pejorative and sees ancestry and race as unidentical variables, what information does ancestry capture beyond pejorative race? What nonracial information did ancestry reflect, and why was this information insufficient to overcome suspicion stemming from the use of ancestry in determining voting rights? These questions were taken up in a dissent authored by Justice John Paul Stevens and joined by Justice Ruth Bader Ginsburg.

For the dissenting Justices, ancestry could be understood as a proxy for race, yet it “by no means follows that ancestry is always a proxy for race.”[116] What distinguished the ancestry requirement at issue in Rice v. Cayetano from the literacy requirement at issue in Guinn v. United States[117]were “the realities of time, place, and history behind the voting restrictions being tested.”[118] In Guinn, as part of a flagrant effort to exclude Black voters, the State of Oklahoma exempted from its literacy requirement people whose ancestors were entitled to vote prior to the enactment of the Fifteenth Amendment.[119] In contrast, the OHA voting requirement attempted to use ancestry to politically empower members of a once sovereign, Indigenous people.[120] In this perspective, an inquiry into race would still be pejorative but the purpose of the OHA classification “exists wholly apart from race.”[121] According to legal scholar Addie Rolnick, the Rice Court failed to see the independent significance of using ancestry as a mechanism to define “the beneficiary class affected by the positions being voted upon.”[122] An inquiry into ancestry was meant to capture indigeneity, which was relevant in determining who should qualify for voting rights.[123] Over valid critique about the imperfect correlations between race and ancestry, the Supreme Court simply declared, without full explanation, that “[a]ncestry can be a proxy for race. It is that proxy here.”[124]

To be sure, not even the dissent explained precisely how much racial information must be captured to designate a variable a proxy for race, an omission that may signal the risks it assumed by accepting that there is such a thing as a racial proxy. But rather than embrace the false equivalence of the ancestry requirements in Guinn with those instituted by the State of Hawai‘i in Rice, the dissent engaged with their opposing ideological purposes, suggesting that one can only identify the “racial” quality of proxy variables by referencing historical and societal concerns.

In contrast, Justice Kennedy’s opinion cast ancestry as undeniably “racial,” demonstrating judicial investment in a particular conception of the racial proxy as presumptively normatively irrelevant and existing entirely apart from the process of political line drawing in which the Court itself is engaged. In the context of colorblind constitutionalism, therefore, the racial proxy becomes a powerful discursive tool that permits judges to obscure their contestable normative claims, framing them as self-evident statements of fact and, in Rice, facilitating the erosion of Indigenous legal claims to self-determination and political consciousness.[125]

B. A Racial Proxy as Highly Correlated with Race

In September 2017, the Cook County Circuit Court implemented new pretrial release policies intended to reform cash bail.[126] The Cook County Sheriff, Thomas Dart, disagreed with the revised policies.[127] Acting as penal policymaker, Dart began to conduct his own administrative review of the court’s bail orders, at times substituting his assessment for the court’s and refusing to release individuals who were granted bail by a judge.[128]

The plaintiffs in Williams v. Dart were a group of Black defendants, each charged with felonies and granted release on bail by a judge, only to have the release order ignored by the Cook County Sheriff based on his own “independent reviews.”[129] In conducting his rogue assessments, the Sheriff considered factors like the defendants’ arrest history, the charges they faced, and their neighborhood to determine whether he would comply with a judge’s order granting their release on bail.[130] The plaintiffs challenged this practice through several federal and state law claims, including a claim under the Equal Protection Clause of the Fourteenth Amendment.[131] They argued that the Sheriff targeted them for continued detention because of their race and that the Sheriff’s policy “disproportionately targets African Americans by using charge, prior arrests, and neighborhood to determine eligibility for release.”[132] The result was that out of more than eighty people detained despite a court order directing their release, four in five were Black.[133]

Reversing the lower court’s holding that the plaintiffs did not claim a “plausible, nonconclusory allegations of intentional discrimination,” the Seventh Circuit held that the allegations of racial discrimination in the plaintiffs’ complaint were sufficient to meet the pleading standard.[134] Even though the variables considered by the Sheriff’s bail review policy were facially race neutral, the circuit court wrote that the policy could still raise an “inference of impermissible intent” by relying on criteria that “map[ped] so closely onto racial divisions that they allow racial targeting ‘with almost surgical precision.’”[135]

Although the decision was a step toward vindicating the constitutional rights of the defendants, the notion that a variable’s racial quality stems from its so-called surgically precise overlap with race is hardly a workable standard. Many collectable data points have high correlations with race, and determining which high correlations matter and why is the normative task at hand for any decision-maker, be they judge or programmer. What transformed these highly race-correlated variables into racial proxies?

In criteria reminiscent of a racial commonsense standard, the court took judicial notice that Chicago, the home of all nine plaintiffs, consistently ranked among the most racially segregated cities in the nation, and therefore, “neighborhood was a plausible proxy for race.”[136] In reaching this conclusion, the court cited a previous ruling that relied on evidence of a statistical correlation between race and geography in Chicago, as well as a magazine exposé on the economic pressures exerted on low-income residents of urban metropolises like Chicago.[137]

The court concluded that arrest history is another proxy for race, “in light of Chicago’s alleged history of disproportionately arresting African Americans,”[138] which the court noted was an “allegation endorsed by the U.S. Department of Justice in 2017.”[139] Yet, when it came to the variable of pending criminal charges, the court wrote that although this variable may have been another racial proxy, the precise proxy mechanism was unexplained. The court stated that neighborhood and arrest history were enough to conclude that there was at least a plausible, nonconclusory allegation of intentional discrimination.[140] The court’s hesitation to cite to similar studies supporting the correlation between criminal charge and race warrants some scrutiny for its inconsistency with findings present elsewhere in the opinion.

Racial disparities in charging disposition are a well-documented concern in the criminal legal system[141] such that some states have chosen to respond legislatively.[142] If the Williams court understood statistical evidence or common knowledge as criteria for establishing certain variables as racial proxies, then the court’s hesitation to label a criminal charge as a proxy for race appears inconsistent. Why not use the same criteria—judicial notice or scientific evidence—to label other, race-correlated variables racial proxies? Such internal inconsistency reveals how the transformation of a given feature into a racial proxy, a transformation that raises the specter of impermissible intent, is articulated by courts in a contingent manner. The problem is that such contingency is often obscured in judicial opinions where the borders demarcating a racial proxy are depicted without reference to moral commitments or deeper engagement with time, place, and history.

Williams v. Dart demonstrates how judicial identification of racial proxies frequently avoids the very normative reasoning that is inherent to any interpretation of a racial proxy. This avoidance, however, results in internal inconsistencies, such as the exclusion of some highly race-correlated variables and the unexplained preservation of others.[143] With aspects reminiscent of federal cases determining the boundaries of racial categories, Williams v. Dart exemplifies how judges can selectively and actively construct the racial proxies they claim merely to identify.[144] Even within the same opinion, courts struggle with the question of what gives a proxy variable enough “racial” character such that its inclusion in state decision-making would run afoul of legal commitments to equal racial treatment. While the necessary correlation between race and the proxy variables may appear self-evident, a more thorough reading of judicial reasoning reveals the malleability at the core of these judicial determinations.

C. A Racial Proxy as Immutable

In another case, Equal Employment Opportunity Commission v. Catastrophe Management Solutions, the Equal Employment Opportunity Commission (EEOC) filed suit on behalf of a Black female job applicant whose offer of employment was rescinded when she refused to cut her hair in order to comply with the company’s formally race-neutral grooming policy.[145] The employee wore her hair in a locked style, and the EEOC argued that the company’s actions constituted discrimination on the basis of the employee’s race in violation of Title VII of the Civil Rights Act of 1964.[146] The Eleventh Circuit affirmed the lower court’s decision in favor of the company, writing that “[t]he EEOC’s allegations—individually or collectively—do not suggest that [the company] used that policy as proxy for intentional racial discrimination.”[147]

The EEOC argued that by discriminating against those with locks, the policy was relying on an expressly protected characteristic, not because hairstyle is expressly protected, but because locks are “directly associated” with the immutable trait of race.[148] Thus, according to the EEOC, it was the close association between race and hairstyle that made the company’s actions constitutionally suspect.[149]

The circuit court acknowledged that the EEOC’s appeal required the court “to consider, at least in part, what ‘race’ encompasses under Title VII.”[150] Race, according to the court’s dictionary sources, referred most commonly to “physical characteristics shared by a group of people and transmitted by their ancestors over time.”[151] The definition left open questions about the precise physical characteristics as well as the number of group members that should hold them in order to label any population a “race.” Indeed, one would be hard pressed to find anyone who considers people with green eyes, blonde hair, or short stature a race of people, even though these are all physical characteristics shared by a group of people and transmitted by their ancestors over time. Put another way, some other quality must turn certain shared physical characteristics into markers of race.

Trading the vexation of the precise for the comfort of the self-evident, the court concluded that it was “not much of a linguistic stretch” to understand that these physical characteristics referred to immutable qualities, that is, those features considered “a matter of birth, and not culture.”[152] Although locks are “historically, physiologically, and culturally associated with . . . race,”[153] the court reasoned that this association did not make them an immutable characteristic of race, nor did it comport with a line of legal precedent that understood Title VII to protect against discrimination based on immutable characteristics.[154] Title VII could possibly protect litigants against discrimination on the basis of Black hair texture, the court suggested, but not against hairstyles, even those highly correlated with race.[155]

Thus, rather than engaging with time, place, and history when reasoning about which physical features demarcate the boundaries of a racial proxy, the court leaned on taken-for-granted beliefs about race to conclude that immutability gives a variable its racial quality. This highly contestable and paradoxical definition was justified by the judges through their decree that such a definition amounted to “not much of a linguistic stretch.”[156] But in erecting the putative boundary between immutability and mutability, the court neglected the inconvenient contrary fact that certain immutable qualities—like eye color or height—are not understood as racial proxies in America’s reigning racial regime.[157]

One glaring impact of the transposition of EEOC v.Catastrophe’s definition of a racial proxy into the algorithmic context is the obvious incongruence between algorithmically driven forms of discrimination and the preoccupation with immutability permeating America’s current equality law regimes.[158] Many of the features that antidiscrimination scholars, computer programmers, and advocates deem impermissible proxies for race fall into the category of seemingly mutable characteristics like zip code, income level, or prior criminal arrest history. If a programmer were to exclude all members of an online group whose mission was to share information about Black hairstyles, this would also likely raise the danger of intentional discrimination. The ability of algorithmic tools to rely on input data not previously available or even comprehensible to human actors raises equally vast technical and conceptual problems for equality and antidiscrimination law.[159]

EEOC v. Catastrophe is a paradigmatic example of how courts, in identifying what gives a proxy variable its racial character, make appeals to the self-evident and the obvious. But one need only scratch the surface of these self-evident criteria to uncover critical contradictions, inconsistencies, and unanswered questions. Despite the court’s insistence, such criteria are rarely ever obvious, unless one acknowledges the connective thread as the act of judicial decree that provides the supplemental quality transforming some immutable features into racial markers. These acts of judicial decree, which represent judges’ normative investments and commitments, are too frequently rendered invisible or inevitable behind the powerful taken-for-granted nature of race.

Yet to anyone acquainted with courts’ jurisprudence of racial classification, it will come as little surprise that courts frame the contingent boundaries of a racial proxy as obvious or uncontestable.[160] Just as courts, witnesses, lawyers, and litigants have not required a consistent and exacting definition of race to “know race when [they] see it,”[161] courts evidently do not require a precise definition of a racial proxy before allowing the idea to shape their decision-making.[162]

Understanding how judges engage with definitional disputes surrounding the racial proxy sheds new light on the capricious determinations made by those who develop and deploy algorithms.[163] Rather than seeing their decisions about what constitutes a racial proxy as ad hoc or unsystematic, we might instead notice the digital era emergence of the same political dilemmas that have permeated the very development of America’s racial jurisprudence.[164] Judicial processes of racial classification and assignment that developed over hundreds of years have not engendered a precise definition of race, nor its relationship to other closely related variables. Situating a developer’s technical decisions about the definition of a racial proxy within a broader set of legal dilemmas makes clear how a new set of actors and technologies are assuming this politically powerful and legally significant adjudicatory role.

If programmers, like judges, draw on an obvious or self-evident relationship between race and the racial proxy, they too can cast their normative claims as ostensibly factual ones, replicating a legal ethos that closes one’s eyes to factors like time, place, and history in formulating the relationship between race and the racial proxies.

III. The Racial Proxy and Emerging Statistical Solutions

[T]here is . . . always something about race left unsaid. Always someone—a constitutive outside—whose very existence the identity of race depends on, and which is absolutely destined to return from its expelled and abjected position outside the signifying field to trouble the dreams of those who are comfortable inside.

—Stuart Hall, Race the Floating Signifier: What More Is There to Say About “Race”?, in Selected Writings on Race and Difference

The cases discussed above, divergent as they are in their reasoning about the racial proxy, point to an underlying commonality: A racial proxy is best understood as referencing the relationship between race and another variable. Within the ideological framework of colorblindness, judges can seize on this relationship to powerfully shape legal outcomes. However, courts struggle to articulate the exact shape and nature of this relationship, such that the jurisprudence of racial proxies results in an uncertain picture of the precise quality giving a variable its racial character.

Recognizing the problem for predictive modeling as one rooted in the indeterminate definition of a racial proxy, scholars approaching the issue from an econometric lens have proffered their own solutions to addressing proxy discrimination in algorithms. Two approaches stand out in this literature for the ways they are taken up by legal scholars. The first proffers a more sophisticated and nuanced statistical approach that orthogonalizes inputs rather than haphazardly removing controversial inputs from a dataset.[165] The second deemphasizes the significance of algorithmic inputs altogether and focuses instead on the outcomes produced by the algorithm.[166]

Unlike judicial solutions, these approaches suggest that a computer programmer may avoid the explicitly normative, definitional issues associated with the racial proxy completely. While both approaches are attractive in this way, this Section demonstrates that a shift away from explicitly defining a racial proxy does not avoid the most vexing normative task of racial proxy formation: determining the relationship between race and the racial proxies.

As it were, both methodologies ascribe to a particular understanding of race and its relationship to proxy variables. Yet this act of ascription is frequently obscured by commonly held but paradoxical ideas about race—ideas which are so widespread they rarely receive scrutiny in discussions of algorithmic discrimination.

This Part analyzes some of the racial assumptions and underexamined conceptual beliefs that undergird these proposals. Drawing on historical accounts of racial formation long cited by critical scholars of race and law, I argue that the racial proxy is properly understood as standing in a constitutive relationship to race, a conceptualization that complicates the very possibility of a “proxy for race.” A small group of scholars studying algorithmic discrimination have aptly approached a discussion of proxy discrimination from this vantage point,[167] and here I explain how this critical understanding of race challenges some of the statistical strategies proposed to address proxy discrimination. Indeed, an account of the racial proxy as constitutive of race constitutes a material challenge to statistical techniques that rely on orthogonalizing inputs to achieve the goals of colorblindness. By uncovering the racial ontologies that are omnipresent but rarely acknowledged in algorithmic antidiscrimination reform, this Part shows how the dream of bypassing normative disputes about the meaning of the racial proxy is continuously troubled by unstated racial assumptions and beliefs.

A. Input-Focused: The Racial Proxy Paradox

The statistical method proposed by economists and by some legal scholars permits a computer programmer to use a relatively simple statistical technique to isolate the “true” relationship between a racial proxy and the outcome the algorithm is trying to predict, where the “true” relationship is the correct coefficient between a racial proxy and the outcome of interest.[168] The key to this statistical solution is to “build on the fact that algorithmic predictions are formed through an estimation step and prediction step.”[169]

In the estimation step, the algorithm considers all inputs, including race and all potential racial proxies in estimating any predictive relationships.[170] At the prediction step, however, when the model is applied to a specific individual, racial information is withheld from the algorithm, with an average value of all races placeholding for the race variable.[171] For example, an algorithm designed to predict the risk of arrest within a certain timeframe will include all input variables in its estimation, including race and potential racial proxies like geographic indicators or arrest history.[172] A programmer’s inclusion of race and all race-correlates allows their algorithm to capture all information deemed relevant to the prediction.[173] After the programmer isolates the correct coefficient, meaning the information contained within the racial proxies that is orthogonal to race, they can then use this model to make the risk prediction, and the threat of discrimination posed by a racial proxy is purportedly removed.[174] Proponents of this approach assert that it can solve many dilemmas associated with the algorithmic racial proxy and deliver a truly race-neutral algorithm—one “purged” of both racial discrimination and proxy discrimination, while preserving the algorithm’s accuracy by including all relevant inputs.[175]

Recognizing again that a racial proxy represents a relationship between race and another variable, however, this Section points to an important conceptual paradox associated with this approach. To be clear, the issue is not one of empirical rigor; rather, the conceptual consideration raised here is that this method relies on a common but paradoxical understanding of race and its relationship to the racial proxy. What makes this orthogonalization possible is the assumption that the variable of race can be understood independently of the racial proxies.

To understand the theoretical paradox upon which this assumption relies, it is first important to understand how the race variable is typically understood and secured in algorithms. Within algorithms, race is typically understood as a personal, subjective quality which can be generated from “individual self-identification, institutional ascription, or both.”[176] Even those critical of formalistic conceptions of algorithmic fairness tend to accept this algorithmic understanding of race, although others have problematized such techniques.[177] My aim here is not to renarrate some of the problems inherent to securing a definition of the race variable, which have been laid out by others.[178] Rather, I highlight the paradoxical reasoning in attempts to isolate the effect of race from the effect of a racial proxy when one secures the race variable through institutional ascription or individual identification.

To begin, stopping an inquiry into race at individual identification or institutional ascription continues to beg the question: How do individuals or institutions know to what race one should ascribe? Indeed, few would accept the idea that individuals get to identify themselves as whatever race they choose, nor would one accept the idea that institutions can take such racial liberties.[179] Accepting individual self-identification or institutional ascription as the ground truth of race is possible not because one should be free to select their race; on the contrary, the myth of racial freedom ignites notions of racial theft, appropriation, and trespass.[180] Individual identification or institutional ascription is acceptable as a way to determine the race variable only because of an underlying assumption that individuals will not or cannot lie about their race. In other words, the assumption is that they will or must tell the truth about race based on an authoritative background system of racial knowledge.

It is this authoritative background system of racial knowledge that is represented when one retrieves the race variable through self-identification or institutional ascription.[181] In this perspective, race is not merely a mathematical value garnered from personal or institutional viewpoints. Rather, the race variable is itself a proxy, standing in for the taken-for-granted system of social meaning that powerfully shapes individual and institutional actions and beliefs about racial identification.

This is not an argument against securing the race variable via the techniques of individual identification or institutional ascription—any method of identification will necessarily involve some degree of racial essentialism, which neither can nor should be avoided entirely.[182] The paradox in this technique, however, is that the very beliefs about racial classification that the race variable represents have historically been dependent on the so-called “racial proxies.” That is, rather than exist in some relational posture to race, the racial proxies have historically been the political substrate of race since race itself has been determined by reference to the existence or absence of certain proxy variables.[183]

Isolating racial information from the so-called racial proxies is paradoxical when many of the features that are considered racial proxies by courts, computer programmers, scholars, and advocates—features such as geographic area,[184] hair texture,[185] or ancestry[186]—are the very features that have historically been used to assign people to their proper race in the first place.[187]

Take for example the contested racial assignment cases of the nineteenth century, where courts determined race by looking to morphological features including hair texture or nose shape.[188] These features were not simply “proxies for race” based on the notion that they exhibit a high correlation with some understanding of race already at work in the world. Instead, these variables have historically been constitutive of race itself, meaning a person’s race has turned on the presence or absence of these variables. Geographic indicators have similarly been used to assign individuals to racial groups when a person’s “appropriate” racial category was otherwise ambiguous. Writing about the logic that informed racial ascription in the late nineteenth century census, Angela James explains,

enumerators were instructed to use the socially dependent criteria of residential community to classify “half-breed” Indians. So, if such persons lived with Whites, and had the “habits of life” of Whites, they should be classified as Whites. If however, they lived among Indians they were to be classified as Indian.[189]

The constitutive nature of race and the racial proxies call into question the core framework of statistical methods that suggest the possibility of isolating race from its so-called proxies like geography or hair. Historically, these proxies have been embedded in the political marrow of racial classifications, not by some natural force or biological essence, but by repeated acts of legal enforcement.

Rather than adjudicating the legal position of already existing and coherent racial categories, law plays a central role in shaping the boundaries of the categories themselves. This phenomenon is aptly established by legal scholars who demonstrate how judges, lawyers, juries, and litigants—relying on what they deemed common knowledge or scientific evidence—determined racial identity for a host of litigants based on spurious legal rationale that made reference to the presence or absence of certain proxy features such as ancestry or religion.[190]

Recalling again the way the circuit court framed the question of how to racially classify Asian Indians in United States v. Thind,[191] the judges who were authorized to make racial determinations were confused about how to weigh proxies like “class and caste, blood and birthplace, and even religion” when identifying race.[192] The Supreme Court’s turn towards racial common sense as the deciding criteria in Thind worked to stabilize an inherently unstable racial ideology with a veneer of cohesion,[193] in ways similar to the courts’ holdings in Rice v. Cayetano, Williams v. Dart, and EEOC v. Catastrophe. In each of these cases judges cast their political understandings of race as natural or self-evident, and these determinations were critical to upholding or ameliorating the stratification of political, economic, and social life.

Whether a person was subject to freedom,[194] whether they could enjoy the privileges of citizenship,[195] whether they could testify in court,[196] or whether they could attend a school of their choice[197] have at some point in America’s legal history turned on the legal determination of a person’s race. Through powerful acts of legal decree and everyday performances of ritual enforcement, the illusion of fixed, inherent, or natural races emerged to rationalize and maintain a society organized by and through racial hierarchy.[198]

Seen through this perspective, the very concept of a racial proxy variable is paradoxical since the notion redundantly places race and another variable into a false relationship of “proxy.” The historically constitutive relationship between race and racial proxy variables reveals the contradictions in attempts to leach racial information from the racial proxies. While this statistical method appears to bypass definitional disputes concerning the racial proxy, the normative selection is nevertheless revealed by the decision to place race and a racial proxy into an ahistorical and decontextualized relation of proxy. Only by ignoring the historically constitutive relationship between race and its so-called proxies can this method claim to deliver a race-neutral algorithm.

B. Output-Focused: Racial Ontologies in Algorithmic Outcomes

Some scholars bringing an econometric perspective to the question of algorithmic racial proxy discrimination have similarly cautioned against the “algorithmic myth of colorblindness.”[199] Simulating algorithmic credit pricing using a rich dataset of mortgage loans, Talia Gillis demonstrates some of the problems with excluding, limiting, or otherwise scrutinizing algorithmic inputs as an approach to addressing algorithmic racial discrimination.[200]

When discussing the prohibition of race and racial proxy variables, Gillis shows how, in the big data context, the complex interactions between inputs mean one cannot rely on a racial common sense to determine which variables serve as racial proxies in a dataset.[201] Using this point to reinforce a broader caution against regulatory acquiescence to an “input fallacy,” Gillis urges a turn toward scrutinizing algorithmic outputs. When one cannot be sure about the racial consequences of algorithmic inputs, attention to racialized outputs may be the only workable solution to address racial proxy discrimination.[202] At the very least, scrutinizing algorithmic outputs renders certain normative decisions more transparent.

For Gillis, legally relevant questions—such as the question of which individuals are “similarly situated” and should therefore be treated similarly by the algorithm—can be freshly examined by looking to outputs.[203] Scrutinizing algorithmic outputs also permits regulators to examine how the algorithm performs compared to human decision-makers when evaluating protected groups,[204] another metric of algorithmic bias that occupies a prominent place in contemporary discourse.[205]

While these output-based assessments of discrimination bypass the difficulty inherent to identifying racial proxies in input data, they are hardly an escape from the racial reasoning that makes input scrutiny a difficult task for computer programmers. Consider for example that any scrutinization of outputs evidently returns the programmer back to issues which legal scholars have been keenly attentive to. That is, the lack of consensus over the meaning of key, politically relevant concepts like algorithmic fairness, algorithmic equality, and algorithmic bias within antidiscrimination discourse.[206] This is a point that Gillis acknowledges.[207]

Any approach to regulating algorithmic outcomes necessarily requires selecting definitions of algorithmic equality, discrimination, and bias from the many competing definitions that exist.[208] While legal scholars and computer scientists have been acutely aware of the normative disputes behind measurements of algorithmic fairness,[209] the underlying racial ontologies that become embedded in algorithmic outputs as a result of these disputes are rarely identified as part of the terrain of normative contest.

Take, for example, the question of whether an algorithm’s disparate racial impact must reach a certain magnitude before violating civil rights law, a question that has preoccupied legal scholars and that directly implicates theories of antidiscrimination law.[210] To be sure, this is a question about the normative meaning of algorithmic discrimination. But it is also fundamentally a question about how much “race” can be said to explain algorithmically determined differences between groups once a programmer has presumably accounted for every other variable relevant to the outcome.

It is here that a programmer must decide what amount of algorithmically produced differentiation can be counted as legitimate racial disparity.[211] Drawing these distinctions between warranted and unwarranted levels of racially disparate impact is an enormous conceptual challenge. If racial disparities in the outcome remain beneath a particular limit, the algorithm is not deemed the source of differentiation. Instead, race is the explanation for the difference. But for a programmer, accepting any level of innate racial difference risks embedding an erroneous ideology of natural, static, or intrinsic racial difference within the algorithm. Hence, the determination of how much racial differentiation is tolerable is one way that human decision-makers technologically embed racial ontologies into algorithms.[212] These embedded ontologies can, in turn, lend the algorithm some degree of explainability.

What I want to suggest is that when algorithmic determinations are otherwise difficult for a programmer to articulate or explain—perhaps due to the self-enhancing nature of the algorithm itself[213]—beliefs and assumptions about how race relates to the algorithmic output can form the basis of a programmer’s explanation of an otherwise inscrutable algorithmic outcome. Their beliefs echo legal constructions of race, where judges’ racial intuitions lend a veneer of coherence to otherwise inscrutable reasoning.[214]

Aspects of this debate are not newly initiated by machine learning algorithms. Legal scholars have long established the fundamentally ambiguous nature of antidiscrimination discourse, describing how judges, scholars, and social movements put forth varying and competing definitions of equality that are later absorbed into law and policy.[215] Each legally absorbed definition contains embedded racial beliefs about the source and meanings of racially disparate outcomes.[216] But just as new information technologies usher in new types of disputes, these technologies also initiate new actors, arenas, and processes that resolve these disputes,[217] and each dispute must be clearly defined as a domain for normative contest and political struggle. Even if a turn to an outcomes-based analysis bypasses the programmer’s need to define a racial proxy in input data, it opens the door to a new, output-oriented genre of racial reasoning for computer programmers.

To be sure, these unanswered questions of antidiscrimination law and racial ontology have not stemmed the tide of algorithmic proliferation. If anything, the law’s capricious understanding of a racial proxy leaves the door open to a breathtaking assertion of state and private power to define a racial proxy within the context of algorithms. That is, against this legal, statistical, and discursive backdrop, the definition of the racial proxy, and other questions much debated in equality law, are currently being answered by a novel list of extra-legal actors engaged in the development and deployment of algorithms. Without attention to the theoretical voids left open in the interaction between law and technology, the technologically embedded meaning of race and other constructed categories of difference, and the economic and political processes that guide this meaning, escape critical scholarly, advocacy, and regulatory attention.[218]

IV. The New Racial Proxy Construction

All that you touch you Change. All that you Change Changes you. The only lasting truth is Change.

—Octavia E. Butler, Parable of the Sower

The current judicial, statistical, and discursive understandings of a racial proxy provide capricious and at times paradoxical conceptual tools to define a racial proxy in input data. But such capriciousness ought not to be confused for randomness, nor should it be allowed to conceal the ways in which political and economic domination work to shape understandings of the algorithmic racial proxy with consequences that are both immediate and yet unknown.

In the interstices of legal reasoning about the definition of the racial proxy is space for various “acts of legal entrepreneurship” taken by powerful political and economic actors.[219] To date, the practical result of courts’ inability to proffer a definition of the racial proxy, as well as a popular confusion about its relationship to race, has been a delegation of epistemic authority over racial proxy determinations to those who develop and deploy machine learning algorithms.

To illustrate this point and its far-reaching consequences, this Section offers an examination of the definition of the racial proxy proffered by state and corporate actors in one of the most closely watched legal disputes about algorithmic racial discrimination, one between the DOJ and Meta, one of the largest multinational technology conglomerates in the world.[220] I examine how the U.S. federal government has ultimately sought to respond to racial proxy discrimination and how it has approached the problem of the indeterminate meaning of a racial proxy.

While remaining sensitive to the ways the processes of defining a racial proxy can vary across algorithmic contexts, a close examination of this case can nevertheless offer a microcosmic glimpse into how misguided conceptions of the racial proxy can both reinforce the maldistribution of benefits and burdens along conventional racial lines, while also facilitating a change to the racial subject at the center of antidiscrimination law.

The federal government’s case against Meta demonstrates how myopic understandings of the relationship between race and a racial proxy result in meaningful limitations on the ability to regulate new information technologies, while simultaneously facilitating the emergence of novel forms of racialization whose consequences have yet to be determined.

A. The Lawsuit Against Meta’s Racial Proxy

In October 2016, an article published in ProPublica revealed the results of an inquiry into Facebook’s, now known as Meta, advertising portal.[221] Journalists entered the company’s portal to purchase housing ads, conducting a quasi-audit for discrimination.[222] Once within the portal, the journalists were able to target their ads by including or excluding from the ad’s eligible audience Facebook users with certain characteristics.[223]

The ability to target ads to particular user characteristics was hardly a new or surprising revelation. Targeted advertisement is core to Meta’s business model.[224] The company maintains an extensive and unprecedented system of mass behavioral surveillance and data collection.[225] Meta tracks information related to individuals’ online activity across the company’s technologies (e.g., Facebook and Instagram), their activities with other websites and cellphone applications, their location data from websites and from phones, and their activity with other businesses.[226] The collected data then fuel Meta’s lucrative business of targeted advertising—nearly $135 billion in revenue in 2023.[227]

While ad targeting based on user characteristics is key to the company’s financial success, the specific characteristics presented to housing advertisers drew media and advocacy scrutiny. Facebook’s ad portal allowed the ProPublica journalists to narrow the eligible audience for their housing ads by excluding users who were classified as having a certain “ethnic affinity” such as African American, Asian American, or Hispanic.[228] Housing advertisers could select from a drop-down menu of these categories and exclude people within these groups from the ad’s audience.[229] The legal question at hand was whether Facebook’s ad targeting and delivery system violated the Fair Housing Act (FHA).[230] Among other prohibitions, the FHA makes it unlawful to:

[M]ake, print, or publish, or cause to be made, printed, or published any notice, statement, or advertisement, with respect to the sale or rental of a dwelling that indicates any preference, limitation, or discrimination based on race, color, religion, sex, [disability], familial status, or national origin, or an intention to make any such preference, limitation, or discrimination.[231]

ProPublica likened the ethnic affinity categories to a racial exclusion option that gave advertisers the Jim Crow–era opportunity to place ads only in newspapers that went to White readers.[232] Civil rights lawyers contended that one would be hard pressed to find such a blatant violation of the FHA prohibition against racial discrimination in housing advertisements, and that allowing advertisers to narrow their advertising pool by ethnic affinity constitutes unlawful racial discrimination.[233] The ProPublica article launched a series of investigations and a lawsuit against Facebook filed by a group of civil rights advocates, National Fair Housing Alliance v. Facebook (NFHA v. Facebook).[234]

In formulating the problem with Facebook’s ad system as one of racial discrimination under the FHA, civil rights advocates relied on their understanding of “ethnic affinity” as an impermissible racial proxy.[235] To grasp why the issue of proxy discrimination is relevant to the dispute, one must understand that the “ethnic affinity” categories that Facebook presented to ad buyers were not direct racial markers in the formal sense.[236] According to Meta, the company did not directly collect racial information.[237] Facebook’s ethnic affinity categories were instead algorithmically constructed conclusions about people’s interests or affinities based on insights culled from the massive trove of data that Facebook collects.[238] As the complaint in NFHA v. Facebook argued, “Facebook extracts data from its users’ online behavior, both on Facebook and off, and uses algorithms designed to sort that data, process it, and repackage it to group potential customers into new and salient categories for advertisers to choose from when targeting their ads.”[239]

In other words, Facebook’s algorithms classify people as belonging to a particular “ethnic affinity” based on data about their online behaviors.[240] The company’s algorithms also create and sort users into hundreds of other novel categories based on their data.[241] Once assembled into these categories—like “parents with toddlers,” or “people with an interest in cooking”—advertisers can then include or exclude users from an ad’s audience based on their membership in the constructed category.[242]

The critique at the core of the NFHA complaint is that some of these algorithmically constructed categories reflect groups protected under the FHA,[243] even though the categories stop short of making explicit racial identifications. From this perspective, these algorithmically constructed groupings can track racial categories with almost surgical precision,[244] making them close proxies or pretexts for constitutionally protected groups.[245]

In November 2016, Meta responded with a statement about Facebook’s “ethnic affinity” ad-targeting tool.[246] Framing the issue as a misuse of Facebook’s technology by bad actors who unlawfully sought to narrow ad audiences, Meta explained that it would disable the use of these ethnic affinity categories.[247] But, in 2018, after advocates reported that the problem persisted, the Assistant Secretary for Fair Housing and Equal Opportunity filed an administrative complaint with the U.S. Department of Housing and Urban Development (HUD).[248] Following discussions with Meta, the DOJ brought an action against Meta in district court in June 2022 for violations of the FHA.[249]

Like the suit filed by civil rights advocates, the DOJ complaint alleged that Facebook facilitated housing discrimination based on “characteristics that are proxies for, or closely related to, FHA-protected categories.”[250] Facebook’s algorithmically constructed categories, like its “multicultural affinity” categories, helped advertisers identify African American, Latino, and White users and exclude them from housing advertisements.[251] According to the DOJ, “Facebook continues to rely upon its vast trove of user data to create the audiences for housing ads. FHA-protected characteristics are related to and encoded within this user data.”[252]

The government’s discursive framing of protected characteristics as “encoded” within facially race-neutral data echoes a long line of antidiscrimination critiques of algorithms discussed within this Article. These critiques suggest race is fundamentally present within purportedly race-neutral data. These so-called “redundant encodings,” which occur when certain data is highly correlated with membership in a legally protected class, have been critiqued by computer scientists and antidiscrimination scholars alike.[253] And as previous sections explain, core to the racial proxy debate is the difficulty in determining which redundant encodings ought to be acceptable and which ones should be rejected.[254]

Importantly, the answer to this core theoretical question is glaringly absent from the government’s complaint. Beyond the assertion that these contentious variables are “closely related” to race,[255] the government appears to sidestep this definitional question altogether, opting instead for a self-evident understanding of a racial proxy. And while the complaint plainly asserts that FHA-protected characteristics such as race are “related to and encoded within” Meta’s massive trove of user data,[256] the specific mechanism and extent to which this racial coding occurs is not explicitly stated.

Without giving an account of the histories of exclusion, segregation, and subordination that form the material basis of these redundant encodings—one that links race with its imagined proxies like zip code and geography[257]—the complaint leaves a significant conceptual void. As I argue below, the definition of the racial proxy that prevails in the case’s eventual settlement constitutes a regulatory conceit to the technology company. This regulatory conceit extends beyond inadequate transparency and disclosure requirements secured within the settlement and touches on the authority of Meta to instantiate new political groupings of great consequence to public life.

B. Meta’s Epistemic Authority over the Racial Proxy

Without proffering a precise definition of a racial proxy, the DOJ entered into a settlement agreement (Settlement) with Meta in June 2022.[258] Meta agreed to eliminate ad targeting options that it determined were “direct descriptors of, or semantically or conceptually related to, a person or group of people based on FHA-Protected Classes.”[259] This means that once Facebook’s algorithms assemble their novel and salient human groupings for ad targeting, the company cannot name these groupings after racial or other protected categories, nor can it name these groupings anything that would appear to describe legally protected groups.[260]

While the Settlement still permits Meta to algorithmically assemble meaningful groupings for targeted advertising, under the Settlement’s terms, Meta must share the names of these new groupings with the government.[261] Using criteria that is not explicated in the agreement, the government will have the opportunity to review and contest these categories based on any “semantic or conceptual” overlaps with race and other protected characteristics.[262]

Despite the government’s ability to observe and contest these potential racial proxies from a descriptive standpoint, Meta’s right to maintain the privacy of its proprietary and confidential information is explicitly protected by the Settlement.[263] The explicit protection of Meta’s right to privacy leaves open the question of whether the government has secured the level of transparency and accountability required to contest how Meta algorithmically assembles its groupings.

Without guaranteeing the ability to contest the myriad human decisions that have shaped Meta’s group-classifying algorithms, the Settlement assumes that the threat of the impermissible racial proxy is purely semantic. The “racial” nature of the “ethnic affinity” proxy is addressed as a highly formalized question of naming a particular category, rather than as the human-driven algorithmic design decisions that such naming represents.

To be sure, a shift in naming practices alone is not entirely beside the point from an antidiscrimination standpoint. Renaming these groupings from “ethnic affinity” categories to categories that are less semantically or conceptually related to race may frustrate attempts by those seeking to intentionally exclude racial groups from online housing advertisements. Without a clear, intuitive, or recognizable racial descriptor, intentional discrimination by housing advertisers may be much harder.

Nevertheless, by defining the relationship between race and a racial proxy as purely semantic, the Settlement improperly gives Meta the authority to use machine learning techniques to assemble distinct groupings of individuals that may then be ordered hierarchically. Yet therein lies the government’s regulatory conceit. It is precisely this process—a process that categorizes and differentiates people for the purpose of exposing them to differentiated opportunities, resources, and privileges—that constitutes the racial proxy in need of legal regulation. And it is this process that should not be within Meta’s control.

To be sure, framing Meta’s authority to algorithmically assemble new and salient classes of people as a process of racialization that ought to be removed from the company’s control turns in part on an understanding of race that is not widespread in law and technology literature.[264] While the idea that race is a sociopolitical construction is accepted in most scholarly circles, the full import of this understanding has yet to inform legal codes or much of legal thought.[265] And it has proven unsuccessful at dislodging commonly held assumptions about the role of race within scientific fields.[266]

While there continue to be those that cynically exploit biological theories of race to advance political-economic agendas, the durability of beliefs about the existence of innate races may also reflect just how salient race is as a sociopolitical construction. One would be hard pressed to point to a corner of American life where the reality of race is not glaringly evident—the sheer material import of race may lead some to believe race must have natural origins. Combine race’s material salience with the reality that scholarship tracing the emergence of race as a historically contingent (often violently coercive) sociopolitical processes[267] is difficult to encounter since such accounts are powerfully suppressed.[268]

Despite this knowledge suppression, a focus on these sociopolitical processes has led scholars of race and technology to conceptualize race itself as a “technology,” one that mediates between individuals and techniques to systemize certain outcomes.[269] Emerging from historically contingent processes, systems of racial classifications have been constantly subject to change and revision as social, economic, and political conditions change.[270] At each step, law has provided the ideological and coercive scaffolding.[271]

Today, legal doctrine strictly scrutinizes actions based on overt racial classifications but has failed to acknowledge or correct the role that law has played in constructing the categories themselves.[272] One encounters strict scrutiny even when groups are classified for the purpose of eliminating racial subordination.[273] Law sees race as the cause of some present or future act of discrimination that must be avoided, but not as evidence of the phenomena of differential treatment and discrimination.[274]

To argue that Meta’s group-assembling algorithm constitutes a process of racialization is to assert that the assignment of a group status, based on attributes that are regarded as unchangeable and inherent, for the purpose of hierarchical human ordering, is a process that touches on the heart of what is meant by the term “racial.”[275] Although Meta’s algorithmically constructed categories may be semantically distinct from race, they still operate in a virtually identical way as race,[276] meaning members of these algorithmic groups are assumed to be meaningfully dissimilar from others by virtue of the presence or absence of certain traits. This presumed difference then becomes the basis for the assignment of differential opportunities, resources, privileges, and statuses within hierarchies of disposability.

While the agreement between Meta and the U.S. government settles the definitional dispute about the racial proxy—defining it as a category with a semantic or conceptual overlap with race—it settles it in such a way as to neglect the myriad value-laden choices that allow Meta to create these distinct classes of people whose differentiation can be accepted as uncontestable or obvious. Permitting Meta to algorithmically configure these groupings protects processes of ad personalization—processes that are practically useful to some Facebook users and that are, without a doubt, immensely lucrative for the company.[277] But it does so at the unknown cost of relinquishing control over the algorithmic construction of categories that can order and organize humans in legally relevant areas of public life, including housing, credit, and employment.

Approaches that allow Meta to self-regulate this process risk acquiescence to larger patterns observed by scholars of technology who demonstrate that when powerful companies in the technology sector are occasionally forced to alter their practices in response to some external opposition, their “executives and engineers produce superficial but tactically effective adaptations that satisfy the immediate demands of government authorities, court rulings, and public opinion.”[278] Yet these adaptations may do nothing to alter the company’s fundamental enjoyment of epistemic deference and domination.

Consider that advertisers who run housing, employment, and credit ads on Meta platforms will now encounter restrictions in the creation of their target audiences. According to Meta, an advertiser will:

not [be] permitted to target based on gender, age, or interests that appear to describe people of a certain race, religion, ethnicity, sexual orientation, disability status, or other protected class. If they opt to target by location, that location targeting must have a minimum 15-mile radius.[279]

Nevertheless, the company’s voluntary disclosures reveal almost nothing about what features “appear to describe people of a certain race,” or how the company selected a 15-mile radius as the minimum location-targeting zone, presumably to eliminate the discriminatory effect of place-based racial proxies. While Meta asserts that “the categories that remain available to these advertisers were the result of in-depth conversations with civil rights stakeholders,”[280] the precise details of this consultation process remain obscured. Therefore, the company maintains virtually exclusive epistemic control over the process of racial proxy selection.

From this perspective, the definition of a racial proxy as “semantically or conceptually” related to race may satisfy the narrow dictates of antidiscrimination law, but it does little to shape the underlying value-laden design decisions that structure Meta’s group-constructing algorithm. In this state of affairs, Meta retains authority over the algorithmically constructed categories that hierarchically order and organize individuals and expose groups of people to divergent opportunities and resources in key areas of legal and policy concern.

By defining a racial proxy as a fixed, semantic choice that can be intuitively recognized, the Settlement confines the antidiscrimination debate to questions of Meta’s phrasing. Yet algorithmic processes of sorting and classifying individuals necessitate a plethora of human decisions beyond naming the resulting groups.[281] Each development and deployment decision eventually determines the shape and nature of Meta’s “ethnic affinity” category, and each decision ought to be available for meaningful scrutiny and challenge by regulators when the outcome of these decisions is to sort individuals into novel, materially significant classifications.

Urgently needed is an approach to algorithmic discrimination that turns attention away from the names ascribed to racial proxy groupings and towards the human and algorithmic decisions required to bring these algorithmically constructed groups into existence and assign them material resources in the world. Indeed, it is their constructed nature—which is done for the purpose of distributing resources asymmetrically across groups—that transforms these algorithmically constructed groups into racial proxies.

The importance of understanding how these different conceptual definitions of a racial proxy can result in vastly different regulatory approaches lies partially in the unpredictable potential of new information technologies as both socially repressive and socially redemptive tools.[282] What Meta’s control over the value-based design decisions that construct race-like categories does for the categories themselves remains an open question. It is, of course, impossible to predict the future. But the technical design decisions overseen by Meta will be shaped by the company’s own interests and agendas.

One can expect that Meta’s algorithmically constructed groupings will be driven by the company’s primary pursuit of profit maximization. In other words, the logic of capital accumulation in the digital age is likely to drive these new and capricious forms of racial common sense because, for companies like Meta, the reputational benefits or legal consequences that stem from demonstrating fair algorithms have not proven to outweigh the commercial risks of self-auditing and disclosure.[283] Therefore, the company acts within its profit-making interests when it constructs these categories in ways that prioritize commercial benefits, rather than prioritizing the prevention of racialized economic opportunity and privilege.

In 2023, advertising drove almost all of Meta’s revenue.[284] While this Settlement institutes a racial reform, it is unrealistic to assume the company will engage in the process of category construction in ways that are not guided by economic concerns like protecting the company’s quest for revenue through the continued extraction and monetization of consumer data.[285]

Through a close examination of Meta’s Settlement, we see how law enables a powerful set of economic actors to drive developments in racial formation. This is the digital era instantiation of a historical process. As the law—a major and meaningful agent of racial production—began to enshrine the ideology of race and White supremacy into the American colonies, it was fueled by the real and immediate quest to solidify economic structures and secure profound economic gain. As Barbara and Karen Fields write,

[p]ractical needs—the need to clarify the property rights of slaveholders and the need to discourage free people from fraternizing with slaves—called forth the law. And once practical needs of this sort are ritualized often enough either as conforming behavior or as punishment for non-conforming behavior, they acquire an ideological rationale that explains to those who take part in the ritual why it is both automatic and natural to do so.[286]

The ideological rationale to which the Fields refer is race itself, a powerful technology developed to reconcile a complex and contradictory social and political reality in which the enslavement of African descendants stood as an “anomalous exception” to an otherwise avowed commitment to human liberty.[287] Part of the power of racial ideology is that it assumes its own aura of normality by virtue of its reference to an embodied “underlying and unchangeable essence” that makes racial borders seem otherwise unpassable.[288]

When vision is clouded by the ideology of race, it becomes difficult to recognize that Meta’s presumably fixed categorical borders—those based on data culled from online activity instead of skin, hair, or bone—are in fact porous, contestable, and actively constructed.[289] In the interstices left open by the Settlement’s definition of the racial proxy, Meta retains the power over novel forms of racial formation.

C. Meta’s Output-Focused Approach in Practice

To be sure, a shift in semantic and conceptual framing is not all Meta intends to do to comply with the FHA. The company also intends to deploy a new system—the Variance Reduction System (VRS)—to reduce the variance between an ad’s eligible audience and the audience who ultimately sees the ad.[290]

Broadly speaking, the VRS uses reinforced learning, a type of machine learning, to achieve a predefined outcome. In this case, the system is instructed to minimize the variance in advertisement impressions across demographic groups.[291] Once a housing advertiser selects a target audience for its advertisement, the eligible demographic ratio for the ad is determined. Then, as the ad is being delivered, the VRS will measure the proportion of impressions of that ad across demographic subgroups, including gender, age, and estimated race.[292] Since Meta contends that it does not collect user information about race, race here is estimated using a method known as Bayesian Improved Surname and Geocoding (BISG).[293] Broadly speaking, BISG estimates racial identity using name and location data pulled from the U.S. Census.[294] If the VRS determines that an ad is being shown in a manner that does not reflect the demographic distribution of the eligible audience, a controller system will shift the distribution of the ad’s impressions toward the eligible ratio.[295]

Although the move toward a more outcomes-based test as the primary mechanism for detecting algorithmic racial discrimination is a constructive one, it does not adequately put to rest disputes about racial meaning and effect. Many important disputes about the meaning of algorithmic fairness and the effect of race on a given outcome will ultimately be technically embedded in the VRS. These disputes will be resolved by Meta’s design choices with some engagement from the government. As Meta explains:

Across the industry, approaches to algorithmic fairness are still evolving, particularly as it relates to digital advertising. But we know we cannot wait for consensus to make progress in addressing important concerns about the potential for discrimination – especially when it comes to housing, employment, and credit ads, where the enduring effects of historically unequal treatment still have the tendency to shape economic opportunities.[296]

For Meta, the recognition that there is no “consensus” in the “evolving” notions of algorithmic fairness helps fuel its rationale that the company can fill the theoretical void.[297] And fill the void it does by relocating important debates about the meaning of algorithmic discrimination from “open, public participatory processes” to private development choices.[298]

For example, the question of what degree of variance is required before the VRS system should trigger the controller represents its own distinct racial determination about what level of racial disparity is both ethically and technically permissible.[299] As discussed in previous sections, this technical choice reflects a theoretical perspective on the degree to which algorithmically produced differential outcomes can be attributed to racial differences. According to the Settlement, Meta and the government will “meet and confer in good faith in an effort to agree on metrics for how much the VRS will reduce any variances.”[300] But the Settlement does not clarify which values, commitments, and principles will guide Meta’s design decisions or the government’s regulatory approach. Nor are these infinitely contestable values left open for public debate, discussion, and dissent.[301]

The Settlement also fails to articulate how the VRS’s third-party reviewer—which was suggested by Meta—will engage the public in its review process. Most importantly, the development, deployment, and oversight of the VRS itself highlight the counterintuitive ethics of outsourcing civil rights regulation to machine learning technologies that are designed, owned, and operated by parties accused of civil rights violations.

Ultimately, a study of Meta’s agreement with the U.S. government—particularly its treatment of the racial proxy—highlights what is at stake in the definition of the indeterminate racial proxy. To be sure, at stake are the meanings of contestable terms like algorithmic discrimination, equality, and fairness, but a closer examination reveals that we may also be witnessing Meta’s ability to algorithmically alter the very nature of the political categories that animate these terms. Indeed, regulatory approaches to machine learning algorithms must consider how contested, constitutionally relevant concepts, like intent or colorblindness, translate uneasily to the algorithmic context and can therefore be manipulated by programmers.[302] But there is also a need to understand how the racial subject—the very subject at the center of antidiscrimination law—may similarly be reconstructed by those who develop and deploy machine learning algorithms.

Conclusion

The concern that formally race-blind algorithmic decision-making can perpetuate racial discrimination by relying on racial proxies has become a cornerstone of debates over algorithmic discrimination. In laying the groundwork for a discussion of the theoretical and material implications of this racial proxy debate, this Article has advanced several claims. I have shown how our present jurisprudential terrain gives those who develop and deploy algorithms a powerful adjudicatory role, which was once exclusively reserved for judges—that is, to use their racial intuition to decide what gives a variable its racial quality. Normatively, I argue that the answer to the question of what constitutes a racial proxy requires an explicitly moral and political solution and cannot be resolved with a purely technical fix. Most importantly, what is at stake in the ability to define a racial proxy is the production of new and meaningful classes of individuals that can later be exposed to differing resources, opportunities, subordination, and privilege. Power over this novel process of racial construction ought not to rest with profit-driven technology companies, who can fashion these groups in the cause of their economically lucrative futures. What is urgently needed in both law and technology is an understanding of race as itself a proxy—one imbued with social, historic, and political meaning to be constructed in service of a just human future.

‍ ‍

Copyright © 2026 Fanna Gamal, Assistant Professor of Law, UCLA School of Law. This Article is indebted to the wisdom and generosity of many people. For your time and extraordinary insights, thank you Angela P. Harris, Issa Kohler-Hausmann, Jerry Kang, Jessica Eaglin, Dorothy Roberts, Noah Zatz, Cheryl Harris, Joseph Fishkin, Mark McKenna, and Pauline Kim. This Article benefitted greatly from thoughtful comments, feedback, and conversations in faculty workshops at UC Davis School of Law, UC Berkeley School of Law, UCLA School of Law, UC Irvine School of Law, and University of Pennsylvania Carey School of Law. Many thanks are due to the Privacy Law Scholars Conference for incubating this work and for awarding it the Reidenberg-Kerr Award for Outstanding Scholarship by a Junior Scholar. For invaluable research support, I thank Nicola Haubold Sanz De Santamari, Jada Evans, Sherry Leysen, Brian Raphael, and Jonathan Rogers. The editors of the California Law Review provided exceptional editing and feedback. This research was made possible in part by support from the UCLA Initiative to Study Hate and the UCLA Luskin Institute on Inequality and Democracy. The Mesa Writer’s Refuge provided the gift of time and space to complete this project. Kyle Halle-Erby, thank you for introducing me to this literature all those years ago. This Article is dedicated to my brothers, Zeine and Jamal.

[1]. In re Thind, 268 F. 683 (D. Or. 1920).

[2]. An Act to Establish an Uniform Rule of Naturalization, ch. 3, § 1, 1 Stat. 103 (1790) (repealed 1795); An Act to Amend the Naturalization Laws and to Punish Crimes Against the Same, and for Other Purposes., ch. 254, § 7, 16 Stat. 254 (1870).

[3]. United States v. Thind, 261 U.S. 204 (1923).

[4]. Id. at 206.

[5]. Id. at 215 (“It is a matter of familiar observation and knowledge that the physical group characteristics of the Hindus render them readily distinguishable from the various groups of persons in this country commonly recognized as white.”).

[6]. See Ian Haney López, White By Law: The Legal Construction of Race 62 (10th anniversary ed. 1996) (“The language betrays entrenched beliefs about the racial significance of class and caste, blood and birthplace, and even religion in establishing racial identity.”).

[7]. See generally id. at78–108.

[8]. See David Lehr & Paul Ohm, Playing with the Data: What Legal Scholars Should Know About Machine Learning, 51 U.C. Davis L. Rev. 653, 670 (2017).

[9]. Id. at 671.

[10]. Id.

[11]. Crystal S. Yang & Will Dobbie, Equal Protection Under Algorithms: A New Statistical and Legal Framework,119 Mich. L. Rev. 291, 297–98 (2020) (“Surveying the field, we find that all commonly used predictive algorithms exclude race as an input. The universal exclusion of race as an algorithmic input is unsurprising given the mainstream legal view that the direct use of race as an input would be unconstitutional.”).

[12]. Id.

[13]. See Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold & Richard Zemel, Fairness Through Awareness, in ITCS ‘12: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference 214, 218 (2012) (referring to “fairness through blindness”); Lily Morse, Mike Horia M. Tedorescu, Yazeed Awwad & Gerald C. Kane, Do the Ends Justify the Means? Variation in the Distributive and Procedural Fairness of Machine Learning Algorithms, 181 J. Bus. Ethics1083, 1087 (2022); Amina A. Abdu, Irene V. Pasquetto & Abigail Z. Jacobs, An Empirical Analysis of Racial Categories in the Algorithmic Fairness Literature, in FAccT 2023: Proceedings of the 6th ACM Conference on Fairness, Accountability, and Transparency 1324, 1332 (2023) (explaining that ethical considerations in technical research and AI are influenced by legal compliance among other aspects).

[14]. Arvind Narayanan, Tutorial: 21 Fairness Definitions and Their Politics, YouTube (Mar. 1, 2018), https://youtu.be/jIXIuYdnyyk (on file with the California Law Review).

[15]. Abdu, Pasquetto & Jacobs, supra note 13, at 1324; Pauline T. Kim, Race-Aware Algorithms: Fairness, Nondiscrimination and Affirmative Action, 110 Calif. L. Rev. 1539, 1545 (2022) (“[E]fforts to make a model less biased could involve taking race into account in many different ways. Exactly when and how a given de-biasing strategy does so is critically important for judging its legality.”); Yang & Dobbie, supra note 11, at 297–99; Solon Barocas & Andrew D. Selbst, Big Data’s Disparate Impact, 104 Calif. L. Rev. 671, 695 (2016); Sonja B. Starr, Evidence-Based Sentencing and the Scientific Rationalization of Discrimination, 66 Stan. L. Rev. 803, 817–21 (2014); Anya E.R. Prince & Daniel Schwarcz, Proxy Discrimination in the Age of Artificial Intelligence and Big Data, 105 Iowa L. Rev. 1257, 1265–67 (2020).

[16]. See, e.g.,Bernard E. Harcourt, Risk as a Proxy for Race: The Dangers of Risk Assessment, 27 Fed. Sent’g Rep. 237, 238 (2015); Barocas & Selbst, supra note 15, at 695; Starr, supra note 15, at 838; Prince & Schwarcz, supra note 15, at 1260–63.

[17]. See Betsy Anne Williams, Catherine F. Brooks & Yotam Shmargad, How Algorithms Discriminate Based on Data They Lack: Challenges, Solutions, and Policy Implications, 8 J. Info. Pol’y 78, 83–85, 89 (2018).

[18]. See Talia B. Gillis & Jann L. Spiess, Big Data and Discrimination, 86 U. Chi. L. Rev. 459, 469 (“In very high-dimensional data, and when complex, highly nonlinear prediction functions are used, this problem that one input variable can be reconstructed jointly from the other input variables becomes ubiquitous.”); Prince & Schwarcz, supra note 15, at 1273–76; Yang & Dobbie, supra note 11, at 311–18; Kim, supra note 15, at 1546.

[19]. Prince & Schwarcz, supra note 15, at 1260 (“[O]ne of the most important threats to anti-discrimination regimes posed by big data and AI is largely unexplored or misunderstood in the extant legal literature. This is the risk that modern AIs will result in ‘proxy discrimination.’”); see also Pauline T. Kim, Data-Driven Discrimination at Work, 58 Wm. & Mary L. Rev. 857, 898–99 (2017); Aziz Z. Huq, Racial Equity in Algorithmic Criminal Justice, 68 Duke L.J. 1043, 1099–1100 (2019) [hereinafter Huq, Racial Equity]; Dorothy E. Roberts, Digitizing the Carceral State, 132 Harv. L. Rev. 1684, 1719 (2019) (reviewing Virginia Eubanks, Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor (1st ed. 2018)); Sandra G. Mayson, Bias In, Bias Out,128 Yale L.J. 2218,2263 (2019); Barocas & Selbst, supra note 15, at 691;Jessica M. Eaglin, Constructing Recidivism Risk, 67 Emory L.J. 59, 95–97 (2017); Harcourt, supra note 16, at 237; Starr, supra note 15, at 821–42; Yang & Dobbie, supra note 11, 311–18; Aziz Z. Huq, Constitutional Rights in the Machine Learning State, 105 Corn. L. Rev. 1875, 1925–26 (2020) [hereinafter Huq, Constitutional Rights].

[20]. Yang & Dobbie, supra note 11; Talia B. Gillis, The Input Fallacy, 106 Minn. L. Rev. 1175 (2022).

[21]. See Talia B. Gillis & Jann L. Spiess, Big Data and Discrimination, 86 U. Chi. L. Rev. 459, 460 (2019) (“Unlike human decision-making, the exclusion of data from consideration can be guaranteed in the algorithmic context. However, forbidding inputs alone does not assure equal pricing and can even increase pricing disparities between protected groups.”).

[22]. See Gillis, supra note 20, at 1192; Yang & Dobbie, supra note 11, at 312, 315–19.

[23]. Michael Carl Tschantz, What is Proxy Discrimination?, in FAccT ‘22: Proceedings of 2022 5th ACM Conference on Fairness, Accountability, and Transparency1993 (2022).

[24]. Gillis, supra note 20, at 1235–36.

[25]. Yang & Dobbie, supra note 11, at 333.

[26]. Barocas & Selbst, supra note 15, at 728 (“Abandoning a belief in the efficacy of procedural solutions leaves policy makers in an awkward position because there is no definite or consensus answer to questions about the fairness of specific outcomes. These need to be worked out on the basis of different normative principles.”).

[27]. Mayson, supra note 19, at 2224.

[28]. Gillis, supra note 20, at 1236 (“If what we are truly interested in is the ability to recover a person’s protected characteristics, intuitive judgments are insufficient to determine which features to exclude. Features that intuitively feel like proxies might correlate less than features that do not feel like proxies.”).

[29]. For background on the capricious and contradictory manner in which courts, litigants, and lawyers construct racial categories, boundaries, and meaning, see generallyHaney López, supra note 6; Ariela J. Gross, What Blood Won’t Tell: A History of Race on Trial in America (2008).

[30]. See Ian Haney López, Intentional Blindness, 87 N.Y.U. L. Rev. 1779, 1783 (2012) (“Colorblindness today applies when a government actor explicitly employs a racial classification. In practice, this covers affirmative action policies and little else. Under colorblindness, the remedial motives behind affirmative action are irrelevant.”).

[31]. Cheryl Harris, Critical Race Studies: An Introduction, 49 UCLA L. Rev. 1215, 1229 (2002).

[32]. See infra Part II.

[33]. Noah D. Zatz, Disparate Impact and the Unity of Equality Law, 97 B.U. L. Rev. 1357, 1382 (2017) (discussing scenarios that “elicit confusion about whether they constitute disparate treatment” based on the functional equivalency of a characteristic with a protected class).

[34]. Rice v. Cayetano, 528 U.S. 495 (2000) (holding ancestry is a proxy for race because it captures a pejorative purpose to classify).

[35]. Barocas & Selbst, supra note 15, at 714.

[36]. Equality law treats these variants as analytically distinct, but antidiscrimination scholars have long pointed to their conceptual unity beneath overarching principles or frameworks. See, e.g., Zatz, supra note 33, at 1383 (unifying the variants of equality law under the framework of status causation); see also Cass R. Sunstein, The Anticaste Principle, 92 Mich. L. Rev. 2410 (1994) (uniting the variants of equality law under an anticaste principle); Joseph Fishkin, The Anti-Bottleneck Principle in Employment Discrimination Law,91 Wash. U. L. Rev. 1429 (2014) (uniting the variants of antidiscrimination law under an antibottleneck principle, designed to prevent the emergence and perpetuation of opportunity “bottlenecks”); Deborah Hellman, Measuring Algorithmic Fairness, 106 Va. L. Rev. 811 (2020).

[37]. Williams v. Dart, 967 F.3d 625, 638 (7th Cir. 2020).

[38]. Rice,528 U.S. 495.

[39]. See discussion infra Part II.C.

[40]. Gross, supra note 29, at 16.

[41]. Gillis & Spiess, supra note 21, at 469 (“For example, if an applicant’s neighborhood is highly correlated with an applicant’s race, we may want to restrict the use of one’s neighborhood in pricing a loan. A major challenge of this approach is the required articulation of the conditions under which exclusion of data inputs is necessary.”).

[42]. See Haney López, supra note 6, at 56–77 (discussing the variation of Supreme Court racial reasoning in Ozawa v. United States, 260 U.S. 178 (1922), and United States v.Thind,261 U.S. 204 (1923). Decided mere months apart, the two cases reflect inconsistent ideas about what evidences racial differences: racial science or common knowledge).

[43]. Yang & Dobbie, supra note 11, at 346–48; Devin G. Pope & Justin R. Sydnor, Implementing Anti-Discrimination Policies in Statistical Profiling Models, 3 Am. Econ. J. 206 (2011).

[44]. Gillis, supra note 20, at 1180–81 (“The input fallacy creates an algorithmic myth of colorblindness by fostering the false hope that input exclusion can create non-discriminatory algorithms.”); see also Talia B. Gillis, Orthogonalizing Inputs, in CS&Law ‘24: Proceedings of the Third Symposium on Computer Science and Law 1 (2024) [hereinafter, Gillis, Orthogonalizing Inputs].

[45]. Gillis, supra note 20, at 1186 (“For when it is no longer possible to scrutinize inputs, outcome analysis provides the only way to evaluate whether a pricing method leads to impermissible disparities.”); Gillis, Orthogonalizing Inputs, supra note 44, at 7.

[46]. See Jessica M. Eaglin, On “Color-blind” and the Algorithm, 112 Geo. L.J. 1385, 1389 (2024).

[47]. Gillis, supra note 20, at 1250 (“In a world in which there is no credible way to determine at the outset whether a protected characteristic is being used to price, the closest alternative world be to ask: are the prices different for protected groups, controlling for the legitimate grounds for differentiation? . . . Only the unexplained component of price disparity would then be the basis of discrimination and not the raw disparities alone.”).

[48]. See id. at 1187, 1219.

[49]. Press Release, Off. of Pub. Affs., U.S. Dep’t of Just., Justice Department Secures Groundbreaking Settlement Agreement with Meta Platforms, Formerly Known as Facebook, to Resolve Allegations of Discriminatory Advertising (June 21, 2022), https://www.justice.gov/opa/pr/justice-department-secures-groundbreaking-settlement-agreement-meta-platforms-formerly-known [https://perma.cc/8Z26-NG7H].

[50]. See Sandra Wachter, The Theory of Artificial Immutability: Protecting Algorithmic Groups under Anti-Discrimination Law, 97 Tul. L. Rev. 149, 153–55 (2022) (analyzing AI’s construction of new groupings and the challenges this poses for antidiscrimination law).

[51]. Michael Omi & Howard Winant, Racial Formation in the United States: From the 1960s to the 1990s (2d ed. 1994).

[52]. Ian F. Haney López, The Social Construction of Race: Some Observations on Illusion, Fabrication, and Choice, 29 Harv. C.R.-C.L. L. Rev. 1 (1994).

[53]. Karen E. Fields & Barbara J. Fields,Racecraft: The Soul of Inequality in American Life (2012).

[54]. Stuart Hall, Race the Floating Signifier: What More Is There to Say About “Race”?, in Selected Writings on Race and Difference 359 (Paul Gilroy & Ruth Wilson Gilmore eds., 2021) (“Well, to put it crudely, race is one of those major concepts which organize the great classificatory systems of difference, which operate in human societies.”).

[55]. See, e.g.,Richard T. Ford, Race as Culture? Why Not?, 47 UCLA L. Rev. 1803, 1804 (2000); Mari J. Matsuda, Voices of America: Accent, Antidiscrimination Law, and a Jurisprudence for the Last Reconstruction, 100 Yale L.J. 1329, 1348–57, 1360–67 (1991); Angela Onwuachi-Willig & Mario L. Barnes, By Any Other Name?: On Being “Regarded As” Black, and Why Title VII Should Apply Even if Lakisha and Jamal Are White, 2005 Wis. L. Rev. 1283, 1288–89, 1297–1312 (2005); Lauren Sudeall Lucas, Identity as Proxy, 115 Colum. L. Rev. 1605, 1613–34 (2015).

[56]. See Devon W. Carbado & Mitu Gulati, Working Identity, 85 Corn. L. Rev. 1259, 1298 (2000) (discussing racial discrimination based on racial performance); Mary Anne Case, “The Very Stereotype the Law Condemns”: Constitutional Sex Discrimination Law as a Quest for Perfect Proxies, 85 Corn. L. Rev. 1447, 1449 (1999); Ford, supra note 55; Matsuda, supra note 55; Onwuachi-Willig & Barnes, supra note 55.

[57]. Parents Involved in Cmty. Schs. v. Seattle Sch. Dist., 551 U.S. 701, 720 (2007).

[58]. See Fair Housing Act, 42 U.S.C. §§ 3601–3619; Title VII of the Civil Rights Act of 1964, 42 U.S.C. §§ 2000e–2000e-17; Equal Credit Opportunity Act, 15 U.S.C. § 1691.

[59]. Yang & Dobbie, supra note 11, at 297 (surveying the use of predictive algorithms in the criminal legal system and finding that “all commonly used predictive algorithms exclude race as an input”).

[60]. See, e.g.,Camille Gear Rich, Performing Racial and Ethnic Identity: Discrimination by Proxy and the Future of Title VII,79 N.Y.U. L. Rev. 1134, 1140 (2004); Onwuachi-Willig & Barnes, supra note 55, at 1297–1312; Lu-in Wang, Race as Proxy: Situational Racism and Self-Fulfilling Stereotypes, 53 DePaul L. Rev. 1013, 1015–16 (2004).

[61]. W. Kerrel Murray, Discriminatory Taint, 135 Harv. L. Rev. 1190 (2022).

[62]. Matsuda, supra note 55.

[63]. See Complaint at 24, United States v. Meta Platforms, Inc., No. 22-cv-05187 (S.D.N.Y. June 21, 2022) [hereinafter DOJ Meta Complaint] (indicating that zip code would no longer be considered in connection with Facebook’s Lookalike Audience tool); The Los Angeles County Risk Stratification Pilot: An Overview and One Year Update 73 (Aug. 29, 2022), https://dcfs.lacounty.gov/wp-content/uploads/2022/08/Risk-Stratification-One-Year-Update_8.24.22.pdf [https://perma.cc/K29X-K887] (“To help mitigate bias, several modeling and implementation decisions were made, including . . . (3) excluding zip code and geographic indicators from the model . . . .”); see also Barocas & Selbst, supra note 15, at 722 (discussing the elimination of certain geographic indicators in a workforce optimization algorithm).

[64]. See Emily Putnam-Hornstein, Rhema Vaithianathan, Jacquelyn McCroskey & Daniel Webster, Los Angeles County Risk Stratification: Model Methodology & Implementation Report (Model Version 1.0), Child.’s Data Network (Aug. 2022), https://dcfs.lacounty.gov/wp-content/uploads/2022/08/Risk-Stratification-Methodology-Report_8.29.22.pdf [https://perma.cc/HH6A-RXEF].

[65]. I borrow this term from Julie E. Cohen, Between Truth and Power: The Legal Constructions of Informational Capitalism 25 (2019).

[66]. See generally Rashida Richardson, Racial Segregation and the Data-Driven Society: How our Failure to Reckon with Root Causes Perpetuates Separate and Unequal Realities, 36 Berkeley Tech. L.J. 1051 (2021).

[67]. See Yang & Dobbie, supra note 11, at 298.

[68]. Prince & Schwarcz, supra note 15, at 1276; Williams, Brooks & Shmargad, supra note 17, at 106.

[69]. See, e.g., Huq, Racial Equity, supra note 19, at 1062 (discussing “instruments” that are “generally calibrated using one pool of data and then applied to new data as a means of identifying or predicting crime that was previously unknown and that, typically, has not yet occurred”); Kim, supra note 15, at 1551 (“[A] predictive model is developed by applying mathematical tools to extract patterns from an existing dataset called the training data. Those observed patterns are then used to make inferences about what will happen in future cases.”).

[70]. See Anupam Datta, Matt Frederikson, Gihyuk Ko, Piotr Mardziel & Shayak Sen, Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs, in CCS ‘17: Proceedings of the 2017 ACM SIGSAC Conference On Computer And Communications Security 1193, 1194 (2017); see also Gillis, supra note 20, at 1183 (“[I]nformation about a person’s protected characteristics is embedded in other information about the individual, so that a protected characteristic can be ‘known’ to an algorithm even when it is formally excluded.”).

[71]. Cary Coglianese & Alicia Lai, Algorithm vs. Algorithm, 71 Duke L.J. 1281, 1297–98 (2022).

[72]. See generally Alice Xiang, Reconciling Legal and Technical Approaches to Algorithmic Bias, 88 Tenn. L. Rev. 649 (2021) (explaining the processing capacity of algorithms and mentioning their potential negative effects on equality).

[73]. Barocas & Selbst, supra note 15, at 721 (“[I]f you wanted to remove everything correlated with race, you couldn’t use anything.”) (citing Nadya Labi, Misfortune Teller, Atlantic (Jan./Feb. 2012), https://www.theatlantic.com/magazine/archive/2012/01/misfortune-teller/308846/ [https://perma.cc/74PC-T4WP]); see also Yang & Dobbie, supra note 11, at 291.

[74]. Harcourt, supra note 16, at 240 (identifying prior criminal history as a proxy for race: “Current actuarial instruments vary widely in the number and type of risk factors that they include, but all place heavy weight on criminal history. Unfortunately, reliance on criminal history has proven devastating to African American communities and can only continue to have disproportionate impacts in the future”); Starr, supra note 15, at 838 (arguing that factors like criminal history and geographic indicators raise concerns about the impermissible racial proxy).

[75]. Starr, supra note 15, at 838 (“[Even when some actuarial instruments exclude race], the socioeconomic and family variables that they do include are highly correlated with race, as is criminal history, so they are likely to have a racially disparate impact. Given widespread de facto residential segregation and the concentration of crime in urban neighborhoods of color, the neighborhood crime rate variables found in some instruments are particularly disturbing.” (citations omitted)).

[76]. See, e.g., id.

[77]. Yang and Dobbie call this the “formalistic solution” of excluding inputs to the algorithm. Yang & Dobbie, supra note 11, at 298, 343–46.

[78]. Gillis, supra note 20, at 1185.

[79]. Id.

[80]. Prince & Schwarcz, supra note 15, at 1304 (“Making matters even worse, the black box nature of AIs and the vastness of big data mean that intuition alone will often be inadequate to identify an AI’s use of a proxy variable, even after the fact.”).

[81]. See id.

[82]. Gillis, supra note 20, at 1184–85; Yang & Dobbie, supra note 11, at 315, 334.

[83]. Robert Brauneis & Ellen P. Goodman, Algorithmic Transparency for the Smart City, 20 Yale J.L. & Tech. 103, 125 (2018) (“However, due to residential segregation, zip codes are often proxies for race. Knowing this, agencies may choose to exclude zip codes as inputs to predictive algorithms even where they improve the algorithm’s predictive power.”).

[84]. See, e.g., Gillis, supra note 20, at 1237 (“[I]nput exclusion comes at the price of prediction accuracy, which may hurt vulnerable populations.”); Yang & Dobbie, supra note 11, at 319 (“[C]hoosing to exclude protected characteristics comes at the cost of predictive accuracy.”); Richard Berk, Hoda Heidari, Shahin Jabbari, Michael Kerns & Aaron Roth, Fairness in Criminal Justice Risk Assessments: The State of the Art, 50 Socio. Methods & Rsch. 3, 3 (2021); Richard Berk, Accuracy and Fairness for Juvenile Justice Risk Assessments, 16 J. Empirical Legal Stud. 175, 175–76 (2019); Tameem Adel, Isabel Valera, Zoubin Ghahramani & Adrian Weller, One-Network Adversarial Fairness, in AAAI-19: Proceedings of the 33rd AAAI Conference on Artificial intelligence 2412, 2412 (2019).

[85]. See, e.g., Gillis, supra note 20, at 1237; Yang & Dobbie, supra note 11, at 319.

[86]. Students for Fair Admissions, Inc. v. President & Fellows of Harvard Coll., 143 S. Ct. 2141, 2185–86 (2023) (Thomas, J., concurring).

[87]. Id.

[88]. Id.; Jack M. Balkin & Reva B. Siegel, The American Civil Rights Tradition: Anticlassification or Antisubordination, 58 U. Mia. L. Rev. 9, 11 (2003).

[89]. Lehr & Ohm, supra note 8, at 682 (explaining that in many of the most important decisions in the development of algorithms, analysts will have to make “judgment calls” and “bring subject matter knowledge to bear”).

[90]. Students for Fair Admissions, 143 S. Ct. at 2165–66.

[91]. Id. (strictly scrutinizing race-based admissions designed to advance racial diversity in higher education).

[92]. See, e.g., Williams v. Dart, 967 F.3d 625, 638 (7th Cir. 2020).

[93]. See Rice v. Cayetano, 528 U.S. 495, 510 (2000).

[94]. Seeid.

[95]. Id. at 499.

[96]. Id.

[97]. Id. at 498–99.

[98]. Id. at 510.

[99]. Id. at 514.

[100]. See Addie C. Rolnick, Indigenous Subjects, 131 Yale L.J. 2652, 2667–68 (2022) [hereinafter Rolnick, Indigenous Subjects]; see also Addie C. Rolnick, The Promise of Mancari: Indian Political Rights as Racial Remedy, 86 N.Y.U. L. Rev. 958, 1008 (2011) [hereinafter Rolnick, The Promise].

[101]. Rolnick, Indigenous Subjects, supra note 100, at 2693.

[102]. Rice, 528 U.S. at 510, 514.

[103]. Id. at 513–14.

[104]. Id. at 514–15.

[105]. Id. at 515.

[106]. Id. at 514.

[107]. Id. at 500–01, 514.

[108]. Id. at 514.

[109]. Id.

[110]. Id. at 515–16.

[111]. See generally id. at 514.

[112]. Id. at 517.

[113]. Id. at 516–17.

[114]. Id. at 517 (explaining that the State’s inquiry into ancestorial lines evinced “the same grave concerns as a classification specifying a particular race by name”).

[115]. Id.

[116]. Id. at 539–40 (Stevens, J., dissenting).

[117]. 238 U.S. 347 (1915).

[118]. Rice, 528 U.S. at 540 (Stevens, J., dissenting).

[119]. Guinn, 238 U.S. at 367–68.

[120]. Rice, 528 U.S.at 540 (Stevens, J., dissenting).

[121]. Id. at 541.

[122]. Rolnick, Indigenous Subjects, supra note 100, at 2695.

[123]. Id. at 2722.

[124]. Rice, 528 U.S.at 514.

[125]. J. Kehaulani Kauanui, The Politics of Blood and Sovereignty in Rice v. Cayetano, 25 Pol. & Legal Anthropology Rev. 110 (2002).

[126]. Williams v. Dart, 967 F.3d 625, 630 (7th Cir. 2020).

[127]. Id.

[128]. Id.

[129]. Id. at 632.

[130]. Id. at 638.

[131]. Id. at 637–38.

[132]. Id. at 638.

[133]. Id.

[134]. Id.

[135]. Id.

[136]. Id.

[137]. Alana Semuels, Chicago’s Awful Divide, Atlantic (Mar. 28, 2018), https://www.theatlantic.com/business/archive/2018/03/chicago-segregation-poverty/556649/ [https://perma.cc/NYU5-2TBE].

[138]. Williams, 967 F.3d at 638.

[139]. Id.

[140]. Id. Although, the court was less sure about the “proxy mechanism” in the case of criminal charges.

[141]. See, e.g.,Cassia Spohn, Race, Crime, and Punishment in the Twentieth and Twenty-First Centuries, 44 Crime & Just. 49, 57 (2015); Yu Du, Racial Bias Still Exists in Criminal Justice System? A Review of Recent Empirical Research, 37 Touro L. Rev. 79, 92–94 (2021); Paul Butler, Race and Adjudication, in 3 Reforming Criminal Justice: Pretrial and Trial Processes 211, 215 (Erik Luna ed., 2017).

[142]. Cal. Penal Code § 741 (requiring the state Department of Justice to develop race-blind charging guidelines).

[143]. See Williams, 967 F.3d at 638 (acknowledging that arrest history and neighborhood are proxies for race without explaining why pending criminal charges may not be).

[144]. See id.; Rice v. Cayetano, 528 U.S. 495, 514–15 (2000).

[145]. 852 F.3d 1018, 1020 (11th Cir. 2016).

[146]. Id.

[147]. Id. at 1030.

[148]. Id. at 1024.

[149]. Id. at 1022.

[150]. Id. at 1026.

[151]. Id. at 1027.

[152]. Id.

[153]. Id. at 1030.

[154]. See id. at 1033–35.

[155]. Id. at 1030 (“[D]iscrimination on the basis of black hair texture (an immutable characteristic) is prohibited by Title VII, while adverse action on the basis of black hairstyle (a mutable choice) is not.”).

[156]. Seeid. at 1027.

[157]. “Racial regimes are constructed social systems in which race is proposed as a justification for the relations of power. While necessarily articulated with accruals of power, the covering conceit of a racial regime is a makeshift patchwork masquerading as memory and the immutable.” Cedric J Robinson, Forgeries of Memory and Meaning: Blacks and the Regimes of Race in American Theater and Film Before World War II, xii–xiii (2007).

[158]. 852 F.3d at 1033–35.

[159]. See Sandra Wachter, The Theory of Artificial Immutability: Protecting Algorithmic Groups under Anti-Discrimination Law, 97 Tul. L. Rev. 146, 154 (2022) (“AI systems frequently use input data containing features that cannot be meaningfully interpreted by a human observer due to their scale (both small and large), volume, complexity, or source.”).

[160]. See Haney López, supra note 6, at 35–55, app. A at 163 (discussing racial prerequisite to citizenship cases such as Ozawa v. United States, 260 U.S. 178 (1922),and United States v. Thind, 261 U.S. 204 (1923), that move between a reliance on so-called scientific evidence and racial common sense).

[161]. Gross, supra note 29, at 98–99; see Haney López, supra note 6, at 35–55, app. A at 163.

[162]. See Students for Fair Admissions, Inc., v. President & Fellows of Harvard Coll., 143 S. Ct. 2141, 2185 (2023) (Thomas, J., concurring) (“[B]ecause ‘not all blacks in the United States were former slaves,’ ‘freedman’ was a decidedly under-inclusive proxy for race.” (quoting Michael B. Rappaport, Originalism and the Colorblind Institution, 89 Notre Dame L. Rev. 71, 98 (2013))).

[163]. See Yang & Dobbie, supra note 11, at 43 (“[T]hese algorithms take a very haphazard approach to dealing with nonrace correlates and proxy effects, sometimes excluding inputs deemed to be correlated with race out of fairness concerns (even if a loss to accuracy) yet also retaining others that are also likely correlated with race, including in particular current offense and criminal history.”).

[164]. Haney López, supra note 6, at 56–77 (discussing the variation of Supreme Court racial reasoning in Ozawa, 260 U.S. 178,and Thind, 261 U.S. 204, which, although decided mere months apart, reflect inconsistent ideas about what evidences racial differences, racial science, or common knowledge).

[165]. Yang & Dobbie, supra note 11, at 298; Pope & Sydnor, supra note 43, at 207.

[166]. Gillis, supra note 20, at 1186.

[167]. See generally Lily Hu, What is “Race” in Algorithmic Discrimination on the Basis of Race?, 21 J. Moral Phil. 1 (2024); Lily Hu & Issa Kohler-Hausmann, What Is Perceived When Race Is Perceived and Why It Matters for Causal Inference and Discrimination Studies, 59 Law & Soc’y Rev. 239 (2025).

[168]. Pope & Sydnor, supra note 43, at 207; Yang & Dobbie, supra note 11, at 343 (“Our first recommended solution purges all algorithmic inputs of the proxy effects of race in the estimation step of the predictive algorithm, and then uses these ‘colorblind’ inputs to predict outcomes in the prediction step.”).

[169]. Yang & Dobbie, supra note 11, at 343.

[170]. Id. at 346.

[171]. Id. at 346–47.

[172]. See id. at 346.

[173]. See id. (“[T]he colorblinding-inputs algorithm does not exclude race and race correlates in the estimation step. In fact, it uses all inputs to estimate predictive relationships . . . .” (alteration in original)).

[174]. See id. at 346–48.

[175]. Id. at 346.

[176]. Sebastian Benthall & Bruce D. Haynes, Racial Categories in Machine Learning, in FAT ’19: Proceedings of the 2019 Conference on Fairness, Accountability and Transparency 289, 295 (2019).

[177]. For work that problematizes approaches to securing the race variable in algorithms, seeJessica M. Eaglin, Racializing Algorithms, 111 Calif. L. Rev. 753, 787–88 (2023); Abdu, Pasquetto & Jacobs, supra note 13, at 1327–33.

[178]. See, e.g.,White Logic, White Methods: Racism and Methodology (Tukufu Zuberi & Eduardo Bonilla-Silva eds., 2008); Laura E. Gómez, Looking for Race in All the Wrong Places, 46 Law & Soc’y Rev. 221, 229–34 (2012).

[179]. See United States v. Johnson, 122 F. Supp. 3d 272, 331 (M.D.N.C. 2015) (suggesting that a statistical study was unreliable because its author identified people as “Hispanic” based on self-reports and observations of who “appeared to be” Hispanic).

[180]. Cheryl I. Harris, Whiteness as Property, 106 Harv. L. Rev. 1707, 1710–11 (1993).

[181]. See Khiara Bridges, Race in the Machine: Racial Disparities in Health and Medical AI, 110 Va. L. Rev. 243, 286 (2024) (discussing how the variable of “black race” in machine learning algorithms can conceal the “persistence of the myth of biological race”).

[182]. Devon W. Carbado & Cheryl I. Harris, Intersectionality at 30: Mapping the Margins of Anti-Essentialism, Intersectionality, and Dominance Theory, 132 Harv. L. Rev. 2193, 2193 (2019) (“[C]ontest[ing] the view that feminism and critical theory must always avoid essentialism to achieve normative commitments to social transformation.”).

[183]. Hall, supra note 54, at 359–73 (showing race as signs, symbols, and language).

[184]. DOJ Meta Complaint, supra note 63 (referencing the usage of zip code in Facebook’s Lookalike Audience tool).

[185]. EEOC v. Catastrophe Mgmt. Sols., 852 F.3d 1018, 1030 (11th Cir. 2016) (explicitly suggesting that hair texture, as opposed to hair style, can be a proxy for race).

[186]. Rice v. Cayetano,528 U.S. 495, 496 (2000) (using Hawaiian ancestry as a proxy for race).

[187]. See, e.g., Hudgins v. Wrights, 11 Va. (1 Hen. & M.) 134 (1806) (assigning race by reference to morphological features other than complexion, including hair and nose shape); Abdullahi v. Prada USA Corp., 520 F.3d 710, 712 (2008) (racially classifying Iranians by reference to certain proxy variables including national origin, accent, and experiences of discrimination); Hernandez v. Texas, 347 U.S. 475, 479–81 (1954) (expanding equal protection rights to Mexican Americans as a racialized class by reference to racial proxies such as surname, community attitudes, and experiences of segregation); Saint Francis Coll. v. Al-Khazraji, 481 U.S. 604, 613 (1987) (racially classifying Iraqi Americans based on racial proxies such as ancestry or ethnic characteristics).

[188]. See Ariela J. Gross, Litigating Whiteness: Trials of Racial Determination in the Nineteenth-Century South, 108 Yale L.J. 109, 129–30 (1998).

[189]. Angela James, Making Sense of Race and Racial Classification, in White Logic, White Methods: Racism and Methodology 31, 36–38 (Tukufu Zuberi & Eduardo Bonilla-Silva eds., 2008).

[190]. Haney López, supra note 6, at 61–65 (discussing United States v. Thind,261 U.S. 204 (1923));Gross, supra note 29, at 23 (discussing Hudgins v. Wrights in which an enslaved women sued for her freedom claiming her mother was of Indian ancestry).

[191]. Thind, 261 U.S. at 206.

[192]. Haney López, supra note 6, at 62.

[193]. See Thind, 261 U.S. at 215.

[194]. See generally Hudgins v. Wrights, 11 Va. (1 Hen. & M.) 134 (1806) (using genealogy to evaluate the race of a Native American person and to determine whether the person should be enslaved).

[195]. See generally Thind, 261 U.S. 204;Ozawa v. United States, 260 U.S. 178 (1922) (evaluating whether a person of Japanese descent was White to determine whether he could be naturalized).

[196]. See generally People v. Hall, 4 Cal. 399 (1854) (striking testimony from a Chinese witness after determining him to be Black under a law that excluded testimony from Black witnesses in criminal trials with White defendants).

[197]. See generally Gong Lum v. Rice, 275 U.S. 78 (1927) (determining that a child of Chinese descent was “colored” and therefore her placement at a segregated “colored school” was not a violation of the Fourteenth Amendment).

[198]. Fields & Fields, supra note 53, at 130 (“Race does not explain that law. Rather, the law shows society in the act of inventing race.” (alteration in original)).

[199]. Gillis, supra note 20, at 1180.

[200]. Id. at 1181.

[201]. Id. at 1234–35.

[202]. Id. at 1257; see also Anupam Chander, The Racist Algorithm?,115 Mich. L. Rev. 1023, 1039 (2017) (“Instead of transparency in the design of the algorithm, what we need is a transparency of inputs and outputs.”).

[203].Gillis, supra note 20, at 1249–52.

[204]. Id. at 1254–56.

[205]. Coglianese & Lai, supra note 71, at 1286 (“Any meaningful assessment of AI in the public sector must therefore start with an acknowledgement that government as it exists today is already grounded in a set of imperfect algorithms. These existing algorithms are inherent in human decision-making.”).

[206]. See, e.g., Kim, supra note 19, at 898; see also Huq, Racial Equity, supra note 19, at 1099–1100; Barocas & Selbst, supra note 15, at 695; Mayson, supra note 19, at 2241.

[207]. See Gillis, supra note 20, at 1247 (“The exact criteria to be used in outcome analysis cannot be defined without clear definition of what discrimination law, and disparate impact in particular, are meant to achieve.”).

[208]. See, e.g.,Sam Corbett-Davies, Hamed Nilforoshan, Ravi Shroff & Sharad Goel, The Measure and Mismeasure of Fairness, 24 J. Mach. Learning Rsch. 1 (2023).

[209]. Deborah Hellman, Measuring Algorithmic Fairness, 106 Va. L. Rev. 811 (2020); Barocas & Selbst, supra note 15, at 723 (“Data mining discrimination will force a confrontation between the two divergent principles underlying antidiscrimination law: anticlassification and antisubordination.”).

[210]. Chander, supra note 202, at1039 (arguing for an understanding of “algorithmic affirmative action” that scrutinizes racially disparate outputs); see also Nicholas O. Stephanopoulos, Disparate Impact, Unified Law, 128 Yale L.J. 1566, 1611–13 (2019) (explaining that the question of how much disparate impact is permissible under antidiscrimination law is an open and important question); Barocas & Selbst, supra note 15, at 676 (“In certain cases, data mining will make it simply impossible to rectify discriminatory results without engaging with the question of what level of substantive inequality is proper or acceptable in a given context.”).

[211]. See Gillis, supra note 20, at 1250 (“In a world in which there is no credible way to determine at the outset whether a protected characteristic is being used to price, the closest alternative world be to ask: are the prices different for protected groups, controlling for the legitimate grounds for differentiation?”).

[212]. See, e.g.,discussion infra Part IV about Meta’s Variance Reduction System.

[213]. Lehr & Ohm, supra note 8, at 692 (“It is often difficult to put into intuitive, understandable prose how exactly a machine-learning algorithm generates, for each subject, a prediction from all of the subject’s input variable values.”); Leo Breiman, Statistical Modeling: The Two Cultures, 16 Stat. Sci. 199, 206 (2001) (“Unfortunately, in prediction, accuracy and simplicity (interpretability) are in conflict. For instance, linear regression gives a fairly interpretable picture of the y, x relation. But its accuracy is usually less than that of the less interpretable neural nets.”).

[214]. Ozawa v. United States, 260 U.S. 178, 198 (1922) (“The effect of the conclusion that the words ‘white person’ mean a Caucasian is not to establish a sharp line of demarcation between those who are entitled to naturalization, but rather a zone of more or less debatable ground outside of which, upon the other hand, are those clearly ineligible for citizenship.”).

[215]. Kimberlé Williams Crenshaw, Race, Reform, and Retrenchment: Transformation and Legitimation in Antidiscrimination Law,101 Harv. L. Rev. 1331 (1988); see also Balkin & Siegel, supra note 88, at 1341–46.

[216]. Crenshaw, supra note 215, at 1341–46 (discussing the competing restrictive and expansive views of antidiscrimination law).

[217]. Cohen, supra note 65, at 140.

[218]. Jerry Kang, Cyber-Race, 113 Harv. L. Rev. 1130 (2000); Sonya K. Katyal & Jessica Y. Jung, The Gender Panopticon: AI, Gender, and Design Justice, 68 UCLA L. Rev. 692 (2021).

[219]. Julie E. Cohen, Surveillance Capitalism as Legal Entrepreneurship, 17 Surveillance & Soc’y 240, 242 (2019) (reviewing Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power (2019)).

[220]. See, e.g., Press Release, Meta Platforms, Inc., Meta Reports Fourth Quarter and Full Year 2023 Results; Initiates Quarterly Dividend (Feb. 1, 2024), https://www.prnewswire.com/news-releases/meta-reports-fourth-quarter-and-full-year-2023-results-initiates-quarterly-dividend-302051285.html [https://perma.cc/2FUS-7U8R]; Karl Russell & Joe Rennison, These Seven Tech Stocks Are Driving the Market, N.Y. Times (Jan. 22, 2024), https://www.nytimes.com/interactive/2024/01/22/business/magnificent-seven-stocks-tech.html [https://perma.cc/MU8Y-TN7S] (reporting Meta among the “Magnificent Seven” technology stock companies); Who We Are, L.A. Cnty. Dep’t of Child. & Fam. Servs., https://dcfs.lacounty.gov/about/who-we-are/ [https://perma.cc/P635-PEBU] (stating responsibility for “more than 2 million children” across eighty-eight cities in Los Angeles County); Letter from Fesia A. Davenport, CEO, L.A Cnty., to Bd. of Supervisors, L.A. Cnty., attach. 1 at 4 (June 24, 2024), https://file.lacounty.gov/SDSInter/bos/supdocs/192623.pdf [https://perma.cc/PJ9A-63G3] (recommending in the 2024–25 budget letter $2.069 billion for the administration of children and family services).

[221]. Julia Angwin & Terry Parris Jr., Facebook Lets Advertisers Exclude Users by Race, ProPublica (Oct. 28, 2016), https://www.propublica.org/article/facebook-lets-advertisers-exclude-users-by-race [https://perma.cc/6G7P-L7JZ].

[222]. Id.

[223]. Id.

[224]. Audience Ad Targeting, Meta, https://www.facebook.com/business/ads/ad-targeting [https://perma.cc/K9FU-UNQS].

[225]. See About Facebook Ads, Meta, https://www.facebook.com/ads/about/?entry_product=ad_preferences [https://perma.cc/NE3W-SKGL].

[226]. Id.

[227]. Id.; see also Meta Platforms, Inc., Annual Report (Form 10-K), at 59 (Feb. 1, 2024).

[228]. Stephen Engelberg, HUD Has ‘Serious Concerns’ About Facebook’s Ethnic Targeting,ProPublica (Nov. 7, 2016), https://www.propublica.org/article/hud-has-serious-concerns-about-facebooks-ethnic-targeting [https://perma.cc/TYE3-ZXPB].

[229]. Julia Angwin, Ariana Tobin & Madelein Varner, Facebook (Still) Letting Housing Advertisers Exclude Users by Race, ProPublica (Nov. 21, 2021), https://www.propublica.org/article/facebook-advertising-discrimination-housing-race-sex-national-origin [https://perma.cc/936B-3FXD].

[230]. First Amended Complaint at 1, Nat’l Fair Hous. All. v. Facebook, Inc.,No. 1:18-CV-02689, (S.D.N.Y. June 25, 2018) (dismissed Mar. 29, 2019) [hereinafter NFHA First Amended Complaint] (“For decades, the FHA has prohibited both publishers and advertisers from ‘targeting’ ads based on sex, family status, disability, national origin, and other protected characteristics. Given this milestone, it is all the more egregious and shocking that Defendant Facebook continues to create content for landlords and real estate brokers to bar families with children, women, and others from receiving rental and sales ads for housing.”).

[231]. 42 U.S.C. § 3604(c).

[232]. Angwin & Parris, supra note 221 (“Imagine if, during the Jim Crow era, a newspaper offered advertisers the option of placing ads only in copies that went to white readers. That’s basically what Facebook is doing nowadays.”).

[233]. Id. (“When [ProPublica] showed Facebook’s racial exclusion options to a prominent civil rights lawyer John Relman, he gasped and said, ‘This is horrifying. This is massively illegal. This is about as blatant a violation of the federal Fair Housing Act as one can find.’”).

[234]. NFHA First Amended Complaint, supra note 230.

[235]. Id. at 18–19.

[236]. Angwin & Parris, supra note 221 (“[T]he ‘Ethnic Affinity’ is not the same as race — which Facebook does not ask its members about.”).

[237]. NFHAFirst Amended Complaint, supra note 230, at 14 (“[O]nly a handful of categories (age, gender, location, language, university, field of study, employer, and any ‘liked’ pages) are self-reported by users.”).

[238]. Id. at 12 (“Although Facebook users often voluntarily provide limited personal information, such as their age, gender, employer, and limited other categories, most of the data Facebook collects is not self-reported. The vast majority of this information comes from Facebook’s collection, evaluation, and processing of their users’ behavior both on and off Facebook to learn about users’ demographics (for example, their family status), their interests (for example, their political leanings or hobbies), and their behaviors (for example, that they are ‘recent mortgage borrowers’ or that their ‘spending method’ is ‘primarily cash’).”).

[239]. Id. at 13.

[240]. Angwin & Parris, supra note 221 (“Facebook assigns members an ‘Ethnic Affinity’ based on pages and posts they have liked or engaged with on Facebook.”).

[241]. See NFHA First Amended Complaint, supra note 230, at 14.

[242]. Id.

[243]. Id. at 34–35 (“These Facebook-created ‘interest’ categories are the equivalent of demographic exclusion categories labeled ‘disability’ or ‘Hispanic.’”).

[244]. Id. at 17 (“Using the ‘exclusions’ feature within Ad Manager, the employee selected the demographic preset options of ‘African-Americans’ and ‘Hispanics’ to exclude African-Americans and Hispanics from the ad’s potential audience. Facebook approved this ad.”); see also Joseph Blass, Note, Algorithmic Advertising Discrimination,114 Nw. U. L. Rev. 415, 421 (2019) (“At least prior to Facebook’s settlement, Facebook’s ad targeting system could be used to create audiences homogenous along a protected characteristic, which therefore discriminated by excluding those without that characteristic. For example, targeting an ad at users interested in the brand ‘Marie Claire’ generated an audience that was 90% female.”).

[245]. Statement of Interest of the United States of America at 5–6, Nat’l Fair Hous. All. v. Facebook, Inc.,No. 1:18-cv-02689 (S.D.N.Y. Aug. 17, 2018) (dismissed Mar. 29, 2019) (“Facebook invites such advertisers to construct a desired audience by including and excluding demographic and other traits; using the results of its algorithms, Facebook then delivers the ad only to users that Facebook determines matches those preferences.”).

[246]. Erin Egan, Improving Enforcement and Promoting Diversity: Updates to Ethnic Affinity Marketing, Meta (Nov. 11, 2016), https://about.fb.com/news/2016/11/updates-to-ethnic-affinity-marketing/ [https://perma.cc/B3AF-7QQS].

[247]. Id. (“Recently, policymakers and civil rights leaders have expressed concerns that advertisers could misuse some aspects of our affinity marketing segments. Specifically, they’ve raised the possibility that some advertisers might use these segments to run ads that discriminate against people, particularly in areas where certain groups have historically faced discrimination — housing, employment and the extension of credit.”).

[248]. Many of the events discussed in the ProPublica story formed the basis of the government’s complaint. The complaint alleged that Meta used an advertisement targeting and delivery system that violated the FHA. Housing Discrimination Complaint, Assistant Sec’y for Fair Hous. & Equal Opportunity v. Facebook, Inc. (Aug. 13, 2018), https://archives.hud.gov/news/2018/HUD_01-18-0323_Complaint.pdf [https://perma.cc/X9GF-3Y4U].

[249]. DOJ Meta Complaint, supra note 63.

[250]. Id. at 3.

[251]. Id. at 15. Other problems with Meta’s ad delivery system highlighted in the DOJ Complaint include “‘Lookalike’ Targeting,” whereby Meta employs a machine learning algorithm to help housing advertisers find audiences for their ads that resemble their typical audience based in part on FHA-protected characteristics. See id. at 2.

[252]. Id. at 25.

[253]. Barocas & Selbst, supra note 15, at 691–92; Dwork et al., supra note 13, at 217; see also Deirdre K. Mulligan & Kenneth A. Bamberger, Saving Governance-by-Design, 106 Calif. L. Rev. 697, 728 (2018) (“Because protected traits that are predictive of relevant differences will be redundantly encoded in other data that is mined to produce classifications, recognizing and eliminating such classifications depend upon access to data about protected classes.”).

[254]. See HBR IdeaCast, When Not to Trust the Algorithm, Harv. Bus. Rev. (Oct. 6, 2016), https://hbr.org/podcast/2016/10/when-not-to-trust-the-algorithm [https://perma.cc/9ERT-FQQN]; Ignacio N. Cofone, Algorithmic Discrimination Is an Information Problem,70 Hastings L.J. 1389, 1413 (2019).

[255]. DOJ Meta Complaint, supra note 63, at 17.

[256]. Id. at 25.

[257]. Rashida Richardson, Racial Segregation and the Data-Driven Society: How our Failure to Reckon with Root Causes Perpetuates Separate and Unequal Realities, 36 Berkeley Tech. L.J. 1051,1071 (2021) (“[R]acial segregation inevitably influences and shapes data sources, the data mining processes, and human biases and practices in the technology development process.”).

[258]. Settlement Agreement, United States v. Meta Platforms, Inc., No. 22-cv-05187 (S.D.N.Y. June 21, 2022) [hereinafter DOJ Meta Settlement Agreement]. The settlement was announced the same day the complaint against Meta was filed. What this suggests about the ongoing discussions between the company and the federal government is unclear.

[259]. See id. at 5–6.

[260]. Id. (“‘Direct descriptors’ means targeting options whose names directly describe persons in FHA-Protected Classes. ‘Semantically or conceptually related to’ means targeting options whose names appear to be associated with FHA-Protected Classes or persons in FHA-Protected Classes.”).

[261]. Id. at 6.

[262]. Id. (“Targeting options that comply with the standards in paragraph 9.a. may be added to the Housing Ad Flows in accordance with the following procedure. Meta will provide such targeting options to the United States, which will be given thirty (30) days to review and notify Meta of any objections based on the standards in paragraph 9.a. before the targeting options are added to the Housing Ad Flows. In the event that the Parties cannot reach agreement on whether a targeting option meets the standards in paragraph 9.a., Meta may not add such option without Court approval.”).

[263]. Id. at 11 (Meta maintains the right to “ensure the privacy and protection of its confidential, privileged, or otherwise proprietary information, including but not limited to user data and Meta’s intellectual property and trade secrets”).

[264]. Although there are important exceptions. See generally Kang, supra note 218 (bringing a racial constructive lens to the meaning of race in cyberspace). See also Jessica E. Eaglin, When Critical Race Theory Enters the Law & Technology Frame, 26 J. Race & L. 151, 165–67 (2021) (discussing how critical understandings of race can constructively intervene into law and technology literature).

[265]. See generally Khiara M. Bridges, The Dangerous Law of Biological Race, 82 Fordham L. Rev. 21 (2013).

[266]. Dorothy Roberts, Fatal Invention: How Science, Politics, and Big Business Re-create Race in the Twenty-first Century (2011).

[267]. See, e.g.,Harris, supra note 180, at 1715; Omi & Winant, supra note 51, at 43–76; Hall, supra note 183, at 363; Fields & Fields, supra note 53.

[268]. CRT Forward, UCLA Sch. of L. Critical Race Stud. Program, https://crtforward.law.ucla.edu/ [https://perma.cc/5EFC-SY7W].

[269]. Wendy Hui Kyong Chun, Race and/as Technology or How to Do Things to Race, in Race After the Internet 38 (Lisa Nakamura & Peter A. Chow-White eds., 2012); Ruha Benjamin, Race After Technology (2019).

[270]. Fields & Fields, supra note 53, at 121–24.

[271]. Haney López, supra note 6, at 1; Gross, supra note 29, at 11; Laura E. Gómez, Inventing Latinos 171–73 (2022).

[272]. Sudeall Lucas, supra note 55, at 1607.

[273]. See generally Students for Fair Admissions Inc., v. President & Fellows of HarvardColl.,143 S. Ct. 2141 (2023).

[274]. Harris, supra note 180, at 1229; Angela P. Harris, Race and Essentialism in Feminist Legal Theory, 42 Stan. L. Rev. 581, 585, 611 (1990); Haney López, supra note 6; Gross, supra note 29; Fields & Fields, supra note 53, at 119–20.

[275]. Gargi Bhattacharyya, Rethinking Racial Capitalism: Questions of Reproduction and Survival 96 (2018); Paul Gilroy, Against Race: Imagining Political Culture Beyond the Color Line (2000); Hall, supra note 183.

[276]. Thao Phan & Scott Wark, What Personalization Can Do for You!Or: How to Do Racial Discrimination Without ‘Race,’ 2021 Culture Mach. 12, https://culturemachine.net/wp-content/uploads/2021/09/Phan-Wark.pdf [https://perma.cc/DG6J-4RNC].

[277]. Id. at 11.

[278]. Shoshana Zuboff, The Age of Surveillance Capitalism: The Fight for a Human Future at the New Frontier of Power139, 140 (2019).

[279]. Miranda Bogen, Pushkar Tripathi, Aditya Srinivas Timmaraju, Mehdi Mashayekhi, Qi Zeng, Rabyd (Rob) Roudani, Sean Gahagan, Andrew Howard & Isabella Leone, Meta, Toward Fairness in PersonalizedAds 9–10 (2023), https://about.fb.com/wp-content/uploads/2023/01/Toward_fairness_in_personalized_ads.pdf [https://perma.cc/8F83-NVQE].

[280]. Id. at 10.

[281]. Lehr & Ohm, supra note 8 (discussing problem definition, data collection, data cleaning, data partitioning, and model selection as only a few of the human decisions that constitute the development of an algorithm).

[282]. Kang, supra note 218, at 1135 (“Such design questions are urgent because cyberspace holds redemptive and repressive potential.”).

[283]. Cathy O’Neil, Holli Sargeant & Jacob Appel, Explainable Fairness in Regulatory Algorithmic Auditing, 127 W. Va. L. Rev. 79, 82–83 (2024); see also Alex C. Engler, Independent Auditors Are Struggling to Hold AI Companies Accountable, Fast Co. (Jan. 26, 2021), https://www.fastcompany.com/90597594/ai-algorithm-auditing-hirevue [https://perma.cc/5LPC-CD3V].

[284]. Meta Platforms, Inc., Annual Report (Form 10-K), at 59 (Feb. 1, 2024).

[285]. See Crenshaw, supra note 215, 1380–81; see also Zuboff, supra note 278, at 137.

[286]. Fields & Fields, supra note 53, at 130–31.

[287]. Id. at 142.

[288]. Bhattacharyya, supra note 275, at 2.

[289]. See, e.g., Dorothy E. Roberts, Digitizing the Carceral State, 132 Harv. L. Rev. 1684, 1713–14 (2019) (“[P]rediction has long been one of racism’s central features. Race itself is a form of state categorization that ranks people by supposedly innate traits that are claimed to predict their behavior and character.”);E. Tendayi Achiume, Racial Borders, 110 Geo. L.J. 445, 480–88 (2022); see also Harris, supra note 180.

[290]. DOJ Meta Settlement Agreement, supra note 258, at 6–8.

[291]. Bogen et al., supra note 279, at 14.

[292]. Id.

[293]. See RAND Bayesian Improved Surname Geocoding, RAND, https://www.rand.org/health-care/tools-methods/bisg.html [https://perma.cc/AA96-SZTA].

[294]. Id.

[295]. Bogen et al., supra note 279, at 14–18.

[296]. Roy L. Austin, Jr., An Update on Our Ads Fairness Efforts, Meta (Jan. 9, 2023), https://about.fb.com/news/2023/01/an-update-on-our-ads-fairness-efforts/ [https://perma.cc/VVM2-JUVE].

[297]. Id.

[298]. Mulligan & Bamberger, supra note 253, at 726 (describing and critiquing the relocation of important, value-laden governance questions to design decisions made in closed-door sessions).

[299]. Bogen et al., supra note 279, at 17 (“For a given ad, the controller experiments with different ways to apply multipliers that most effectively reduce impression variance. The controller is periodically provided with updated aggregated impression variance measurements that signal whether the strategy used has been effective in reducing impression variance or not, and inform whether a new strategy should be deployed.”).

[300]. DOJ Meta Settlement Agreement, supra note 258, at 7.

[301]. Publication of Meta’s design choices and greater commitments to transparency by the company is a dubious substitute for interventions that promote public participation and control. Transparency reforms alone are an unpredictable reform tool without explicit interventions that halt the arbitrary accumulation of power through secrecy. See, e.g.,David E. Pozen, Transparency’s Ideological Drift, 128 Yale L.J. 100, 163 (2018); see also Stop LAPD Spying Coalition, Know Your Fights: Using Public Records Laws in Abolitionist Organizing (2022), https://stoplapdspying.org/wp-content/uploads/2022/10/KNOW-YOUR-FIGHTS.pdf [https://perma.cc/275J-SXEM] (describing the limitations of transparency discourse without robust mechanisms for accountability).

[302]. Barocas & Selbst, supra note 15; see Huq, Constitutional Rights, supra note 19.

California Law Review https://californialawreview.org