Source Collect: The Algorithmic Racial Proxy

This podcast episode accompanies the article from Professor Gamal: The Algorithmic Racial Proxy

Transcript

SPEAKERS

Host: Davis Rich
Guest/Author: Professor Fanna Gamal, UCLA School of Law

Judge Thelton E. Henderson  00:04

And that's what sustains our system, is that having one's day in court, feeling you were heard, and even though you don't agree with the ruling, you feel you've been through a fair process.

Davis Rich  00:15

Algorithms shape our modern world, determining everything from which ads we might see on Instagram to who is afforded access to credit. Yet if you're not a machine learning engineer, it's hard to discern what decisions go into the development of these algorithms. That question -- what input decisions go into the creation of machine learning algorithms -- motivated Professor Fanna Gamal's latest article, The Algorithmic Racial Proxy. Professor Gamal, Assistant Professor of Law at UCLA School of Law, noticed that developers often exclude race and racial proxy variables as an input when creating machine learning algorithms. 

So what exactly is a racial proxy? Why is that a legally difficult question to answer, and what are the implications of allowing those who develop machine learning algorithms to decide. My name is Davis Rich, and this is Source Collect, the podcast of the California Law Review. At the California Law Review, we strive to collect sources that underscore how law shapes society and how society shapes the law. The goal of our podcast is to provide an accessible and thought-provoking overview of the scholarship we publish. On today's episode, Professor Gamal joins Source Collect to discuss her article, The Algorithmic Racial Proxy, published in the April 2026 issue of the California Law Review. 

Davis Rich  01:49

Professor Gamal, thank you so much for taking the time to speak with me today about your article. To start, what motivated you to write The Algorithmic Racial Proxy?

Prof. Fanna Gamal  02:01

Thank you for having me. So the origins of this article really rest in an ongoing, collaborative research project between my clinic, the Community Lawyering in Education Clinic at the UCLA School of Law, and a really incredible organized collective of parents working in South Los Angeles named CADRE. So CADRE and my clinic, we partnered to investigate, kind of with the hope of understanding, what interests and forces influence the procurement of data driven technologies in the Los Angeles Department of Children and Family Services. And this is the largest family regulation agency in the nation. So a few years ago, the agency announced that it would pilot a machine learning algorithm that kind of, at a very high level helped busy caseworkers determine which families to investigate after some allegation of abuse or neglect had been made. 

When we were investigating this algorithm, with CADRE, my clinical students, and a very talented lawyer in the area of surveillance and technology, Shakeer Rahman, I noticed that the developers made certain decisions around what variables to include or exclude in the algorithm's input data. So they included race, which, as I know we'll talk about later. Sorry, they excluded race from the algorithm's input data, and I know we'll talk a little bit later about why that's not that surprising. But they also made the decision to exclude other kind of race-correlated variables, like geographic indicators. And I was really interested in their decision to exclude these race correlated variables in part because they also made the decision to include other very highly-race correlated variables in the input data. So this sort of inconsistency about what variables to include or exclude, was just curious to me. And it started me on this path of trying to understand how, not only these developers make decisions about the inclusion or exclusion of certain kind of data that is related to race in machine learning algorithms, but just more broadly, how courts, statisticians, developers, make those types of determinations.

Davis Rich  04:28

That's great. So interesting. Could you summarize the main arguments of your piece for our audience?

Prof. Fanna Gamal  04:36

I'll start the question by giving some context for how I got to some of my arguments. So this paper, broadly speaking, is about the legal regulation of racial data in machine learning algorithms. So machine learning algorithms, at a very high level, are mathematical models that search for correlations and patterns in historical data in order to achieve some designated outcome. And often that outcome is the probability that some future event is going to occur. These models represent a range of techniques and approaches. They're not necessarily generalizable for the most part, but one thing that does unite these models is the presence and the significance of human decision making. And one of the most important human decisions is the decision about whether or not to include or exclude certain variables from input data. 

Now, as is the case for most of the human decisions that are relevant to the development and deployment of machine learning algorithms or new information technologies, the law provides very little direct instruction about the selection and exclusion of input data. But one important exception to this general kind of lack of legal scaffolding is this entrenched legal principle that one's race should not determine whether one is benefited or burdened by certain government action, and also certain private actions in legally relevant, important areas of public life, like housing, employment, or credit -- areas where civil rights law is implicated. So we might understand this as broadly speaking, the legal impulse towards colorblindness. And we can talk a little bit more about what that means. But in practice, this legal impulse towards colorblindness means that, as a general matter, race will not be included as an input variable in machine learning algorithms that are designed to predict the probability of some future outcome. So, for example, predicting risk, risk assessment used in the criminal legal system, or predicting someone's probability of default on a loan, for example. If machine learning algorithms are used to accomplish these particular tasks, chances are that they will not include race as an input variable. 

But the issue is that, because of the nature of statistical modeling, simply removing race doesn't necessarily blind this algorithm from considering race, because the effect of race on an outcome can still be recreated from other omitted variables. In other words, variables that appear formally race neutral, like one's income or education level or their prior involvement in the legal system -- they all contain racial meaning. So, the question that is sort of at the foot of computer programmers, of judges, of advocates, of legal scholars, is what to do with these racial proxy variables. 

And my paper sort of intervenes on this discussion and argues that what we do with these racial proxies depends a lot on how we identify a racial proxy. And that's not actually a very straightforward task, because it forces us to ask the question: what precisely gives any given variable its racial character? And this sort of racial question has proven really difficult for judges, scholars and programmers alike. So, the paper surveys the various answers to this question to argue that today, what we're witnessing is a really important shift away from judges and courts and towards those who develop and deploy algorithms to answer this important question by relying on their own racial intuitions. And in answering this, they're actually ushering in new actors, new processes that help reconstruct race in the digital age.

Davis Rich  08:59

You’ve described this really fascinating, contemporary challenge. Your piece opens with the story of Bhagat Singh Thind. Can you share that story with our audience and explain what it teaches us about racial proxies and the law?

Prof. Fanna Gamal  09:17

Sure. Thind was an immigrant from India, a military veteran. He was an educated individual, had caste privilege, and he was trying to naturalize in 1920, at a time when the right to naturalize was limited to white people and people of African descent. So, an Oregon District Court said that he could naturalize. But the Ninth Circuit, when it was appealed by the federal government, the Ninth Circuit was actually confused about what to do with this case, and so they asked the Supreme Court for instruction on the question of whether or not Thind was a white person within the meaning of the act. They didn't just ask, "Is this person a white person?" They frame the question in a particular way. They mentioned his caste, they mentioned his religion, they mentioned his birthplace, his blood, in the question of whether or not Thind was, in fact, white. I think this is actually quite telling, because while eventually the Supreme Court, answered the question in the negative, saying that it is effectively common knowledge that Indians are not white for purposes of the act, the question that was asked by the Ninth Circuit actually disputes this framing and shows that it wasn't actually common knowledge. Race was kind of very much in dispute, and importantly, the question of how these other variables, you know, his birthplace, his caste, his education related to his race, was also in dispute, and that's a question that we are still kind of very much in the process of asking. And the answer to that question, which is an answer that implicates how courts construct race, is also very much still being answered.

Davis Rich  11:14

So I'd like to set up the modern challenge your article addresses, which we touched on a little bit. Why does the concept of "fairness through blindness" in computer programming exist, and why has it been criticized?

Prof. Fanna Gamal  11:27

I think the idea of "fairness through blindness" -- or the idea that you can simply remove the race variable or correlated race variables from an algorithm in order to effectively achieve what many are trying to achieve, which is some race neutral or colorblind algorithm -- the reason why it exists, I think, is, is because it fits quite neatly with legal impulses of colorblindness. That making something race blind by removing race actually, in fact, makes it race neutral. And it's been critiqued also for a lot of the same reasons that colorblindness in the law is critiqued, which is to say that something doesn't have to explicitly name race to take race into account. So we can think about the current debates over college admissions, an admissions officer or sort of admissions process might not take formal race into account, but to the extent that they take other factors into account that are extremely race-laden, like AP classes, like financial need, like an applicant's test scores, like their participation in sports, all of these things. There is still certainly racial meaning in those different variables. And so just eliminating race does not necessarily "colorblind," and I'm putting that in quotes, doesn't necessarily remove racial meaning from the outcome

Davis Rich  13:04

Given these challenges of defining a racial proxy in the in the context of machine learning development, we might expect programmers, or we may ourselves, look to the courts for guidance. But your article describes how judges are perhaps inconsistent in how they describe the relationship between race and racial proxies. Can you share some of the different ways that courts have approached this question?

Prof. Fanna Gamal  13:38

So, courts have taken, like you said, very different types of approaches to defining racial proxies. Sometimes they think about a racial proxy as something kind of inherently pejorative, right? That the reason why a certain variable is racial is because it represents something pejorative. Other times they talk about a racial proxy is being able to be identified because it is a variable that's highly correlated with race, and sometimes they say that it is a racial proxy because it is immutable. It is a sort of an immutable quality. 

Each of these things opens up a different set of questions that courts have not been good at answering. So, we can even take the question of something highly correlated with race. Of course, how strongly correlated with race might something be? Almost every conceivable piece of data one could collect has some correlation with race. So, drawing those lines beyond just using common sense or racial intuition -- which is, in fact, what courts typically lean on -- is extraordinarily important, and not something that is answered in the jurisprudence. 

We can think about that question too with respect to immutable qualities, certainly we can think of, you know, courts have said things like hair texture is some immutable quality such that if you discriminate based on someone's hair texture, you might be crossing into racial discrimination. But it must be something other than the fact that it is an immutable quality. I mean, one can be discriminated on based on your height, you know, or your eye color, other types of immutable qualities, but we don't necessarily think of that as racial discrimination, right? And "Why not?" is the question. What makes those types of immutable qualities nonracial, versus hair texture racial? The answer to that is, of course, historical and, you know, political and has, has been answered over time. But the very fact of accepting immutable characteristics or immutability as one of the foundations or ground truths of race is called into question by these other immutable qualities that we don't understand is racial.

Davis Rich  16:01

So these questions that you raise about the different judicial approaches around racial proxies raise the question, how do these different judicial approaches affect folks who are developing machine learning algorithms and doing programming, and what are the implications of that choice?

Prof. Fanna Gamal  16:23

That's such a great question, and not one that my paper, I think answers, honestly, very well. I'm not sure how they affect the decisions of individual programmers, and in fact, my guess is that they probably don't that much. But I think what these cases do show is the reliance on a racial intuition that judges have used to make these difficult questions. And the fact that now, in our current digital age, these same questions are being answered, not by judges who are writing legal opinions to some extent, trying to justify their reasoning, but by developers and programmers, who maybe you could argue are better positioned, or maybe even not as well positioned. But the fact is, we're witnessing a transformation in that a novel group of actors and a novel set of players are now involved in using their racial intuition to make these very important decisions.

Davis Rich  17:41

This is perhaps an issue that folks who study this specific issue area have thought about, and thought about the problem of proxy discrimination. And your paper examines a couple of approaches that exist to addressing proxy discrimination. I'm wondering if you could explain to our audience the two approaches that your paper examines and what you see as the shortcomings of those approaches.

Prof. Fanna Gamal  18:11

Yeah, so one approach is, you know, we could talk about it as the orthogonalizing approach. In that approach, you effectively try to isolate the effect of race from the non-racial effect of the proxy variable on the outcome. And so, you do this by creating a model that includes race, and then you achieve the presumably race neutral coefficients of the proxy variables. And then once you apply your model to an individual case, you put in a placeholder value for race, and then you presumably achieve an orthogonalized outcome. The problem with this that I point out in the paper is, fundamentally, that it relies on a belief that there is such a thing as an orthogonal relationship between race and these proxies. But if we think about this question, historically, one's race, you know, has often turned on the presence or absence of certain proxy variables. So, factors like skin color, or geographic location, or your associations were not just understood by courts as proxies for race, but people's race actually turn on the presence or absence of these proxies. And so, what I'm pointing to is the paradox of understanding race and these proxies as somehow in an orthogonal relationship, or orthogonal posture. 

The other point, the other approach that I discuss, is sort of an outcome-based approach. That approach sort of looks to algorithmic outcomes to determine whether or not some racial discrimination has occurred via the algorithm. I'm not opposed to scrutinizing outputs. I think that we ought to be scrutinizing algorithmic outputs. The concern I have, or the addendum I have to that approach, is to just think about how that approach also necessitates using some kind of racial intuition, or embedding an idea of race into an algorithm. So, for example, a computer programmer will have to decide a question of how much racial differentiation that is produced by the algorithm is too much, and how much differentiation is acceptable based on their idea of what "raw racial difference" is. And that idea of certain differences being attributed to race, not to other types of variables that are presumably included in the outcome, but to race itself, is also an important racial determination. And in fact, I would say that any acceptance of some raw racial difference returns us to some kind of understanding of race as biological or, you know, inherent or static, you know, fixed, and that has proved to be a very problematic ideology. So, my concern is attention to this problem of how we distinguish algorithmically-produced differentiation from algorithmically-produced racial discrimination.

Davis Rich  21:58

One concern that you raise in your article is what you call "a delegation of epistemic authority over racial proxy determinations to those who develop and deploy machine learning algorithms." This concern is exemplified in an algorithmic racial discrimination case between the U.S. Department of Justice and Meta. Can you give our listeners some background on this case and how the government framed its complaint?

Prof. Fanna Gamal  22:28

The case against Meta was brought because of practices that Facebook was permitting in its ad delivery system. So, in particular, Facebook allowed people who were seeking to buy ads for housing, the opportunity to include or exclude certain groups from an ads-eligible audience. And that is, of course, not a surprising thing. Ad targeting, which includes, you know, the ability to include or exclude people from being the intended audience for the ads is core to Meta's business model. It is the foundation for their incredibly lucrative operation. The issue with these housing ads is that people who were purchasing them were permitted to include or exclude groups that were defined as or labeled "ethnic affinity groups" from seeing a housing ad. So that means you could, if you were placing a housing ad, go into the ad portal and say that you would like, or select from a drop down menu, "ethnic affinity groups," like Asian-American, like African-American, like Hispanic, to exclude those groups from seeing the ad. 

Now this raised a really important question about violations of the Fair Housing Act, which prohibits racial discrimination in many different areas, but in particular  in housing advertising. The retort of the company was that this actually was not racial discrimination, that their ad portal was effectively race neutral, because these ethnic affinity groups were not direct racial markers. And in fact, Meta contends that it doesn't collect information about race. It doesn't have users’ racial information. Instead of being directly tied to users' race, these ethnic affinity groups were algorithmically constructed groupings of people's affinity for these particular groups based on people's digital history. The collection of data that Meta had about its users, it would algorithmically kind of organize them into particular groups, and those groups represented people who had an affinity for certain ethnic groups. It wasn't, in their view, a direct racial marker. 

So just assuming that we accept that argument from Meta, the DOJ and civil rights organizations argued that even if it wasn't racial markers, this was a clear case of racial proxy discrimination. In other words, these groups mapped very closely on to racial groups, with almost surgical precision  -- and that was a term that was used in the complaint -- that these groups mapped on to other groups with almost surgical precision. And so, this was, in fact, racial discrimination, but via proxy. Eventually, Meta and the DOJ reached a settlement in 2022. And in that settlement, Meta agreed to change some things about its ad targeting and the types of categories that it would permit people who are buying advertising to use when constructing their ad audience. And I should say, it wasn't just housing advertisers that are now going to encounter new types of groups, but also credit, housing, and employment.

Davis Rich  26:50

So, I want to hone in a little bit on the settlement and the terms of the settlement as it relates to racial proxies and Meta's authority to kind of determine the relationship between race and racial proxies in its algorithms. Can you speak a little bit more about some of the key terms of the settlement related to racial proxies and what you make of this result?

Prof. Fanna Gamal  27:14

Okay, so I think that this is a really important question to ask, and I think that we ought to you know, as legal scholars be able to investigate questions in the settlement with these platforms, because many of the legal disputes with the most powerful platforms end up settling. So as part of the legal settlement, Meta agreed that it would no longer permit housing advertisers to include or exclude users based on their membership in a protected class. When it came to the question of their membership in ethnic affinities, in ethnic affinity groups, Meta agreed that they would no longer allow housing advertisers to include or exclude people based on their membership in any group that was “semantically or conceptually related to race” -- and that is a quote from the settlement. These groups can no longer be semantically or conceptually related to race. 

This is really a very interesting outcome of the settlement, because, of course, it begs the question, "What about these groups is semantically or conceptually related to race, or what about particular algorithmically constructed groups in general make them semantically or conceptually related to race?" Now in the paper, I point to this very important shift to thinking about the racial proxy, or what makes something racial, on the semantic and conceptual plane, instead of on the terrain of the more substantive human decisions that go into assembling the algorithmic group.  So how is the algorithm itself organizing or ordering people? What decisions did the developers make with respect to kind of the data or the techniques, the methods that they used? Rather than peer into these questions and explore these questions, the settlement accepts that Facebook or Meta can algorithmically construct these groups as long as they don't label them as something “semantically or conceptually related to race.” And it's not just that Facebook can't name something as “semantically or conceptually related to race.” It's that Facebook can't name something that Facebook believes is “semantically or conceptually related to race.” And here, of course, is the delegation of epistemic authority that I mentioned, right, that Facebook, in some ways, decides what is “semantically or conceptually related to race” as they're building their sort of ad targeting system. 

The other question that you asked was about sort of the regulatory conceit. I think it's a regulatory conceit because it doesn't permit any additional peering into the substantive questions that go into developing the algorithm. It only allows challenges to the semantic decisions, sort of the linguistic or rhetorical and conceptual decisions that Facebook makes. And I think that that is a missed opportunity. And I especially think it's a missed opportunity because what we are talking about here is the ability of Facebook to algorithmically assemble people into groups and then use membership in those groups to expose them to differing access to resources, privileges, and opportunities. And I think that it's that process that makes these groups racial proxies. That is what gives these groups their racial character, not what Facebook calls them, but the fact that they are assembling them and then using them to dole out resources in these really important areas of public life, like housing, like employment, and like credit.

Davis Rich  31:45

I want to follow up about the power that you just described. The shift from interrogating the human decisions that go into developing an algorithm to ensuring only that Meta itself, the creator of the algorithm, doesn't think that the groups its algorithm creates are “semantically or conceptually related to race” feels very important. What are the implications of this power shift beyond the Meta case that we've discussed?

Prof. Fanna Gamal  32:16

I think it's important to think about what -- from a perspective of race as a process, as something that you do, not necessarily something that is -- what race allows people to do, and what ideas of race allow people to do. And right now, what kind of legal understandings of race are, like those embedded in the settlement between the DOJ and Meta. What that idea of race is allowing Meta to do is to assemble all types of people into different groups and then use those groups to dispense really important resources. And that is discriminatory. You know, there is a discriminatory nature to that, right. 

The question that the law asks is whether or not this is racial discrimination based on the law's understanding of race. And I think that that is a problem, because we are all exposed to that type of discrimination, right? This idea of race as semantically or conceptually, kind of you know, inhering in some semantic or conceptual approach is allowing Meta, this really, really powerful tool, right? This really powerful opportunity to be able to dispense important resources in housing and employment and credit, these areas of life that are so fundamental. And it's allowing them to do it with very little oversight or regulation, you know, very little peering under the hood. 

And I think the fact that we have civil rights laws in these areas can motivate us to understand that we can peer under the hood of these practices, of these algorithmic practices. We can look at these algorithms. I think that that is very much in relation, or sort of in the spirit of the civil rights laws, that these areas of life are really important, and so how people act, or how companies, individuals, public and private actors act in these areas gives us as a public more authority to intervene on what they're doing. And I think that this is something that regulators ought to understand, scholars ought to understand or push for. We have a responsibility to explain that and the public, I think, can also demand more.

Davis Rich  35:13

To conclude, what do you hope a reader of your article or a listener of this podcast takes away from The Algorithmic Racial Proxy?

Prof. Fanna Gamal  35:22

So, I think there are really two things that I want people to take away from the piece. So, for scholars who are interested in racial construction, I want them to take away sort of a deeper understanding of the role that those who develop and deploy algorithms are playing in reconstructing race in the digital age. Because it's really a novel set of actors that get to embed their kind of racial intuition into machine learning algorithms that are, of course, increasingly important to human life. And for those who are interested or concerned about the power of data-driven technologies, just more broadly, I want them to understand how narrow conceptions of race allow really powerful companies like Meta to sort of dole out access to really important resources and opportunities. And that that process happens sort of in the service of protecting Meta's really lucrative practices of targeted advertisement. So, I'm hoping that the article sheds light on interests of both of those groups of people.

Davis Rich  36:30

Well, thank you so much, Professor Gamal. This has been such a great conversation.

Prof. Fanna Gamal  36:34

Thank you for having me.

Davis Rich  36:36

Thank you for listening to this episode of Source Collect. If you would like to read Professor Gamal's article, you can find it at californialawreview.org This episode was recorded in April 2026. For updates on new episodes and articles, please follow us on Instagram @californialawreview. A complete list of our socials is available on our website. Lastly, you can find a list of the editors who worked on this episode of the podcast in the show notes. See you next time.

Next
Next

Source Collect: The Complexities of Consent to Personal Jurisdiction