skip to content
camaleon's log Adapting's Blog

The paradox of fairness: bias without bias

/ 12 min read

Yes, it turns out a selection process can be both biased and unbiased at the same time. It can also be biased while free of discrimination or discriminatory but without bias. So what to do if we want to make our processes fairer?

The first step is to understand what these concepts really mean, then you’ll be able to map your definition of fairness into one of them, or to a combination. This allows for the calculation of an objective measure of fairness1 that can help drive progress, not only for AI, also for human-run processes.

1. Unbiased bias, it’s Berkeley 1973

A famous foundational case in the application of statistical methods to the study of fairness was that of the evaluation of the admission process in Berkeley (a centre of America’s progressivism) in 1973. The alarms went off at the identification of gender bias in graduate admissions: 44% of men were admitted, while for women it was only 35%.

A natural first step is to look at data by department, to narrow down the possible source of the issue. The results were surprising: in every department women had higher rates of admission than men! So it seemed like there was indeed gender bias, but in the opposite direction, which wasn’t considered problematic at the time because women were a clear minority in universities, so such a bias would, if anything, help even things up.

Memorial Glade and Sather Tower on the campus of the University of California, Berkeley Berkeley campus. Source: Wikipedia, made by Gku.

These kinds of contradictory results are an example of Simpson’s Paradox, which is not truly a paradox in that it actually has a comprehensible explanation. Women were applying in higher numbers to departments that had lower admission rates; therefore, even though their rates per department were higher, because they concentrated on “harder” departments the average admission rate looked lower.

The case was closed, although we will see later there’s more to it. In any case, the thing is that if we just look at the data we could say there was gender bias in favour of men, and gender bias in favour of women, both at the same time! This is at least quite counterintuitive, and begs the question of whether bias is truly useful in evaluating fairness. So let’s take a closer look at what bias really means.

2. Definitions, it’s always definitions

Well not always, but very often paradoxes or disagreements arise from holding two different definitions for the same word simultaneously. This happens to be the case.

Bias etymologically means “oblique or diagonal line, at an angle” and in its current use it has at least two meanings:

  1. A broad one, meaning deviation from evenness or equilibrium. Justice has been traditionally linked to the notion of equilibrium (represented with a woman holding a balance), in that sense, bias literally means unfairness.
  2. A specific statistical one, meaning deviation from an expected value. The concept of “expectation” is also mathematically loaded, but let’s just say that if we expect admission rates to be equal and they are not, there’s a bias.
Justitia
A representation of Justitia in Frankfurt. Source: Wikipedia, made by Mylius.

I hope this helps shed some light onto the previous confusion. If the Berkeley admission process was biased as in 1., it could only be biased one way; but when we measure bias as in 2., we are just doing mathematical operations that can fall either way, or both ways depending on how we operate with the data. Because we are using the same word, we can’t help but translate the meaning of 1. into 2., thus the source of confusion. At the same time, both definitions are intimately connected, otherwise you would have spotted the source of the issue from the get go (perhaps you did! but given the debate this sparked when it was first discovered I assume it’s generally not evident).

Given the above, one could describe the quantitative study of fairness as finding the measure of 2. (among all the possibilities) that best represents the meaning in 1. But is it really that simple?

3. Discrimination is not bias

3.1 A closer look at the Berkeley case

From what we’ve seen it looks like looking at aggregate data is not a good idea. General admission rates make it look like men are being favoured in the selection process, but when you look at rates per department it looks like it’s actually women that have preferential treatment. I say “it looks like” because that’s not very consistent with our understanding of what the world was like back in 1973. So let’s dig in a little bit deeper and see if we can make more sense of these figures.

One key aspect is that, unsurprisingly for the time, there were twice as many male applicants as female applicants. Historically women have been discouraged from taking higher studies as their role was considered to be family life. This means that, unlike with men, few women went to college to meet their parents or community expectations; they went because they really wanted to. Since women had to face negative external pressures, it’s reasonable to assume that the ones that got through were, on average, smarter or more driven than their male counterparts2 (later on we’ll discuss an interesting dataset for testing this hypothesis).

3.2 Enter causal inference. The role of counterfactuals

This would mean that higher admission rates for women were justified. In fact, that the disparity should have been much higher than it was, meaning there could have been discrimination against women despite the data showing a positive bias! And here is the issue, data alone is not enough to identify discrimination.

Discrimination can be thought of as: “If I weren’t X, I would have received better treatment”. It’s essentially a counterfactual, a statement about a world that does not exist, but gives insight into our own. Data is by definition factual, so in order to make counterfactual statements to assess discrimination we need more than that.

What we need is basically a causal model. When data is used to extract the causal mechanisms underlying the system, we can create alternative worlds by just changing the variables we want to (gender in this case), and seeing the effects they would cause down the causal chain. If there’s no discrimination, we would see no changes in the end result, “If I were a woman, my chances of admission would have been the same”. Note that the same data can often be explained with different causal structures, meaning some degree of theoretical understanding about how the world works is required to make the right choice.

So is discrimination what we were after all along? Well, it’s complicated. This is starting to get annoying, I know, but it’s just the way this works. One thing is clear at this point, if you care about fairness, you can have quick answers or you can have right answers, but you cannot have both.

3.3 Discrimination to reduce bias

A key example where discrimination is not considered to be unfair is that of affirmative action, which has been a common policy of many USA universities that has evoked a lot of controversy, reaching the Supreme Court multiple times. In Europe, this is sometimes known as positive discrimination, as it’s a form of discrimination that aims to counterbalance pre-existing unfairness. The idea is that some people are already starting the race a few miles behind due to societal factors out of your control, so if you want to make the race fairer, you might as well push them forward to even things up.

There’s no consensus so as whether this is okay. Taking the US example (since we started with Berkeley’s case), some forms of affirmative action received green light by the US Supreme Court, first in 1978 (Regents of the University of California v. Bakke) and then in 2003 (Grutter v. Bollinger); but then in 2023 this was overturned as it was considered a kind of unequal treatment that was unconstitutional (Students for Fair Admissions v. Harvard). Interestingly, both sides ground their arguments on equality or fairness, but one considers equality of the process alone, while the other one considers equality of the process plus societal factors.

Setting that discussion apart, since positive discrimination is compensatory in nature, one interesting challenge it faces is assessing the strength of the societal factors it aims to compensate. How much forward should the disadvantaged be pushed? The most common answer is up to population parity, which is essentially a no-bias criterion, what a comeback! When enforced strictly this is known as a quota system, and it’s actually nothing new, it’s been used in many places around the world throughout history.

A famous example is that of the reforms introduced in the imperial examination system (the selection process for government officials in Imperial China) by Wu Zetian, the one and only (literally3) Chinese Empress, in the 7th century CE. At the time, the government was mainly controlled by a few highly influential aristocratic families near the capital. Apart from other reforms, Wu Zetian introduced regional quotas with the goal of integrating the local elites of the periphery of the empire, which in turn was aimed at increasing the cohesiveness of the state and reduce the possibilities of internal conflict or secessions.

Wu Zetian
Wu Zetian. Source: Wikipedia.

This may feel like a digression, but points to a crucial feature of fairness, it makes society better off on the whole. In fact, American universities at court have generally talked not so much about compensatory discrimination4, but about the positive impact of diversity in the quality of life at campus and in creating better formative experiences for their students.

Still, coming back to a recurring theme of this post, it’s not that simple. In the last US Supreme Court case (2023), the center of the story was the claim that the Harvard admission process was discriminatory against Asian Americans. This minority tends to be overrepresented in academic forums because they come from cultures where education is deeply valued, and they are under a lot of pressure to deliver good results. Should this count as an unfair advantage? If fairness and diversity are in conflict, which should prevail?

Once again, there’s no easy answer or at least not a universal one5, but that does not mean there’s no point in trying, quite the opposite. Some answers are clearly better than others, and it’s by digging deeper into each of them that we get increasingly better at making the right choices.

Wrapping up

The topic is deep, and there are many more things to discuss, but I hope this served as a thought-provoking introduction that encourages to engage more on the matter. If there are just three things to take home, I’d suggest the following:

  1. Being precise in the definitions of the concepts we handle is key in order to come up with measures of fairness that truly reflect our values.
  2. There are multiple ways to measure fairness which can be contradictory or orthogonal. It’s crucial to have a deep understanding of what fair means to you. There’s no universal solution, it requires contextual knowledge of the problem at hand.
  3. Giving more visibility to matters of fairness, through well defined and clear measures, helps drive the conversation forward and iteratively reach better solutions for all.

Footnotes

  1. Note that it’s the measure that’s objective, not the definition of fairness upon which the measure is based. Many of the things we care about are subjective after all, but introducing a little bit of objectivity helps make processes clearer and more transparent, which is a necessary condition for continuous improvement.

  2. This is, by the way, called selection bias, when the sample you have is not representative of the population, the general intuitions we have may no longer be valid. Interestingly, it seems like introducing one kind of bias can cause other biases (potentially in the opposite direction) down the road.

  3. But also figuratively, she was indeed a remarkable ruler. This made me think of reviewing the female rulers I know, and compare them with their male counterparts. I didn’t make a comprehensive analysis by any means, but it seemed to me like female rulers where on average more competent, which would be evidence in favour of how a negative selection bias creates a “positive competence bias in the selected”.

    I once found a table scoring Roman emperors according to their competence (which I loved by the way); if someone can find a compilation of scores for rulers throughout history and the world, we could compare the average rating for males and females, and assess more rigorously if the negative selection bias meant that the few women that got there were generally more competent.

  4. Explicit quota systems were considered unconstitutional from the start anyway. If you want to read about Harvard’s defence here you have the court memorandum for the aforementioned case (the file corresponds to the Massachusetts ruling, before it was elevated to the Supreme Court). Also for a more academic discussion Michael Sandel has a great lecture precisely at Harvard as well.

  5. For reference, Google lists three different fairness criteria.