By Christoph Heitz (ZHAW)
translated from original German language version published at Inside IT
Can a prisoner be released early, or released on bail? A judge who decides this should also consider the risk of recidivism of the person to be released. Wouldn’t it be an advantage to be able to assess this risk objectively and reliably? This was the idea behind the COMPAS system developed by the US company Northpoint.
The system makes an individual prediction of the chance of recidivism for imprisoned offenders, based on a wide range of personal data. The result is a risk score between 1 and 10, where 10 corresponds to a very high risk of recidivism. This system has been used for many years in various U.S. states to support decision making of judges – more than one million prisoners have already been evaluated using COMPAS. The advantages are obvious: the system produces an objective risk prediction that has been developed and validated on the basis of thousands of cases.
In May 2016, however, the journalists’ association ProPublica published the results of research suggesting that this software systematically discriminates against black people and overestimates their risk (Angwin et al. 2016): 45 percent of black offenders who did not reoffend after their release were identified as high-risk. In the corresponding group of whites, however, only 23 percent were attributed a high risk by the algorithm. This means that the probability of being falsely assigned a high risk of recidivism is twice as high for a black person as for a white person.
Algorithm-based software systems like COMPAS are having an impact on more and more areas of our lives – often in the background, without those affected being aware of it. They make decisions independently, or, as in the case of COMPAS, they support human decision-makers. Algorithms influence whose application is read by the Human Resource manager, who gets a loan for buying a house, in which urban areas police surveillance is intensified, which unemployed people receive support and which do not, who sees or does not see what kind of information. The basis for these decisions or recommendations is always data – in most cases personal data.
There are good reasons for the increasing use of algorithms: In many cases, algorithms consistently make better decisions than humans. As computer programs they are objective, incorruptible, can be trained on millions of data sets, have no prejudices, and their decisions are reproducible.
Nevertheless, the example of COMPAS shows that important social values such as justice, equal opportunity, and freedom from discrimination can be at risk. Algorithms can create social injustice. COMPAS is one of the most cited examples, but there are many others. A study published by the anti-discrimination office of the German federal government from September 2019 describes no less than 47 documented cases of discrimination based on algorithms (Orwat 2019).
This type of discrimination is particularly critical because it is usually not intentionally built into the algorithms, and is often detected only much later – or not at all. Thus, the problem of “Algorithmic Fairness” has been on the radar of science and society only since few years.
For many years, discussions about Big Data focused on the problem of data protection – who is allowed to do what with personal data, and how can it be protected against unauthorized use? The European Data Protection Regulation (GDPR) from 2018, for example, was developed in this spirit.
In the meantime, however, data-based decision-making systems have spread massively and are rapidly penetrating more and more areas of life, where they have a very concrete impact on the lives of countless people. In recent years, the issue has therefore been the subject of increasing debate: how are people and our society as a whole affected when such algorithms increasingly control our lives? How do we ensure that social achievements and values are not thrown overboard suddenly and perhaps even unnoticed?
Can algorithms be fair?
An algorithm is a rule that calculates an output from input data, for example the probability of becoming a criminal. Such a calculation rule can be derived using different methods, for example, by statistical modeling or machine learning methods. The calculation rule is usually optimized using training data so that the result is an “optimal” calculation rule, for example, one that makes the best possible prediction for a new crime. In this form, the outputs of an algorithm are then used as the basis for a decision.
An algorithm is objective, incorruptible, unemotional, and always works the same way. But does that make it fair or just? The sobering answer is: No, data-based algorithms are usually not fair! The reason is simple: The goal of developers is not to produce fairness, but good predictions. And the algorithm that makes the best prediction is at best fair by chance, but usually unfair – as countless concrete examples show. So we are right to be seriously concerned.
This leads to a second question. If fairness does not come automatically: Can fairness be implemented into algorithms? This is indeed possible. In recent years, there has been intensive worldwide research on this topic, and there is now a great deal of knowledge on how to develop fair decision algorithms.
As a matter of fact, fairness of algorithms can be measured. This is exactly what the ProPublica reporters have done with the COMPAS system: What percentage of blacks who have not committed crimes was rated “high risk” by the algorithm, and what was the ratio of whites? Unlike human decision makers, algorithms can be put to the test and tested with a high number of cases to determine their characteristics, including their fairness characteristics.
What exactly is fairness?
It is precisely at this point that it becomes ambiguous, because it is not clear how exactly fairness should actually be measured. Arvind Narayanan of Princeton University has named no less than 21 different fairness criteria used in the technical literature (Narayanan 2018). Crucially, it can be proven mathematically that many of these criteria are mutually exclusive: It is not possible to meet all of them at the same time (Chouldechova 2017).
A closer analysis shows that this is not a technical problem, but an ethical one. Indeed, the various statistically defined and measurable fairness criteria correspond to different notions of social justice. For example: «fairness» can mean that the same rules must apply to everyone. But «fairness» can also mean that everyone should have the same opportunities, which is not the same.
Taking the example of school grades in sport as an example, one can see that these two ideas of fairness are mutually exclusive: If girls and boys get a 6 for the same distance in long throw (same rules), girls obviously have worse chances of getting a good grade. In this case we cannot demand equal rules for all and at the same time equal opportunity for all. This applies to all decision-making mechanisms, including algorithms.
Someone must therefore decide what kind of fairness or social justice an algorithm should ensure. This requires an ethical debate in which opposing values usually have to be weighed against each other. In the political arena we are used to such discussions. But this is different in the field of algorithmic fairness. Here, the connection between a concrete discussion of values on the one hand and technical implementation in the form of decision algorithms on the other hand is still in its infancy. How can we tie the ethical discourse with engineering? Where do ethicists and engineers find the common ground on which socially acceptable algorithms can be developed?
Interdisciplinary collaboration in Zurich
Initial approaches are currently being developed in an interdisciplinary research collaboration between ZHAW (School of Engineering) and the University of Zurich (Ethics). This research will help to ensure that the undeniable advantages of modern data-based decision algorithms create practical benefits without harming our social values.
References
- Angwin, Julia; Larson, Jeff; Mattu, Surya; Kirchner, Lauren (2016): Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks, in: ProPublica, online.
- Chouldechova, Alexandra (2017): Fair prediction with disparate impact: A study of bias in recidivism prediction instruments, in: Big data, 5. Jg., H. 2, S. 153-163.
- Narayanan, Arvind (2018): FAT* tutorial: 21 fairness definitions and their politics.
- Orwat, Carsten (2019): Diskriminierungsrisiken durch Verwendung von Algorithmen. Berlin: Antidiskriminierungsstelle des Bundes.