Kevin's blog

In these posts I elaborate on my thoughts and concerns regarding science, technology and society.

Mastodon verification


Regulating AI: Discrimination and Data Efficiency

Last September, Wired released an article The Fight to Define When AI Is ‘High Risk’, which addresses the shared concern of more than 300 organizations regarding the EU's Artificial Intelligence Act and the safeguarding of people against potential harm from AI and the effects of their applications.

Tech giants like Google and Facebook are rather obviously against what they style as over-regulation, or intend to differentiate between suppliers of 'generic' AI and its deployers, as the article states. Here the term 'generic' is important. As companies and public organizations increasingly move parts of their infrastructure to 'the cloud', and software (like our devices) becomes less a thing we own and more a subscription-based service (SAAS), the greatest profit lies in developing widely deployable software with little if any possibility of local control. Even public management organizations increasingly rely on external companies to provide software to manage the public realm and their data.

What strikes me is that in public policy making at national and EU-levels there is little inclusion at all of those software engineers and data scientists who are in fact developing AI and algorithms. It seems to me that as long as we don't include the actual developers who write software with increasingly autonomous functions, we will never really understand what we are regulating and therefore fall short. And I don't think this is a one way street either: what I've seen of rather large organizations and companies, there are often only a handful of engineers or system architects who can be said to truly understand the linkages between the multitudes of (legacy and new) systems in use, and the ephemeral spaces potentially created by easy fixes and shortcuts in IT system development.

Of course, this might be over-complicating the issue. But I wonder about the relationship between the IT-infrastructures we have created in the past with the continuation of the development of software as increasingly adaptive and integrative with these systems and the public domain. There's a distinct dimension to AI which continues a cyber-cultural tradition of digital utopianism or tech optimism in which solutions for social problems are sought in the technical domain.

I have met data scientists who assured their managers that of course the algorithm "does not discriminate". Obviously the algorithm discriminates. Algorithms do nothing else than discriminate on the basis of input. They are devices of calculated discrimination: it is their singular function. I do of course understand the implicated social associations with discrimination and the discourse that an algorithm should not discriminate against marginalized people of any kind. I'd like to stress here a difference in meanings. When data scientists talk about discrimination in the social sense, they don't talk about discrimination. They talk about data efficiency and how it relates to the distributional shift which increase safety and robustness of models. These words and concepts comprise the realm which should be included in the further invention of the ethics of AI. In this realm, sadly, human oversight is considered "particularly costly".

The question I raise is if we understand to what great extend we would need to guide technological developments in order to create sufficient feedback loops on governmental levels. Is our current neoliberal capitalist system sufficiently equipped to deal with the organizational change necessary to critically think about these socially-technically relevant questions, if the next big profit lies around the corner?

The 'High Risk' label for AI Wired mentions as a positive direction does not to me seem a bad thing. However, I do think this notion reveals our cultural immaturity when it comes to thinking about technology and innovation on a grand scale. Labels might give legitimacy to certain developments and not to others, but the idea of a label is that it can be taken off and swapped easily. The more labelling we do, the more we need organizations to keep track of these labels, without turning into a lethargic bureaucratic monstrosity or underfunded agency. I wonder if, in a time when EU oversight regarding data and privacy protection is so underfunded, we understand what it takes to commit to the regulation of AI and advanced technologies.

Situation the Project: Engaging Multiple Disciplines

This post was first uploaded to, on 25 February 2022.

As Erik Tjong Kim Sang mentioned in the first news update, he and Stefan Andrade have been working on the project for several months now, both part-time. One of the main roles of the project, a full-time post-doctoral researcher (me), started on the project in the beginning of this year. I'm very excited about this project, as it touches on many dimensions I find important and interesting to do. In this first news update, I will briefly elaborate on these dimensions.

Narrative Research on Covid-19

Narratives are rich and detailed accounts of how people make sense of what happens in their lives. Although what a narrative is, is interpreted differently by different authors, how people story, give words to, take a position and attribute (non)agency (for instance) in a story matters for their well-being and outlook on life, and is also important for what comes to matter in social and political sense. Stories on Covid-19 are a particularly relevant contemporary example of this mattering practice of narratives. Doing narrative research also means reducing the complexity of people’s stories into abstract data, and we should care that nothing salient gets lost in the process. For instance, dealing with increasing amounts of data could mean that we, as human researchers, do not stay close to the meanings embedded in texts. We skim over it (perhaps because of a lack of time), while trusting programs to make decisions on the relevance of particular phrases for our research, and taking them at face value. So this is something that we want to avoid. One of the difficult tasks of this project becomes clear here: We are going to translate people's stories to data points interpretable by computer programs, for the sake of training black-boxed algorithms. This means that the team is going to make decisions on what is important, and what is not important in other people's stories, with the risk of either attributing meaning where the text intends none, or by mislabeling the meaning of words, sentences, sentiments, or actions which really should have different sorts of meaning. Of course, these decisions could also be data-driven, e.g., through explorative data analyses. More so, there is the risk of misinterpretation or bias and data inefficiency (more on this in a later post). In other words, the choices made on daily basis by this team are social, political, and technical choices, and inextricably intertwined. It's my job then to keep this perspective in mind, especially when diving deep into the code (which, thankfully, is done by my colleagues!

When is Multidisciplinarity? Or: Programming for the Social Sciences & Humanities (SSH)

One way to keep remembering how intertwined the social, political and technical dimensions are, is by means of an interdisciplinary approach to project-based work. I have worked both in 'soft-core' qualitative projects, and in 'hard-core' technical development and operational teams - none of which are really 'hard' or 'soft', by the way. One of the main things I noticed regularly however, is the tension between a push by business for 'deliverables' on the one hand and 'experiential experimentation' on the other hand. In many companies, the main concern is to deliver working software as soon and often as possible. I'm not going into the neoliberal politics of this frame of thought here, but I will say that what constitutes working software in these contexts is always a matter of negotiation between some things which work (and are successful) and some other promised features don't work as nicely, and are often 'nuanced' just to continue the external story of success and growth.

In this research project, we want to do things a bit differently. This doesn't mean that we don't want to deliver working software, but it does mean that we primarily want to learn how to deliver meaningful programs. Because this is a research project, we're first and foremost interested in what computational analysis (such as Digital Story Grammar1) can mean for research in narratives psychology, and more broadly, the social sciences and humanities.

Teaching Computers to Read: A Different Sort of Primary School

Translating computational analysis into language oriented towards the Social Sciences and Humanities (SSH), I could say we're dealing here with teaching computers how to read and interpret stories. Making an analogy with learning language in human contexts, I know many primary schools start by teaching children to recognize simple letters, words and sentence structures. Teachers in schools use (or used to use when I was there) whiteboards and simple illustrations to instruct children, aided by assignments of multiple repetitions. It is a humbling thought that this might be the language-level we are dealing with in this project.

But of course, we want to arrive at answers to these more difficult questions: How do we train computer algorithms to recognize (or even understand?) that narratives have contexts, subtleties and nuances, and often ambiguousness, all of which are crucial for the meaning of narratives? In the words of literary scholar Kenneth Burke, we might approach this question by looking for the "... terms that clearly reveal the strategic spots at which ambiguities necessarily arise".2 And, the next step: how do we create visualizations to enable researchers to navigate through the results of the computational analysis of texts? Of course, there are many differences between teaching humans to read, and teaching computers to read. Humans are able to learn, as our brains are quite plastic and nearly endlessly able to adapt to new situations, especially due to the 'programming' of our psychological unconscious and how it's embedded in neuronal networks.3 On this topic, and the parallels and differences between human unconsciousness and computer unconsciousness (e.g., algorithms), I encourage you to read Babel: Could a machine have an unconscious?,4 on AI (GPT-3), large language models and the creating and solving of problems by technology, published by N+1 magazine by the brilliant Meghan O'Gieblyn.

Registering Changes in how People make Sense of Times of Transition

Eventually, the point of the project is not simply to create programs for computational analysis. The point is rather to be able to analyze stories in times of transition, and to analyze more of them. This is a social and political goal: We want to register changes in how people make sense of times of transitions. What did (does?) Covid-19 really mean for people in precarious situations? What do political measures against Covid-19 mean for those with chronic illnesses, and what does the reversal of those measures mean for them? As Anneke Sools5 asked in her research: How do people narrate their futures in uncertain times, and how can this knowledge improve our world? In collaboration with other organizations, we have access to (open) databases of these personal stories, which we can use to train and test our software. All the while staying close to the texts and what's critical in these stories.

  1. Andrade, Stefan Bastholm, and Ditte Andersen. 2020. ‘Digital Story Grammar: A Quantitative Methodology for Narrative Analysis’. International Journal of Social Research Methodology 23(4):405–21. doi: 10.1080/13645579.2020.1723205. 

  2. Burke, Kenneth. 1969, p. xviii. A Grammar of Motives. Berkeley, California: University of California Press. 

  3. Malabou, Catherine. 2008. What Should We Do with Our Brain? New York: Fordham University Press. 

  4. O'Gieblyn, Meghan. 2021. Babel: Could a machine have an unconscious? in N+1 Magazine. 

  5. Sools, Anneke. 2020. ‘Back from the Future: A Narrative Approach to Study the Imagination of Personal Futures’. International Journal of Social Research Methodology 23(4):451–65. doi: 10.1080/13645579.2020.1719617.