Toxic data, waste: feeding AI’s voracious addiction

Our data is our society. Our data is our culture. Our data is our attitudes. Our data is our prejudices. Our data is our history. And our history is full of inaccurate, out-of-date, irrelevant, poor quality, racist, misogynist, cruel and biased data. And all this data is incredibly poorly managed. In fact, much of it we don’t even know exists anymore.

People and organizations are really, really bad at managing the data they produce. And with the advent of modern technology, we have become much, much worse. I see far worse information architecture, classification and metadata skills today than I did in 2000. It is disturbing that in an age when data is exploding, the skills to professionally manage and organize it are in decline.

Blame human laziness and our obsession with convenience. The core promise of information technology has been that we don’t need to think about these problems anymore. Data centers solved the big data problem by giving us cheap and convenient storage so that we can basically store everything. The search engines then help us find everything, so we don’t need to organize anything. And then along comes AI which makes things even better.

We are feeding and training AI on massive quantities of the worst possible data. This data is not simply wrong. It reflects the worst of human society. The misogyny, the racism, the greed, the cruelty, the immature machismo upon which Silicon Valley has fed for so long.

A recent study found that AI is reintroducing “harmful, race-based medicine.” A range of AI health systems were tested and it was found that, “All models had examples of perpetuating race-based medicine in their responses. Models were not always consistent in their responses when asked the same question repeatedly.” The authors summarized their findings by stating that AI “could potentially cause harm by perpetuating debunked, racist ideas.” A few years ago, I read about an AI health system that was misdiagnosing women with heart attack symptoms because it was fed on forty years of data from misogynist male doctors who had deliberately misdiagnosed women. A society with a history of racism and misogyny will create racist, misogynist AI.

It is also true that AI systems are, at multiple levels, really very stupid and easy to fool and undermine. A tool call Nightshade allows visual artists to add invisible “poison” code to their images that can severely damage the AI that feeds off these images. Right now, scammers, grifters and state propagandists are feeding AI masses of fake information in order to rewrite the past and the present.

Meanwhile, the people who have designed AI don’t even know how it truly works. They have moved so fast, broken so many things on the way, fed AI all the data they could possibly find, that they now have systems so utterly complex and opaque that it will only be through their use that their biases, lies and various nefarious capacities will be discovered. The AI designers have let AI out into the wild, driven by greed and ambition, and a profound ignorance and sense of entitlement that they can do whatever they want to do because they’re “innovating” and making change happen. As usual with technology, we launch first and then figure out how to deal with the harms. With AI, that may become one of the biggest mistakes we have ever made.

This new data poisoning tool lets artists fight back against generative AI, Melissa Heikkilä, MIT Technology Review

Large language models propagate race-based medicine, Jesutofunmi A. Omiye, et al, Nature, 2023

Podcast: World Wide Waste
Interviews with prominent thinkers outlining what can be done to make digital as sustainable as possible.
Listen to episodes