Be very careful what you feed AI

When I was younger, I met a respected computer scientist. He explained to me about how he had created a model for predicting the agricultural output of an island in the Pacific. As I listened I was in awe, thinking I was in the presence of a genius. He described a complexity that made my brain whirl. With his model he was predicting exact crop growth rates twelve months out.

After his description I was gobsmacked. “Of course,” he said after a few moments’ silence, “any change in precipitation has very major impacts on the model.”

“How likely is it that rainfall levels would change for that island,” I asked.

“Oh very likely, very likely indeed,” he replied.

What an enormous idiot, I thought to myself as I got on my bicycle. His whole model of tremendous complexity was essentially useless because even the slightest change in rainfall had huge impacts on all the other variables. You might as well examine what way the hairs on the back of a donkey were standing and then predict next year’s crop.

Models can be hugely powerful. Models can be hugely wrong. And the geniuses that create models often make extraordinarily naïve assumptions. Models are not reality, and the problem is that some who work intensely on models think they are reality, think that they are reality makers. Models can be right until they go horribly wrong.

AI and all these models depend on data. AI cannot develop anything approaching intelligence without being fed with lots and lots of data. If my experience of almost thirty years of working with the data of some of the largest organizations in the world is anything to go by, then we should be genuinely fearful of the artificial ‘intelligence’ that will emerge.

Most data has huge social and historical bias. The prejudice of race, class and gender is written deep into data. And most data is incredibly poorly maintained. Not just that, most organizations I’ve dealt with have no clue what data they have, let alone what its quality is. What’s more, nobody cares. To even do an audit of all the data that exists within an organization is seen as something that is not worth the effort, because it’s just too hard because there’s just too much of it.

Organizations have a cult-like belief and magical thinking approach when it comes to technology. They feel they can turn AI loose on all this data and that AI will make sense of it all. “Garbage in, garbage out” is very old saying in the computer industry. Most data is garbage and needs serious cleaning. Only about 5-10% of data in a typical organization is actually useful. If you bring your AI to the garbage dump and let it feed off the garbage and the bias, then you get garbage AI.

Organizations claim that data is critical and yet there is hardly anything that gets less care and attention than the data. At least we take out our physical garbage. We don’t even do that with data garbage. We just store it, leave it lying there in the Cloud until the costs rise to such a point that we’re forced to do something.

Quality data is critical. Quality data should not be left to rot among the garbage data because the vast quantities of garbage degrade the quality. Separating the garbage amid the quality requires HW (human wisdom), that attribute that seems to get rarer as ‘intelligence’ increases.