We need to talk about data. Crap data. We’re destroying our environment to create and store trillions of blurred photos and cat videos, binge watch Netflix, and ask ChatGPT inane questions and get instant wrong answers. We’re destroying our environment to store copies of copies of copies of stuff we have no intention of ever looking at again. We’re destroying our environment to take 1.4 trillion photos every year. That’s more photos taken in one single year in the 2020s than were taken in the entire 20th century. 10 trillion photos and growing, stored in the Cloud, the vast majority of which will never be viewed again. Exactly as Big Tech wants it.
I have spent almost thirty years working with some of the largest organizations in the world, trying to help them to better manage their content and data. 90% plus of commercial or government data is crap, total absolute crap. Period. It should never have been created. It certainly should never have been stored. The rise of digital saw the explosion of data crap production. Content management systems were like giving staff diesel-fueled diggers, whereas before they only had data shovels. I remember around 2010 being in conversation with a Microsoft manager, who estimated that there were about 14 million pages on Microsoft.com at that stage, and that four million of them had never been visited. Four million, I thought. That’s basically the population of Ireland of pages that nobody has ever visited. Why were they created? All the time and effort and energy and waste that went into all these pages that nobody had ever read. We are destroying our environment to create crap.
Everywhere I went it was nothing but the same old story. Data crap everywhere. Distributed publishing that allowed basically anyone to publish anything they wanted on the intranet. And nobody maintaining anything. When Kyndryl, the world’s largest provider of IT infrastructure services, was spun off by its parent, IBM, they found they had data scattered over 100 disparate data warehouses. Multiple teams had multiple copies of the same data. After cleanup, they had deleted 90% of the data. There are 10 million stories like this.
Scottish Enterprise had 753 pages on its website. 47 pages got 80% of visits. A large organization I worked for had 100 million visits a year to its website, with 5% of pages getting 80% of visits. 100,000 of its pages had not been reviewed in 10 years. “A huge percentage of the data that gets processed is less than 24 hours old,” computer engineer, Jordan Tigani, stated. “By the time data gets to be a week old, it is probably 20 times less likely to be queried than from the most recent day. After a month, data mostly just sits there.” The Southampton University public website found that 0.2% of pages got 90% of visits. Only 4% of pages were ever visited. So, 96% of the roughly four million pages were NOT visited. One organization had 1,500 terabytes of data, with less than 2% ever having been accessed after it was first stored. There are 20 million more stories like these.
Most organizations have no clue what content they have. It’s worse. Most organizations don’t even know where all their data is stored. It’s even worse. Most organizations don’t even know how many computers they have. At least 50% of data in a particular organization is sitting on some server and nobody in management knows it even exists. The average organization has hundreds of unsanctioned third-party app subscriptions being paid for by some manager’s credit card, storing everything from project chats to draft reports to product prototypes.
The Cloud made crap data infinitely worse. The Cloud is what happens when the cost of storing data is less than the cost of figuring out what to do with the crap. One study found that data stored by the engineering and construction industry had risen from an average of 3 terabytes in 2018 to 26 terabytes in 2023, a compound annual growth rate of 50%! That sort of crap data explosion happened—and is happening—everywhere. And this is what AI is being trained on. Crap data.