Around the year 2007, for the first time, we created a zettabyte of data in a 12-month period. In creating one zettabyte of data, we created more data in one year than we had created in the 20th, 19th, 18th and all previous centuries of human civilization. By 2010, we were creating two zettabytes a year. In 2035, we will create more than 2,000 zettabytes. This is not sustainable.
A zettabyte is an almost unimaginable number. I calculated that to print out one zettabyte of data you’d need the paper from 20 trillion trees. There are only about 3.5 trillion trees left on the planet, so that’s not possible. That’s one zettabyte. Try to imagine 2,000.
Statista estimates that we only store about 2% of the data we create in any one year. That means that 98% of data produced on a yearly basis has either a single use or has no use at all. By 2020, we were storing about seven zettabytes. In 2023, we will produce about 100 zettabytes and data centers will store about 10 zettabytes.
In 2035—just 12 years away—Statista estimate that the yearly production of data will be more than 2,000 zettabytes. We will be storing somewhere in the region of 200 zettabytes. To do that we’ll need to build 20 times more data centers than we have today.
It’s estimated that data centers are consuming 1–2% of total global electricity, and in the region of 2% of water. In 2022, about 70 million servers were used to store data. Each one caused 1–2 tons of CO2 to manufacture and that year about 20 million of them became e-waste. A twentyfold increase in the above, during a period when we should be radically reducing energy consumption, and when we are heading into a severe water crisis, should be unthinkable.
People say, don’t worry, innovation will solve the problem. Some cite Moore’s Law, which for many decades saw computer processing becoming more powerful and cheaper. The consensus is that Moore’s Law is in severe decline. A related phenomenon called Dennard scaling for decades saw significant reductions in the power consumption of processors. Dennard scaling began to stop around 2005–2007. The basic reasons behind the decline of Moore’s Law and Dennard scaling is physics. Chips are getting so small they are reaching physical limits.
The best definition of Big Data I’ve ever come across is: “When the cost of storing data is less than the cost of figuring out what to do with it.” At least 90% of stored data is crap. The growth of data is yet another story of modern human waste. So bad is the state of data ‘management’ that most organizations have no clue how much data they have. The Cloud and data centers have a wonderful business model. They are data dumps that charge luxury hotel prices. Tech consultancies have a wonderful business model. They provide dump trucks to ‘migrate’ vast quantities of crap data to the Cloud while charging limousine prices. It’s all a con, but then so much of modern life is a con.
In the middle of multiple environmental crises, we have a technology industry that has grown rich on producing and storing crap data. We are destroying our environment for crap and waste.