Keeping all data is no longer an option

Data production is growing at around 25% per year. Humans produced about 60 zettabytes of data in 2020, and Statista estimates that over 2,000 zettabytes of data will be produced by 2035. This is a wholly unsustainable growth rate and it will have cataclysmic impacts on the environment if it is not radically reduced.

The culture of information technology is to store everything because you never know what might be important in the future. That was never a wise strategy. However, IT departments and CIOs could get away with it when the quantities of data were relatively low and the storage costs were equally low. However, data storage costs can now be eating up 30% of an IT department’s budget.

There are two challenges here. What to store and what not to create in the first place. According to Bob Clark, director of archives at the US Rockefeller Archive Center, the rule of thumb among professional archivists is that at most 5% of stuff is worth saving. My experience over almost thirty years of working with data and content is that 90% of data in practically any environment can be easily deleted and things will work better.

Someone in the organization needs to start actually managing data. Right now, too many IT departments are behaving like a crude warehouse, or in reality more like a data landfill. IT sees its job as adding more space to dump data. It’s not asking the crucial questions:
Why are we storing this?
Why are we creating it in the first place?

When new IT systems are installed, often the old systems don’t get properly decommissioned and all the data—regardless of quality—gets migrated to the new system. “They’re not performing an overall analysis of why we have got that particular application and its physical hardware,” data center expert John Booth told me. “Why are we moving something that’s already zombie into the Cloud? A lot of IT departments treat every single application that they have as mission critical, when actually it certainly isn’t.”

Just because you can create data doesn’t mean you should. It is simply not sustainable to create thousands of zettabytes of data every year. In 2022, there could have been almost 100 zettabytes of data created. To store all this data required about 70 million servers, with each server causing between one and two tons of CO2 to manufacture. To store 2,000 zettabytes would require 1.5 billion servers. That’s not sustainable.

Data growth is out of control. Most data is useless. The emergence of AI, automation, and the Internet of Things means we are only at the beginning of the data explosion. The cost of data—to create, manage, analyse and store it—is going to vastly outstrip the value it creates.

We must create much less data of a much higher quality. That will require a huge cultural shift among technology professionals. We will need many more data editors, whose primary job will be to decide what data not to create. We will need many more data archivists, whose primary job will be to decide that 95% of data already created needs to be deleted.

Your Memories. Their Cloud., Kashmir Hill, The New York Times, 2022

State of Unstructured Data Management Report, Komprise, 2022

Podcast with John Booth: Data centers: Data theatre and the tsunami of frivolous data

Podcast: World Wide Waste
Interviews with prominent thinkers outlining what can be done to make digital as sustainable as possible.
Listen to episodes