Frugal data: Heavy data is a heavy polluter

Ninety per cent of data is waste. In more than 25 years of working with some of the largest organizations in the world, this has been my consistent experience. Data management is amateur to non-existent.

The philosophy of practically every organization is to collect everything – we’ll sort it out later, and anyway it’s cheap to store. As a consequence, many organizations have no idea how much data they have. In fact, many organizations have no idea how much technology they have.

We need a new culture when it comes to data. One that is frugal. One that is conscious that a byte saved is a leaf saved, that one less byte is one less unit of energy, one less piece of stress on a physical device.

Richard Campos is a designer focused on creating the lightest, most environmentally friendly software possible. He’s been chatting with me about principles such as sending only the data that is absolutely needed as part of a request.

“A typical spreadsheet experience is to have a date column reference for every datum, in the format mm/dd/yyyy, which is 10 characters long,” he explains. “For a typical year's worth of analysis, this will have 365 days x 10 characters = 3650 plus 364 separators, giving 4014 characters of text to indicate all the dates. That’s enormously wasteful. So, I set up an algorithm that receives only the start and end dates with 1 separator and generates the entire calendar sequence from them (including leap year logic). Two dates and one separator = 21 characters of text instead of 4014, so that’s a 99.5% reduction in text weight to achieve the same goal of producing an indicator time series.”

I have come across hundreds if not thousands of examples like the above, where by having the right principles, you can vastly reduce the amount of data being created and stored. Another principle is to store only what is absolutely necessary. In one example, I had a folder for processing survey results that contained 13.7 GB of data. I discovered huge duplication and unnecessary files, and after cleaning everything out, I was left with 1.07 GB of data, a 92% reduction. Nobody – nobody – wants to do this sort of work, and as a result most data centers are in fact data dumps. We’re using all these devices, wasting all this electricity, to store trash.

Richard works with organizations to help people, such as surgeons or accountants, make better decisions with the support of data. Oftentimes, the data ends up being presented as charts. But charts, particularly if they end up being presented as images, can be 50-100 times heavier than the equivalent text. Another Frugal Data principle would be to present using the least data possible.

“You've challenged me to think about when charts are truly necessary, and I would argue that most of the time they aren’t,” Richard states. “The visual could easily be replaced with a single, action-focused text. For example, the budget management chart could be replaced with text that says, ‘10 days into the month, the budget is 70% spent. The historical average is 25%. It’s time to request the department leads to limit their spend to 30% over the remainder of the month.’ A simple narrative like that, I do think it applies to most analytics.”

Be frugal with your data. Good for the climate. Good for communication.

Richard’s company, Passplot

Podcast: World Wide Waste
Interviews with prominent thinkers outlining what can be done to make digital as sustainable as possible.
Listen to episodes