Fallible data

Data is not fact and fact is often just a hypothesis anyway.
We humans design how data is created and we humans are the ones who interpret
data and draw conclusions from it. Therefore, data will always be inherently
fallible. Unless we approach it with a sense of humility and a willingness to
acknowledge our opinions as hypotheses to be tested, we will end up using data
to entrench the gut instinct behaviors we claim we want it to replace.

To many, data is the new god. It can do no wrong. “The data
says” has a certainty and absoluteness to it. “Data doesn’t say anything,”
explains Andrea Jones-Rooy in
an excellent article for Quartz magazine. “Humans say things. They say
what they notice or look for in data—data that only exists in the first place
because humans chose to collect it, and they collected it using human-made
tools.”

For those involved in customer or user experience, or in the
creation of content, or in the digital design world in general, data can be a
powerful advocate. Again and again, I meet web professionals who struggle with
organizational politics and ego. The opinions of other stakeholders often demand
the creation of more and more digital stuff, much of it unnecessary, and some
of it counterproductive to a good customer experience.

Many of these stakeholders are in senior management, so if
your response to them is your opinion, it is their opinion that will invariably
win. Or even if you win, your career will lose because, whether we like it or
not, pleasing our superiors by agreeing with and implementing their opinions,
is still the safest way to promotion. Those who champion the customer are often
seen as awkward, obstructive, stubborn. Not a great career move.

Data can take the heat for you. You can use data to convince
others. In the best scenario, data can help you change minds and opinions, so
that stakeholders develop new opinions based on data. That’s a genuine win-win
situation.

However, once you enter the arena of data, you have to be
careful. Some will see your behavior as a claim to have a higher truth, to have
more robust facts. Some will challenge your data. You need to be able ensure
that your data is as valid as possible, while at the same time clearly
outlining its weaknesses and limitations.

“Data is an
imperfect approximation of some aspect of the world at a certain time and place,”
Jones-Rooy explains. “It’s what results when humans want to know something
about something, try to measure it, and then combine those measurements in
particular ways.”

According
to Jones-Rooy, we can introduce imperfections into data in number of ways. Random
errors occur as a result of faulty equipment or human mistakes. A systematic
error results from “using data from Twitter posts to understand public
sentiment about a particular issue is flawed because most of us don’t tweet—and
those who do don’t always post their true feelings,” Jones-Rooy states.

You may choose the wrong things to measure. You might measure how long people spend on your website, thinking the longer they spend the better the experience, whereas it may reflect time wasted because of confusing navigation and low quality search results and content. Errors of exclusion happen when you exclude a particular segment of the population and then assume that the data applies to them too.

I’m a data scientist who is skeptical about data