Big data has unconscious bias too

We all have unconscious biases – it’s a fascinating subject. But just recently I realised that there is bias in the way we collect and handle data too. Particularly big data.

We all have unconscious bias

In 1952 the Boston Symphony Orchestra initiated blind auditions to help diversify its male-dominated roster, but trials still skewed heavily towards men. After musicians removed their shoes nearly 50 per cent of the women cleared the first audition. It turned out the sound of their high heels was biasing judges subconsciously.

(I read the story in this month’s IET magazine, but it is also in the Guardian, upworthy and even a TED talk)

We all have biases. Our human biases are sometimes hard to foresee and apparently learning that you’re biased doesn’t change your decisions. It needs something more than that. In BT the Diversity and Inclusion subject group in the Academy has some recommendations:

  • Understand different types of bias – the psychology is now very well-known through popular books such as Irrationality
  • Top tips such as avoiding multi-tasking, challenging your own first impressions, being careful with gut instinct and challenging each other
  • Plus lots of resources (including a 1-hour workshop from Google Ventures) and training modules about the key areas of managing recruitment and managing performance without bias

Excellent stuff. But I think there’s a module missing.

Big data has biases too

When the municipal authority in charge of Boston, Massachusetts, was looking for a smarter way to find which roads it needed to repair, it hit on the idea of crowdsourcing the data. The authority released a mobile app called Street Bump in 2011 that employed an elegantly simple idea: use a smartphone’s accelerometer to detect jolts as cars go over potholes and look up the location using the Global Positioning System. Here’s a news item from that time celebrating the innovation. But the approach ran into a pothole of its own.

The system reported a disproportionate number of potholes in wealthier neighbourhoods. It turned out it was oversampling the younger, more affluent citizens who were digitally clued up enough to download and use the app in the first place. The city reacted quickly, but the incident shows how easy it is to develop a system that can handle large quantities of data but which, through its own design, is still unlikely to have enough data to work as planned.

Here’s what the Harvard Business Review said about hidden biases in data in that project in 2013. And it also pointed out the flaws in other projects like the Hurricane Sandy twitter study and Google flu trends. You will have seen its effects in those oddly specific adverts that appear across the internet based on you previously looking at a possible purchase on ebay or amazon. Or Facebook’s attempts to amplify your opinions by showing you content that reinforces what you already believe? They happen because the algorithms see some data and act on it. But of course that data isn’t a complete picture of “you”. It’s a tiny slice. Think what happens if an insurance company bases your premiums on a similar tiny slice of your data. Or if your health-care options were entirely computer-recommended based on the selective history of things you told your GP. Does this affect how we in BT think about digital marketing?

It would seem that there is no such thing as “raw data”. Never mind the bias when statistical techniques are mismatched to the data. Or the deplorable distortions by selective corporate funding of research. Even the collection mechanism introduces unconscious bias.

What about the culture in some organisations which values highly the things that you can count and sometimes performance-manages those numbers to the exclusion of the bigger picture. It’s well-known that as soon as data moves from being insight to a measurable target, gaming behaviours kick in and all attention goes to the numbers with tunnel-vision. We have a bias towards the things that can be counted. Do we really believe “If you can’t measure it, you can’t manage it” and its corollary “so it doesn’t matter”? Much as I value the insight that comes from evidence, I know it always needs interpretation, and it does disproportionately grab our attention.

The IET magazine article that started my interest in this subject has an interesting quote from Jim Adler at Toyota Research Institute (a company famous for data-based performance improvements).

Geeks suits and wonks

“Policymakers will say, ‘there’s a decision here let’s take it’, without really looking at what led to it. Was the data trustworthy, clean?” The “geeks, suits and wonks” have been used to operating sequentially. Geeks create technology, suits make it successful and wonks manage the repercussions. But the pace of progress is pushing their worlds together, he says. “It’s not serial any more. Everyone needs to come together at the same time.”

So I wonder if there could be some synergy between the “geeks, suits and wonks” in our organisations: the growing set of technologists who work on Big Data and the Internet of Things, the management, and the people who work on unconscious bias and diversity policies?

There are obstacles to even talking about this. How do we deal with the embarrassment that comes when noticing that you have been unconsciously biased? Will we get told off for pointing out possible bias? Can we speak openly about our own biases – it feels a bit politically incorrect.

Despite that I am really interested to hear in the comments below your stories about unconscious bias you have noticed – whether it’s in data or human interactions. And if you are really brave, your own biases.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s