Image: Vincent Brady via It’s Okay To Be Smart
Recently I’ve been to three talks about open data at the Open Data Institute. I started to get interested in open data when I organised a talk about it in Shift Surrey.
Open data is information that is made accessible to everyone. This could be information from governments, universities, and businesses about anything from health, to pollution, to poverty or house prices. Making this information accessible means that it can be used for a great many applications such as finding out more about why people have certain health conditions, or finding out where to locate new businesses depending on the behaviours of the locals. Making data open could have hugely positive social impacts; in the wrong hands, it could also have hugely negative social impacts.
Here are some ideas I learnt about at the three talks I went to. They raise some interesting questions. See what you think.
‘The value of open data to business – the Open Data 500 Study’
This talk was about the complex problem of businesses, NGOs, government, and citizens using open data usefully.
The main thing I took away from this talk was that 10% of the vast quantities of information we have provide 90% of the usefulness. This is a good thing to bear in mind when facing the challenge of how much data is just on paper. All that data needs to be digitised and catalogued, and potentially anonymised… it’s not a fast process, so picking which data is digitised and shared is the first challenge.
‘Why Anonymity Fails’
I didn’t know about anonymising data before this talk but it was useful knowledge when I went to a talk about GPs and innovation at Nesta.
If we share medical data, we have the potential to find out why certain illnesses, diseases and conditions develop. For example, we can find out if there is a correlation between developing asthma and living in cities – we can find high risk areas. There are huge privacy implications though; the argument about health insurance rates is often cited too. (Personally I don’t have health insurance so that argument is not a factor for me.)
Health data sets are made anonymous; that means that they strip away information that could personally identify you. They remove your name from the record. In some cases they keep your postcode. In other cases they keep your ethnicity. If you get hold of these two data sets and match them up, in some cases it is possible to identify individuals.
If, for example, you are the only 24 year old mixed race female in your postcode address, by matching up two or more data sets, it may be possible to find out if you in particular have a certain condition. Many people say that they ‘have nothing to hide’. However, thinking ethically about it, there may be implications further down the line. For example, if you develop a hereditary condition, and you are open about it because you are comfortable with that, one day your children may not be comfortable with that if they also inherit the condition and it clashes with their job prospects. They would prefer to be discreet but you would have already made a choice on their behalf.
I’m not sure how I feel. I think we ought to challenge the work culture, if fear of losing out on work is what holds us back from sharing data about health. Knowing more about health could make all our lives healthier. And anyone that becomes very unhealthy knows that your health is more precious than whatever job you have.
Another scenario is where a particular person has a rare condition, and a terrorist or something wants to find them and target them; they could use open health data to find where this person lives.
How could you stop data getting in the wrong hands? Is health data open to absolutely everyone, or just via the NHS and universities? Is it right to make health information accessible to anyone? Should it be truly ‘open’?
‘Oceans of Data’
Adam Leadbetter, Britsh Oceanographic Data Centre
This talk was about how oceanographers are trying to share data across Europe in order to get a fuller picture of what is happening. The data they share helps predict weather (as far as I understood). The major challenge they face is that different countries have adopted different terms to mean the same thing; in order to pool their information, they need to tag the terms in a universal way. The analogy the speaker used was this image about the meeting of Stevenson’s and Brunel’s railroads. Because they are using different standards, there is chaos.
I imagine that this technical challenge of taking data and translating it into all the same language probably crops up in many different fields; from literal language translations to technical terms used by different groups.
One solution idea:
Develop a system that data can be put through to translate terms across fields. Save workers’ time having to manually tag all the terms; automate it instead. Make this system adaptable for any data set in any field – they would have to input what terms mean the same thing, and then let the programme do the rest.
If this already exists, the solution is to make sure that people know about it and can use it!
Some questions to leave you with
Should health data be available for everyone to look at? Why/why not?
What data might be useful to you in your work?
How might you start to find out about where that data might already be held? Perhaps your council already has it.
What could you do differently if you knew more about the people/things you want to affect?
How could you use large data sets, if you had access to them?
What fields in particular do you think open data would be the most useful?