Get all your news in one place.
100’s of premium titles.
One app.
Start reading
The Guardian - UK
The Guardian - UK
Science
Nick Clarke

Healthcare to humanitarian aid: making the data explosion work for us

Data could revolutionise medical treatment for individuals or to combat a wider spread of disease, such as Ebola.
Data could revolutionise medical treatment for individuals or to combat a wider spread of disease, such as Ebola. Photograph: Alamy

In 140 AD, Ptolemy noted that an oar appeared to change direction when it went underwater. Like any good scientist, he compiled measurements, allowing him to make predictions about how an object would appear as it entered the water at different angles.

Centuries later, scientists developed accurate techniques for filling in the gaps between the limited set of measured data points, opening up new understanding of the world. We could now extrapolate from measurements to make new predictions, from the size of the earth to how chemicals react.

Recently the challenge has moved from collecting measurement data – which is now abundant – to taking full advantage of the opportunities it presents.

The modern economy runs on data. It has changed how we shop and travel, how governments make decisions, and how companies develop everything from drugs to shampoos. But there is much, much more it can do.

This is one of the biggest scientific opportunities of our time, but we are missing out because people don’t think of it as a scientific issue. This is a shame, since the history of science is one of collecting, testing, and understanding data. By better applying our hard won knowledge of the scientific method to data generated from our everyday lives, we can live longer, transform economies and reduce inequality.

A more personalised world

Retail has been the public trailblazer of the new approach to data. Amazon recommendations and Tesco Clubcard are part of our lives. The goal is personalisation – matching customers to products they want to buy.

In the past, total sales figures told you how popular a product was, but not with whom. New data changes all that, recording what actually happens and crucially linking each search or sale back to an individual. The challenge in retail is to take the myriad things that make us individuals - sex, age, income, location, etc - and fuse our behaviour with that of others like us to predict other products we are likely to buy.

The key is to build models using data from similar people. Predictions for 21-year-old single men will be less accurate if data from 70-year-old widows feed the same model. But equally, not all 21-year-old single men are the same, and the many subtle variabilities average out the predictions.

This is where the science comes in, because at the heart of the scientific method is the identification, control and elimination of variability. The design and execution of finely crafted experiments eliminates any variability other than what you are interested in, reducing unnecessary averaging of the results. Applying a similar scientific approach to everyday data eliminates hidden biases, screens out anomalies and helps you see when correlations between behaviours implied by the data are just accidents. So-called spurious correlations become more common as the data gets bigger, making a scientific approach ever more important.

Data analysis for a better life

Retail may be our most familiar contact with personalisation. However, if our personal data becomes richer and more available, it will present significant opportunities to improve our lives.

Health is the big one. Smart watches are increasingly able to collect data about your heart rate, sleeping patterns, exercise routine, etc. Home tests can spot changes in your wellbeing. Information about your genetic makeup can be gathered by medical professionals.

This could trigger a step change in the personalisation of healthcare – from advice on how much you should exercise, to a highly targeted treatment for a serious illness.

Right now, a lot of healthcare is based on population averages. Recommendations about calorie intake or body mass index are based on data sampled from a highly variable population. Similarly, pharmaceuticals are developed to work across many diverse cases. As with the retail example, the averaging of the variability reduces the targeting. If your specific case sits at the far edge of the diversity, how well will an “averaged” treatment work for you?

In a future data-driven world, your watch or phone will know you as an individual and measure how you behave. Lifestyle apps connected to your shopping account could nudge you towards healthier choices. Your doctor could be notified if you exhibit symptoms of a condition your genetics are pre-disposed towards. This will all be based on combining knowledge of you, with evidence of what works for others like you.

Even more exciting is disease treatment. Right now most pharmaceuticals are tested against diverse populations in order to be licensed. Drugs that are effective, but only in a sub-population, can be missed. There is a drive already underway to make the design and administration of drugs more targeted (to different genetics, ages, diets, etc), using the wealth of personalised data now starting to be collected. The scientific method is essential if we are to accurately understand and manage the huge variability in the underlying data used to match complex clinical observations with the detailed evidence for the genetic basis of disease. You can afford to make the odd mistake in your targeting when up-selling a razor. Not with cancer.

Tackling global problems

This scientific approach doesn’t need to have nice clean data sets to be effective. Take, for example, work by Oxford University on making better decisions about complex, fast-moving global issues such as the spread of Ebola. Unlike shopping online, Ebola doesn’t carefully monitor and submit information back to you. But we do have data – from social media, and from agency reports of varying levels of detail.

Using these different sources and combining these with established, scientifically validated models of epidemiology, we can set up neural networks – computer programmes that behave like the brain – trained to spot signatures within highly varied data sources.

Neural networks must be trained on what to ignore and where to focus, so that over time they can function on their own. The combination of technology, trained alongside expert human knowledge which understands likely biases, produces a more reliable model. Together they provide better, faster information to those needing to take difficult decisions.

This kind of approach could be very valuable for the current migration crisis, for example, allowing aid organisations and governments to predict where support will need to be most effectively focused.

Doing more with data

An oft-cited article from 2008, The end of theory, argued that finding patterns in our booming quantities of data would be able to answer any question, negating the need for understanding causal links or biases – ie Google search analytics replaced the need for a model. High profile failures, caused by not understanding the variability driving the patterns, make it abundantly clear this is not the case. But many technologists still happily put their data into a black box hoping that relevant answers will come out the other side.

Instead, we should view everyday data analysis as just another application of the scientific method.

Better than ever before, it allows us to fuse the informed guesswork that has underpinned almost all innovation until recent times, with a more measurement driven approach that scientists have long favoured.

We can move from vanilla products or services suited to a diverse population, to those tailored to specific groups, linked to genetic profiles and personal preferences. We can make reliable predictions about how products – including complex products such as drugs – will behave in the complex real world. For companies this means increased profits and reduced risk.

Of course, all of this comes with challenges around how much individuals and organisations should be willing or obliged to share. But the technology exists to collect data and the techniques to analyse it are advanced. With existing capabilities, we can make dramatic improvements to our society in a relatively short space of time.

The challenge is bringing data from many sources together in front of the right people. But first we need to appreciate that this is a scientific opportunity as much as a technological one.

  • Dr Nick Clarke is Head Of Analytics at Tessella. You can find him on twitter: @analytics_lab.
Sign up to read this article
Read news from 100’s of titles, curated specifically for you.
Already a member? Sign in here
Related Stories
Top stories on inkl right now
One subscription that gives you access to news from hundreds of sites
Already a member? Sign in here
Our Picks
Fourteen days free
Download the app
One app. One membership.
100+ trusted global sources.