Big data – what is it and why is it important?

| May 20, 2016

The government is currently asking the public to comment on a draft Guide to big data and the Australian Privacy Principles. Alan Stevenson says we need to take a step back and talk about both the power and limitations of big data.

We are being asked for input into the government’s paper on big data. However, I believe that most people don’t even know what the term means nor how it can and is impacting on our daily lives. This is a discussion which must be brought front and centre.

Big data is simply the mass of data which has been generated by the users of the internet and which is stored on millions of computers around the world and can be accessed by others.

In recent years, a suite of new tools has emerged to help us ease some of our most pressing societal problems, such as impact investing, shared value, social enterprise and social innovation. Gradually, all sectors have come to recognise the value of these models, albeit with some disagreement over the scale of value.

One tool has only just begun to be considered for its potential contribution to positive social impact – “big data”. The easiest way to think about big data is for what it enables us to do with large volumes of information. Put simply, big data allows us to analyse and see things in ways that smaller sources of data cannot.

Most importantly, the fact that “most published research findings are false”, as famously reported by John Ioannidis, an epidemiologist from Stanford University, underlines that data is not the same as facts; one critical dataset – the conclusions of peer reviewed studies – is not to be relied on without evidence of good experimental design and rigorous statistical analysis. When the Berlin Wall came down, the Stasi files were made available for analysis and found to be only about 50% accurate. Yet many now claim that we live in the “data age”. If you count research findings themselves as an important class of data, it is very worrying to find that they are more likely to be false (incorrect) than true.

The worship of big data downplays many issues. To make sense of all this data, researchers are using a type of artificial intelligence known as neural networks. But no matter their “depth” and sophistication, they merely fit curves to existing data. They can fail in circumstances beyond the range of the data used to train them. All they can, in effect, say is that “based on the people we have seen and treated before, we expect the patient in front of us now to do this”.

Data can’t be racist or sexist, but the way it is used can help reinforce discrimination. The internet means more data is collected about us than ever before and it is used to make automatic decisions that can hugely affect our lives, from our credit scores to our employment opportunities.

Palantir (a US company apparently named after the ‘seeing stones’ in the Lord of the Rings), was initially backed by the CIA. One of the more intriguing companies working in big data analytics, it merges information from some of the most obscure sources, including videos, photos, geospatial sources, tweets, satellite imagery and newspaper articles, to monitor crisis situations such as epidemics, typhoons and conflict environments. Aid and relief organisations can then use detailed maps and tracking systems to determine where need is greatest, and how to navigate to those areas, avoiding obstructed roads, or landmines. The technology is so advanced that Palantir can also predict environmental, financial or biological crises.

Professor Richard Berk at the University of Pennsylvania specialises in criminology and statistics. He claims that his data analysis can predict whether a prisoner released on parole will be involved in a murder (either killed, or a killing) with a probability of at least 75 per cent, through tracking detailed data on their demographics, history of offence and more. As correctional facilities across the world struggle with the cost of recidivism and swelling numbers of prisoners, the potential for these sorts of analysis, and their cost-savings, are significant. Bearing in mind the number of prisoners now being found ‘not guilty’, is 75% probability enough to keep a person behind bars?

Organisations like UNICEF are now exploring how to partner with leaders in the wearable market, to track and prevent health crises for young children in developing countries, keeping a “pulse” on issues from nutrition rates, to the risk of viral or bacterial infections.

Big data can identify with strong accuracy what behaviours are correlated with an individual developing a chronic health condition. For example, we might be concerned about people who purchase large amounts of sugary snacks, and be at a higher risk of diabetes. Potentially, government and social service providers could utilise this data to target those individuals with awareness campaigns, or discounts for healthy snack alternatives. The potential cost savings for such preventative health programs are in the billions of dollars. So a portion of the cost savings could be used to pay for the data analysis.

However, the constant weak like in all this is the human tendency to overlay all data without (often unconscious) biases and preconceptions.



  1. Alan Stevenson

    Alan Stevenson

    May 22, 2016 at 11:28 pm

    As an addendum to my blog, I

    As an addendum to my blog, I would like to say that although I think the subject is very important I was aghast at my lack of knowledge about it when asked to define the subject. I therefore spent the rest of the day with The Conversation, the New Scientist, the Guardian and the ABC radio transcripts getting to know a bit more about it. I am hoping that this will start a discussion in which we will all learn a lot more. A.S.

    • Alex_Fox

      June 14, 2016 at 2:32 pm

      Very interesting. I learnt a

      Very interesting. I learnt a lot new form your post, thanks.