Protecting anonymous data

| May 27, 2020

On 14 May, the Australian parliament passed legislation setting out the framework for the collection and use of data from the COVIDSafe contact-tracing app. The law amended the Privacy Act 1988 ‘to support the COVIDSafe app and provide strong ongoing privacy protections’.

Much focus has rightly fallen on who can access the data gathered by the app, and for what purpose; however, little attention has been given to a provision that permits the collection and analysis of de-identified data for statistical purposes.

This provision is not, of itself, troubling. De-identification or anonymisation is the process of stripping out characteristics that can identify the people who provided the data. For data analysts, it’s a routine way to use large datasets to uncover trends and insights while protecting the privacy of individuals .

But de-identification, while laudable in principle, suffers from a technical problem—it is possible to re-identify, or de-anonymise, the individuals.

This isn’t a far-fetched or theoretical event. A notable example was the revelation by University of Melbourne researchers in 2017 that confidential, de-identified federal patient data could be re-identified without the use of decryption. In another study, the same researchers were able to re-identify users of Victoria’s public transport system, demonstrating that only two data points were needed to re-identify individuals from large datasets.

The problem of re-identification is likely to be made worse by advances in artificial intelligence. In a recent UK report on AI, Olivier Thereaux of the Open Data Institute noted that even with ‘pretty good’ de-identification methods, AI systems can re-identify data by cross-referencing against other datasets.

So, there are trade-offs. The more effective a technology is at drawing out insights from data, the more likely it is that individuals’ privacy will be undermined. And the more we attempt to strip out information to protect privacy, the less useful the dataset becomes for legitimate research.

This is not inherently problematic, or new. UK Information Commissioner Elizabeth Denham says in the same report that there’s ‘no such thing as perfect anonymisation; there is anonymisation that is sufficient’. Her comments are in the context of the UK Data Protection Act, which criminalises illegitimate dataset re-identification.

If Australia is serious about protecting data collected by government departments, agencies and other organisations to which the Privacy Act applies, it should follow the UK example.

The government has considered this issue before. In response to the 2017 incident with the patient dataset, the Coalition introduced a bill to criminalise illegitimate re-identification of datasets. However, the bill died in review because of concerns that it would have a chilling effect on cybersecurity research.

Those concerns are legitimate. Researchers shouldn’t feel under pressure to stop calling out poor privacy practices. It’s in the public interest for organisations to improve how they collect and publish datasets.

But the need to enable legitimate cybersecurity research is also not a difficult obstacle to overcome. An exception could be made in the act for public-interest research by bodies such as universities, think-tanks and NGOs. That could be done in one of two ways.

Researchers could be given the right to apply for the equivalent of a licence to re-identify government datasets. Something similar has been done under the Defence Trade and Controls Act, which includes a requirement for researchers to get a licence if they want to collaborate internationally on certain technologies.

Such an approach would ensure that anyone seeking to re-identify data is thoroughly checked. But it’s also overkill. Australia’s original data privacy bill called for licensing or obtaining ministerial permission for such work, but it was criticised for failing to provide a clear exemption for public-interest research.

The other way is to write a defence or an exception to liability into the law that would place the legal burden of proof on the state. Licensing would require researchers to demonstrate that their work is in the public interest.

That approach has the benefit of providing a clear exemption for researchers, and it’s unlikely that prosecutors will be able to prove that university research isn’t in the public interest.

Britain took that approach in its Data Protection Act, which includes a public-interest defence to criminal liability.

Of course, neither approach blocks the ability to re-identify datasets, any more than criminalising murder prevents people from murdering. But it would be an effective deterrent and moral indicator of unacceptable data practices.

Moreover, such a provision would demonstrate the government’s commitment to protecting privacy. This is especially important now as the government tries to persuade 10 million of us to download a tracing app.

The concerns raised by the COVIDSafe app suggest that Australians care a lot about privacy, at least when information to be held by the government is involved. Facebook’s data-collection policies don’t appear to cause nearly as much concern. Let’s turn that passion into action, starting with bolstering the privacy protections on large datasets.

This article was published by The Strategist.