If it’s public then it’s not private. Really?
Can Metcalf's Law be applied to personal data management?
It is often said that if data about someone is already in the public domain, then that information is no longer private. Sounds reasonable, but I reckon that can become an insidious furphy.
"The data is already public" was the chief debating point advanced by proponents of searchable white pages. They argued that because publicly available paper white pages reveal everyone's phone numbers, surely having a searchable database didn't change anything. But a searchable digital white pages really is different. And not just quantitatively — it makes reversing names from numbers vastly more efficient — but also qualitatively.
For one thing, the very act of searching generates new types of information, much of which is private (and commercially valuable). For instance, whomever owns the searchable white pages also gets to know stuff like who else is interested in my phone number, and why. The owner can synthesise brand new information, none of which is accessible to me, even though nothing other than my 'already public' number has been revealed.
It is often said that if data about someone is already in the public domain, then that information is no longer private. Sounds reasonable, but I reckon that can become an insidious furphy.
"The data is already public" was the chief debating point advanced by proponents of searchable white pages. They argued that because publicly available paper white pages reveal everyone's phone numbers, surely having a searchable database didn't change anything. But a searchable digital white pages really is different. And not just quantitatively — it makes reversing names from numbers vastly more efficient — but also qualitatively.
For one thing, the very act of searching generates new types of information, much of which is private (and commercially valuable). For instance, whomever owns the searchable white pages also gets to know stuff like who else is interested in my phone number, and why. The owner can synthesise brand new information, none of which is accessible to me, even though nothing other than my 'already public' number has been revealed.
Obviously the same synthesis of new information is what underpins so much of the value and power of Google and its kin.
Consider also Zoominfo, the amazing collection of individuals' CVs, apparently generated automatically by clever new web crawlers that collate information from websites, conference brochures, media releases and so on. I contend that aggregating publicly available data into some fresh sort of whole surely generates new information, which cannot have previously been public. For anyone to deny that this synthesised information is new is simply implausible; Zoominfo wouldn't go to all this trouble if its outputs were trivial.
If I'm on the right track, then David Brin's hypothesis, discussed in Malcolm Crompton's recent blog, that in future privacy won't matter because we will all know everything about each other, is probably invalidated. Nobody can ever know everything, because new information is being synthesised at an accelerating rate, as are new methods for synthesis. There will always be an imbalance where some people know more than others.
So I wonder out loud how to build a counter argument to the simplistic line that "if it's public then it's no longer private"? Can the newness and value of aggregated data be measured? Does something like Metcalfe's Law apply (namely that the value of a network is proportional to some power of the number of users)? Maybe a variant of Metcalf's Law applies directly if data snippets were to be equated to nodes in a network?
Maybe this is just a more analytical angle to the hoary old line that data is different from information is different from knowledge.
Cheers,
Stephen Wilson
Lockstep
www.lockstep.com.au
——————-
Lockstep Consulting provides independent specialist advice and analysis
on authentication, PKI and smartcards. Lockstep Technologies develops
unique new smart ID solutions that safeguard identity and privacy.
