Anyone Brushing Off NSA Surveillance Because It's 'Just Metadata' Doesn't Know What Metadata Is
from the your-metadata-reveals-quite-a-bit dept
One of the key themes that has come out from the revelations concerning NSA surveillance is a bunch of defenders of the program claiming "it's just metadata." This is wrong on multiple levels. First of all, only some of the revealed programs involve "just metadata." The so-called "business records" data is metadata, but other programs, such as PRISM, can also include actual content. But, even if we were just talking about "just metadata," the idea that it somehow is no big deal, and people have nothing to worry about when it comes to metadata is ridiculous to anyone who knows even the slightest thing about metadata. In fact, anyone who claims that "it's just metadata" in an attempt to minimize what's happening is basically revealing that they haven't the slightest clue about what metadata is. Here are a few examples of why.Just a few months ago, Nature published a study all about how much a little metadata can reveal, entitled Unique in the Crowd: The privacy bounds of human mobility by Yves-Alexandre de Montjoye, Cesar A. Hidalgo, Michel Verleysen, and Vincent D. Blondel. The basic conclusion: metadata reveals a ton, and even "coarse datasets" provide almost no anonymity:
A simply anonymized dataset does not contain name, home address, phone number or other obvious identifier. Yet, if individual's patterns are unique enough, outside information can be used to link the data back to an individual. For instance, in one study, a medical database was successfully combined with a voters list to extract the health record of the governor of Massachusetts27. In another, mobile phone data have been re-identified using users' top locations28. Finally, part of the Netflix challenge dataset was re-identified using outside information from The Internet Movie Database29.Some of the figures they presented show how easy it is to track individuals and their locations, which can paint a pretty significant and revealing portrait of who they are and what they've done.
All together, the ubiquity of mobility datasets, the uniqueness of human traces, and the information that can be inferred from them highlight the importance of understanding the privacy bounds of human mobility. We show that the uniqueness of human mobility traces is high and that mobility datasets are likely to be re-identifiable using information only on a few outside locations. Finally, we show that one formula determines the uniqueness of mobility traces providing mathematical bounds to the privacy of mobility data. The uniqueness of traces is found to decrease according to a power function with an exponent that scales linearly with the number of known spatio-temporal points. This implies that even coarse datasets provide little anonymity.
"We use the analogy of the fingerprint," said de Montjoye in a phone interview today. "In the 1930s, Edmond Locard, one of the first forensic science pioneers, showed that each fingerprint is unique, and you need 12 points to identify it. So here what we did is we took a large-scale database of mobility traces and basically computed the number of points so that 95 percent of people would be unique in the dataset."Others are discovering the same thing. Ethan Zuckerman, who recently co-taught a class with one of the authors of the paper above, Cesar Hidalgo, wrote about how two students in the class created a project called Immersion, with Hidalgo, which takes your Gmail metadata ("just metadata") and maps out your social network. As Zuckerman notes, his own use of Immersion reveals some things that could be questionable or dangerous.
Anyone who knows me reasonably well could have guessed at the existence of these ties. But there’s other information in the graph that’s more complicated and potentially more sensitive. My primary Media Lab collaborators are my students and staff – Cesar is the only Media Lab node who’s not affiliated with Civic who shows up on my network, which suggests that I’m collaborating less with my Media Lab colleagues than I might hope to be. One might read into my relationships with the students I advise based on the email volume I exchange with them – I’d suggest that the patterns have something to do with our preferred channels of communication, but it certainly shows who’s demanding and receiving attention via email. In other words, absence from a social network map is at least as revealing as presence on it.Separately, more than two years ago, we wrote about how a German politician named Malte Spitz got access to all of the metadata that Deutsche Telekom had on him over a period of six months, and then worked with the German newspaper Die Zeit to put together an amazing visualization that lets you track six months of his life entirely via his metadata, combined with public information, such as his Twitter feed.
In Germany, whenever the government begins to infringe on individual freedom, society stands up. Given our history, we Germans are not willing to trade in our liberty for potentially better security. Germans have experienced firsthand what happens when the government knows too much about someone. In the past 80 years, Germans have felt the betrayal of neighbors who informed for the Gestapo and the fear that best friends might be potential informants for the Stasi. Homes were tapped. Millions were monitored."Just metadata" isn't "just" anything, other than a massive violation of basic privacy rights.
Although these two dictatorships, Nazi and Communist, are gone and we now live in a unified and stable democracy, we have not forgotten what happens when secret police or intelligence agencies disregard privacy. It is an integral part of our history and gives young and old alike a critical perspective on state surveillance systems.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: cesar hidalgo, ethan zuckerman, fingerprints, immersion, malte spitz, metadata, nsa surveillance, prism, privacy, surveillance
Reader Comments
Subscribe: RSS
View by: Time | Thread
How to explain?
I've been trying to figure out a quick, simple, "sound bite" way of explaining this. It's something that people don't seem to instinctively understand, I think because they focus on a single tree at a time, when the problem is really the forest.
[ link to this | view in chronology ]
Re: How to explain?
[ link to this | view in chronology ]
Re: How to explain?
[ link to this | view in chronology ]
Re: How to explain?
If all this metadata is so useless as far as figuring out what someone is up to, then why does the government have so many multi-million/billion dollar programs designed to capture it all?
[ link to this | view in chronology ]
Re: Re: How to explain?
[ link to this | view in chronology ]
Re: How to explain?
[ link to this | view in chronology ]
Re: Re: How to explain?
[ link to this | view in chronology ]
Re: Re: Re: How to explain?
[ link to this | view in chronology ]
Re: Re: Re: Re: How to explain?
[ link to this | view in chronology ]
Re: How to explain?
Metadata is data which describes other data and is thus completely harmless. As an example, Snowden didn't release any of the actual data held by the NSA, only information about it - thus he only released metadata which is ... completely ... um ... harmless ... hang on, scratch that. Guys? We need a better explanation!
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Yay! SOMBODY remembered the 20th century.
[ link to this | view in chronology ]
Re: Yay! SOMBODY remembered the 20th century.
[ link to this | view in chronology ]
OK, some things, to think about
1. corp advertising..do you think they DONT KNOW ENOUGH?? think about credit agencies, add that Target credit card..and they know Every purchase you made.
2. SPAM..with small amounts of Data, they can get information on you that would scare the >>>> out of you. and empty your bank.
3. the USA gov, has LESS info on you then the CORPS do. the IRS cant even track Corp Tax accounts.
IF the Gov. grabbed the Corp info, Credit agencies..and merged it with Social sec. They would have all the info they need.
The problem they are TRYING to do. tends to be with communication. The problem is that there is so much, HOW do you sort it out.
consider an adversary, that WAS RAISED AND taught how to send communications without tracking..Direct communications and Message drops. interlinked groups that have no EAL connections, except by message drops..
Phone spying wont help.(they might buy a Burner phone, make 1 message and throw it away) NET spying wont help.( there are MANY games/site/... to send data over the net, that its astounding). trying to track illegals, is hard enough. and if they are Staying below radar..its even harder.
[ link to this | view in chronology ]
Same with Google for "mere" search terms and websites.
Don't leave out "The Google" when you write of privacy rights. Corporations don't have any right to track persons, EITHER. As a minion wrote just today: "It's almost as though these entities assume they have an innate right to access personal data without ... any consideration for the rights of those whose information they're sweeping up."
Unlike corporatist Mike, I'd bet most people believe that corporations CAN violate your rights just as much as gov't does, and often more so, besides annoyingly.
[ link to this | view in chronology ]
Just a suggestion, but if you ever want to, you know, be taken seriously, learn the difference between giving a company information, and having the information taken whether you like it or not by the government. Until then, you'll just continue to be seen as the willfully blind, paranoid individual that you are now.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Same with Google for "mere" search terms and websites.
What happened over the holiday? Did Google piss in your Cheerios or make it rain on your parade or something?
[ link to this | view in chronology ]
Re: Re: Same with Google for "mere" search terms and websites.
[ link to this | view in chronology ]
Re: Re: Re: Same with Google for "mere" search terms and websites.
[ link to this | view in chronology ]
Re: Same with Google for "mere" search terms and websites.
Everyone here on Techdirt agrees with you about the amount of data that Google collects, but we're not worried about the Big G. We're worried about the Other Big G, Uncle Sam, which is infinitely scarier than a single corporation. Yet, for some reason, you can't process the thought that there are worse things than Google out there.
[ link to this | view in chronology ]
Re: Re: Same with Google for "mere" search terms and websites.
[ link to this | view in chronology ]
Re: Same with Google for "mere" search terms and websites.
Or you're getting paid for this bunch of nonsense?
[ link to this | view in chronology ]
Re: Re: Same with Google for "mere" search terms and websites.
[ link to this | view in chronology ]
Re: Same with Google for "mere" search terms and websites.
Let's compare shall we?
Google's data gathering supports a whole range of really useful products and services that most people seem to love; there is no evidence of Google ever nefariously abusing their ability to collect their users' data; and if you don't like them or their actions, you simply don't use them.
On the other hand, the USG's data gathering has yet to be proven useful to the general public other than to provide a false sense of security to those paranoid about terrorism; there is a long and sordid history of governments and law enforcement abusing their access to such data; and if you don't like all this you're shit out of luck.
Do us all a favour and don't ever compare these two things again, ok?
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Your bank has a huge amount of saleable metadata about you.
This would be saleable data. Is there anything in your contract with them that says they cannot do that?
[ link to this | view in chronology ]
Imagine someone in a trench coat and hat, following you around all day long, writing down everything you do and when it happens; every phone number dialed or send a text too, every location you visit and every email address you send an email to.
Creepy, isn't it.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
Gaaah! You guys are not getting it!
[ link to this | view in chronology ]
Re: Re: Re:
The Techdirt comment system needs another click-able button, right alongside "Insightful" and "Funny"
-- this one should be labelled "Depressing"...
[ link to this | view in chronology ]
Only metadata
Wouldn't it?
[ link to this | view in chronology ]
Yeah, not sure about the Google-hate.
So, granted, Google's network reconnoiters you in very much the same way that NSA does with PRISM, using information from emails, searches, etc. to determine your interests in order to target relevant advertising at you (and augment the usefulness of their many services). Google was doing a great job of demonstrating the benign side of surveillance.
Also, Google's gone to great lengths to defend their core database from court subpoenas. Really it should be truly inaccessible, though we've also had the occasional Google-tech stalker get fired over misuse of it.
But now thanks to the FISC, that core database is a threat to the privacy of all its users. Hopefully this won't destroy the Google business model.
[ link to this | view in chronology ]
Re: Yeah, not sure about the Google-hate.
a) their surveillance database was inadmissible in court, ergo it couldn't be used to prosecute anyone for anything (the DoD is going to disappear terrorists anyway), and...
b) they do as Google did and provided a whole bevy of services to the surveyed populace (instant statical analyses, calendars, phone books, email and so on) all covered ad free from the NSA budget.
This isn't a perfect solution, but it certainly would make the whole ordeal a bit more tolerable.
[ link to this | view in chronology ]
Re: Yeah, not sure about the Google-hate.
Well, no, not even remotely in the same way as the NSA does. but that aside, a huge difference is that you can avoid the vast majority of Google's spying by not using their services. You can avoid almost all Google spying by blocking access to their domains.
You can't avoid NSA spying.
[ link to this | view in chronology ]
A very simple argument
[ link to this | view in chronology ]
Best Explanation...
Our smart phones and electronic devices have such data on them...the meta data stored on your smart phone can include and does store the IP adress, MAC Adress, Your name, your registered name, your phone number, who you last called, who last called you, who last texted you, who you last texted, your credit card info, how many minutes each call was, how much space your texting used and the last text message sent or received. Any questions so far?
[ link to this | view in chronology ]
It's just metadata!
In reality, anyone who asserts this is disingenuous to the extreme. They know what metadata is, but they want to minimize the exposure to the general public what this really means to them!
[ link to this | view in chronology ]
Secret Police
[ link to this | view in chronology ]
If you, your parents or grandparents did not live in a dictatorship, the average person just doesn't know how valuable freedom is.
[ link to this | view in chronology ]
As Jane Mayer at the New Yorker recently explained, the metadata issue is the one we should be most frightened about:
“The public doesn’t understand,” [mathematician and
former Sun Microsystems engineer Susan Landau] told me,
speaking about so-called metadata. “It’s much more
intrusive than content.” She explained that the
government can learn immense amounts of proprietary
information by studying “who you call, and who they
call. If you can track that, you know exactly what is
happening—you don’t need the content.”
https://www.techdirt.com/articles/20130609/23180323386/is-bigger-concern-nsa-getting-phone-recor ds-prism-just-everything.shtml
[ link to this | view in chronology ]
Uh, no. The German public does not give a damn. They can't even be bothered to leave Deutsche Telekom in droves now that they've laid out their strategy of destroying net neutrality.
[ link to this | view in chronology ]
Words and definitions
After all, it doesn't look bad, it looks official and geeky enough so that ordinary people don't look twice at it.
"I don't understand that word, but it's not a bad word."
We do need a better word to describe the concept:
"Everything you have done in the past, everywhere you've been, every single penny you have spent, everyone you know or have talked to..all in our database."
Or as Wikipedia states: "The term metadata refers to "data about data". The term is ambiguous, as it is used for two fundamentally different concepts (types) ".
Ambiguous, indeed.
[ link to this | view in chronology ]
No distinction
[ link to this | view in chronology ]
Metadata composes 99% of the information that's actually present within any data pool, 1% is the initial data itself, but only about 5% of the Metadata is actually visible, that is until you begin to collect more data samples, at which point the Metadata from every new sample begins to build off of the previous metadata, filling in logic gaps, allowing for new virtual layers to be added, much like a solving soduku puzzle. He told me the payoff per additional data samples for classical data analysis is logarithmic meaning, that every new piece of information will become less valuable than the piece before it with 100% data never being acquired for a given operation through Meta data analysis alone.
However with multidimensional integration of virtual data sets -Metadata- (particularly extremely large sample sizes such as the one the nsa is amassing) it is possible to use information about the information to infer the original content, and so much much more. The types of things that would become comprehensible with the right framework are terrifying and unimaginable. Imagine being able to know almost anything about what is happening, has happened, could happen, this is the power of Meta data. Anyone who claims otherwise is full of absolute shit.
[ link to this | view in chronology ]