It Took Just 5 Minutes Of Movement Data To Identify 'Anonymous' VR Users
from the no-such-thing-as-anonymous dept
As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is "anonymized" -- or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if "anonymizing" your data means something.
The latest case in point: new research out of Stanford (first spotted by the German website Mixed), found that it took researchers just five minutes of examining the movement data of VR users to identify them in the real world. The paper says participants using an HTC Vive headset and controllers watched five 20-second clips from a randomized set of 360-degree videos, then answered a set of questions in VR that were tracked in a separate research paper.
The movement data (including height, posture, head movement speed and what participants looked at and for how long) was then plugged into three machine learning algorithms, which, from a pool of 511 participants, was able to correctly identify 95% of users accurately "when trained on less than 5 min of tracking data per person." The researchers went on to note that while VR headset makers (like every other company) assures users that "de-identified" or "anonymized" data would protect their identities, that's really not the case:
"In both the privacy policy of Oculus and HTC, makers of two of the most popular VR headsets in 2020, the companies are permitted to share any de-identified data,” the paper notes. “If the tracking data is shared according to rules for de-identified data, then regardless of what is promised in principle, in practice taking one’s name off a dataset accomplishes very little."
If you don't like this study, there's just an absolute ocean of research over the last decade making the same point: "anonymized" or "de-identified" doesn't actually mean "anonymous." Researchers from the University of Washington and the University of California, San Diego, for example, found that they could identify drivers based on just 15 minutes’ worth of data collected from brake pedal usage alone. Researchers from Stanford and Princeton universities found that they could correctly identify an "anonymized" user 70% of the time just by comparing their browsing data to their social media activity.
The more data that's available to researchers (or corporations or governments), the easier it is to identify you. And with hacks, data leaks, and breaches dumping an endless ocean of existing datasets into the public domain, and no serious rules of the road governing things like the collection of location and other sensitive data, it shouldn't be too hard to see how the idea of "privacy" is a myth. Especially if the company is, say, Facebook, which is now tying your entire online Facebook experience to VR whether you like it or not.
It's all something to keep in mind for whenever the U.S. gets off its ass and finally crafts a meaningful privacy law for the internet era. Especially given that "don't worry, your data is anonymized!" will be an endless refrain by industry as they try to ensure any rules are as feeble as possible.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: anonymity, anonymized data, data, de-anonymized, vr
Reader Comments
Subscribe: RSS
View by: Time | Thread
Don't hold your breath
This may never happen because:
[ link to this | view in chronology ]
Re: Don't hold your breath
Will never happen. There's too much money to be made for the corps to stop doing it on their own. Even if there wasn't, governments around the world would ensure that the price would go up to keep the tap wide open. If for nothing else so that governments can have their scapegoat when the plebs get angry about it, like they do every once in a blue moon.
If you were actually serious about stopping it, the first thing would be a general ban on data collection in consumer products. No, I don't care about the "experience" needing to be optimized, or development feedback. You idiots got greedy and now it's time to take your toys away.
Another rule: General ban on using consumer devices to auction off eyeballs. That should have never been permitted in the first place. Using the visitor's browser as a bot to make money should have died a quick death. Both in how much it slows down page loading and the fact the resources are being stolen from them without compensation. I don't think everyone visiting Walmart in-person would consent to whoring themselves out for 5-30 minutes (because humans are slower than computers) to advertisers at the edge of each isle just to enter one. No one should be allowed to demand that of virtual visitors.
You can't do anything about the data that is already out there, as such there's no ban on selling the info they already have. It would be a never ending endeavor for the courts to try and stop it. But the data collection can be observed, and therefore stopped, on consumer devices.
It's only a myth because people don't give a shit about their own safety. Let alone anyone else's. They don't care if someone else is in that photo they sent to Facebook. They are selfish and believe that the person should be honored to be seen on their account. They don't care if someone could use all of the tweets they've made to figure out their lifestyle and falsely portray themselves to take advantage of them. They have to post that location update. My employees don't wanna pay for tracking their location, audio / video recordings, and time using my workplace app? Well, I need better employees then. Poisoning? Nope, they have to post that food pic. Theft? Nope, gotta post about the fact they left the front door unlocked and how funny it is. Rape? Nope, gotta post about going out to get drunk right now, at this specific bar, alone. Murder? Nope they actually took the damn cellphone with them and actually asked Siri where to bury the body, how to destroy evidence, etc.
Go ahead and try to avoid these things where you have input, you'll find out just how selfish society really is. Hell, some of them may even try to re-educate you, or worse, punish you over it.
[ link to this | view in chronology ]
The irony of it all...
On the internet, everyone who gets your data stream knows you're a dog.
I'm still trying to decide whether law enforcement should have access to this kind of info. On the one hand, it's a huge government intrusion, on the other hand it would (eventually) allow courts to be more strict about data for probable cause and search warrants and get rid of the "my years of experience" justification when not backed by data.
[ link to this | view in chronology ]
Re: The irony of it all...
Much more likely the collection and use of such data would be normalized, and that would be used on top of 'years of experience', so no, they definitely shouldn't have that data.
[ link to this | view in chronology ]
Misleading Marketing
The term "anonymized" is shaping up to be another deceptive term, akin to "unlimited data plan" or "all-natural". As soon as someone starts trying to sell you on this, know that they're probably trying to con you into giving up your privacy.
[ link to this | view in chronology ]
'You first.'
Anyone who tries to argue that data like that is anonymous or not a privacy concern because it will be 'anonymised' should be presented with a 'put up or shut up' challenge, where they either admit that they're wrong or lying or have their data given that treatment and then poured through by a third party to see how much they could learn about them from 'anonymous' data, which would show that they are wrong or lying.
[ link to this | view in chronology ]
One Simple Rule
Anonymized data is worthless (i.e. no-one wants to pay for it) therefore sold data wil always be de-anonymizable.
[ link to this | view in chronology ]