It Took Just 5 Minutes Of Movement Data To Identify 'Anonymous' VR Users
from the no-such-thing-as-anonymous dept
As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is "anonymized" -- or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if "anonymizing" your data means something.
The latest case in point: new research out of Stanford (first spotted by the German website Mixed), found that it took researchers just five minutes of examining the movement data of VR users to identify them in the real world. The paper says participants using an HTC Vive headset and controllers watched five 20-second clips from a randomized set of 360-degree videos, then answered a set of questions in VR that were tracked in a separate research paper.
The movement data (including height, posture, head movement speed and what participants looked at and for how long) was then plugged into three machine learning algorithms, which, from a pool of 511 participants, was able to correctly identify 95% of users accurately "when trained on less than 5 min of tracking data per person." The researchers went on to note that while VR headset makers (like every other company) assures users that "de-identified" or "anonymized" data would protect their identities, that's really not the case:
"In both the privacy policy of Oculus and HTC, makers of two of the most popular VR headsets in 2020, the companies are permitted to share any de-identified data,” the paper notes. “If the tracking data is shared according to rules for de-identified data, then regardless of what is promised in principle, in practice taking one’s name off a dataset accomplishes very little."
If you don't like this study, there's just an absolute ocean of research over the last decade making the same point: "anonymized" or "de-identified" doesn't actually mean "anonymous." Researchers from the University of Washington and the University of California, San Diego, for example, found that they could identify drivers based on just 15 minutes’ worth of data collected from brake pedal usage alone. Researchers from Stanford and Princeton universities found that they could correctly identify an "anonymized" user 70% of the time just by comparing their browsing data to their social media activity.
The more data that's available to researchers (or corporations or governments), the easier it is to identify you. And with hacks, data leaks, and breaches dumping an endless ocean of existing datasets into the public domain, and no serious rules of the road governing things like the collection of location and other sensitive data, it shouldn't be too hard to see how the idea of "privacy" is a myth. Especially if the company is, say, Facebook, which is now tying your entire online Facebook experience to VR whether you like it or not.
It's all something to keep in mind for whenever the U.S. gets off its ass and finally crafts a meaningful privacy law for the internet era. Especially given that "don't worry, your data is anonymized!" will be an endless refrain by industry as they try to ensure any rules are as feeble as possible.
Filed Under: anonymity, anonymized data, data, de-anonymized, vr