'Anonymized Data' Is A Gibberish Term, And Rampant Location Data Sales Is Still A Problem
from the doing-nothing-helpful dept
As companies and governments increasingly hoover up our personal data, a common refrain is that nothing can go wrong because the data itself is "anonymized" -- or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if "anonymizing" your data actually something.
That's a particular problem when it comes to user location data, which has been repeatedly abused by everybody from stalkers to law enforcement. The data, which is collected by wireless companies, app makers and others, is routinely bought and sold up and down a major chain of different companies and data brokers providing layers of deniability. Often with very little disclosure to or control by the user (though companies certainly like to pretend they're being transparent and providing user control of what data is traded and sold).
For example, last year a company named Veraset handed over billions of location data records to the DC government as part of a COVID tracking effort, something revealed courtesy of a FOIA request by the EFF. While there's no evidence the data was abused in this instance, EFF technologist Bennett Cyphers told the Washington Post Veraset is one of countless companies allowed to operate so non-transparently. Nobody even knows where the datasets they're selling and trading are coming from:
"A lot of these data brokers’ existence depends on people not knowing too much about them because they’re universally unpopular,” Cyphers said. “Veraset refuses to reveal even how they get their data or which apps they purchase it from, and I think that’s because if anyone realized the app you’re using … also opts you into having your location data sold on the open market, people would be angry and creeped out."
While a long list of companies continue to insist that the massive scale this data is bought and sold at is no big deal because the data is "anonymous," experts (with mixed success) keep pointing out that's not really true:
"If you look at a map of where a device spends its time, you can learn a lot: where you sleep at night, where you work, where you eat lunch, what bars and parks you go to,” Cyphers said. Because of that, he added, it’s extremely simple “to associate one of these location traces to a real person."
After major location data scandals at both Securus and wireless carriers, it looked like we might see actual reform on this front, but those efforts have largely stalled. Bills specifically targeting location data have gone nowhere. The occasional fines levied against such companies are a tiny fraction of the revenues made from the data in the first place. And our 20-year effort to have anything even vaguely resembling a useful federal privacy law for the internet era remains mired in gridlock thanks to a massive coalition of cross industry lobbying opposition with a near-unlimited budget.
Which means most of these companies are going to keep collecting and selling access to this data, while pretending they don't sell access, that the data they collect is anonymous and harmless, and that absolutely any oversight or transparency requirements are unnecessary. And the parade of scandals, breaches, and abuse of this data will continue, until eventually there's a scandal so large that the problem can no longer be cavalierly brushed aside.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: anonymized data, location data, privacy
Companies: veraset
Reader Comments
Subscribe: RSS
View by: Time | Thread
I'm pretty sure there's already overlap between those two groups.
[ link to this | view in thread ]
Re:
... in both duties and in criminal charges.
[ link to this | view in thread ]
As long as the data contains any reference or can be tied to the real world it isn't anonymized.
TL;DR: Anonymized data isn't.
[ link to this | view in thread ]
"Anonymized"
You know if you asked these brokers/companies how they specifically anonymized the data, you would be a) ignored, b) lied to or c) denied due to 'trade secrets'
If nothing else, I'd like to know how a data set is anonymized, what data fields are in the set and to be able to see my own 'anonymized' data records.
[ link to this | view in thread ]
There may be....
"a scandal so large that the problem can no longer be cavalierly brushed aside", but I doubt that there is a scandal so large that it cannot be forgotten, given enough time.
The abusers know this, so if such a scandal hits, they will orchestrate a massive "do something" runaround that doesn't actually achieve anything and keep it going until we the people forget the original scandal.
[ link to this | view in thread ]
Had to take one of those dumbass Online Security Training things at work last week.
Got marked wrong for putting "first name" under "can be used to identify an individual."
Which just tells me the test was written by somebody whose first name isn't Thaddeus.
[ link to this | view in thread ]
Re:
The first thing I would've asked this clown was "Did your parents give you first and last names? Did they use these names throughout your childhood? And did you answer to either of those names because you knew you were being singled out from every one else..... i.e. identified?"
If you get a deer-in-the-headlights look, or more likely, a ration of "I'm in charge, don't question me!", tell the people who hired this asshat that he's incompetent. (Being as this was online, substitute "company" for "he" or "him".)
[ link to this | view in thread ]
Re: Re:
I've learned not to bother.
One company I worked at (okay it was GoDaddy) my first week I had to take this online security test which, among other things, recommended that you comply with the "mixed case plus symbols" requirement by starting your password with a capital letter and ending with an exclamation point.
I e-mailed the security team to let them know this was terrible advice. Never heard back.
That was...2013, I think? I'm sure they're not using that test anymore (because it was Flash-based), but that doesn't mean whatever test they're using now is any better.
[ link to this | view in thread ]
That's very interesting. How then does anyone even know the datasets aren't totally fake?
[ link to this | view in thread ]
Simplify
". . . thanks to corruption." would be much simpler. Maybe not as explanatory, but just as accurate.
[ link to this | view in thread ]
Re: "Anonymized"
A way to do it properly is only aggregate generic statistics without information which may be used as proxies like say time of day or of course IP address. You can say "the traffic spiked Thanksgiving Weekend, 65% of it from Chrome". That of course reduces the value of the data but it has some uses.
[ link to this | view in thread ]
Re:
For one statistics can give hints - for example one red flag in accounting is all digits occuring nearly an equal number of times as opposed to biased towards the "lower half" - an interaction of sums and which are more common.
[ link to this | view in thread ]
Gotta disagree with you
It's not gibberish, it is a perfectly good euphemism, like 'retrenchment' for 'fired', or 'wet work' for 'murder'. Just another way of protecting delicate, snooping ears from reality.
[ link to this | view in thread ]
wasnt long ago,
that the adverts I would see on Roku, and the internet, trying to show my location were off by over 40 miles, and farther in the past, it was around 100 miles.
At this time, they are within 10 miles, and even naming my town.
But who has released this data?
ISP
Chrome
Firefox
Microstuff?(soft)
Amazon
Newegg
Roku
My router? Which has been registered in my name, and has a # in it specific to this device. which the ISP needs.
OR even the gov. forcing this data to Come out and matching all the random data. It has been shown over time that with enough Data, they can Finally figure out Enough about all of us.
What logic can we figure out that would interconnect us to the Whole system to be Easily identified. Between your router and modem, which both has a NICE intricate numbers, what would it take and Who to hack to get this?
[ link to this | view in thread ]
Re: Gotta disagree with you
I've always liked "rejuvenation area" as an euphemism for "deforested area"...
[ link to this | view in thread ]
Re: Re: Re:
Isn't that how everyone complies with such requirements? That, or underscores as spaces if using multiple words.
Asking people to make gibberish passwords is the terrible advice, because they won't remember them and will have to resort to writing them down or using the same one everywhere.
[ link to this | view in thread ]