stories filed under: "anonymous data"

Harvard Students Again Show 'Anonymized' Data Isn't Really Anonymous

from the I-know-more-about-you-than-you-do dept

Mon, Feb 10th 2020 6:08am — Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is "anonymized" -- or stripped of personal identifiers like social security numbers. But time and time again, studies have shown how this really is cold comfort, given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, many privacy policy folk, and even government officials still like to act as if "anonymizing" your data means something.

A pair of Harvard students have once again highlighted that it very much doesn't.

As part of a class study, two Harvard computer scientists built a tool to analyze the thousands of data sets leaked over the last five years or so, ranging from the 2015 hack of Experian, to the countless other privacy scandals that have plagued everyone from social media giants to porn websites. Their tool collected and analyzed all this data, and matched it to existing email addresses across scandals. What they found, again (surprise!) is that anonymized data is in no way anonymous:

"An individual leak is like a puzzle piece,” Harvard researcher Dasha Metropolitansky told Motherboard. “On its own, it isn’t particularly powerful, but when multiple leaks are brought together, they form a surprisingly clear picture of our identities. People may move on from these leaks, but hackers have long memories."

“We showed that an ‘anonymized’ dataset from one place can easily be linked to a non-anonymized dataset from somewhere else via a column that appears in both datasets,” Metropolitansky said. “So we shouldn’t assume that our personal information is safe just because a company claims to limit how much they collect and store."

For example, one UK study showed how machine learning could currently identify 99.98% of Americans in an anonymized data set using just 15 characteristics. Another MIT study of "anonymized" credit card user data showed how users could be identified 90% of the time using just points of information. One German study (pdf) looked at how just 15 minutes of brake pedal data could help them identify the right driver, out of 15 potential options, 90% of the time.

Take that data and fuse it with, say... the location data hoovered up by your cell phone provider, or the smart electricity meter data collected by your local power utility, and it's possible for a hacker, researcher, corporation to build the kind of detailed profiles on your daily movements and habits that even you or your spouse might be surprised by. And since we still don't have even a basic U.S. privacy law for the internet era, nothing really seems to change, and any penalties for abusing the public trust are, well, routinely pathetic.

Yet somehow, every time there's another massive new hack or break, the involved companies (as we just saw with the Avast antivirus privacy scandal), like to downplay the threat of the hack or breach by insisting the data collected was anonymized, and therefore there's just no way the data could help specifically identify or target individuals. There's simply never been any indication that's actually true.

Filed Under: anonymity, anonymous data, study

18 Comments

Using Trump As A Prop, The Myth Of 'Anonymized' Cell Data Is Finally Exposed

Privacy

from the privacy-doesn't-exist dept

Fri, Dec 20th 2019 12:01pm — Karl Bode

As companies and governments increasingly hoover up our personal data, a common refrain to keep people from worrying is the claim that nothing can go wrong because the data itself is "anonymized"--or stripped of personal detail. But time and time again, studies have shown how this really is cold comfort; given it takes only a little effort to pretty quickly identify a person based on access to other data sets. Yet most companies, policy folk, and government officials still act as if "anonymizing" your data means something. It's simply not true.

While that point hasn't yet resonated with the public and press fully, it should now.

The second in an amazing 7 series saga by the New York Times was released this week, taking a closer look at a data trove of 50 billion location pings from the phones of more than 12 million Americans given to the Times by an anonymous insider at one of countless location data brokers. The first in the Times' series took a look at how easy it was to identify "anonymized" normal citizens and track their everyday lives. This second piece ups the ante by... easily tracking the President of the United States via the location data of one of his secret service agents:

"The meticulous movements — down to a few feet — of the president’s entourage were recorded by a smartphone we believe belonged to a Secret Service agent, whose home was also clearly identifiable in the data. Connecting the home to public deeds revealed the person’s name, along with the name of the person’s spouse, exposing even more details about both families. We could also see other stops this person made, apparently more connected with his private life than his public duties. The Secret Service declined to comment on our findings or describe its policies regarding location data."

I'm not sure I've ever seen a story that more perfectly encapsulates both the stupidity of the "anonymized data is a panacea" claim, as well as the government's feckless refusal to seriously address one of the biggest scandals in privacy history. Granted it wasn't just the daily movement habits of the President's security detail that the data revealed, but that of Congressional staffers and lawmakers, many of whom have similarly been utterly apathetic to the problem:

"We were able to track smartphones in nearly every major government building and facility in Washington. We could follow them back to homes and, ultimately, their owners’ true identities. Even a prominent senator’s national security adviser — someone for whom privacy and security are core to their every working day — was identified and tracked in the data."

DC lawmakers could use this as a learning opportunity to finally understand why location data--whether it comes from an app or your cellular provider--shouldn't be treated cavalierly and sold to every nitwit with a nickel. Granted, it's just as likely the end lesson government learns is a focus on better location data security for government officials, and nobody else. As we've noted for a while, "feckless" is the best term to describe the government's response to a steady parade of scandals showing this data is routinely abused from everybody from rogue law enforcement officers to crazed stalkers.

The Congressional response to the Times' latest report was bipartisan in nature, even though a desire to actually do something about it hasn't been. Recall Congress voted along strict party lines to kill FCC broadband privacy rules in 2017 that could have at least partially addressed the problem. The GOP also supported the erosion of FCC authority in the net neutrality repeal, which also opened the door to greater abuse. Still, when the check comes due for those policy moves, notice how the outrage is suddenly bipartisan:

"This is terrifying,” said Senator Josh Hawley, Republican of Missouri, who has called for the federal government take a tougher stance with tech companies. “It is terrifying not just because of the major national security implications, what Beijing could get ahold of. But it also raises personal privacy concerns for individuals and families. These companies are tracking our kids."

“Tech companies are profiting by spying on Americans — trampling on the right to privacy and risking our national security,” Senator Elizabeth Warren, a Democrat running for president, told us. “They are throwing around their power to undermine our democracy with zero consequences. This report is another alarming case for why we need to break up big tech, adopt serious privacy regulations and hold top executives of these companies personally responsible.”

The FCC's supposed investigation into carrier location data sales appears to be stuck in neutral, with growing concerns the agency is running out the clock to avoid having to hold industry accountable. There's no real effort to craft rules that prohibit the widespread collection and sale of such data, and most policy conversations remain fixated exclusively on big tech, despite the problem also being rampant in big telecom. Meanwhile, our quest for an actual US privacy law in the internet are remains stuck in neutral, in large part because industry doesn't want to lose billions to consumers opting out of having their every waking moment monetized.

Still, the Times report (which many gearing up for the holidays won't read) may help finally dislodge some of this apathy and drive some actual, fact-based awareness of the real scope of the problem, maybe someday resulting in actual, serious proposals to fix it.

Filed Under: anonymized data, anonymous data, data brokers, datasets, donald trump, location data, privacy

22 Comments

Yet Another Report Showing 'Anonymous' Data Not At All Anonymous

Studies

from the what-privacy dept

Wed, Feb 18th 2015 2:45pm — Karl Bode

As companies expand the amount of data hoovered up via their subscribers, a common refrain to try and ease public worry is that consumers shouldn't worry because this data is "anonymized." However, time and time again studies have highlighted how it's not particularly difficult to tie these data sets to consumer identities -- usually with only the use of a few additional contextual clues. It doesn't really matter whether we're talking about cellular location data, GPS data, taxi data or NSA metadata, the basic fact is these anonymous data sets aren't really anonymous.

The latest in a long stream of such studies comes from MIT, where researchers explored (the actual study is paywalled) whether they could glean unique identities from "anonymous" user data using a handful of contextual clues. Studying the purportedly anonymous credit card transactions of 1.1 million users at 10,000 retail locations over a period of three months, the researchers found they could identify 90% of the users' names by using four additional data points like the dates and locations of four purchases. Using three clues, including more specific points like the exact price of a purchase, allowed the identifying of 94% of the consumers. Intentionally trying to make the data points less precise didn't help protect consumer privacy much:

"The MIT researchers also looked at whether they could preserve anonymity in large data sets by intentionally making the data less precise, in order to examine whether preserving privacy would still enable useful analysis. But the researchers found that even if the data set was characterised as each purchase having taken place in the span of a week at one of the 150 stores in the same general area, four purchases would still be enough to identify more than 70 percent of users."

Note they're not saying they can ascertain your personal identity from this data alone, but they (or a hacker that nabs this data) can identify you if they have just a smattering of other contextual clues as to who you are. In an age when cellular companies track and sell your daily location down to the minute, and your automobile, insurance companies and toll payment systems are all gathering even more precise data, that's not going to be a particularly difficult task. The gist of the study isn't going to be a shock to most of you: privacy in the modern age -- unless you're willing to go to extreme lengths -- is an illusion.

"We are showing that the privacy we are told that we have isn't real," study co-author Alex "Sandy" Pentland of MIT said in an email...The study shows that when we think we have privacy when our data is collected, it's really just an "illusion", said Eugene Spafford, director of Purdue University's Centre for Education and Research in Information Assurance and Security. Spafford, who wasn't part of the study, said it makes "one wonder what our expectation of privacy should be anymore."

That said, it's very important to remember that we can probably trust that companies rushing head first toward vast new revenue generation opportunities are spending the time and resources necessary to ensure consumer privacy is at the very top of their list of priorities.

Filed Under: anonymous, anonymous data, data

15 Comments

Airbnb Under Pressure, Agrees To Hand Over Data To NY's Attorney General

Privacy

from the makes-ny-less-interesting-to-visit dept

Wed, May 21st 2014 2:11pm — Mike Masnick

As we've discussed a few times, NY's Attorney General, Eric Schneiderman, has been trying to go on a fishing expedition through Airbnb's records, looking for what he calls "illegal hotels." Of course, he's more or less admitted that he's doing this to protect big NYC hotels from being disrupted. Last week, Airbnb actually won in court, quashing Schneiderman's subpoena, though Schniederman just turned around and issued a new subpoena to deal with the deficiencies of the first one.

Recognizing that it was going to run into issues eventually, Airbnb has worked out a "settlement" with Schneiderman, in which it will provide anonymous data, but if Schneiderman finds information that he thinks indicates an "illegal hotel" he can go back to the company to get identifying information. Airbnb says it hopes that Schneiderman is really just focused on true large-scale abusers, many of which it has already removed from the system itself:

The Attorney General's Office will have one year to review the anonymized data and receive information from us about individual hosts who may be subject to further investigation. We believe the Attorney General's Office is focused on large corporate property managers and hosts who take apartments off the market and disrupt communities. We have already removed more than 2,000 listings in New York and believe that many of the hosts the Attorney General is concerned about are no longer a part of Airbnb.

As I've mentioned in the past, I've used Airbnb to find places to stay in NYC a few times now, and it's been an absolutely wonderful experience, significantly better and more convenient than staying in Manhattan hotels. The accommodations, location and service were all significantly better. At least two of the hosts whose places I've stayed at own multiple apartments and rent them out via Airbnb, and it seems ridiculous that they might be shut down because of this. Yes, there are specific "hotel" regulations in NYC, but it seems silly to apply most of them to these kinds of small businesses, where the platform and competition do a great job of keeping them honest. The hosts always seemed to go out of their way to make sure that staying at their apartments was a great experience worth coming back again in the future. Yes, it's possible that some people are somehow abusing the system, but it seems likely that plenty of small time entrepreneurs are going to get swept up in this effort to protect big hotels from disruption.

Filed Under: anonymous data, apartment rentals, disruption, eric schneiderman, illegal hotels, ny, nyc
Companies: airbnb

34 Comments

New Study Shows Anonymous Data Isn't Very Anonymous At All

Studies

from the hear-that? dept

Fri, Mar 27th 2009 2:21pm — Mike Masnick

We've pointed out time and time again that there's really no such thing as an anonymized dataset. Given the data, it's almost always easy enough to at least connect some of it back to a real person. It looks like there's now some research to support that. Steven Hoy points us to a new paper where some researchers wrote an algorithm that takes anonymized data from social networks and connects it back to names and addresses of individuals:

We present a framework for analyzing privacy and anonymity in social networks and develop a new re-identification algorithm targeting anonymized social-network graphs. To demonstrate its effectiveness on real-world networks, we show that a third of the users who can be verified to have accounts on both Twitter, a popular microblogging service, and Flickr, an online photo-sharing site, can be re-identified in the anonymous Twitter graph with only a 12% error rate.

Basically, the researchers are saying that anonymized data isn't really anonymous -- and social networks that insist they're "safe" because they've anonymized the data are being somewhat disingenuous.

Filed Under: anonymity, anonymous data, social networks

14 Comments

Follow Techdirt

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

read all »

Techdirt Deals

Report this ad | Hide Techdirt ads

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Older Stuff

Thursday
13:33	Former Employees Say Mossad Members Dropped By NSO Officers To Run Off-The-Books Phone Hacks (2)
12:01	No, Creating An NFT Of The Video Of A Horrific Shooting Will Not Get It Removed From The Internet (18)
10:49	San Francisco Cops Are Running Rape Victims' DNA Through Criminal Databases Because What Even The Fuck (18)
10:44	Daily Deal: The Complete 2022 Java Coder Bundle (0)
09:31	As Expected, Trump's Social Network Is Rapidly Banning Users It Doesn't Like, Without Telling Them Why (44)
06:30	Comcast Continues To Bleed Olympics Viewers After Years Of Bumbling (19)
Wednesday
20:42	Apple Finally Defeats Dumb Diverse Emoji Lawsuit One Year Later (6)
15:39	Clearview Pitch Deck Says It's Aiming For A 100 Billion Image Database, Restarting Sales To The Private Sector (10)
13:41	Peloton Outage Prevents Customers From Using $2,500 Exercise Bikes (16)
12:09	The GOP Knows That The Dem's Antitrust Efforts Have A Content Moderation Trojan Horse; Why Don't The Dems? (16)
10:51	Hertz Ordered To Tell Court How Many Thousands Of Renters It Falsely Accuses Of Theft Every Year (24)
09:21	Even As Trump Relies On Section 230 For Truth Social, He's Claiming In Lawsuits That It's Unconstitutional (34)
06:16	Medical, Home Alarm Industries Warn Of Major Outages As AT&T Shuts Down 3G Network (25)
Tuesday
20:37	Video Game History Foundation: Nintendo Actions 'Actively Destructive To Video Game History' (29)
15:35	Massachusetts Court Says No Expectation Of Privacy In Social Media Posts Unwittingly Shared With An Undercover Cop (17)
13:30	Techdirt Podcast Episode 312: Regulating The Internet (2)
12:03	US Copyright Office Gets It Right (Again): AI-Generated Works Do Not Get A Copyright Monopoly (60)
10:42	LA Sheriff Threatens To 'Subject' City Council To 'Defamation Law' If They Won't Stop Calling His Deputies 'Gang Members' (20)
10:37	Daily Deal: codeSpark Academy Sibling Bundle (0)
09:25	Trump's Truth Social Bakes Section 230 Directly Into Its Terms, So Apparently Trump Now Likes Section 230 (128)
06:22	15 Years Late, The FCC Cracks Down On Broadband Apartment Monopolies (31)
Sunday
12:05	Funniest/Most Insightful Comments Of The Week At Techdirt (11)
Saturday
12:00	This Week In Techdirt History: February 13th - 19th (1)
Friday
19:39	Letter From High-Ranking FBI Lawyer Tells Prosecutors How To Avoid Court Scrutiny Of Firearms Analysis Junk Science (25)
15:52	Nintendo Is Beginning To Look Like The Disney Of The Video Game Industry (44)
13:49	Seattle Public Radio Station Manages To Partially Brick Area Mazdas Using Nothing More Than Some Image Files (44)
12:13	Thankfully, Jay Inslee's Unconstitutional Bill To Criminalize Political Speech Dies In The Washington Senate (8)
10:52	How Our Convoluted Copyright Regime Explains Why Spotify Chose Joe Rogan Over Neil Young (136)
10:47	Daily Deal: The Complete Blocs Website Builder Bundle (0)
09:33	Arizona Prosecutor Who Brought Bogus Gang Charges Against Protesters Files Ridiculous Defamation Suit Against Her Boss (12)

Harvard Students Again Show 'Anonymized' Data Isn't Really Anonymous

from the I-know-more-about-you-than-you-do dept

Using Trump As A Prop, The Myth Of 'Anonymized' Cell Data Is Finally Exposed

from the privacy-doesn't-exist dept

Yet Another Report Showing 'Anonymous' Data Not At All Anonymous

from the what-privacy dept

Airbnb Under Pressure, Agrees To Hand Over Data To NY's Attorney General

from the makes-ny-less-interesting-to-visit dept

New Study Shows Anonymous Data Isn't Very Anonymous At All

from the hear-that? dept

The Techdirt Greenhouse

Thursday

Wednesday

Tuesday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

from the I-know-more-about-you-than-you-do dept

from the privacy-doesn't-exist dept

from the what-privacy dept

from the makes-ny-less-interesting-to-visit dept

from the hear-that? dept

Techdirt Daily Newsletter

The Techdirt Greenhouse

Tools & Services

Company

Contact

More