Once Again With Feeling: 'Anonymized' Data Isn't Really Anonymous

from the we-can-see-you dept

For years, the companies that hoover up your internet browsing and other data have proclaimed that you don't really have anything to worry about, because the data collected on you is "anonymized." In other words, because the data collected about you is assigned a random number and not your name, you should be entirely comfortable with everything from your car to your smart toaster hoovering up your daily habits and selling them to the highest bidder. But studies have repeatedly shown that it only takes a few additional contextual clues to flesh out individual identities. So in an era of cellular location, GPS, and even smart electricity data collection, it doesn't take much work to build a pretty reliable profile on who you are and what you've been up to.

The latest case in point: German journalist Svea Eckert and data scientist Andreas Dewes recently descended upon Defcon to once again make this point, releasing a new report highlighting how "anonymous" browsing data is anything but. The duo found it relatively trivial to obtain clickstream browsing data from numerous companies simply by posing as a fake marketing company, replete with a website filled with “many nice pictures and some marketing buzzwords." Ironically, some of this data was gleaned from companies that profess to offer you additional layers of privacy, including “safe surfing” tool Web of Trust.

It didn't take long before the pair was able to obtain a database containing more than 3 billion URLs from roughly three million German internet users, spread across roughly 9 million different websites. However easy obtaining the "private" and "anonymous" browsing data was, using this data to quickly and easily identify individual users was even easier:

"Dewes described some methods by which a canny broker can find an individual in the noise, just from a long list of URLs and timestamps. Some make things very easy: for instance, anyone who visits their own analytics page on Twitter ends up with a URL in their browsing record which contains their Twitter username, and is only visible to them. Find that URL, and you’ve linked the anonymous data to an actual person. A similar trick works for German social networking site Xing."

The pair also highlighted how repetitive visitation of websites specific to you (your bank, your hobbies, your neighborhood) help further narrow down your identity:

"For other users, a more probabilistic approach can deanonymise them. For instance, a mere 10 URLs can be enough to uniquely identify someone – just think, for instance, of how few people there are at your company, with your bank, your hobby, your preferred newspaper and your mobile phone provider. By creating “fingerprints” from the data, it’s possible to compare it to other, more public, sources of what URLs people have visited, such as social media accounts, or public YouTube playlists."

Of course this is nothing new, and researchers have been making this precise point for several years now. Princeton researcher Arvind Narayanan in particular has been warning that anonymous data isn't really anonymous for the better part of the last decade, yet somehow the message never seems to resonate, and everyone from broadband providers to internet of things companies continue to pretend that "anonymization" of data is some kind of impenetrable, mystical firewall preventing companies or hackers from identifying you.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: anonymized data, privacy


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. icon
    Anonymous Anonymous Coward (profile), 4 Aug 2017 @ 3:08pm

    Breaking down the breakdown

    They are just following the Governments lead. The Government says it's just meta data and doesn't mean anything, so the advertising firms take this (without grains of salt) and make the same claim.

    Then, because meta data is actually a whole lot more than nothing, they send you advertising for things you don't want because their data mining said that you would want it. Following the same recipe, product creators pay the advertisers for sending advertising no one wants, and everybody is happy. Well...almost.

    link to this | view in thread ]

  2. icon
    MyNameHere (profile), 4 Aug 2017 @ 6:04pm

    ..and?

    Karl, if you follow an individual around during the day and note all the places they go when in public, you can draw the same conclusions.

    I can randomly pick a person and, within a few days, tell you who their immediately family is, what they like to eat, where they work, and so on.

    The internet is a public place. Even if you make the effort to hide yourself, the reality is that you are walking in public places. Like it or not, everything you do online has a certain public nature to it.

    link to this | view in thread ]

  3. icon
    orbitalinsertion (profile), 4 Aug 2017 @ 6:32pm

    Re: ..and?

    Go follow ten million people in an afternoon and let us know how you make out.

    The internet is a public place, but companies bulk harvesting data are essentially going through your pockets and then removing you name from the generated report of the contents and claim that is anonymous.

    None of this is the same as following an individual around meatspace or the internet. (Never mind that what your car or appliances or what have do is not "the internet".) Repeated claims of anonymization, when those claims are complete bunk, is rather more the point. When you go following someone around, and say, report your gathered intel to another party, merely without the subjects name, and claim that is anonymous, would you be telling the truth? (And not a merely technical truth-flavoured thing.)

    link to this | view in thread ]

  4. identicon
    Anonymous Coward, 4 Aug 2017 @ 7:40pm

    Re: ..and?

    You do realize that stalking is illegal in places, don't you?

    link to this | view in thread ]

  5. identicon
    Anonymous Coward, 4 Aug 2017 @ 7:54pm

    The overriding issue here is ..

    that javascript has been perverted for evil.

    If you browse without blocking a single thing .. then yeah, sure you are going to be tracked in six degrees of separation.

    The only the defense (such as it is) anyone has is to run Noscript and requestpolicy in a default "block everything" configuration enabling only the javascript that is needed to make a website functional. Even then, I have to run fiddler to load stripped out .js files for googletagmanager et al so that the rest of the website will function.

    For example:

    window.confirm("This is the blank gtm.js file loading.");

    The next line of defense is to not use social media. Period. Fuck social media.

    I could go on but I do not think anybody really cares anymore.

    *sigh*

    link to this | view in thread ]

  6. icon
    MyNameHere (profile), 5 Aug 2017 @ 3:19am

    Re: Re: ..and?

    "companies bulk harvesting data are essentially going through your pockets and then removing you name from the generated report of the contents and claim that is anonymous."

    Colorful description aside, you miss the point. Technology allows it. Face it, technology can track you. The phone in your pocket is a beacon to your location. Facial recognition cameras can pinpoint your location and everything from your fastpass car transponder to your refillable public transit card is tracking your every move. Technology allows for it, and it's often an unavoidable trade off for the technology to even work.

    The internet is no different in reality. Google and a large number of other companies are tracking you ever day. What makes this story sort of funny is that coming to Techdirt triggers over 40 tracking cookies from a half a dozen sources. Each page view sends you visit data (anonymous, natch) to soundcloud and others, who can track your interest in the sorts of things discussed here.

    For reference, EFF.ORG sets a single cookie for their own use only. A visit to the Drudge Report triggers hundreds of cookies.

    There hasn't been a level of tracking in "meatspace" because Technology hasn't supported it in the past. But the cell phone alone has clearly changed all that, and all those other things I mentioned before are all conspiring to tell the world where you have been and what you do.

    Is the data anonymous? At each point, it is. Combined, perhaps less so. Can we really stop one company from using your data because combining a second or third data set might be the tipping point on your anonymous life? Do you not think it's already happened?

    You already gave it up. You just don't realize it.

    link to this | view in thread ]

  7. identicon
    Anonymous Coward, 5 Aug 2017 @ 7:38am

    Re: Re: Re: ..and?

    Technology allows it.

    Technology allows a lot things, both rightful and wrongful. That you think that makes it alright says something about you.

    link to this | view in thread ]

  8. icon
    Sok Puppette (profile), 5 Aug 2017 @ 8:02am

    So I assume...

    link to this | view in thread ]

  9. identicon
    Anonymous Coward, 5 Aug 2017 @ 9:26am

    Re: ..and?

    you miss the point that this data is stored for later abuse and takes much less effort than spending the day following someone.

    link to this | view in thread ]

  10. identicon
    Anonymous Coward, 5 Aug 2017 @ 12:55pm

    Once more with feeling...

    Buffy is good allegory, the early internet was fun, frustrating, dynamic and human.. until all our friends came along and pulled us out of heaven into the squalid cease pit that is face fuck and the other big 5, except in this case there intentions when never good it was obvious that it was a devils bargain from the start.

    https://www.youtube.com/watch?v=L8uH26GlWeo

    Whistle while you work :)

    link to this | view in thread ]

  11. icon
    MyNameHere (profile), 5 Aug 2017 @ 1:19pm

    Re: Re: Re: Re: ..and?

    Actually, it's the same logic that has Techdirt supporting piracy - technology allows it, so figure out a way to live with it.

    Usually technology is the friend of the Techdirt readers. When the tables are turned, it's fun to watch the song change.

    link to this | view in thread ]

  12. identicon
    Brad Dump-Tish, 5 Aug 2017 @ 3:57pm

    Re: Breaking down the breakdown

    And of course the Government has never meta-data it didn't like.

    link to this | view in thread ]

  13. icon
    Anonymous Anonymous Coward (profile), 5 Aug 2017 @ 4:32pm

    Re: Re: Re: Re: Re: ..and?

    Yep, technology allowed me to record songs off the FM radio. Perfectly legal. Technology allowed me to record TV shows and movies off the air. Perfectly legal.

    So, what was your point?

    link to this | view in thread ]

  14. icon
    Arioch (profile), 5 Aug 2017 @ 7:01pm

    Re: The overriding issue here is ..

    "I could go on but I do not think anybody really cares anymore."

    An interesting point of view.

    May I point out that I and many others openly take action against these scumbags.

    While I am sure that they are making every effort to collect any scrap of personal they can find on me , I make sure that that information is hilariously bogus

    link to this | view in thread ]

  15. identicon
    Anonymous Coward, 6 Aug 2017 @ 11:05am

    Re: Re: Re: Re: Re: ..and?

    "Actually, it's the same logic that has Techdirt supporting piracy"

    False equivalence and false accusation - awesome.


    "technology allows it, so figure out a way to live with it."

    Tech provides tools that can do things that you may not survive ... you had better get busy and figure out a way to live with it.

    link to this | view in thread ]

  16. identicon
    Anonymous Coward, 6 Aug 2017 @ 11:08am

    Re:

    "you are lucky if half the world doesn't know when you fart."

    Echo knows

    link to this | view in thread ]

  17. identicon
    Anonymous Coward, 6 Aug 2017 @ 11:09am

    Re: Once more with feeling...

    "cease pit "

    LOL

    link to this | view in thread ]

  18. icon
    MyNameHere (profile), 6 Aug 2017 @ 1:25pm

    Re: Re: Re: Re: Re: Re: ..and?

    What's the issue? One of the cornerstone of Mike's tacit approval of piracy is that it is something that technology allows, so creators should suck it up and deal with it, an find ways to profit from it, rather than thinking about the losses that may occur.

    Tracking is exactly the same thing. It's something that technology allows (even requires in the case of cell phones), so you should suck it up and deal with it, rather than losing sleep over what you can't avoid.

    link to this | view in thread ]

  19. identicon
    Anonymous Coward, 7 Aug 2017 @ 7:18am

    Re: Re: Re: Re: Re: Re: Re: ..and?

    "tacit approval of piracy "

    In your mind. You state this as though everyone agrees.


    "Tracking is exactly the same thing."

    No it is not.


    "you should suck it up and deal with it, rather than losing sleep over what you can't avoid"

    Yeah, listen to your mother and no complaining now kids.

    link to this | view in thread ]

  20. identicon
    Thad, 8 Aug 2017 @ 9:33am

    Re: So I assume...

    Why? What's that article got to do with this one?

    That article is about how when ISPs resell data, they're actually selling targeted advertisements. This article is about how companies that sell "anonymized" data are selling data that still contain personally identifiable information. What's the connection?

    link to this | view in thread ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.