Researchers Feeling Conflicted Over AOL Data

from the in-two-minds dept

The leak of a ton of search data at AOL has been nothing short of a mess, culminating Monday in the termination of some employees at the company. The privacy concerns overshadowed how interesting the data was, and AOL's mistake in not stripping out personally identifiable information undid their original good intention: to give researchers a look at a large amount of search data, something that's often difficult for them to get their hands on. Though AOL pulled the data, it was downloaded by plenty of people before it got yanked, and many search researchers have been examining it. However, feeling some ethical pangs, some can't bring themselves to look at it. It's nice to see these people have some ethical concerns, but as long as they're using the information responsibly, it doesn't seem like they have much to be worried about. However, as some researchers point out, the ongoing effect of the AOL gaffe will be to make search companies think twice about releasing any kind of data, even if they have anonymized it. That's really not an ideal solution, as it limits the ability of people outside search companies to research and refine search technology. The answer is to release the data responsibly, taking users' privacy into consideration. In the meantime, these researchers should probably just carry on with the data, since it's last they're likely to see for a while -- just try to avoid fingering individual users with their search habits.
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    Anonymous Coward, 23 Aug 2006 @ 10:32am

    I dont believe its unethical to use that data for non-unethical purposes. It was only unethical for it to be released, as not only will it fall into the hands of those using it for unethical purposes, it breaches the trust of the users. There is no such implicit trust relationship between researchers and AOL users, although there is a more primitive ethical imperative for them to do no evil with it. Keep right on using that data, the damage is already done and now let us gain as much from it as we can.

    link to this | view in chronology ]

  • identicon
    NGUVU, 23 Aug 2006 @ 10:45am

    I am so glad...

    ...that I don't use AOL!

    link to this | view in chronology ]

  • identicon
    AOL Grrrrr, 23 Aug 2006 @ 11:06am

    Spam King

    The big question is- did any AOL user search for "How to bury Gold in your grandfather's Garden"

    link to this | view in chronology ]

  • identicon
    Grandfather Time, 23 Aug 2006 @ 11:53am

    it takes something like this, for thousands upon thousands to realize that the AOL service and security they have been paying for, for years, is all just a waste of time and money. Faster, Securer, More reliable.....more like, Slower, More Expensive, and Easier to Hack......

    link to this | view in chronology ]

  • identicon
    techdirtReader, 23 Aug 2006 @ 12:00pm

    I don't need their help

    I'm not a big fan of companies sharing my data - even anonymously - with other companies to "improve my user experience". Somehow, every time a company wants to "improve my user experience", the company ends up with more revenues and I end up with a big brother-esque experience (aka Amazon suggestions based on past history)

    Hence, I'm all for limiting the ability of people outside search companies to research and refine search technology.

    link to this | view in chronology ]

  • identicon
    aReader, 23 Aug 2006 @ 1:32pm

    Track me not

    This has made the extension http://mrl.nyu.edu/~dhowe/TrackMeNot/ very popular. I don't really like the extension as by creating ghost traffic it could backfire. The search engine servers will have to deal with ghost queries even if they do not intend to store and publish the search data.

    link to this | view in chronology ]

  • identicon
    Luke Metcalfe, 23 Aug 2006 @ 2:31pm

    Other means of getting data for research

    Why don't researchers make use of proxies more? So they have subjects specify a proxy in their browser and just keep searchings. That way they can control for the types of people using the data, and get to see more than just what they did on the AOL site. (Although this data is quite rich showing which sites they went to after searching).

    link to this | view in chronology ]

  • identicon
    gene becker, 23 Aug 2006 @ 9:20pm

    Using the AOL data is not ethical

    I think the ethical problems with doing research on the AOL dataset are significant. See this post for longer discussion: The Ethics of the AOL Search Data Disclosure

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.