The Story Behind Facebook Threatening To Sue Developer Into Oblivion For Highlighting Useful Facebook Data

from the how-nice-of-them dept

Facebook's lawyers have been getting pretty nasty lately. We recently covered the company's threats against the creator of a useful Greasemonkey script, and now a developer named Pete Warden has shared the sordid details of his legal run-in with Facebook -- where they threatened to sue him for his activity aggregating publicly available data found on Facebook.

You should read the full story, but basically, he built a simple crawler for public Facebook info, initially for his own purposes. He made sure that Facebook's robots.txt didn't block such crawlers -- and he also emailed someone at Facebook (who he had dealt with before), but didn't hear back from anyone. As his crawler worked, it started collecting a bunch of interesting data, and so he set up a website to let people explore some of this (again, public) data.

After playing with some of the data himself, he started making some interesting maps and charts with the data, and did a simple analysis of geographic locations of Facebook friend connections to show people what you could do with the data. He noted that if others (such as professional researchers) wanted to dig into the data, he would let them access a version of the data set (with identifying info stripped). The chart he released got picked up by a variety of sites and quickly got passed around.

And that's when the lawyers called:
On Sunday around 25,000 people read the article, via YCombinator and Reddit. After that a whole bunch of mainstream news sites picked it up, and over 150,000 people visited it on Monday. On Tuesday I was hanging out with my friends at Gnip trying to make sense of it all when my cell phone rang. It was Facebook's attorney.

He was with the head of their security team, who I knew slightly because I'd reported several security holes to Facebook over the years. The attorney said that they were just about to sue me into oblivion, but in light of my previous good relationship with their security team, they'd give me one chance to stop the process. They asked and received a verbal assurance from me that I wouldn't publish the data, and sent me on a letter to sign confirming that. Their contention was robots.txt had no legal force and they could sue anyone for accessing their site even if they scrupulously obeyed the instructions it contained. The only legal way to access any web site with a crawler was to obtain prior written permission.
Mathew Ingram reported on the data getting forced down, and got a statement from Facebook that seems to miss the point:
Andrew Noyes, manager of public policy communications at Facebook, said in an email that Warden "aggregated a large amount of data from over 200 million users without our permission, in violation of our terms. He also publicly stated he intended to make that raw data freely available to others." Noyes also noted that Facebook's statement of rights and responsibilites says that users agree not to collect users' content or information "using automated means (such as harvesting bots, robots, spiders, or scrapers) without our permission."
But I still don't see what the legal argument is. At best, I could see them terminating his account for disobeying the terms of service -- but even then the whole thing doesn't make much sense. The data is publicly available and, as Peter notes, it's pretty much standard practice for people to aggregate and analyze such data. However, he also pointed out that he couldn't afford to be a legal test case, and so he gave in and negotiated with Facebook to remove the data.

In the end, though, this shows Facebook's rather schizophrenic view towards data and privacy. On the one hand, it tries to push everyone to open up their info, but then if anyone does anything useful with it, they threaten to sue?
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: crawler, facebook, legal threats, public information


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • identicon
    dave blevins, 7 Apr 2010 @ 11:15am

    Goodbye Facebook, it wasn't nice to know you.

    link to this | view in chronology ]

  • identicon
    Richard, 7 Apr 2010 @ 11:28am

    since forever

    weeeelll this is an old battle, remember the aggregators of the late 90's? hotspot err altavista (before Fast bot) and so many others I cant remember. I don't really know what happened with any of those cases except that companies were sued out of business. The serach engines (also aggregators) lasted and the others ended up on the penny exchange. I think the tried and true "sued into oblivion" strategy is the real story here. I mean, thats a massive failure of the legal system. It's denying justice to the poor and thats unconstitutional.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 7 Apr 2010 @ 11:30am

    He's lucky he wasn't arrested. Many prosecutors still seem to think that violating a site's terms of service means you're violate federal hacking laws.

    Does data that requires you to login constitute "public" data though? Where's the threshold?

    link to this | view in chronology ]

    • identicon
      Ryan, 7 Apr 2010 @ 11:31am

      Re:

      Yeah, didn't he just commit federal computer fraud under the Lori Drew law? Or was that only in Missouri(for now)?

      link to this | view in chronology ]

    • identicon
      Anonymous Coward, 7 Apr 2010 @ 11:42am

      Re:

      thanks to facebook's new policy most of that data is public. not stuff you log in to see, but just go to facebook and you can see it.

      in fact everyone should log out of facebook and google their names to make sure they know what is posted publicly.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 7 Apr 2010 @ 11:46am

    We have reached a point in time where corporate declarations hold the rule of law.When a company says you cannot use Product X to do Action Y, then the courts will say aye, tis the law of the land.

    link to this | view in chronology ]

  • identicon
    Beta, 7 Apr 2010 @ 11:46am

    I know logic isn't involved, but...

    If Facebook's entire argument is based on his using an automated tool to gather this information, then he could crowdsource it: announce his plan to Facebook users and invite them to contribute information which they collect by hand.

    And by "could", I mean "could have". By which I mean that once he made the announcement, it'd be hard to prove that the data in his possession hadn't come from big crowds of helpful Facebookers.

    link to this | view in chronology ]

    • identicon
      david G, 7 Apr 2010 @ 11:54am

      Re: I know logic isn't involved, but...

      "And by "could", I mean "could have". By which I mean that once he made the announcement, it'd be hard to prove that the data in his possession hadn't come from big crowds of helpful Facebookers."

      One problem, with today's standard you are guilty until you prove YOU DID NOT get it by other means.

      link to this | view in chronology ]

  • identicon
    Dave G, 7 Apr 2010 @ 11:49am

    I saw this yesterday and it really irked me

    I saw this yesterday on another site. I went and read the blog and this really irked me. I wish peopel woudl get together and say enough when we see these type of abuses. Some peopel had arguments that their EULA states you can't spider their site without previous permission, but I say, then don't allwo it in your robot.txt file. You can't play the open card, then shut the door when it dosn't server your purpose. I feel the same way about people who set up rss feeds, then state you cannot use it in an open manner in some blurb on the website, but not int the feed itself.

    link to this | view in chronology ]

  • identicon
    John Doe, 7 Apr 2010 @ 11:58am

    All about control...

    They only want you to open up your info if THEY can control it. They don't want anyone else to have it; you must come to them to get it. If it is that useful, they will want to charge for it.

    Personally I don't believe they have a legal leg to stand on, but our court system is for the rich as the rest of us can't afford to fight.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 7 Apr 2010 @ 11:59am

    Who gives a rat's @$$

    link to this | view in chronology ]

  • icon
    JackSombra (profile), 7 Apr 2010 @ 12:14pm

    "On the one hand, it tries to push everyone to open up their info, but then if anyone does anything useful with it, they threaten to sue? "
    The reason is very simple if you ask yourself a simple question, how does facebook make money?

    Via two methods. The obvious one is advertising, the second, not so obvious method is selling info like what was collected by this guy. He was cutting into their revenue stream, hence the trigger happy (but imo toothless) lawyers

    link to this | view in chronology ]

  • identicon
    Dental Chicken, 7 Apr 2010 @ 12:41pm

    This sounds like something for the EFF to handle.

    If this is not a violation of the FB EULA (which I can't say whether I know it is or not) then it seems to me this is something that falls squarely in the charter of the EFF.

    link to this | view in chronology ]

  • identicon
    JP, 7 Apr 2010 @ 12:58pm

    Tresspass to Chattels

    Facebook could use Tresspass to Chattels [http://en.wikipedia.org/wiki/Trespass_to_chattels] to win their case. It's been done before. As much BS as it is, I wish the EFF would help.

    link to this | view in chronology ]

  • identicon
    cc, 7 Apr 2010 @ 1:25pm

    What kind of an argument is, "You can do it, but you can't make a computer do it for you"?

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 7 Apr 2010 @ 3:02pm

      Re:

      it is a question of speed and volume just like a library compared to a torrent. as a single person clicking and making notes you might get a few dozen pieces of information. a bot running 24 hours per day will collect much more data more than anyone would personally need. scale is key.

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 7 Apr 2010 @ 6:38pm

        Re: Re:

        Except...anyone personally studying Facebook's publicly available data, of course.

        link to this | view in chronology ]

      • icon
        nasch (profile), 8 Apr 2010 @ 7:58am

        Re: Re:

        We all understand computers can do it faster, but how does that change anything LEGALLY?

        link to this | view in chronology ]

  • identicon
    mark, 8 Apr 2010 @ 7:59am

    Why don't they...

    ...just hire the guy? I'm baffled.

    link to this | view in chronology ]

  • identicon
    V for Vendetta, 31 Jul 2010 @ 7:13am

    Facebook data leak - download all files here

    The original work, released a few days ago, was done on a Unix machine, and therefore, used Unix compression, which is woefully inadequate when compared to even WinRAR.

    So I have all the original Facebook data, decomressed them, and tested three Windows-based compressors - WinRAR won out (the other contestants were 7-Zip and WinZIP)

    The original data are merely huge text files, and came in at a hefty 15GB. With WinRAR, I was able to get that to just a bit over 2GB.

    If you would like the files, you can download them yourselves from RS, much faster I suspect than from a torrent. Here are the links:

    http://rapidshare.com/files/409949014/Facebook.repacked.part01.rar
    http://rapidshare.com/ files/409947525/Facebook.repacked.part02.rar
    http://rapidshare.com/files/409947812/Facebook.repacke d.part03.rar
    http://rapidshare.com/files/409997211/Facebook.repacked.part04.rar
    http://rapidshare. com/files/409997597/Facebook.repacked.part05.rar

    [b]YOU MUST DOWNLOAD all five files to get the data.[/b] Click FREE USER button if not a Premium Member.

    It's ALL public information, so is all legal - kinda fun to peruse through, though not exciting.

    Enjoy.

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.