Can Scraping Non-Infringing Content Become Copyright Infringement... Because Of How Scrapers Work?
from the this-seems-troubling dept
Earlier this year, we couldn't figure out how Facebook's lawsuit against Power.com made any sense. Power.com tried to aggregate various social networking accounts in a single place, so you could manage them all at once through a single interface. Yet Facebook charged the company with all sorts of complaints, including copyright and trademark infringement, unlawful competition and violation of the computer fraud and abuse act. Power.com asked for the case to be dismissed, but last month the judge sided with Facebook, but did so in a troubling way, by basically suggesting that since Facebook's terms of service prohibited these uses, it made it copyright infringement. Michael Scott points us to lawyer Jeff Neuberger's take on the ruling, and separately Tom O'Toole has a good analysis of the ruling. Neuberger states the following:Judge Fogel concluded that the allegations of the complaint made out a sufficient claim of copyright infringement because Power Ventures "need only access and copy one page to commit copyright infringement." The court also found that the ToU prohibited downloading, scraping or distributing content from the Facebook Web site content except that belonging to the user, and that in any event, using automated methods, i.e., "data mining, robots, scraping, or similar data gathering or extraction methods" to access any content were also prohibited by the ToU. Thus, the court found that the allegation that Power Ventures accessed Facebook via automated means constituted made out a claim of direct copyright infringement, while the allegation that Facebook users utilized the Power.com interface to access their own profile pages made out claim of secondary copyright infringement.Thus, because the terms of service said you can't do any automated scraping of the site, it's suddenly infringing? Even worse, the court found that even though the data being used by Power.com isn't owned by Facebook (it's the users') the scraping was still copyright infringement, because in order to scrape the non-infringing content, Power.com had to first "scrape" the whole page. O'Toole explains:
OK, so far the court has found that Power.com made unauthorized copies of the Facebook Web site. What about the fact that Facebook does not own the copyright in its users' profile data? Facebook surmounted this hurdle by arguing that the content of the Facebook page that surrounded the user's data is copyrightable and is owned by Facebook. According to Facebook, the Power.com scraper operated in a manner that required it to copy the entire Web page in order to extract the user's profile data....All of this seems a bit troubling, as it would effectively rule out scraping even non-infringing content, just because the scraper had to first read through copyrighted content to get to the non-infringing stuff. But, that seems to go against the entire purpose of copyright law. The fact that the scraper reads copyrighted content shouldn't mean that it's infringement. It's not doing anything with that content other than using it to find the content it can make use of. Anyway, this ruling probably doesn't mean all that much, since it was just to reject the dismissal request, but it does seem odd that the judge gave so much weight to Facebook's terms of service, and seems to indicate the mere act of scraping can be copyright infringement.
Note that the court is conditioning its ruling on the assertion that the Power Ventures scraper necessarily copied the entire Web page before it processed the page and extracted the profile data. That comports with my (limited) understanding of how a Web scraper works. But is it true? If it were true, couldn't an argument be made that this is a fair use of the page? I'll leave that for better lawyers.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: copyright, infringement, scraping, terms of service, trademark
Companies: facebook, power
Reader Comments
Subscribe: RSS
View by: Time | Thread
[ link to this | view in chronology ]
As far as I know, copyright infringement requires a medium - in other words, it is not infringement for Power.com to view copyrighted Facebook material without authorization, but it may be infringement for Power.com to republish copyrighted Facebook material without authorization. Admittedly I don't know exactly how Power.com's application works, but it would seem that if they are only republishing user information (info that Facebook does not own the rights to) then their scraper tool would fall into the first category.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
But that would be logical and rational.
Asking any large company to be logical or rational is like pulling teeth. It just very rarely prevails.
[ link to this | view in chronology ]
Re: Re:
No it wouldn't. How would such an option be enforced? At best FaceBook could set up it's API to play nice with such an option, but it wouldn't stop things like scrapers, or even just a 'friend' cut-and-pasting info to their blog.
[ link to this | view in chronology ]
What does the law actually say regarding automated retrieval of content?
I write scrapers, and the very way that they work requires you to use either an HTML POST or GET command to retrieve a target url, the same way your browser works, then you parse the results for the content you are after. Mostly I do all this client side, so it isn't done by my company's servers but rather in javascript in the client's own browser.
So really it's not my company's server that's accessing the content, but a client's computer, and in the case of a facebook profile, presumably the client has agreed authorized access with facebook. Surely therefore it's not breaching the ToS to access your facebook profile with your own browser through a secondary interface (scraper)??
No webmaster likes their precious site being scraped, but in all honesty there's not a great deal they can do about it unless they put a CAPTCHA image on every page, which would seriously detract from the user experience.
[ link to this | view in chronology ]
Re:
This is what popped into my head when I read the post. Why would a Browser not be a problem but a Scraper would be?
Also, as far as I know, it's nmot copyright infringment until you produce a copy -- if the product is just information FB don't control, there should be no case for infringement.
[ link to this | view in chronology ]
Re:
i would like to learn more about scrapers. plz email to chat in confidence at:
malleylaw@gmail.com
thanks
joe malley
[ link to this | view in chronology ]
Also, why should the service be held to terms of service that users, not the service, has a agreed to?
[ link to this | view in chronology ]
Dirty scrapers
Ok that's a forced argument - but the point is, if you don't want information to be available, don't publish it on the web. And this is as nothing to Facebook asking for my email password. What a stupid question.
[ link to this | view in chronology ]
Re: Dirty scrapers
[ link to this | view in chronology ]
Scrapers
We often scrape our own websites content and then display it on another of our websites to reduce the number of times we publish content.
I am sure one could write a scraper that only copied specific content directly from a full web page and display it.
[ link to this | view in chronology ]
The argument, it seems to me, goes thusly:
When a program or a browser or some other software accesses a website, it is making a copy of the content of that site. That content may be copyrighted. By default, this would be a violation of copyright law. However, the Ticketmaster website included some terms of use that permitted this - effectively granting a limited license for some uses, but not others. Thus, an ordinary user with a browser was not in violation, because they were accessing the content under a license.
RMG's product, on the other hand, explicitly went outside the bounds of this license, and thus fell back under the aegis of copyright law, where such copying is prohibited. I suppose RMG could have mounted a fair-use defense, but it appears they explicitly did not.
Now, you could argue, what about the millions of websites that do not have terms of use granting a license? Is just browsing these sites a commission of copyright infringement?
The answer is likely no, based on the Perfect 10 ruling. This ruling indicated that some caching can be fair use; in that case, the caching was "noncommercial, transformative, and has a minimal impact on the potential market for the original work." The difference in Ticketmaster was that the fair use defense was not adequately asserted, and moreover the RMG software was in pretty clear violation of Ticketmaster's terms of use.
[ link to this | view in chronology ]
Re:
None of those tiny details should matter at all.
Copyright is a horribly outdated pre-internet system that just isn't needed anymore.
Scraping like this should be less than no problem. Simple.
Granting tiny limited licenses? C'mon. Its dumb.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
What about Google?
[ link to this | view in chronology ]
Re: What about Google?
I thought the same thing, but it goes beyond Facebook. Is this yet another ruling that would effectively criminalize Google if it were applied to everyone (of course it won't be)?
[ link to this | view in chronology ]
uh?
Truly there is bad lawyering, or the judge is unfamiliar with the internet.
[ link to this | view in chronology ]
Re: uh?
[ link to this | view in chronology ]
Re: uh?
Nah, you seem to laboring under the illusion the law apples equally to everyone. It doesn't.
[ link to this | view in chronology ]
whaaaaa faceplant
including,
1) we aren't making any money off of it so it must be illegal somehow.
2) our TOS states that you can not out innovate us
3) we own all user submitted content, including copyright
[ link to this | view in chronology ]
1) Get entire page
2) Parse for content (either automatically, or by a human reading the interesting parts and skipping things like ads)
3) ???
4) Profit
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Can Scraping Non-Infringing Content Become Copyright Infringement... Because Of How Scrapers Work?
[ link to this | view in chronology ]
Search Engines come to mind.....
[ link to this | view in chronology ]
For More Details : https://www.youtube.com/watch?v=Megr1WncZk4
[ link to this | view in chronology ]