Search Engine Cache Isn't Copyright Infringement
from the good-news-for-search-engines dept
There are some out there who have suggested that search engines such as Google and Yahoo are basically just massive copyright violators, because they scan, index and keep an archive of websites. That copied archive (usually called a cache) is, according to these commenters, an unauthorized copy. Now a court has basically destroyed that argument, noting that putting content online is giving an implicit license for search engines to index and copy. The lawsuit also claimed that individuals who visited the cached version were also infringers -- but the court also rejected that argument, claiming that the implied license extends to those users. The only part of the case that seems to be moving forward is whether or not this implicit license was broken after the lawsuit started and search engines still didn't take down the content. The idea there was that any explicit notification by the content holder might override the implicit license -- and thus search engines should have taken down the content as soon as the lawsuit started (thus signaling an explicit revoke of the license). Of course, the whole thing seems pretty silly. If the guy didn't want his content indexed, he should learn what a robots.txt file is for.Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: cache, copyright, search engine
Companies: google, microsoft, yahoo
Reader Comments
Subscribe: RSS
View by: Time | Thread
robots.txt
[ link to this | view in chronology ]
Public
If the data is able to be index/scanned becvause proper security messures aren't taken, then it's the web sites fault
If someone released non-public data outside of the website's wishes, then the search engine should purge that data
[ link to this | view in chronology ]
Not the site
In spite of the cached copy, a search engine is NOT the cached site. A search engine is, roughly, a travel guide. A conveniently arranged source of information that tells you how to get where you want to go.
Or, in the words of a particular Eastern religion, the finger pointing at the moon is not the moon.
[ link to this | view in chronology ]
Re: Not the site
I'm kinda on the fence here. It seems like we're carving out a copyright exception to deal with the way the digital world works instead of just fixing the law in the first place.
I'd rather see a few less exceptions carved out long enough to force the law to be fixed.
But, I could be wrong, and the law could be correct in the first place.
[ link to this | view in chronology ]
Re: Re: Not the site
[ link to this | view in chronology ]
This whole problem though, does go right along with your ideas about reworking copyright laws. Allowing copyright laws to block search engines would severely hinder "progress". But then again, search engines profit from this data which technically violates copyright law.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re:
If you want to use a good analogy I would look to Hollywood. Many celebs have their name protected under copyright laws. Say a celebrity then decides to get a phone hooked up in their home and they did not take the time to get their number unlisted. Now if the new phone book came out and included the celeb in the listing would this fall under a copyright violation? These kind of lawsuits are on par with the, “I spilled coffee on my lap now I will sue McDonalds for 3 million”. Just like a celeb would need to take steps to get their number unlisted so would a web designer need to take steps to incorporate a robot txt file. Most people want Google and Yahoo to scan and index their site.
Now if Sean Pen said, “What my names in the phone book!! That’s it I am going to court!!”,
a judge would throw the case out of court. Unfortunately when it comes to technology a majority of people are not familiar enough with the environment to really make a good decision on it.
[ link to this | view in chronology ]
Re:
Incorrect. It is not against the law to profit from copyrighted works if it falls under fair use.
[ link to this | view in chronology ]
Re:
No, not even remotely close. Putting a website online is specifically and proactively saying "here it is, we're open for business!" When one computer sends a request to visit the other, the other actively welcomes it and tells you to make a copy of it.
But then again, search engines profit from this data which technically violates copyright law.
Profiting does not technically violate copyright law. It's one factor of many, and there are plenty of cases where companies have been allowed to "profit" from others' copyrighted works. In fact, it's quite common.
[ link to this | view in chronology ]
Cached sites
[ link to this | view in chronology ]
Robber analogy
[ link to this | view in chronology ]
Re: Robber analogy
As for the browser caching a copy, yes they do and always have. So putting stuff on the internet means you accept that behavior.
[ link to this | view in chronology ]
Re: Re: Robber analogy
[ link to this | view in chronology ]
Re: Re: Re: Robber analogy
[ link to this | view in chronology ]
Re: Re: Robber analogy
Why are people putting content on the web if they don't want people to find it? This business model makes no sense to me.
[ link to this | view in chronology ]
Re: Re: Re: Robber analogy
You did see that this article is about caching the content and not just indexing it? Indexing wouldn't be a problem. You are just reporting you found these words on this site. Caching the actual site is making a copy of it and does run afoul of the copyright law. This kind of thing is exactly why Mike says copyright law needs to be re-thought out.
[ link to this | view in chronology ]
Re: Re: Re: Re: Robber analogy
Not in and of itself, no, but as part of a larger creative work, like a collage, you most certainly can. If one wanted to argue that Google's indexing of the world is a creative endeavor.
Or as a catalog of factual data. If one wanted to argue that all the search engine is doing is publishing a factual list of things ("sites") and their locations.
Or a travel guide.
Or perhaps dozens of other examples.
And, neverminding that Google does not in fact profit from the cache views - they sell no advertising on those pages, and they PROMINENTLY announce that it is a cache view and NOT the actual site, so there is no argument of "confusion" here, either.
[ link to this | view in chronology ]
Re: Re: Robber analogy
Your computer on your desk in your house, you expect to be private. Were "Teh Goog" caching that for public search, I can see you getting bunged.
A website is a billboard on a busy freeway. All Google does is say "There's a billboard here". One puts a billboard on a busy freeway, presumably, because one wants it to be seen. Having a 3rd party point at it and say, "look over there" hardly seems to be "taking" your content. That is, nobody goes to Google to see your billboard, they go to Google to see where to find your billboard.
Indeed. So who are these webmasters who are unfamiliar with Google and its purpose, again?
[ link to this | view in chronology ]
Re: Re: Re: Robber analogy
[ link to this | view in chronology ]
Re: Re: Re: Re: Robber analogy
Ah. Fine. Perhaps they've been under a rock. It could happen.
Say you write a book. It's a great book. You turn it over to $BigPublishingHouse... but then you notice that it's in bookstores! Those BASTARDS! Just what do they think they're up to??!? You had no idea they were going to sell it, fer Christ sake!
You had the reasonable expectation that nobody would ever know about your book unless you told them of it personally, or ... I don't know ... maybe magic faries clubbed them with the Stick of Obscure Realization or something.
But to think $BigPublishingHouse would take your manuscript and make it public?? Unthinkable.
[ link to this | view in chronology ]
Re: Robber analogy
I would also like to note that:
a) cached pages are out of date;
b) most cached pages don't include much of the content of the original page, or it doesn't work properly (Javascript)
[ link to this | view in chronology ]
Re: Re: Robber analogy
[ link to this | view in chronology ]
robot.txt is useless in a lot of cases
In many cases the search engines are doing a serious injustice by caching the content of sites. Especially in the case of my site which is entirely data driven.
Depending on the entry page, and the options chosen the content will be different from person to person, and since there is a live system in use by end users, even the content of a given page outdated in a matter of minutes in some cases...
All in all I have a robot.txt file, and then I als monitor my logs for spiders and just block their IP addresses.
[ link to this | view in chronology ]
No Index Meta Tags
[ link to this | view in chronology ]
Not quite black& white - (new robbery analogy)
[ link to this | view in chronology ]
1) Google profiting from caching: Since Google does not "seem" to put their own adds on cached pages, I do not see that they are directly profiting from the caching. They are making money on the ads on there indexing pages, not there caching.
2) Copyright truly needs to have limits attached. Not only of duration, but also of owners absolute control. As far as I know, a library does not have to pay special fees for purchase a book, in order to lend it to the public. It is understood that this is in the public interest. Yet I can easily imagine some authors screaming "They are accessing my work for free! Each reader owes me X dollars!". This would be ridicules, but I bet some think it appropriate. The same could reasonably be applied to Google. Yes, I realize the library payed for the one copy of the book, and Google did not, but neither did it really cost the author/publisher anything for Google to possess a copy, while the library has paper/printing/binding to pay for.
And really, who goes to the cached page anyway, if the real page is up, and intact? I certainly only use them as backup access.
[ link to this | view in chronology ]
Re:
They're also handy to peek at search results that might otherwise get swallowed by *ork's web filters.
[ link to this | view in chronology ]
What makes this special in terms of search engines is that the cached copy is available - it's being redistributed by the search engine.
Robots.txt does not cover caching. That's what the appropriate meta tags are for. You can be indexed but not cached with the correct setup, provided the search engines decide to obey what is basically a voluntary standard.
Remember, before the people who cloaked sites got wise, how you'd hit the cache of a paywalled site in Google? That's what this is about. And, of course, trying to suck blood from the lumbering behemoth which is Google.
[ link to this | view in chronology ]
If anyone wants information to be unsearchable, they need only make a page with a button (not set to be a "submit" type, but which does an onclick=form.submit) that you have to press to proceed to the "protected" content. Then make sure not to link to that content any other way, or include it in your sitemap. Spiders won't see the protected stuff, humans won't have any trouble getting to it.
...but the whole idea of the web is to share information, so this seems like a bass-ackwards way to use it right from the start.
[ link to this | view in chronology ]
Point of the Web
The person who is suing is an idiot. If they put info out there for me to read, I can easily print it, copy it, store it. I don't need permission to use the technology that the info was put on.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Robber analogy
This is exactly like leaving your doors unlocked. Anyone can come in unless you:
1. Tell them that they can no longer come in
2. Put up a no tresspassing sign
3. Lock your doors
At any of those points, if someone then goes in anyway THEN they can say the guy entered illegally. ROBOTS.TXT is the web's way of posting a No Tresspassing sign. You can also use a META tag, which is more like telling individuals that they can not come in. Or, you can password-protect, which is like locking your doors.
If a guy goes into your house (with unlocked doors), takes a picture of the insides of your house and goes to the town square and shows everyone the pictures and you complained to the cops, the cops would laugh at you and tell you to install door locks or put up a No Tresspassing sign.
This is EXACTLY the same, and the judge made a good ruling.
[ link to this | view in chronology ]
Re: Robber analogy
So, so close. It has to do with the expectation that a random person should be granted entry. Even if the doors on your house are unlocked, there's a certain reasonable expectation that not just any random schmoe should be wandering through your living room.
Replace "house" with something of a more public venue, like a convenience store. You expect public traffic. It's pretty much kinda the entire point... like a website. And then, say, there's this weird kid in town who, instead of cutting grass, has figured out how to make a few bucks telling other people what you have in your store. Not selling it, not taking away from your profits, more likely increasing your profits by sending extra people to your wares.
He doesn't charge you a cent. He doesn't cost you a cent. He increases your traffic and profits. He saves you a certain amount of contracting out advertising on your own.
Boy howdy, it's the wonder kid. One would think a "thank you" might be appropriate.
One would think.
[ link to this | view in chronology ]
Kind of like walkin outside naked
[ link to this | view in chronology ]
This would solve the complaint, no ?
[ link to this | view in chronology ]
How is "search engine" defined?
I wonder what the borders look like that separate a (big and very general) search engine like Google from specialized ones like auto-generated link lists...
Thanks, A.
[ link to this | view in chronology ]
Mike: Don't ever pick horses
Clearly, without question, your legal instincts are bad. See today's Google story for proof.
My guess is you're still hoping for a Ron Paul upset in the election.
Time to get out of the basement and see the world for what it REALLY is.
[ link to this | view in chronology ]
Copyright is a guaranteed right, not a voluntary right
All of the analogies here are completely ridiculous. Breaking into someone's house? Photographing people's wares? Looking at people naked?!
A far better analogy is that somebody creates a lounge for people to come listen to music for free. Then the Yellow Pages stops in and besides merely adding the venue information to its phone book, encloses a free DVD with a reproduction of the recordings being played at the lounge in their entirety -- unless the venue owners explicitly post on its front door "unauthorized recording is prohibited."
Well last I checked, copyright law is not a voluntary right. It is a guaranteed right with the only conditional exemption being fair use -- which primarily applies to personal, non-commercial uses. So the venue owners shouldn't have to post anything to secure the protections of copyright against willful infringement.
To say that Google is NOT violating copyright law, is setting the legal precedent that "it is okay to make reproductions of publicly performed copyrighted works available in their entirety so long as you claim they are a cached copy from a search engine".
Thus pirates can now freely exchange music and movies by simply creating a "search engine" that records radio shows and TV programs (in which the public performance was provided free of charge) and makes those "cached copies" available without charge to the public while generating revenue from ancillary advertising on the search results pages.
Heck, I see some interesting business prospects here.
--Randall
[ link to this | view in chronology ]