UK Police Worried About Online Crime Maps

Search Engine Cache Isn't Copyright Infringement

from the good-news-for-search-engines dept

Fri, Oct 24th 2008 5:37am — Mike Masnick

There are some out there who have suggested that search engines such as Google and Yahoo are basically just massive copyright violators, because they scan, index and keep an archive of websites. That copied archive (usually called a cache) is, according to these commenters, an unauthorized copy. Now a court has basically destroyed that argument, noting that putting content online is giving an implicit license for search engines to index and copy. The lawsuit also claimed that individuals who visited the cached version were also infringers -- but the court also rejected that argument, claiming that the implied license extends to those users. The only part of the case that seems to be moving forward is whether or not this implicit license was broken after the lawsuit started and search engines still didn't take down the content. The idea there was that any explicit notification by the content holder might override the implicit license -- and thus search engines should have taken down the content as soon as the lawsuit started (thus signaling an explicit revoke of the license). Of course, the whole thing seems pretty silly. If the guy didn't want his content indexed, he should learn what a robots.txt file is for.

Filed Under: cache, copyright, search engine
Companies: google, microsoft, yahoo

39 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

eleete, 24 Oct 2008 @ 5:48am

robots.txt
The robot file is good also, but why not just password 'protect' the content. That is available to them also. Fact is, there are many ways a content publisher can remove their content from a search engine. Many users trying to get content IN to the engines make one small mistake and find themselves out or on page 12 of the results. Sounds like they were enjoying the traffic until something they didn't approve of happened. Then the copyfight broke out. I agree with this ruling. If you don't want your content out there, protect it yourself. There are many things a content owner can do to avoid this. I think this particular case was about getting money out of the big boys. Greed.
[ link to this | view in thread ]
Benjie, 24 Oct 2008 @ 6:18am

Public
If the data/information is publically avaliabe, then it's the website's fault.

If the data is able to be index/scanned becvause proper security messures aren't taken, then it's the web sites fault

If someone released non-public data outside of the website's wishes, then the search engine should purge that data
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 6:26am

Not the site
In spite of the cached copy, a search engine is NOT the cached site. A search engine is, roughly, a travel guide. A conveniently arranged source of information that tells you how to get where you want to go.

Or, in the words of a particular Eastern religion, the finger pointing at the moon is not the moon.
[ link to this | view in thread ]
John Doe, 24 Oct 2008 @ 6:41am

Interesting problem here. Funny that you mention that it is up to the website operator to use the Robots file to stop search engine indexing. That is like saying a guy should lock his doors if he doesn't want robbed. Well yea, but that still doesn't excuse the robber. And just like a robber breaking a window, the search engine could ignore the robots file.

This whole problem though, does go right along with your ideas about reworking copyright laws. Allowing copyright laws to block search engines would severely hinder "progress". But then again, search engines profit from this data which technically violates copyright law.
[ link to this | view in thread ]
Anonymous Coward, 24 Oct 2008 @ 6:46am

Cached sites
Correct me if I am wrong, but when you browse any site, your browser downloads a copy of the images and pages to your browser cache on your local computer. Does this mean that every browser out there was considered illegal by these commenters?
[ link to this | view in thread ]
Ben Robinson, 24 Oct 2008 @ 6:52am

Robber analogy
I think that is a terrible analogy. More apt would be to say you have your doors wide open to the whole world but get pissed when one particular person comes in. Robots.txt is like a little sign out front that can say "No Google Allowed". Sure, they could ignore the sign, but 1.they won't ignore it and 2.if they did at least then you would have a legitimate gripe.
[ link to this | view in thread ]
Michial, 24 Oct 2008 @ 6:54am

robot.txt is useless in a lot of cases
The rules you set in robot.txt is only as good as the search spider is willing to follow. It seems only the major engines abide by your wishes.

In many cases the search engines are doing a serious injustice by caching the content of sites. Especially in the case of my site which is entirely data driven.

Depending on the entry page, and the options chosen the content will be different from person to person, and since there is a live system in use by end users, even the content of a given page outdated in a matter of minutes in some cases...

All in all I have a robot.txt file, and then I als monitor my logs for spiders and just block their IP addresses.
[ link to this | view in thread ]
John Doe, 24 Oct 2008 @ 6:58am

Re: Robber analogy
Ah, but you left the doors wide open for them to "view" the goods not "take" them.

As for the browser caching a copy, yes they do and always have. So putting stuff on the internet means you accept that behavior.
[ link to this | view in thread ]
Jim, 24 Oct 2008 @ 7:09am

Re: Re: Robber analogy
Not to be mean, but you have twice demonstrated the fact that you don't understand how search engines work.
[ link to this | view in thread ]
John Doe, 24 Oct 2008 @ 7:14am

Re: Re: Re: Robber analogy
Would you care to elaborate? Making statements like this without backing it up is kind of pointless.
[ link to this | view in thread ]
Dex, 24 Oct 2008 @ 7:22am

Re: Re: Robber analogy
That analogy makes no sense: no one is "taking" anything. It's more like they went in and took a picture of your stuff, and then let people look at the pictures, and told people how to go see your stuff for themselves if they so desire.

Why are people putting content on the web if they don't want people to find it? This business model makes no sense to me.
[ link to this | view in thread ]
John Doe, 24 Oct 2008 @ 7:28am

Re: Re: Re: Robber analogy
You are taking a picture of "copyrighted" material. You can't take a photo of a painting and use if for profit.

You did see that this article is about caching the content and not just indexing it? Indexing wouldn't be a problem. You are just reporting you found these words on this site. Caching the actual site is making a copy of it and does run afoul of the copyright law. This kind of thing is exactly why Mike says copyright law needs to be re-thought out.
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 7:31am

Re: Re: Robber analogy
Your computer on your desk in your house, you expect to be private. Were "Teh Goog" caching that for public search, I can see you getting bunged.

A website is a billboard on a busy freeway. All Google does is say "There's a billboard here". One puts a billboard on a busy freeway, presumably, because one wants it to be seen. Having a 3rd party point at it and say, "look over there" hardly seems to be "taking" your content. That is, nobody goes to Google to see your billboard, they go to Google to see where to find your billboard.

As for the browser caching a copy, yes they do and always have. So putting stuff on the internet means you accept that behavior.

Indeed. So who are these webmasters who are unfamiliar with Google and its purpose, again?
[ link to this | view in thread ]
John Doe, 24 Oct 2008 @ 7:34am

Re: Re: Re: Robber analogy
Obviously the ones suing are unfamiliar. To continue the devil's advocate role, who says search engines have to cache content? By indexing and returning searchers to the site, they see whatever is there at that moment. By caching the site, they have created a copy of "copyrighted" work for re-display. This is a very fine point here.
[ link to this | view in thread ]
Anonymous Coward, 24 Oct 2008 @ 7:38am

Re: Robber analogy
The robber analogy is flawed, because the robber is taking something away from the house, while the cache is just making a (usually inferior) copy of it; a more appropriate analogy would be looking in an open window and taking a not of what is inside the house.

I would also like to note that:

a) cached pages are out of date;

b) most cached pages don't include much of the content of the original page, or it doesn't work properly (Javascript)
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 7:43am

Re: Re: Re: Re: Robber analogy
You are taking a picture of "copyrighted" material. You can't take a photo of a painting and use if for profit.

Not in and of itself, no, but as part of a larger creative work, like a collage, you most certainly can. If one wanted to argue that Google's indexing of the world is a creative endeavor.

Or as a catalog of factual data. If one wanted to argue that all the search engine is doing is publishing a factual list of things ("sites") and their locations.

Or a travel guide.

Or perhaps dozens of other examples.

And, neverminding that Google does not in fact profit from the cache views - they sell no advertising on those pages, and they PROMINENTLY announce that it is a cache view and NOT the actual site, so there is no argument of "confusion" here, either.
[ link to this | view in thread ]
John Doe, 24 Oct 2008 @ 7:44am

Re: Re: Robber analogy
The "flaw" is the whole crux of Mike's argument about copyright law. This goes back to copying music, video, games or digital whatever.
[ link to this | view in thread ]
InanimateOne, 24 Oct 2008 @ 7:54am

No Index Meta Tags
I knew prior to ever uploading my website that search engines kept a cached copy of pages, that I needed to specify what pages I did not want indexed, and that I needed a robots.txt file. The fact this guy didn't know that is his fault not the search engines. The "no follow, no index" meta tag works great. Kind of ironic that a simple Google search for "do not index" could have saved this dude a lot of trouble.
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 7:56am

Re: Re: Re: Re: Robber analogy
Obviously the ones suing are unfamiliar.

Ah. Fine. Perhaps they've been under a rock. It could happen.

Say you write a book. It's a great book. You turn it over to $BigPublishingHouse... but then you notice that it's in bookstores! Those BASTARDS! Just what do they think they're up to??!? You had no idea they were going to sell it, fer Christ sake!

You had the reasonable expectation that nobody would ever know about your book unless you told them of it personally, or ... I don't know ... maybe magic faries clubbed them with the Stick of Obscure Realization or something.

But to think $BigPublishingHouse would take your manuscript and make it public?? Unthinkable.
[ link to this | view in thread ]
DitchDigger, 24 Oct 2008 @ 7:58am

Not quite black& white - (new robbery analogy)
The robbery analogy somewhere up in this thread oversimplifies (as is pretty common to 'puter folk) the problem. Even with brick & mortar world, walking into a house through an unlocked door does not constitute a robbery. At worst, the act's considered trespassing. That analogy would only apply if the header had a Nofollow/Noindex/nocache or some such - which may be seen as a "lock", but I'd say more of a "No Solicitors" sign on the door. And even an .htaccess file does not an *explicit* lock - it only brings on a more interesting question - is the expectation of privacy on the net justified, and, if so, why?
[ link to this | view in thread ]
jonnyq, 24 Oct 2008 @ 8:05am

Re: Not the site
Do a google search and click "cached version" next to a result. That's what we're talking about.

I'm kinda on the fence here. It seems like we're carving out a copyright exception to deal with the way the digital world works instead of just fixing the law in the first place.

I'd rather see a few less exceptions carved out long enough to force the law to be fixed.

But, I could be wrong, and the law could be correct in the first place.
[ link to this | view in thread ]
nasch, 24 Oct 2008 @ 8:37am

Re:
It's more like, if you're going to put your stuff out on the sidewalk, if you don't want people to take it you should put up a sign that says please don't take this stuff.
[ link to this | view in thread ]
Ed, 24 Oct 2008 @ 8:40am

The problem seems two fold.

1) Google profiting from caching: Since Google does not "seem" to put their own adds on cached pages, I do not see that they are directly profiting from the caching. They are making money on the ads on there indexing pages, not there caching.

2) Copyright truly needs to have limits attached. Not only of duration, but also of owners absolute control. As far as I know, a library does not have to pay special fees for purchase a book, in order to lend it to the public. It is understood that this is in the public interest. Yet I can easily imagine some authors screaming "They are accessing my work for free! Each reader owes me X dollars!". This would be ridicules, but I bet some think it appropriate. The same could reasonably be applied to Google. Yes, I realize the library payed for the one copy of the book, and Google did not, but neither did it really cost the author/publisher anything for Google to possess a copy, while the library has paper/printing/binding to pay for.

And really, who goes to the cached page anyway, if the real page is up, and intact? I certainly only use them as backup access.
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 8:43am

Re: Re: Not the site
Like I said, in these comments even, they PROMINENTLY announce that it is a cache view and NOT the actual site, so there is no argument of "confusion" here.
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 8:47am

Re:
And really, who goes to the cached page anyway, if the real page is up, and intact? I certainly only use them as backup access.

They're also handy to peek at search results that might otherwise get swallowed by *ork's web filters.
[ link to this | view in thread ]
Anonymous Coward, 24 Oct 2008 @ 8:55am

I'm not sure caching, in and of itself, is the issue. You have download a copy to view in your browser - whether or not this is saved in RAM or on a hard drive, as well. I suppose you could try to re-implement browsers such that you receive a stream and they must parse that data into the appropriately tokenized tree.

What makes this special in terms of search engines is that the cached copy is available - it's being redistributed by the search engine.

Robots.txt does not cover caching. That's what the appropriate meta tags are for. You can be indexed but not cached with the correct setup, provided the search engines decide to obey what is basically a voluntary standard.

Remember, before the people who cloaked sites got wise, how you'd hit the cache of a paywalled site in Google? That's what this is about. And, of course, trying to suck blood from the lumbering behemoth which is Google.
[ link to this | view in thread ]
Rodney Dunham, 24 Oct 2008 @ 9:21am

Publishing text/images/movies/audio/whatever to a website is just that, publishing. Making available to the great unwashed for free. Anyone who doesn't get that should really hire someone who does before they make a website.

If anyone wants information to be unsearchable, they need only make a page with a button (not set to be a "submit" type, but which does an onclick=form.submit) that you have to press to proceed to the "protected" content. Then make sure not to link to that content any other way, or include it in your sitemap. Spiders won't see the protected stuff, humans won't have any trouble getting to it.

...but the whole idea of the web is to share information, so this seems like a bass-ackwards way to use it right from the start.
[ link to this | view in thread ]
Jane Somebody, 24 Oct 2008 @ 9:33am

Point of the Web
The web was created to enable the free and easy distribution of information (Scientific info at that time). If you choose to use something that was created for sharing information freely, then abide by that or create gates/locks to your information.

The person who is suing is an idiot. If they put info out there for me to read, I can easily print it, copy it, store it. I don't need permission to use the technology that the info was put on.
[ link to this | view in thread ]
Anonymous Coward, 24 Oct 2008 @ 9:41am

Re:
I just disagree here. I think these claims about Search Engine cache or even thumb nail images being copyright infringement are ridiculous.
If you want to use a good analogy I would look to Hollywood. Many celebs have their name protected under copyright laws. Say a celebrity then decides to get a phone hooked up in their home and they did not take the time to get their number unlisted. Now if the new phone book came out and included the celeb in the listing would this fall under a copyright violation? These kind of lawsuits are on par with the, “I spilled coffee on my lap now I will sue McDonalds for 3 million”. Just like a celeb would need to take steps to get their number unlisted so would a web designer need to take steps to incorporate a robot txt file. Most people want Google and Yahoo to scan and index their site.
Now if Sean Pen said, “What my names in the phone book!! That’s it I am going to court!!”,
a judge would throw the case out of court. Unfortunately when it comes to technology a majority of people are not familiar enough with the environment to really make a good decision on it.
[ link to this | view in thread ]
DanC, 24 Oct 2008 @ 10:07am

Re:
But then again, search engines profit from this data which technically violates copyright law.

Incorrect. It is not against the law to profit from copyrighted works if it falls under fair use.
[ link to this | view in thread ]
Mike (profile), 24 Oct 2008 @ 10:37am

Re:
That is like saying a guy should lock his doors if he doesn't want robbed. Well yea, but that still doesn't excuse the robber. And just like a robber breaking a window, the search engine could ignore the robots file.

No, not even remotely close. Putting a website online is specifically and proactively saying "here it is, we're open for business!" When one computer sends a request to visit the other, the other actively welcomes it and tells you to make a copy of it.

But then again, search engines profit from this data which technically violates copyright law.

Profiting does not technically violate copyright law. It's one factor of many, and there are plenty of cases where companies have been allowed to "profit" from others' copyrighted works. In fact, it's quite common.
[ link to this | view in thread ]
David, 24 Oct 2008 @ 10:51am

Wow, really glad about this ruling. After the way the Perfect 10 v. Google lawsuits were going I was thinking that some copyright decisions were going to make it very hard for search engines to keep operating the way we expect them to.
[ link to this | view in thread ]
PRMan, 24 Oct 2008 @ 11:49am

Robber analogy
Funny that you mention that it is up to the website operator to use the Robots file to stop search engine indexing. That is like saying a guy should lock his doors if he doesn't want robbed.

This is exactly like leaving your doors unlocked. Anyone can come in unless you:

1. Tell them that they can no longer come in
2. Put up a no tresspassing sign
3. Lock your doors

At any of those points, if someone then goes in anyway THEN they can say the guy entered illegally. ROBOTS.TXT is the web's way of posting a No Tresspassing sign. You can also use a META tag, which is more like telling individuals that they can not come in. Or, you can password-protect, which is like locking your doors.

If a guy goes into your house (with unlocked doors), takes a picture of the insides of your house and goes to the town square and shows everyone the pictures and you complained to the cops, the cops would laugh at you and tell you to install door locks or put up a No Tresspassing sign.

This is EXACTLY the same, and the judge made a good ruling.
[ link to this | view in thread ]
Dosquatch, 24 Oct 2008 @ 1:29pm

Re: Robber analogy
If a guy goes into your house

So, so close. It has to do with the expectation that a random person should be granted entry. Even if the doors on your house are unlocked, there's a certain reasonable expectation that not just any random schmoe should be wandering through your living room.

Replace "house" with something of a more public venue, like a convenience store. You expect public traffic. It's pretty much kinda the entire point... like a website. And then, say, there's this weird kid in town who, instead of cutting grass, has figured out how to make a few bucks telling other people what you have in your store. Not selling it, not taking away from your profits, more likely increasing your profits by sending extra people to your wares.

He doesn't charge you a cent. He doesn't cost you a cent. He increases your traffic and profits. He saves you a certain amount of contracting out advertising on your own.

Boy howdy, it's the wonder kid. One would think a "thank you" might be appropriate.

One would think.
[ link to this | view in thread ]
satan, 24 Oct 2008 @ 2:23pm

Kind of like walkin outside naked
and sueing people for looking at you.
[ link to this | view in thread ]
No Six Pack, 24 Oct 2008 @ 5:01pm

Google should remove said sites from the search results.
This would solve the complaint, no ?
[ link to this | view in thread ]
Estigy, 27 Oct 2008 @ 7:09am

How is "search engine" defined?
Can somebody please tell me, where I can look up the (legal) definition of a search engine?

I wonder what the borders look like that separate a (big and very general) search engine like Google from specialized ones like auto-generated link lists...

Thanks, A.
[ link to this | view in thread ]
MansickWRONGAgain, 28 Oct 2008 @ 8:37am

Mike: Don't ever pick horses
Mike:

Clearly, without question, your legal instincts are bad. See today's Google story for proof.

My guess is you're still hoping for a Ron Paul upset in the election.

Time to get out of the basement and see the world for what it REALLY is.
[ link to this | view in thread ]
Randall Krause (profile), 15 Aug 2011 @ 10:18am

Copyright is a guaranteed right, not a voluntary right
It amazes me how many people do not even comprehend copyright law in the U.S. Copyright was established to strike a balance between the interests of the public to gain access to artistic works while encouraging creators to continue to create artistic works as commodities for public consumption.

All of the analogies here are completely ridiculous. Breaking into someone's house? Photographing people's wares? Looking at people naked?!

A far better analogy is that somebody creates a lounge for people to come listen to music for free. Then the Yellow Pages stops in and besides merely adding the venue information to its phone book, encloses a free DVD with a reproduction of the recordings being played at the lounge in their entirety -- unless the venue owners explicitly post on its front door "unauthorized recording is prohibited."

Well last I checked, copyright law is not a voluntary right. It is a guaranteed right with the only conditional exemption being fair use -- which primarily applies to personal, non-commercial uses. So the venue owners shouldn't have to post anything to secure the protections of copyright against willful infringement.

To say that Google is NOT violating copyright law, is setting the legal precedent that "it is okay to make reproductions of publicly performed copyrighted works available in their entirety so long as you claim they are a cached copy from a search engine".

Thus pirates can now freely exchange music and movies by simply creating a "search engine" that records radio shows and TV programs (in which the public performance was provided free of charge) and makes those "cached copies" available without charge to the public while generating revenue from ancillary advertising on the search results pages.

Heck, I see some interesting business prospects here.

--Randall
[ link to this | view in thread ]