European Newspapers Look To Reinvent Robots.txt
from the let's-try-this-again dept
After losing its appeal on Friday, Google relented and actually has posted the entire text of the ruling against it on the Google.be site. It does look a bit odd to see so much text on a Google homepage, though, Google put it in a tiny font and without any formatting at all, making it pretty difficult to read. It's still not at all clear what purpose this serves. Speaking of serving pretty much no purpose at all, it turns out that, in the wake of all this, a bunch of European newspapers are trying to create a new system by which they can tell Google not to index them. If this sounds suspiciously like the already available robots.txt file, you'd be right. In fact, in explaining the reason behind this, the publishers working on this state: "Since search engine operators rely on robotic 'spiders' to manage their automated processes, publishers' Web sites need to start speaking a language which the operators can teach their robots to understand. What is required is a standardized way of describing the permissions which apply to a Web site or Web page so that it can be decoded by a dumb machine without the help of an expensive lawyer." You know... if they only had a search engine, where they might do some searches to see if something like that already existed, it might help. But, I guess that according to many of those publishers, that would be copyright infringement.To be fair, it does sound like these publishers are looking to put together something that goes a bit further than robots.txt, but it still looks like they're reinventing the wheel, rather than working on top of what's already there. Once again, though, this seems to all go back to the jealousy issue. This isn't about protecting content. It's about being jealous that Google has built a successful business making their content more valuable -- and they feel that the increased traffic and increased ad revenue isn't enough of a payment. It still takes quite a misunderstanding of the internet to complain when someone gives you traffic that they're not paying you enough for it. It wouldn't be unfair to then suggest that Google stop sending them traffic altogether. These publishers seem to assume they're in the power seat here, and that it's their content that makes Google valuable. That's not the case. The value in Google is its ability to make that content easier to search and find. If the publishers want to go back to the days when it was harder to find their content, that's their problem -- but it seems quite likely they'll regret it if that comes to pass.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Reader Comments
Subscribe: RSS
View by: Time | Thread
Google not indexing the news sites.
[ link to this | view in thread ]
Still you gotta laugh...
Possibly there are groups of users out there who just read the indexed copy and no further and never actually go onto the papers websites and I would assume this is what the papers are most worried about.
However.... I would also assume that the kind of people who just read the first few lines of an indexed or cached page, are probably the kind of people who wouldn't go to a newspaper site anyway
Seems to me the main interest is in securing revenue for their particular papers by ensuring that if you want to read the news you have to go there, however....
1) The less you advertise the less traffic you are going to get (in effect Google is giving them free advertising)
2) Unless you get a complete monopoly and every paper and news source joins in wholeheartedly, people are just going to go elsewhere cos it’s easier (and the internet is not an environment which looks kindly on monopolies)
As regards the new 'improved' robots.txt they seem to be suggesting why not just use the one already there? If their websites are so unsecured that a simple spider can go to members only pages, which they seem to suggest, I would suggest in return that the issue is with their own crap web programmers not Google. If you want a new system to work it most likely will have to be done in cooperation with the major search engines - taking them to court on a first date probably isn't the way to go about this
In a day and age when I can go to any book store and find at least 2 books devoted to how to get the googlebot to visit more often, I can't help but laugh at the ridiculousness of the order
If you want more people to visit your news site and possibly pay for it make the copy more engaging, more informed, better than the competition. Don't ban people from advertising it!
I can't wait for the next instalment
[ link to this | view in thread ]
They also obviously have very little clue about how websites work, or how search engines work.
And what about other search engines, such as Yahoo or MSN? Are they banned from doing the same thing Google apparently did that's so wrong? Or is it okay for them to do it?
[ link to this | view in thread ]
GREAT Idea, HORRIBLE reason to implement it
We really do need an xml based "site directory" that includes different "content handling permissions" for different content. Permission levels, syndication rights, time limits, and all sorts of other wonderful things to allow the sites to better control how their content gets used. Right/wrong/indifferent, it should be up to the content owners how their content gets used.
HOWEVER, if a belgian court (who CLEARLY doesnt understand what the hell they are doing) drives this technology improvement, it will ABSOLUTELY get fubar'd along the way. This is the type of thing that needs to be created as an industry standard by a working group, and done in such a way that everyone sees benefit from ease of management, and enhanced search engine visibility. A court having any say in the matter will ensure its nothing but an example of waste and abuse. Some government WASTING their own money ABUSING a company providing a perfectly valid product.
(this was not intended as a troll btw, so if I look like an idiot saying this, then so be it, I'm probably an idiot.)
[ link to this | view in thread ]
IEEE'd
Google takes a lot of crap from people (especially stupid ones) because of the spiders. However, if a site can't lock down their material for people who have a login or some sort of authentication then they deserve to get linked. This is clearly a case of somebody who probably got too drunk one night before work and due to a bad hangover forgot to set permissions on members only pages for members friggn only, and not wanting to lose his job, passed the buck on to Google.
Coward's right though - standardization is key here; an IEEE for Search Engine Optimization.
[ link to this | view in thread ]
Google is stealing.....plain and simple
[ link to this | view in thread ]
Re: Google is stealing.....plain and simple
You were joking, right?
[ link to this | view in thread ]
I don't see that many sites going this way, and those that do, deserve what they (don't) get.
[ link to this | view in thread ]
Re:
Woot! I love that idea. I'm going to buy some stock in NYT and then opt all other newspapers out of google news.
But don't worry, I'll spoof your IP address when I do it.
[ link to this | view in thread ]
Google Breaketh not the 8th Commandment
And if you're thinking of reaching beyond that into a more broad search engine blast, are you actually suggesting that search engines in their purest form are stealing? Do you realize how ridiculously impossible it would be to use something as vast as the Internet successfully without a search engine?
I'm not saying Google is perfect, but you have to respect the fact that they have made some amazing strides in SEO, advertising, and a myriad of other fields as well (go to Google and click on Other...).
[ link to this | view in thread ]
Companies and money
[ link to this | view in thread ]
[ link to this | view in thread ]
Belgian court is so clueless...
I realize they're a democracy, but I'm pretty proud to be an American because at least our retarded officials aren't retarded to THIS extreme... at least I hope they're not.
[ link to this | view in thread ]
What's next?
I think the court needs to forget about technology and focus on their best known export - WAFFLES. :-P
How many people want to bet the judge in charge of the case can't send an e-mail with an attachment, let alone properly format a search query?
SOLUTION: GIVE THE LUDDITES BACK THEIR TYPEWRITERS AND BANISH THEIR ASSES FROM THE NET!
Correction fluid, anyone?
[ link to this | view in thread ]
User Review
That leads to another question: What happens to the members of those news feeds that use Google to preview the content? Do they no longer get to see the content they pay for without logging into the site? That also seems like a waste of time for someone just wanting the headlines.
[ link to this | view in thread ]
IEEE
Sort of like saying "I am the god of counter-strike". I'm not actually God, just the god of counter-strike.
[ link to this | view in thread ]
Re:
You know, you've identified (historically) the right body.
Unfortuantely, the W3C doesnt have a clue about getting standards adopted. They need to be disposed of and replaced with something useful.
Who in their right fricking mind releases a standard KNOWING that not a single product currently conforms to it, and also fails to release ANYTHING that actually helps companies get their products to spec.
Also, the w3c has made all sorts of asinine requirements to be up to spec, that have absolutely no benefit to the software makers. What good is a spec if you can't get anyone onboard with the implementation of it?
At least IEEE can actually convince companies to follow the spec. W3C doesnt seem to want anyone to follow theirs.
/offtopic-ness
[ link to this | view in thread ]
theyarejealousofgoogle.com
I couldn´t have said it in a more accurate way.
Ditto!
[ link to this | view in thread ]
[ link to this | view in thread ]
Google is one big theif!
[ link to this | view in thread ]
Belgians have Newspapers?
Maybe they view the internet as another Belgian Congo and are out to rape and pillage whatever they can in cyberspace too.
I don't know about waffles, but their chocolates and beer are great, they should stick to what they do well.
[ link to this | view in thread ]
Re: Google is one big theif!
For the love of all that is logical, GET A CLUE! What google is doing is NOT STEALING.
If you are trying to spread the anti-google message, you should probably just silence yourself, as you are NOT doing your cause a favor by making all anti-google zealots look like driveling morons.
There I have stooped low enough to respond to a troll. Now I need to go wash myself. I feel so dirty...
[ link to this | view in thread ]
who knows
[ link to this | view in thread ]
Google is dead in the wrong
However, this is NOT what Google is doing. Google News visits the newspaper's website, pulls ALL of the copyrighted news headlines and story snippets, and sticks it all up on Google's own site. Google returns to the newspaper's website the next day and does it again. And the next day. And the next. Google repeatedly rips off the intellectual property of these other websites day in and day out, and puts that content on Google's own website. Google is NOT simply *providing a link* to those websites to send them traffic.
The fact of the matter is that *Google's value* is increased by all of the content that has been created by many other people. It creates value for Google because people are able to visit Google and see ALL of those headlines and news snippets and associated photographs (yes, Google even steals the photos that appear with the news headlines from those newspapers' sites) in one place - the Google News website. This in turn drives increased traffic to *Google's* news site - and thereby enables Google to generate millions of dollars selling advertising space.
The news headlines, story details, and so forth are the valuable commodity that brings Google its traffic. Without that content, Google's visitors would only see a list of links to online news sites, and the Google News site's value/usefulness would be greatly reduced. *THAT* is what's at the root of this issue - and Google is clearly in the wrong. Do you think those newspaper articles (headlines and all) just write themselves? Intellectual property is intellectual property, and the news sites have spend time and effort in preparing and presenting that content ON THEIR WEBSITES. It *IS* copyrighted material, just as any published paperback sold in Barnes & Noble, and it *IS* protected by copyright law.
If Google wishes to publish this material to their website, they must seek permission and pay any royalties which the original authors/publishers may request, just as Google would have to do if it decided it was going to post a new chapter of the latest "Harry Potter" novel on its website every day.
[ link to this | view in thread ]
What's the actual original complaint about?
I doubt that the issue is the 1.5 lines of context that the search term was in. I doubt it was being searched. With the "portal" like Google, I can have headlines all over it, but I assume it wasn't related to that.
It might have because of Google's cache -- where the person's content is now hosted on Google's servers. If this is the case, the newspapers could have a case. Google is [mis]appropriating their content. If they put a copyright on the page, Google is violating their copyright.
I know my local newspaper keeps the news online for 2 weeks and then charges to viewing back issues. If Google is going to cache the pages and then allow people to view their copyrighted material for free, I can see where it might cost the local paper a revenue stream.
[ link to this | view in thread ]
Re: Google is dead in the wrong
This is false. Google does not show the entire article. It shows the headline and (sometimes) a small snippet which is clearly fair use, along with a link. So, your premise is wrong.
The fact of the matter is that *Google's value* is increased by all of the content that has been created by many other people.
Whether or not it increases Google's value is besides the point. The issue is whether or not the newspaper sites are harmed by the practice. If they are not, they have no complaint. The fact that Google's value is increased shouldn't make a difference.
It creates value for Google because people are able to visit Google and see ALL of those headlines and news snippets and associated photographs (yes, Google even steals the photos that appear with the news headlines from those newspapers' sites) in one place - the Google News website.
Note that the purpose of Google News is to drive people who visit it to those individual sites that originally published the news. So, it benefits them by giving them more traffic they might not have received otherwise.
This in turn drives increased traffic to *Google's* news site - and thereby enables Google to generate millions of dollars selling advertising space.
Again, this is false. Google places no ads on their news site.
You seem to be under some false assumptions here. Google does not display full articles, but simply links and drives traffic. Also, Google does not put ads on its news site. If you don't know those two basic facts, it's hard to take the rest of your complaint seriously.
Do you think those newspaper articles (headlines and all) just write themselves?
Do you think that people magically find newspaper articles online by themselves? No, they need to generate traffic... which is exactly what Google News does.
It *IS* copyrighted material, just as any published paperback sold in Barnes & Noble, and it *IS* protected by copyright law.
And copyright law *DOES* have something called "fair use" which is what Google uses in showing snippets. Look it up.
[ link to this | view in thread ]
Re: What's the actual original complaint about?
The text of the decision is on that Google site, and we wrote about the original case when it came out (that links to the actual court order as well). That has the details, which shows the court was quite confused.
It might have because of Google's cache -- where the person's content is now hosted on Google's servers. If this is the case, the newspapers could have a case. Google is [mis]appropriating their content. If they put a copyright on the page, Google is violating their copyright.
If you read the decision, you see that the judge (and possibly the newspapers) were very confused. They continually switch back and forth between Google cache and Google News interchangeably, using each when it suits. That's just part of the problem with the case. It seems clear that the judge and the newspapers aren't even clear what the complaint is about.
I know my local newspaper keeps the news online for 2 weeks and then charges to viewing back issues. If Google is going to cache the pages and then allow people to view their copyrighted material for free, I can see where it might cost the local paper a revenue stream.
Again, there are very easy ways to opt out of the cache, so it's hardly a reasonable complaint to then ban all French and German papers from Google.be.
[ link to this | view in thread ]
Re: Re: What's the actual original complaint about
Find that the activities of Google News and the use of the "Google cached violate in particular the laws on copyright and ancillary rights (1994) and the law on data bases (1998);
Poking around Google (less than 5 minutes worth; 3 or 4 clicks from the main page), you can find: The "Cached" link will be missing for sites that have not been indexed, as well as for sites whose owners have requested we not cache their content. which indicates that Google will not cache your site if you request ... but it wasn't worth more than 5 minutes trying to figure out how to avoid being cached.
[ link to this | view in thread ]
Re: Re: Re: What's the actual original complaint a
(NOTE I don't agree with this statement - this is just what the order states)
Like you say the case jumps from Google news to Google search as it suits the lawyers so even then it’s not easy to follow
The order doesn't actually say ALL German and French speaking papers - just those that are represented by the plaintiff
(Unfortunately I couldn't find exactly which papers these are)
Oddly enough German isn't a national language in Belgium - French and Flemish are. Flemish is basically the Belgian dialect of Dutch and not at all the same as German
[ link to this | view in thread ]
Google does deprive newspapers of revenue
[ link to this | view in thread ]