Search Engines Should Ignore Bossy Publishers

from the disallow dept

Thu, Dec 6th 2007 9:55am — Timothy Lee

James Grimmelman has an in depth look a ACAP, the new "standard" for website access control that we discussed last Friday. I put "standard" in scare quotes because, as Grimmelman points out, the specs clearly weren't written by people with any experience in writing technical standards. While a well-written standard will very precisely specify which behaviors are required, which are prohibited, and under what circumstances, the ACAP spec is full of vague directives and confusing terminology. Some parts of the standard are apparently designed to "only be interpreted by prior arrangement." Also, despite the "1.0" branding, the latest version of the specification has several sections that are labeled "not yet fully ready for implementation." It is, in short, a big mess.

Of course, this shouldn't surprise us, because it's not really a technical standard at all. Robots.txt works just fine for almost everyone, and search engines aren't clamoring to replace it. Rather, some publishers are using the trappings of a technical standard to try to micromanage the uses to which search engines put their content, and they're laying the groundwork for lawsuits if search engines fail to heed the demands embedded in ACAP files. Not only are the rules vague and confused, but the "standard" also helpfully notes that the rules "may change or be withdrawn without notice." In other words, a search engine that committed to complying with ACAP directives would be setting itself up to have their search engine's functionality micro-managed by the publishers who control the ACAP specifications.

Luckily, as Mike pointed out on Friday, search engines have the upper hand here. So here's my suggestion for search engines: instead of trying to comply with every nitpicky detail of the ACAP standard, just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file. Websites would discover pretty quickly that posting ACAP directives on their sites just caused their content to disappear from search engines. As much as they might bluster about other search engines "stealing" their content, the reality is that they can't afford to give up the traffic that search engines send their way. If search engines simply refused to include ACAP-restricted pages in their index, publishers would quickly realize that those old robots.txt files aren't so bad after all.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: publishers, robots.txt, search engines
Companies: associated press, google, microsoft, yahoo

4 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Steve R. (profile), 6 Dec 2007 @ 10:26am

Publishers may not actually own the content.
James Grimmelman wrote in his report "All in all, it’s an interesting start. I’m concerned that the publishers will soon argue that failure to respect every last detail expressed in an ACAP file will constitute automatic copyright infringement, breach of contract, trespass to computer systems, a violation of the Computer Fraud and Abuse Act (and related state statutes), trespass vi et armis, highway robbery, land-piracy, misappropriation, alienation of affection, and/or manslaughter."

I reiterate, that these DRM schemes to control access to content fail to consider the fact that the content may not even be owned by the content distributer. Further, if the content is not owned by the distributer and this is discovered, there appears to be no mechanism for this DRM technology to be disabled.

Basically, we are devolving into an economic/legal system were a content distributer can assert ownership without proof and can take adverse action against a so-called "infringer" without due process.
[ link to this | view in thread ]
Chronno S. Trigger, 6 Dec 2007 @ 10:28am

Robot.txt
Just looked up how a robots.txt file worked (never had to use one before). It seems pretty adaptable already. You can tell specific search engines what they can't look at down to a specific file. Why do they need to create a new one? At most I'd say add an allow function to say that these search engines are allowed despite the disallow *.

Kept searching and found that Google had indexed the NY times robots.txt file.

"just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file."

That's exactly what I would do.
[ link to this | view in thread ]
Bob C, 6 Dec 2007 @ 11:10am

Yawn...
I've been saying the search engines should ignore bossy publishers since Techdirt started mentioning the issue. You'll find comments to that effect after many of your articles.

Glad to see you've finally come around to suggesting that yourself.
[ link to this | view in thread ]
BTR1701, 6 Dec 2007 @ 1:05pm

ACAP
> If search engines simply refused to include
> ACAP-restricted pages in their index, publishers
> would quickly realize that those old robots.txt
> files aren't so bad after all.

No, they'd just go pay... err "convince" Congress to pass a law requiring search engines to use ACAP data and/or that treating it as a "disallow" is an unlawful restraint of trade or somesuch nonsense.
[ link to this | view in thread ]