Search Engines Should Ignore Bossy Publishers
from the disallow dept
James Grimmelman has an in depth look a ACAP, the new "standard" for website access control that we discussed last Friday. I put "standard" in scare quotes because, as Grimmelman points out, the specs clearly weren't written by people with any experience in writing technical standards. While a well-written standard will very precisely specify which behaviors are required, which are prohibited, and under what circumstances, the ACAP spec is full of vague directives and confusing terminology. Some parts of the standard are apparently designed to "only be interpreted by prior arrangement." Also, despite the "1.0" branding, the latest version of the specification has several sections that are labeled "not yet fully ready for implementation." It is, in short, a big mess.Of course, this shouldn't surprise us, because it's not really a technical standard at all. Robots.txt works just fine for almost everyone, and search engines aren't clamoring to replace it. Rather, some publishers are using the trappings of a technical standard to try to micromanage the uses to which search engines put their content, and they're laying the groundwork for lawsuits if search engines fail to heed the demands embedded in ACAP files. Not only are the rules vague and confused, but the "standard" also helpfully notes that the rules "may change or be withdrawn without notice." In other words, a search engine that committed to complying with ACAP directives would be setting itself up to have their search engine's functionality micro-managed by the publishers who control the ACAP specifications.
Luckily, as Mike pointed out on Friday, search engines have the upper hand here. So here's my suggestion for search engines: instead of trying to comply with every nitpicky detail of the ACAP standard, just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file. Websites would discover pretty quickly that posting ACAP directives on their sites just caused their content to disappear from search engines. As much as they might bluster about other search engines "stealing" their content, the reality is that they can't afford to give up the traffic that search engines send their way. If search engines simply refused to include ACAP-restricted pages in their index, publishers would quickly realize that those old robots.txt files aren't so bad after all.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: publishers, robots.txt, search engines
Companies: associated press, google, microsoft, yahoo
Reader Comments
Subscribe: RSS
View by: Time | Thread
Publishers may not actually own the content.
I reiterate, that these DRM schemes to control access to content fail to consider the fact that the content may not even be owned by the content distributer. Further, if the content is not owned by the distributer and this is discovered, there appears to be no mechanism for this DRM technology to be disabled.
Basically, we are devolving into an economic/legal system were a content distributer can assert ownership without proof and can take adverse action against a so-called "infringer" without due process.
[ link to this | view in chronology ]
Robot.txt
Kept searching and found that Google had indexed the NY times robots.txt file.
"just announce that every line of an ACAP file will be interpreted as the equivalent of a "Disallow" line in a robots.txt file."
That's exactly what I would do.
[ link to this | view in chronology ]
Yawn...
Glad to see you've finally come around to suggesting that yourself.
[ link to this | view in chronology ]
ACAP
> ACAP-restricted pages in their index, publishers
> would quickly realize that those old robots.txt
> files aren't so bad after all.
No, they'd just go pay... err "convince" Congress to pass a law requiring search engines to use ACAP data and/or that treating it as a "disallow" is an unlawful restraint of trade or somesuch nonsense.
[ link to this | view in chronology ]