Google Report: 99.95 Percent Of DMCA Takedown Notices Are Bot-Generated Bullshit Buckshot
from the overplaying-their-hand dept
Google, being the search giant that it is, has been banging the drum for some time about the silly way the DMCA has been abused by those that wield it like a cudgel. Here at Techdirt, we too have described the many ways that the well-intentioned DMCA and the way its implemented by service providers has deviated from its intended purpose. Still, the vast majority of our stories discuss deliberate attempts by human beings to silence critics and competition using the takedown process. Google, on the other hand, has been far more focused on statistics for DMCA takedown notices that show wanton disregard for what it was supposed to be used for entirely. That makes sense of course, as the abuse of the takedown process is a burden on the search company. In that first link, for instance, Google noted that more than half the takedown notices it was receiving in 2009 were mere attempts by one business targeting a competitor, while over a third of the notices contained nothing in the way of a valid copyright dispute.
But if those numbers were striking in 2009, Google's latest comment to the Copyright Office (see our own comment here) on what's happening in the DMCA 512 notice-and-takedown world shows some stats for takedown notices received through its Trusted Copyright Removal Program... and makes the whole ordeal look completely silly.
A significant portion of the recent increases in DMCA submission volumes for Google Search stem from notices that appear to be duplicative, unnecessary, or mistaken. As we explained at the San Francisco Roundtable, a substantial number of takedown requests submitted to Google are for URLs that have never been in our search index, and therefore could never have appeared in our search results. For example, in January 2017, the most prolific submitter submitted notices that Google honored for 16,457,433 URLs. But on further inspection, 16,450,129 (99.97%) of those URLs were not in our search index in the first place. Nor is this problem limited to one submitter: in total, 99.95% of all URLs processed from our Trusted Copyright Removal Program in January 2017 were not in our index.
Now, because Google is Google, the company doesn't generally have a great deal of sympathy hoisted upon it by the public, never mind by copyright protectionists. But, come on, this is simply nuts. When the number of claims coming through the system that don't even pertain to listed results by Google can be logically rounded up to 100%, that's putting a burden on a company for no valid reason whatsoever. Even if you hate Google, or distrust it, it should be plain as day that it's unfair for it to have to wade through all this muck just to appease the entertainment industries.
And, it's important to note that this isn't all of the notices received, but just those coming through the Trusted Copyright Removal system -- meaning that these are organizations that supposedly are supposed to have at least some credibility not to be submitting totally bogus notices. But, apparently, they don't actually give a damn.
The problem, as you may have already guessed, is that most of these claims are being generated through automated systems designed to shotgun-blast DMCA notices with reckless abandon.
These numbers of simply staggering with only a tiny number of millions of requests reflecting actual pages in the search index. Rather, 99.95% of the processed URLs from Google’s trusted submitter program are machine-generated URLs that do not involve actual pages in the search index. Given that data, Google notes that claims that the large number of requests correlates to infringing content on the Internet is incorrect:
Nor is the large number of takedown requests to Google a good proxy even for the volume of infringing material available on the Internet. Many of these submissions appear to be generated by merely scrambling the words in a search query and appending that to a URL, so that each query makes a different URL that nonetheless leads to the same page of results.
The claim by the entertainment industry that one can see what a problem piracy is by looking at the sheer volume of DMCA notices sent to search engines shall hereby be declared dead, having been buried by the industry's fellow takedown-notice-filers. That claim never made much sense, but these stats sever any link between takedown notice numbers and actual piracy completely. And there needs to be a remedy for this, whether its punishment upon the abusers or rules for how notices can be filed. Because these numbers are ridiculous.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: censorship, copyright, copyright office, dmca, dmca 512, false takedowns, free speech, notice and takedown, takedowns
Companies: google
Reader Comments
The First Word
“Re: Force the problem back on the DMCA filers
Here's the solution to the bogus DMCA problem.File a bogus claim, the copyright for the "claimed" media is immediately placed into public domain, if the claimant actually owned the copyright in the first place. Once placed into public domain, nobody can ever claim copyright on the content ever again.
If the claimant did not own the copyright, then the claimant owes 10 times what a copyright violation costs for each and every one of the invalid claims. A nice little 1.5 million per violation, that should dry up the "bogus" DMCA spammers.
Oh, now there's a thought.
Since the submissions are being placed over the internet, wouldn't that make it "wire-fraud"?
And if the entire RIAA/MPAA groups are doing this, then they can *ALL* be charged for each and every bogus claim (wire-fraud), where each invalid claim is a *count* of the wire-fraud.
Let's get us some MPAA/RIAA executives to sit in prison for hundreds if not thousands of lifetimes because of the egregiously large number of false claims.
I think that would also suffice for malicious intent behind the false claims.
Subscribe: RSS
View by: Time | Thread
(I suppose Bot-Built Bullshit Buckshot was too much of a stretch.)
[ link to this | view in chronology ]
Those numbers are not ridiculous, they are political propaganda, and the politicians will just look at the large numbers, and not bother to verify that they are based of facts.
[ link to this | view in chronology ]
99.5 or 99.95?
[ link to this | view in chronology ]
Re: 99.5 or 99.95?
Tim is a writer, not a mathematician. I've updated the title and will send him to remedial math class.
[ link to this | view in chronology ]
Re: Re: 99.5 or 99.95?
Probably more effective to just fix my fingers to get them to press the keys I intended to, but it's certainly true that I don't math good...
[ link to this | view in chronology ]
Re: Re: Re: 99.5 or 99.95?
[ link to this | view in chronology ]
Nothing Surprising
[ link to this | view in chronology ]
Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
Means ZERO if Google hasn't indexed it. If a pirate site puts "norobots.txt" up, then presumbably Google doesn't index it! To claim that this proves anything is the "damn lies" part of the above phrase.
Nonetheless, we all know that piracy is going on.
[ link to this | view in chronology ]
Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
There's absolutely *NO* pirating going on here.
Pirating requires use of Ships at sea, helmed by bearded peg-legged Somalians with slingshots.
File-sharing is *NOT* nor has it ever been, pirating.
People who file-share are the same folks that recorded music off the radio, converted vinyl to 8-track or cassette.
They recorded movies off of HBO and handed them out to friends and family that didn't have HBO.
They bought DVDs or Blu-Rays and then ripped them to recordable media so that their kids wouldn't destroy the originals.
They opted to watch a rip or cam-cord version of a movie to see if it met the hype before spending a 4th of a day's wages for a trip to the movie theater.
In any case, file-sharing has caused exactly $0.00 loss for the entire Movie and Music industry globally.
In most cases, those industry's profits would actually be smaller if it weren't for the file-sharing exposing people to content they wouldn't otherwise see.
Thanks for playing "What lies, damned lies and double-damned lies, will the Jurassic media content industry say next?"
[ link to this | view in chronology ]
Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
Whether the site of interest manually excludes themselves from Google's indexing is irrelevant... the supposedly trustworthy requesting parties are overwhelmingly flooding Google with invalid takedown requests. Is Google supposed to de-list URLs that they never listed in the first place?
[ link to this | view in chronology ]
Re: Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
[ link to this | view in chronology ]
Re: Re: Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
Google fails to comply with at least 99.95% of takedown requests, because it doesn't act to delist them after receiving the notice.
[ link to this | view in chronology ]
Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
[ link to this | view in chronology ]
Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
I agree: DMCA takedown notices are as effective as tiger repellant.
[ link to this | view in chronology ]
Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
Means ZERO if Google hasn't indexed it.
Hence, sending Google takedown notices...
Also, you do realize these "people" send takedowns on URLs that have never actually existed, right?
[ link to this | view in chronology ]
Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!
Not that it matters. It's the same out_of_the_blue bullshit. Did you finally decide to take a whiff of oxygen after having your lips surgically attached to Cary Sherman's phallus?
[ link to this | view in chronology ]
Force the problem back on the DMCA filers
Google could undoubtedly produce millions of bogus requests that could fill hundreds of boxes on the docket of a court challenge. If the other side or the court would object to how burdensome this is, then Google could ask one to consider that this is just a sampling, and imagine how burdensome it is for Google. It is objectively unreasonable that Google could have infinite resources and infinite processing speeds for increasing bogus DMCA requests.
The court needs to set a precedent. The legislators need to fix the broken DMCA to impose a statutory penalty for every bogus DMCA. And the "legitimate" DMCA filers, if there even is such a thing, need to get behind this, since it is in their interest for Google to be able to process these hypothetical "legitimate" DMCA takedowns.
[ link to this | view in chronology ]
Re: Force the problem back on the DMCA filers
File a bogus claim, the copyright for the "claimed" media is immediately placed into public domain, if the claimant actually owned the copyright in the first place. Once placed into public domain, nobody can ever claim copyright on the content ever again.
If the claimant did not own the copyright, then the claimant owes 10 times what a copyright violation costs for each and every one of the invalid claims. A nice little 1.5 million per violation, that should dry up the "bogus" DMCA spammers.
Oh, now there's a thought.
Since the submissions are being placed over the internet, wouldn't that make it "wire-fraud"?
And if the entire RIAA/MPAA groups are doing this, then they can *ALL* be charged for each and every bogus claim (wire-fraud), where each invalid claim is a *count* of the wire-fraud.
Let's get us some MPAA/RIAA executives to sit in prison for hundreds if not thousands of lifetimes because of the egregiously large number of false claims.
I think that would also suffice for malicious intent behind the false claims.
[ link to this | view in chronology ]
Re: Re: Force the problem back on the DMCA filers
[ link to this | view in chronology ]
What's missing from the Td article is how much of these bogus claims go against users, doing nothing more than providing videos (mostly under Fair Use).
Even if this assessment conflates two separate issues, the reality is bogus DMCA takedowns affect everyone, at some point.
In 1990, when I was first introduce to the ramifications of copyright and software, I tried my best to voice my opposition at the vague threats issued by the entertainment industry, but could do nothing but what the knee-jerk protectionism of our government pass a bad bill into law.
I now fully understand why it's called "Checks" and balances.
[ link to this | view in chronology ]
Re:
This is a good point - it'd be an interesting ancillary statistic to see how much of the 0.05% of valid URLs are actually anywhere close to valid copyright claims. I'm guessing that, even if you left in anything that even might be valid if you tilt your head and squint really hard, you'd struggle to rise above 0.025% valid notices.
[ link to this | view in chronology ]
Obviously Sarcastic
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
search engine
[ link to this | view in chronology ]
Re: search engine
[ link to this | view in chronology ]
Re: search engine
Unfortunately I don't have the luxury of using a search engine that doesn't work, so... back to Google.
[ link to this | view in chronology ]
Re: Re: search engine
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
https://www.popehat.com/2016/06/14/lawsplainer-its-not-rico-dammit/
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
[ link to this | view in chronology ]
In other words
[ link to this | view in chronology ]
How is this legal?
I mean seriously, the DMCA process is supposed to have protections to prevent fraudulent claims, right? And iirc you do have to show it was willfull, right? How is an almost 100% rate on literally millions of notices being unapplicable anything BUT willful?
[ link to this | view in chronology ]
Re: How is this legal?
Because those 'protections' have more holes than a target at a gun range hosting a 'Free bullets' day, to the point that it is effectively impossible for them to trigger, barring the accused literally admitting in court that they knew that they were filing a bogus DMCA claim and did it anyway, and even then I wouldn't put good odds on their being punished to any real extent.
The fact that the law theoretically requires a statement made under perjury, and bots, which cannot do so are allowed to send DMCA claims should be all the demonstration you need to show how pathetic the 'protections to prevent fraudulent claims' are.
The law was meant from the get-go to be entirely one-sided, it's 'legal' because it's working as intended.
[ link to this | view in chronology ]
an ex-claim
that ceased to be
that kicked its bucket
that met ist maker
:-) Cheers
[ link to this | view in chronology ]
[ link to this | view in chronology ]
The major problem is that Google are deluged with DMCA Notices every single day and have to cut corners (usually via automation as well). This is then exacerbated by the fact that many anti-piracy agencies work for hundreds of clients (they wouldn't be profitable otherwise) and thus have to rely on automated bots that dynamically generate URLs.
As long as the site is already known as a pirate site/on a blacklist, it's automatically approved for deletion from Google's search index. Ultimately, this means that a DMCA Notice could be erroneous and yet the content is still removed. This happens very frequently.
James Brandes - Digital Copyright Consultancy
[ link to this | view in chronology ]
Yeah, that's pretty much all it is
[ link to this | view in chronology ]
just tell me what anges of ip i need to block
hello
I own one of those bad sites !!
and dmca bots arre driving me crazy
do you know what is dmca bot?
amazon/google/... bot?
what I need to limit to get less dmca complains?
of course I will put this as freelancer project but maybe you guys have any idea
I have a site which posts every single scene release plus all the Important encodes from ipt bots to rm bots
so
I post 2 to 2.5 K/day different files
to 5 different hosts
You can guess amount of dmca reports I get/day?!
[ link to this | view in chronology ]