Google Report: 99.95 Percent Of DMCA Takedown Notices Are Bot-Generated Bullshit Buckshot

from the overplaying-their-hand dept

Google, being the search giant that it is, has been banging the drum for some time about the silly way the DMCA has been abused by those that wield it like a cudgel. Here at Techdirt, we too have described the many ways that the well-intentioned DMCA and the way its implemented by service providers has deviated from its intended purpose. Still, the vast majority of our stories discuss deliberate attempts by human beings to silence critics and competition using the takedown process. Google, on the other hand, has been far more focused on statistics for DMCA takedown notices that show wanton disregard for what it was supposed to be used for entirely. That makes sense of course, as the abuse of the takedown process is a burden on the search company. In that first link, for instance, Google noted that more than half the takedown notices it was receiving in 2009 were mere attempts by one business targeting a competitor, while over a third of the notices contained nothing in the way of a valid copyright dispute.

But if those numbers were striking in 2009, Google's latest comment to the Copyright Office (see our own comment here) on what's happening in the DMCA 512 notice-and-takedown world shows some stats for takedown notices received through its Trusted Copyright Removal Program... and makes the whole ordeal look completely silly.

A significant portion of the recent increases in DMCA submission volumes for Google Search stem from notices that appear to be duplicative, unnecessary, or mistaken. As we explained at the San Francisco Roundtable, a substantial number of takedown requests submitted to Google are for URLs that have never been in our search index, and therefore could never have appeared in our search results. For example, in January 2017, the most prolific submitter submitted notices that Google honored for 16,457,433 URLs. But on further inspection, 16,450,129 (99.97%) of those URLs were not in our search index in the first place. Nor is this problem limited to one submitter: in total, 99.95% of all URLs processed from our Trusted Copyright Removal Program in January 2017 were not in our index.

Now, because Google is Google, the company doesn't generally have a great deal of sympathy hoisted upon it by the public, never mind by copyright protectionists. But, come on, this is simply nuts. When the number of claims coming through the system that don't even pertain to listed results by Google can be logically rounded up to 100%, that's putting a burden on a company for no valid reason whatsoever. Even if you hate Google, or distrust it, it should be plain as day that it's unfair for it to have to wade through all this muck just to appease the entertainment industries.

And, it's important to note that this isn't all of the notices received, but just those coming through the Trusted Copyright Removal system -- meaning that these are organizations that supposedly are supposed to have at least some credibility not to be submitting totally bogus notices. But, apparently, they don't actually give a damn.

The problem, as you may have already guessed, is that most of these claims are being generated through automated systems designed to shotgun-blast DMCA notices with reckless abandon.

These numbers of simply staggering with only a tiny number of millions of requests reflecting actual pages in the search index. Rather, 99.95% of the processed URLs from Google’s trusted submitter program are machine-generated URLs that do not involve actual pages in the search index. Given that data, Google notes that claims that the large number of requests correlates to infringing content on the Internet is incorrect:

Nor is the large number of takedown requests to Google a good proxy even for the volume of infringing material available on the Internet. Many of these submissions appear to be generated by merely scrambling the words in a search query and appending that to a URL, so that each query makes a different URL that nonetheless leads to the same page of results.

The claim by the entertainment industry that one can see what a problem piracy is by looking at the sheer volume of DMCA notices sent to search engines shall hereby be declared dead, having been buried by the industry's fellow takedown-notice-filers. That claim never made much sense, but these stats sever any link between takedown notice numbers and actual piracy completely. And there needs to be a remedy for this, whether its punishment upon the abusers or rules for how notices can be filed. Because these numbers are ridiculous.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: censorship, copyright, copyright office, dmca, dmca 512, false takedowns, free speech, notice and takedown, takedowns
Companies: google


Reader Comments

Subscribe: RSS

View by: Time | Thread


  • icon
    sorrykb (profile), 23 Feb 2017 @ 10:52am

    Nice alliteration. :-)

    (I suppose Bot-Built Bullshit Buckshot was too much of a stretch.)

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 11:00am

    Because these numbers are ridiculous.

    Those numbers are not ridiculous, they are political propaganda, and the politicians will just look at the large numbers, and not bother to verify that they are based of facts.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 11:13am

    99.5 or 99.95?

    The headline does not match the content.

    link to this | view in chronology ]

    • icon
      Mike Masnick (profile), 23 Feb 2017 @ 11:28am

      Re: 99.5 or 99.95?

      The headline does not match the content.

      Tim is a writer, not a mathematician. I've updated the title and will send him to remedial math class.

      link to this | view in chronology ]

      • icon
        Dark Helmet (profile), 23 Feb 2017 @ 12:22pm

        Re: Re: 99.5 or 99.95?

        "Tim is a writer, not a mathematician. I've updated the title and will send him to remedial math class."

        Probably more effective to just fix my fingers to get them to press the keys I intended to, but it's certainly true that I don't math good...

        link to this | view in chronology ]

  • icon
    Steve R. (profile), 23 Feb 2017 @ 11:15am

    Nothing Surprising

    These numbers are an expected and obvious result when companies can "bully" other companies and private parties without the fear of any retribution.

    link to this | view in chronology ]

  • This comment has been flagged by the community. Click here to show it
    identicon
    Anonymous Coward, 23 Feb 2017 @ 11:19am

    Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

    This is the STATISTICS in "lies, damn lies, and statistics", on a level with Tiger Repellent that's 100% effective because haven't seen any tigers since using it.

    Means ZERO if Google hasn't indexed it. If a pirate site puts "norobots.txt" up, then presumbably Google doesn't index it! To claim that this proves anything is the "damn lies" part of the above phrase.

    Nonetheless, we all know that piracy is going on.

    link to this | view in chronology ]

    • identicon
      FesteringPussPocket, 23 Feb 2017 @ 11:28am

      Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

      Uh... hold up.

      There's absolutely *NO* pirating going on here.

      Pirating requires use of Ships at sea, helmed by bearded peg-legged Somalians with slingshots.

      File-sharing is *NOT* nor has it ever been, pirating.

      People who file-share are the same folks that recorded music off the radio, converted vinyl to 8-track or cassette.
      They recorded movies off of HBO and handed them out to friends and family that didn't have HBO.
      They bought DVDs or Blu-Rays and then ripped them to recordable media so that their kids wouldn't destroy the originals.
      They opted to watch a rip or cam-cord version of a movie to see if it met the hype before spending a 4th of a day's wages for a trip to the movie theater.

      In any case, file-sharing has caused exactly $0.00 loss for the entire Movie and Music industry globally.

      In most cases, those industry's profits would actually be smaller if it weren't for the file-sharing exposing people to content they wouldn't otherwise see.

      Thanks for playing "What lies, damned lies and double-damned lies, will the Jurassic media content industry say next?"

      link to this | view in chronology ]

    • identicon
      Jason, 23 Feb 2017 @ 11:28am

      Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

      Are you sure you read the whole article? The entire point was that "99.95% of all URLs" that the "trusted" submitter sent in for removal from Google's search results were not in their index, and therefore "could never have appeared" in search results in the first place.

      Whether the site of interest manually excludes themselves from Google's indexing is irrelevant... the supposedly trustworthy requesting parties are overwhelmingly flooding Google with invalid takedown requests. Is Google supposed to de-list URLs that they never listed in the first place?

      link to this | view in chronology ]

      • identicon
        Anonymous Coward, 23 Feb 2017 @ 12:56pm

        Re: Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

        So Google complies with at least 99.95% of notices it receives? It doesn't list the sites it's told to delist, after all...

        link to this | view in chronology ]

        • identicon
          Anonymous Coward, 23 Feb 2017 @ 2:23pm

          Re: Re: Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

          Come on, now, that's not a very RIAA way of expressing it.

          Google fails to comply with at least 99.95% of takedown requests, because it doesn't act to delist them after receiving the notice.

          link to this | view in chronology ]

    • icon
      Chris-Mouse (profile), 23 Feb 2017 @ 11:37am

      Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

      If the site has a robots.txt file that keeps out Google, then the URL will never appear in google's results. So why insist that Google has to take down something it does not, and never will have?

      link to this | view in chronology ]

    • identicon
      Thad, 23 Feb 2017 @ 11:38am

      Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

      This is the STATISTICS in "lies, damn lies, and statistics", on a level with Tiger Repellent that's 100% effective because haven't seen any tigers since using it.

      I agree: DMCA takedown notices are as effective as tiger repellant.

      link to this | view in chronology ]

    • icon
      orbitalinsertion (profile), 23 Feb 2017 @ 12:54pm

      Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

      Means ZERO if Google hasn't indexed it.

      Hence, sending Google takedown notices...

      Also, you do realize these "people" send takedowns on URLs that have never actually existed, right?

      link to this | view in chronology ]

    • This comment has been flagged by the community. Click here to show it
      identicon
      Anonymous Coward, 23 Feb 2017 @ 6:09pm

      Re: Does Google even index Pirate Bay and other known pirate sites? Should be able to raise their stats!

      You forgot your "Football" pseudonym, jackass.

      Not that it matters. It's the same out_of_the_blue bullshit. Did you finally decide to take a whiff of oxygen after having your lips surgically attached to Cary Sherman's phallus?

      link to this | view in chronology ]

  • icon
    DannyB (profile), 23 Feb 2017 @ 11:25am

    Force the problem back on the DMCA filers

    If processing of DMCA gets slower and slower, how can they possibly complain?

    Google could undoubtedly produce millions of bogus requests that could fill hundreds of boxes on the docket of a court challenge. If the other side or the court would object to how burdensome this is, then Google could ask one to consider that this is just a sampling, and imagine how burdensome it is for Google. It is objectively unreasonable that Google could have infinite resources and infinite processing speeds for increasing bogus DMCA requests.

    The court needs to set a precedent. The legislators need to fix the broken DMCA to impose a statutory penalty for every bogus DMCA. And the "legitimate" DMCA filers, if there even is such a thing, need to get behind this, since it is in their interest for Google to be able to process these hypothetical "legitimate" DMCA takedowns.

    link to this | view in chronology ]

    • identicon
      FesteringPussPocket, 23 Feb 2017 @ 11:36am

      Re: Force the problem back on the DMCA filers

      Here's the solution to the bogus DMCA problem.
      File a bogus claim, the copyright for the "claimed" media is immediately placed into public domain, if the claimant actually owned the copyright in the first place. Once placed into public domain, nobody can ever claim copyright on the content ever again.

      If the claimant did not own the copyright, then the claimant owes 10 times what a copyright violation costs for each and every one of the invalid claims. A nice little 1.5 million per violation, that should dry up the "bogus" DMCA spammers.

      Oh, now there's a thought.
      Since the submissions are being placed over the internet, wouldn't that make it "wire-fraud"?
      And if the entire RIAA/MPAA groups are doing this, then they can *ALL* be charged for each and every bogus claim (wire-fraud), where each invalid claim is a *count* of the wire-fraud.

      Let's get us some MPAA/RIAA executives to sit in prison for hundreds if not thousands of lifetimes because of the egregiously large number of false claims.

      I think that would also suffice for malicious intent behind the false claims.

      link to this | view in chronology ]

      • icon
        Steve R. (profile), 23 Feb 2017 @ 12:11pm

        Re: Re: Force the problem back on the DMCA filers

        A better approach, restore copyright to its original intent and eliminate the automatic unsupervised ability of a private company to have quasi-judicial powers to force the removal of content without any judicial review.

        link to this | view in chronology ]

  • icon
    Jinxed (profile), 23 Feb 2017 @ 11:28am

    I may be conflating two separate issues in my post, but this troubling more for Google, but to those who use Google services who have been booted for "copyright infringement".

    What's missing from the Td article is how much of these bogus claims go against users, doing nothing more than providing videos (mostly under Fair Use).

    Even if this assessment conflates two separate issues, the reality is bogus DMCA takedowns affect everyone, at some point.

    In 1990, when I was first introduce to the ramifications of copyright and software, I tried my best to voice my opposition at the vague threats issued by the entertainment industry, but could do nothing but what the knee-jerk protectionism of our government pass a bad bill into law.

    I now fully understand why it's called "Checks" and balances.

    link to this | view in chronology ]

    • icon
      Not an Electronic Rodent (profile), 24 Feb 2017 @ 2:34am

      Re:

      What's missing from the Td article is how much of these bogus claims go against users, doing nothing more than providing videos (mostly under Fair Use).

      This is a good point - it'd be an interesting ancillary statistic to see how much of the 0.05% of valid URLs are actually anywhere close to valid copyright claims. I'm guessing that, even if you left in anything that even might be valid if you tilt your head and squint really hard, you'd struggle to rise above 0.025% valid notices.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 11:31am

    Obviously Sarcastic

    Wow. Just simply wow. Look at all those *anomalies*! Thank the AA's they definitely don't show a symptom of anything wrong!

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 11:57am

    I finally brought proof that Mike is a google shill. Unfortunately it was DMCA'd by the RIAA.

    link to this | view in chronology ]

    • identicon
      Anonymous Coward, 23 Feb 2017 @ 12:14pm

      Re:

      Well if their DMCA claim was in the 0.05% of requests that weren't malformed, and in the small fraction of the fraction that are valid (Rhe RIAA has copyright), then that would suggest the "proof" was authored by the RIAA, which negates any claimed validity.

      link to this | view in chronology ]

  • This comment has been flagged by the community. Click here to show it
    identicon
    Steve Carr, 23 Feb 2017 @ 12:26pm

    search engine

    Freedom of speech and freedom of the internet, that net neutral was a way for the government to get there greedy hands on the internet. Stop the Government from spying on everybody. Use the search engine that does not change its results for political reasons and respects your privacy, just good old fashion results that are not tracked. Lookseek.com Have a great day

    link to this | view in chronology ]

    • icon
      orbitalinsertion (profile), 23 Feb 2017 @ 12:57pm

      Re: search engine

      double facepalm

      link to this | view in chronology ]

    • identicon
      Anonymous Coward, 23 Feb 2017 @ 12:57pm

      Re: search engine

      Thanks for the heads up, I just tried it! Searched for a business in the area that I needed info on. Lookseek had no results. Same search in Google pointed me to their manta, yellowpages, whitepages, yelp, buzzfile, a local business listing page, and enough other relevant pages to fill the first results screen.

      Unfortunately I don't have the luxury of using a search engine that doesn't work, so... back to Google.

      link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 12:52pm

    Why isn't Google requiring CAPTCHA on the notices? They ARE supposed to be filed by humans after all, no?

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 1:53pm

    I wonder if I can pirate one of the programs that sends these notices.

    link to this | view in chronology ]

  • icon
    Peter (profile), 23 Feb 2017 @ 2:32pm

    How come Dotcom is charged under the RICO-act, and the networks of copyright holders bending the rules and abusing the system are not?

    link to this | view in chronology ]

  • icon
    Advocate (profile), 23 Feb 2017 @ 3:27pm

    We do not need copyright any longer. Abolish copyright.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 3:41pm

    And 99.95% of what the MAFIAA says is bullshit.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 4:27pm

    In other words

    The claimants lied. Saying that you are listing a site that you never listed and you now have to take it down on a legally binding document is perjury. Google should automate its legal processes and create a new case against the rights holder who committed perjury each time.

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 23 Feb 2017 @ 8:36pm

    How is this legal?

    I just dont understand how or why google hasnt tried to go to court over this. Especially in the realm of something lime youtube, where the constant push for automated takedown regularly removes legitimate parody and commentary.

    I mean seriously, the DMCA process is supposed to have protections to prevent fraudulent claims, right? And iirc you do have to show it was willfull, right? How is an almost 100% rate on literally millions of notices being unapplicable anything BUT willful?

    link to this | view in chronology ]

    • icon
      That One Guy (profile), 24 Feb 2017 @ 12:03am

      Re: How is this legal?

      Because those 'protections' have more holes than a target at a gun range hosting a 'Free bullets' day, to the point that it is effectively impossible for them to trigger, barring the accused literally admitting in court that they knew that they were filing a bogus DMCA claim and did it anyway, and even then I wouldn't put good odds on their being punished to any real extent.

      The fact that the law theoretically requires a statement made under perjury, and bots, which cannot do so are allowed to send DMCA claims should be all the demonstration you need to show how pathetic the 'protections to prevent fraudulent claims' are.

      The law was meant from the get-go to be entirely one-sided, it's 'legal' because it's working as intended.

      link to this | view in chronology ]

  • identicon
    oliver, 23 Feb 2017 @ 11:34pm

    The claim by the entertainment industry is...

    an ex-claim
    that ceased to be
    that kicked its bucket
    that met ist maker

    :-) Cheers

    link to this | view in chronology ]

  • identicon
    Anonymous Coward, 24 Feb 2017 @ 9:50am

    Hey Russia, DDOS the source IP.

    link to this | view in chronology ]

  • identicon
    James Brandes - Digital Copyright Consultancy, 28 Feb 2017 @ 2:39am

    There is undoubtedly a major issue with DMCA abuse concerning Google.

    The major problem is that Google are deluged with DMCA Notices every single day and have to cut corners (usually via automation as well). This is then exacerbated by the fact that many anti-piracy agencies work for hundreds of clients (they wouldn't be profitable otherwise) and thus have to rely on automated bots that dynamically generate URLs.

    As long as the site is already known as a pirate site/on a blacklist, it's automatically approved for deletion from Google's search index. Ultimately, this means that a DMCA Notice could be erroneous and yet the content is still removed. This happens very frequently.

    James Brandes - Digital Copyright Consultancy

    link to this | view in chronology ]

  • identicon
    Christopher Burdick, 16 Jul 2018 @ 11:10am

    Yeah, that's pretty much all it is

    Over 90% of these DMCA Complaints, are indeed bot-generated bullshit.

    link to this | view in chronology ]

  • identicon
    ARIO, 27 Mar 2019 @ 8:13am

    just tell me what anges of ip i need to block

    hello
    I own one of those bad sites !!
    and dmca bots arre driving me crazy

    do you know what is dmca bot?
    amazon/google/... bot?

    what I need to limit to get less dmca complains?

    of course I will put this as freelancer project but maybe you guys have any idea

    I have a site which posts every single scene release plus all the Important encodes from ipt bots to rm bots

    so
    I post 2 to 2.5 K/day different files
    to 5 different hosts

    You can guess amount of dmca reports I get/day?!

    link to this | view in chronology ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.