A Numerical Exploration Of How The EU's Article 13 Will Lead To Massive Censorship

from the it's-not-good-folks dept

One of the key talking points from those in favor of Article 13 in the EU Copyright Directive is that people who claim it will lead to widespread censorship are simply making it up. We've explained many times why this is untrue, and how any time you put in place a system for taking down content, tons of perfectly legitimate content gets caught up in it. Some of this is from malicious takedowns, but much of it is just because algorithms make mistakes. And when you make mistakes at scale, bad things happen. Most of you are familiar with the concept of "Type 1" and "Type 2" errors in statistics. These can be more simply described as false positives and false negatives. Over the weekend, Alec Muffett decided to put together a quick "false positive" emulator to show how much of an impact this would have at scale and tweeted out quite a thread, that has since been un-threaded into a webpage for easier reading. In short, at scale, the "false positive" problem is pretty intense. A ton of non-infringing content is likely to get swept up in the mess.

Using a baseline of 10 million piece of content and a much higher than reality level of accuracy (99.5%), and an assumption that 1 in 10,000 items are "bad" (i.e., "infringing") you end up with a ton of legitimate content taken down to stop just a bit of infringement:

So basically in an effort to stop 1,000 pieces of infringing content, you'd end up pulling down 50,000 pieces of legitimate content. And that's with an incredible (and unbelievable) 99.5% accuracy rate. Drop the accuracy rate to a still optimistic 90%, and the results are even more stark:

Now we're talking about pulling down one million legitimate, non-infringing pieces of content in pursuit of just 1,000 infringing ones (many of which the system still misses).

Of course, I can hear the howls from the usual crew, complaining that the 1 in 10,0000 number is unrealistic (it's not). Lots of folks in the legacy copyright industries want to pretend that the only reason people use big platforms like YouTube and Facebook is to upload infringing material, but that's laughably wrong. It's actually a very, very small percentage of such content. And, remember, of course, Article 13 will apply to basically any platform that hosts content, even ones that are rarely used for infringement.

But, just to humor those who think infringement is a lot more widespread than it really is, Muffett also ran the emulator with a scenario in which 1 out of every 500 pieces of content are infringing and (a still impossible) 98.5% accuracy. It's still a disaster:

In that totally unrealistic scenario with a lot more infringement than is actually happening and with accuracy rates way above reality, you still end up pulling down 150,000 non-infringing items... just to stop less than 20,000 infringing pieces of content.

Indeed, Muffett then figures out that with a 98.5% accuracy rate, if a platform has 1 in 67 items as infringing, at that point you'll "break even" in terms of the numbers of non-infringing content (147,000) that is caught by the filter, to catch an equivalent amount of infringing content. But that still means censoring nearly 150,000 pieces of non-infringing content.

This is one of the major problems that people don't seem to comprehend when they talk about filtering (or even human moderating) content at scale. Even at impossibly high accuracy rates, a "small" percentage of false positives leads to a massive amount of non-infringing content being taken offline.

Perhaps some people feel that this is acceptable "collateral damage" to deal with the relatively small amount of infringement on various platforms, but to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality.

Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: algorithms, censorship, censorship machines, copyright, copyright directive, eu, eu copyright directive, false positives, filters, type 1 errors, type 2 errors


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. identicon
    Anonymous Coward, 10 Jul 2018 @ 10:23am

    yeah but that stuff being censored won't be our stuff cause we're gonna sell it on CD and DVD
    - the IAA's

    link to this | view in thread ]

  2. identicon
    Anonymous Coward, 10 Jul 2018 @ 10:28am

    And in reality, the legacy industry will demand that the filters are changed to capture the few false negatives, even if it means ten times more false positive,

    link to this | view in thread ]

  3. identicon
    Anonymous Coward, 10 Jul 2018 @ 10:35am

    the carpetbombing incentive

    Since many of the takedown artists reporting supposedly-infringing content are companies for hire, it's to their advantage to set up their key-word algorithms cast a very wide net and cause as much "collateral damage" as possible, since there is essentially no penalty, while many benefits to reap by showing to their clients an apparently huge work output of "infringing" takedowns that required very little time and effort to produce.

    A casual glance at the Chilling Effects/Lumen database will easily show that many of the named page links are sloppily concocted keyword searches that don't even link to the actual content they claim to, and in many cases use long lists of keyword searches that have no perceptible relationship to the protected content.

    link to this | view in thread ]

  4. icon
    Carlie Coats (profile), 10 Jul 2018 @ 10:41am

    Just to be fair...

    False takedown notices should be subject to the same penalties as copyright infringement.

    IMNHO.

    link to this | view in thread ]

  5. identicon
    Anonymous Coward, 10 Jul 2018 @ 10:51am

    Re: Just to be fair...

    Honestly, if they are not, I plan on obtaining rights to something obscure and filing takedown with literally everyone, on everything. Let them deal with having their accounts locked due to too many violations.

    link to this | view in thread ]

  6. icon
    Ninja (profile), 10 Jul 2018 @ 11:02am

    Collateral damage? I can't see it under the pile of my own greed. - MAFIAA

    link to this | view in thread ]

  7. identicon
    Anonymous Coward, 10 Jul 2018 @ 11:42am

    I don't know if this was posted before... canadian cpa's comercial about *IAA taking advice.

    https://www.youtube.com/watch?v=xknM7g9a7-g

    link to this | view in thread ]

  8. identicon
    Anonymous Coward, 10 Jul 2018 @ 11:55am

    Re:

    Sure, we might lose a million legitimate works of art, but it's not as if copyright exists to promote the creation of new works. How many of the affected individuals would even know they have a copyright? OTOH we've gotta protect the God-given right of corporations to profit.

    link to this | view in thread ]

  9. icon
    ECA (profile), 10 Jul 2018 @ 12:16pm

    MORE WORK FOR THE WICKED?

    So now this is another way to get Server farms to watch over and EDIT THINGS??
    So what is going to happen?
    ASK YOUTUBE.. Go out and look at EVERY video? and see if it infringes? Or TAKE DOWN and dont care?? and deal with the SIMPLE CONSUMERS??
    Anyone got a phone number to youtube/google??
    (NOW you know why they dont have a direct phone number)
    (press one to talk to another computer)

    DO YOU REALLY want to create JOBS?? LET humans do the checking and verification of ALL DATA on the net.
    ENFORCE THAT and we will NEVER run out of jobs..

    link to this | view in thread ]

  10. identicon
    Anonymous Coward, 10 Jul 2018 @ 12:49pm

    'to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality'

    but this is exactly what the entertainment and copyright industries want. remember, they thrive on make believe, on made up stuff, not on reality and expect their way of thinking to be the only way of thinking. they wont be happy until they have got complete control of the best media distribution platform on the planet at the moment. everything they have condemned to date will magically become the best thing since sliced bread, simply because they will be able to use it themselves how they want, for what they want and CHARGE for that use! and be prepared to pay more than high street prices for media downloaded, even though you'll be using YOUR broadband connection, you device(s), your disks, your software, your burner and your printer. the cost to you will escalate considerably while there costs will diminish. and you will need permission and have to pay fees to get to the sites to download the stuff!!

    link to this | view in thread ]

  11. identicon
    Anonymous Coward, 10 Jul 2018 @ 12:51pm

    Re: MORE WORK FOR THE WICKED?

    Your style of alternating quickly between talking and yelling comes across as severely bipolar. And paragraphs are a thing, have been for a very long time.

    link to this | view in thread ]

  12. identicon
    Anonymous Coward, 10 Jul 2018 @ 2:38pm

    Re: Re: MORE WORK FOR THE WICKED?

    Yeah, ECA has a... unique style. And yet somehow manages to be more coherent on the whole than some other notable visitors. xD

    link to this | view in thread ]

  13. identicon
    Anonymous Coward, 10 Jul 2018 @ 4:06pm

    The real numbers

    According to Big Content, the real number for infringing content in user generated platforms is 1 in 1. They then expect 100% accuracy, which becomes easy. The only remaining lawful systems are those with pre-licenced content only.

    link to this | view in thread ]

  14. identicon
    Anonymous Coward, 10 Jul 2018 @ 4:36pm

    Re: The real numbers

    That's the thing though, they aren't actually too far off. Copyright laws are written such that nearly everything is automatically granted a copyright from the moment of its creation, which makes most user generated content (with the exception of short comments like these) be either 1) infringing on existing copyright or 2) newly copyrighted works. Everything really is copyrighted, even if that is both irrelevant and ignored for almost everything.

    link to this | view in thread ]

  15. identicon
    Anonymous Coward, 10 Jul 2018 @ 5:53pm

    Re: Re: Re: MORE WORK FOR THE WICKED?

    Seconded. ECA is a bit off, but never gets my DMCA votes because they actually have something to say. Not just throwing their toys out of the pram like some blues I could mention.

    link to this | view in thread ]

  16. identicon
    Anonymous Coward, 10 Jul 2018 @ 6:53pm

    Re: Re: The real numbers

    The original joke aside, there is a huge difference between "everything is infringing" and "everything is copyrighted". Neither of which are true.

    A factual statement isn't copyrightable (theoretically). An analysis of a copyrighted work may be fair use, and so not infringing. Neither of which are amenable to an automated filter.

    link to this | view in thread ]

  17. identicon
    Anonymous Coward, 10 Jul 2018 @ 9:26pm

    This is copyright.

    Kill it now.

    link to this | view in thread ]

  18. identicon
    Anonymous Coward, 10 Jul 2018 @ 9:30pm

    Re: Re: Re: The real numbers

    A factual statement isn't copyrightable (theoretically).

    Not true. The facts themselves may not be copyrightable, but the statement expressing them can be.

    link to this | view in thread ]

  19. icon
    That One Guy (profile), 11 Jul 2018 @ 12:46am

    "How much of that is ours?" "...very little?" "Don't care then."

    While it's worthwhile to highlight the massive negative impact on speech and content in the flailing about to get those dastardly infringers, the problem is that the ones pushing such plans almost certainly do not care.

    They'll see a hundred and fifty thousand 'innocent' posts killed for every twenty thousand that are actually infringing and completely ignore the first number, caring only that twenty thousand infringing posts were removed.

    After all, if the content isn't their's, and there is no penalty for false accusations or removal, then why would they care?

    link to this | view in thread ]

  20. icon
    That One Guy (profile), 11 Jul 2018 @ 12:48am

    "We don't adapt to the market, the market adapts to US."

    "... how is that going to help us sell CD's?"

    Yeah, that is pretty much them and a few other industries in a nutshell.

    link to this | view in thread ]

  21. identicon
    Anonymous Coward, 11 Jul 2018 @ 4:10am

    Re: Re: Re: Re: The real numbers

    Possibly. It depends how much of the statement is the pure expression of fact, and whether any of it has a creative element. "2+2=4" is not copyrightable

    link to this | view in thread ]

  22. identicon
    Anonymous Coward, 11 Jul 2018 @ 4:11am

    Re: "How much of that is ours?" "...very little?" "Don't care then."

    So, how come their content rarely gets hit by incorrect takedowns? And is there a way to increase that ratio?

    link to this | view in thread ]

  23. icon
    That One Guy (profile), 11 Jul 2018 @ 4:31am

    Re: Re: "How much of that is ours?" "...very little?" "Don't care then."

    'One rule for me, and another for thee' I suspect, where claims made against any large group(political, entertainment, what have you) is given the benefit of the doubt and treated differently than any content flagged that was put up by the rabble. Not to mention a large platform is much more likely to pay attention to a counter-notice made by one of them, such that any incorrect takedowns is likely to only be in effect for a short amount of time.

    It does happen occasionally though, and it tends to be downright hilarious when it does as the ones pushing for filters that they insist are 'easy' suddenly find themselves on the recieving end of those filters.

    link to this | view in thread ]

  24. icon
    JoeCool (profile), 11 Jul 2018 @ 5:27am

    Re: "How much of that is ours?" "...very little?" "Don't care then."

    Ah yes. They watched a bit too much Harry Filth as kids, I suspect. :)

    https://youtu.be/rjtBDO5FL2Y?t=93

    link to this | view in thread ]

  25. icon
    PaulT (profile), 11 Jul 2018 @ 6:10am

    "But that still means censoring nearly 150,000 pieces of non-infringing content."

    ...which will, of course, mostly be the work of independent artists and not works owned by major corporations.

    link to this | view in thread ]

  26. icon
    PaulT (profile), 11 Jul 2018 @ 6:11am

    Re: Re: MORE WORK FOR THE WICKED?

    It's his schtick for some reason. I have tried telling him that all it does is put people off trying to even start reading his posts, but it's never taken on board. A shame, as he does have some good points on the odd occasion I don't just scroll past.

    link to this | view in thread ]

  27. icon
    That One Guy (profile), 11 Jul 2018 @ 6:19am

    Re:

    Purely coincidence of course, and really, if they wanted to avoid unfortunate occurrences like that then they should have signed up with a label to protect them from that sort of thing.

    link to this | view in thread ]

  28. identicon
    Anonymous Coward, 11 Jul 2018 @ 6:25am

    Re: Re:

    That is if they can get the attention of a label in the first place. gatekeeping to limited facilities, like record and CD pressing plants made sense, sort of, gatekeeping the Internet makes absolutely no sense.

    link to this | view in thread ]

  29. identicon
    Anonymous Coward, 11 Jul 2018 @ 7:25am

    Re: Re: Re: Re: Re: The real numbers

    Possibly.

    Exactly. Which makes your previous statement untrue.

    link to this | view in thread ]

  30. icon
    Toom1275 (profile), 11 Jul 2018 @ 11:48am

    Re: Just to be fair...

    Platforms should use a three-strikes system.
    First strike: Written warning and a reminder on the basics of fair use
    Second strike: Temporary ban on filing takedowns.
    Third strike: permanantly forbidden from filing takedowns

    link to this | view in thread ]

  31. identicon
    Anonymous Coward, 15 Jul 2018 @ 9:57pm

    Re: Re: Just to be fair...

    Personally I favor taking it a step further. Make them have to sign and prove they have the rights to what they claim to have first. If they get three strikes that means that the ownership of the copyright is given to public domain as punishment and there is a carte blanche on reproduction on all of their works. They abused their privilege of copyright now they have lost it completely.

    That would make them 'get a team of lawyers before every filing' cautious and that is very much a good thing.

    link to this | view in thread ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.