A Numerical Exploration Of How The EU's Article 13 Will Lead To Massive Censorship
from the it's-not-good-folks dept
One of the key talking points from those in favor of Article 13 in the EU Copyright Directive is that people who claim it will lead to widespread censorship are simply making it up. We've explained many times why this is untrue, and how any time you put in place a system for taking down content, tons of perfectly legitimate content gets caught up in it. Some of this is from malicious takedowns, but much of it is just because algorithms make mistakes. And when you make mistakes at scale, bad things happen. Most of you are familiar with the concept of "Type 1" and "Type 2" errors in statistics. These can be more simply described as false positives and false negatives. Over the weekend, Alec Muffett decided to put together a quick "false positive" emulator to show how much of an impact this would have at scale and tweeted out quite a thread, that has since been un-threaded into a webpage for easier reading. In short, at scale, the "false positive" problem is pretty intense. A ton of non-infringing content is likely to get swept up in the mess.
Using a baseline of 10 million piece of content and a much higher than reality level of accuracy (99.5%), and an assumption that 1 in 10,000 items are "bad" (i.e., "infringing") you end up with a ton of legitimate content taken down to stop just a bit of infringement:
So basically in an effort to stop 1,000 pieces of infringing content, you'd end up pulling down 50,000 pieces of legitimate content. And that's with an incredible (and unbelievable) 99.5% accuracy rate. Drop the accuracy rate to a still optimistic 90%, and the results are even more stark:
Now we're talking about pulling down one million legitimate, non-infringing pieces of content in pursuit of just 1,000 infringing ones (many of which the system still misses).
Of course, I can hear the howls from the usual crew, complaining that the 1 in 10,0000 number is unrealistic (it's not). Lots of folks in the legacy copyright industries want to pretend that the only reason people use big platforms like YouTube and Facebook is to upload infringing material, but that's laughably wrong. It's actually a very, very small percentage of such content. And, remember, of course, Article 13 will apply to basically any platform that hosts content, even ones that are rarely used for infringement.
But, just to humor those who think infringement is a lot more widespread than it really is, Muffett also ran the emulator with a scenario in which 1 out of every 500 pieces of content are infringing and (a still impossible) 98.5% accuracy. It's still a disaster:
In that totally unrealistic scenario with a lot more infringement than is actually happening and with accuracy rates way above reality, you still end up pulling down 150,000 non-infringing items... just to stop less than 20,000 infringing pieces of content.
Indeed, Muffett then figures out that with a 98.5% accuracy rate, if a platform has 1 in 67 items as infringing, at that point you'll "break even" in terms of the numbers of non-infringing content (147,000) that is caught by the filter, to catch an equivalent amount of infringing content. But that still means censoring nearly 150,000 pieces of non-infringing content.
This is one of the major problems that people don't seem to comprehend when they talk about filtering (or even human moderating) content at scale. Even at impossibly high accuracy rates, a "small" percentage of false positives leads to a massive amount of non-infringing content being taken offline.
Perhaps some people feel that this is acceptable "collateral damage" to deal with the relatively small amount of infringement on various platforms, but to deny that it will create widespread censorship of legitimate and non-infringing content is to deny reality.
Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.
Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.
While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.
–The Techdirt Team
Filed Under: algorithms, censorship, censorship machines, copyright, copyright directive, eu, eu copyright directive, false positives, filters, type 1 errors, type 2 errors
Reader Comments
The First Word
“made the First Word by Ninja
Subscribe: RSS
View by: Time | Thread
- the IAA's
[ link to this | view in chronology ]
[ link to this | view in chronology ]
the carpetbombing incentive
A casual glance at the Chilling Effects/Lumen database will easily show that many of the named page links are sloppily concocted keyword searches that don't even link to the actual content they claim to, and in many cases use long lists of keyword searches that have no perceptible relationship to the protected content.
[ link to this | view in chronology ]
Just to be fair...
IMNHO.
[ link to this | view in chronology ]
Re: Just to be fair...
[ link to this | view in chronology ]
Re: Just to be fair...
First strike: Written warning and a reminder on the basics of fair use
Second strike: Temporary ban on filing takedowns.
Third strike: permanantly forbidden from filing takedowns
[ link to this | view in chronology ]
Re: Re: Just to be fair...
That would make them 'get a team of lawyers before every filing' cautious and that is very much a good thing.
[ link to this | view in chronology ]
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
https://www.youtube.com/watch?v=xknM7g9a7-g
[ link to this | view in chronology ]
"We don't adapt to the market, the market adapts to US."
"... how is that going to help us sell CD's?"
Yeah, that is pretty much them and a few other industries in a nutshell.
[ link to this | view in chronology ]
MORE WORK FOR THE WICKED?
So what is going to happen?
ASK YOUTUBE.. Go out and look at EVERY video? and see if it infringes? Or TAKE DOWN and dont care?? and deal with the SIMPLE CONSUMERS??
Anyone got a phone number to youtube/google??
(NOW you know why they dont have a direct phone number)
(press one to talk to another computer)
DO YOU REALLY want to create JOBS?? LET humans do the checking and verification of ALL DATA on the net.
ENFORCE THAT and we will NEVER run out of jobs..
[ link to this | view in chronology ]
Re: MORE WORK FOR THE WICKED?
[ link to this | view in chronology ]
Re: Re: MORE WORK FOR THE WICKED?
[ link to this | view in chronology ]
Re: Re: Re: MORE WORK FOR THE WICKED?
[ link to this | view in chronology ]
Re: Re: MORE WORK FOR THE WICKED?
[ link to this | view in chronology ]
but this is exactly what the entertainment and copyright industries want. remember, they thrive on make believe, on made up stuff, not on reality and expect their way of thinking to be the only way of thinking. they wont be happy until they have got complete control of the best media distribution platform on the planet at the moment. everything they have condemned to date will magically become the best thing since sliced bread, simply because they will be able to use it themselves how they want, for what they want and CHARGE for that use! and be prepared to pay more than high street prices for media downloaded, even though you'll be using YOUR broadband connection, you device(s), your disks, your software, your burner and your printer. the cost to you will escalate considerably while there costs will diminish. and you will need permission and have to pay fees to get to the sites to download the stuff!!
[ link to this | view in chronology ]
The real numbers
[ link to this | view in chronology ]
Re: The real numbers
[ link to this | view in chronology ]
Re: Re: The real numbers
A factual statement isn't copyrightable (theoretically). An analysis of a copyrighted work may be fair use, and so not infringing. Neither of which are amenable to an automated filter.
[ link to this | view in chronology ]
Re: Re: Re: The real numbers
Not true. The facts themselves may not be copyrightable, but the statement expressing them can be.
[ link to this | view in chronology ]
Re: Re: Re: Re: The real numbers
[ link to this | view in chronology ]
Re: Re: Re: Re: Re: The real numbers
Exactly. Which makes your previous statement untrue.
[ link to this | view in chronology ]
This is copyright.
[ link to this | view in chronology ]
"How much of that is ours?" "...very little?" "Don't care then."
While it's worthwhile to highlight the massive negative impact on speech and content in the flailing about to get those dastardly infringers, the problem is that the ones pushing such plans almost certainly do not care.
They'll see a hundred and fifty thousand 'innocent' posts killed for every twenty thousand that are actually infringing and completely ignore the first number, caring only that twenty thousand infringing posts were removed.
After all, if the content isn't their's, and there is no penalty for false accusations or removal, then why would they care?
[ link to this | view in chronology ]
Re: "How much of that is ours?" "...very little?" "Don't care then."
So, how come their content rarely gets hit by incorrect takedowns? And is there a way to increase that ratio?
[ link to this | view in chronology ]
Re: Re: "How much of that is ours?" "...very little?" "Don't care then."
'One rule for me, and another for thee' I suspect, where claims made against any large group(political, entertainment, what have you) is given the benefit of the doubt and treated differently than any content flagged that was put up by the rabble. Not to mention a large platform is much more likely to pay attention to a counter-notice made by one of them, such that any incorrect takedowns is likely to only be in effect for a short amount of time.
It does happen occasionally though, and it tends to be downright hilarious when it does as the ones pushing for filters that they insist are 'easy' suddenly find themselves on the recieving end of those filters.
[ link to this | view in chronology ]
Re: "How much of that is ours?" "...very little?" "Don't care then."
https://youtu.be/rjtBDO5FL2Y?t=93
[ link to this | view in chronology ]
...which will, of course, mostly be the work of independent artists and not works owned by major corporations.
[ link to this | view in chronology ]
Re:
[ link to this | view in chronology ]
Re: Re:
[ link to this | view in chronology ]