stories filed under: "error rates"

The Copyright Industry Wants Everything Filtered As It Is Uploaded; Here's Why That Will Be A Disaster

from the i'm-sorry,-we-can't-let-you-post-that,-dave dept

Wed, Dec 29th 2021 1:42pm — Glyn Moody

The history of copyright can be seen as one of increasing control by companies over what ordinary people can do with material created by others. For the online world, the endgame is where copyright holders get to check and approve every single file that is uploaded, with the power to block anything they regard as infringing. That digital dystopia moved much closer two years ago, with the passage of the EU Copyright Directive. At the heart of the Directive lies precisely these kind of upload filters – even though the legislation's supporters insisted that they would not be needed. When the law was safely passed – despite voting issues – only then did they admit that upload filters would indeed be required.

The parts of the EU Copyright Directive dealing with upload filters are so badly crafted that most of the EU’s Member States are struggling to implement them in their national laws in any coherent way. This means the full impact of the legislation's upload filters won't be known for some time.

Until then, we can look at the real-life effects of a similar approach, as used by YouTube. Content ID is a digital fingerprinting system developed by Google at great cost – around $100 million by 2018 – which is designed to spot and block allegedly infringing material on YouTube. Content ID’s flaws are well known, particularly in terms of overblocking perfectly legal uploads. This is the fundamental problem with all upload filters: there is no way that an automated, algorithmic system can encompass the complexities of global copyright laws, which even trained lawyers struggle with. The problem of overblocking is widely known on an anecdotal basis, but we have not had reliable data about the scale of the problem. That has finally changed with the release of YouTube’s first Copyright Transparency Report. The Kluwer Copyright Blog has a good analysis and summary of the report by Paul Keller, Director of Policy at openfuture.eu:

The overall take-away is that automated content removal is a big numbers game. In total YouTube processed 729.3 million copyright actions in the first half of 2021 of which the vast majority (99%) were processed via Content ID (as opposed to other tools, such as Copyright Match Tool and the Webform). And while YouTube claims that ContentID is much more accurate and less prone to abuse than its other systems ContentID has still received 3.7 million disputes from uploaders claiming that the actions (these can be blocks/removals but also demonetisation actions) taken against them are unjustified. 60% of these disputes have ultimately been decided in favour of the uploaders, which means that in the first half of 2021 Content ID has generated at least a 2.2 million unjustified copyright actions against its users on behalf of rightholders. In other words, over-enforcement (both unjustified blocking and unjustified demonetisation) is a very real issue that affects the rights of a substantial number of uploaders on a regular basis.

As Keller rightly notes in his post, the real number of unjustified copyright actions is likely to be larger than 2.2 million. When blocked by the Content ID system, many people will just give up, rather than instituting a formal dispute of the block. Unlike copyright companies’ well-paid lawyers, ordinary people do not have the time, money or expertise to engage in this kind of legal battle.

The figure for YouTube overblocking is bad enough. The situation once the EU Copyright Directive’s upload filters come into operation across the continent will be far worse, for a number of reasons that were widely explored by experts before the law was passed, but almost completely ignored by the EU politicians. Perhaps the most worrying aspect of the imminent upload filters is that they must apply to every kind of copyright material. YouTube only deals with music and video, and even then has enough problems with overblocking, as the new report indicates. The upload filters required by the new EU law will apply to text, images, photos, maps, music scores, ballet scores, software and 3D models amongst other things. There are currently no systems comparable to Content ID for these domains, nor are there likely to be for a long time, if ever, given the huge cost involved in developing them.

Despite this glaring omission, EU Member States are required to bring in new copyright laws, which will inevitably come with upload filter rules. This seems like a huge disaster waiting to happen – all thanks to the selfish desire of copyright companies to control down to the last byte what ordinary people do online.

Follow me @glynmoody on Twitter, Diaspora, or Mastodon.

Originally posted to the Walled Culture blog.

Filed Under: contentid, copyright, error rates, eu, filters

45 Comments

Militias Still Recruiting On Facebook Demonstrates The Impossibility Of Content Moderation At Scale

Content Moderation

from the people-will-always-find-a-way dept

Thu, Mar 25th 2021 10:53am — Mike Masnick

Yesterday, in a (deliberately, I assume) well-timed release, the Tech Transparency Project released a report entitled Facebook's Militia Mess, detailing how there are tons of "militia groups" organizing on the platform (first found via a report on Buzzfeed). You may recall that, just days after the insurrection at the Capitol, that Facebook's COO Sheryl Sandberg made the extremely disingenuous claim that only Facebook had the smarts to stop these groups, and that most of the organizing of the Capitol insurrection must have happened elsewhere. Multiple reports debunked that claim, and this new one takes it even further, showing that these groups are (1) still organizing on Facebook, and (2) Facebook's recommendation algorithm is still pushing people to them:

TTP identified 201 Facebook militia pages and 13 groups that were active on the platform as of March 18. These included “DFW Beacon Unit” in Dallas-Fort Worth, Texas, which describes itself as a “legitimate militia” and posted March 21 about a training session; “Central Kentucky Freedom Fighters,” whose Facebook page posts near-daily content about government infringing on people’s rights; and the "New River Militia" in North Carolina, which posted about the need to “wake up the other lions” two days after the Capitol riot.

Strikingly, about 70% (140) of the Facebook pages identified by TTP had “militia” in their name. That’s a hard-to-miss affiliation, especially for a company that says its artificial intelligence systems are successfully detecting and removing policy violations like hate speech and terrorist content.

In addition, the TTP investigation found 31 militia-related profiles, which display their militia sympathies through their names, logos, patches, posts, or recruiting efforts. In more than half the cases (20), the profiles had the word “militia” in their name.

And, this stuff certainly doesn't look great:

Facebook is not just missing militia content. It’s also, in some cases, creating it.

About 17 percent of the militia pages identified by TTP (34) were actually auto-generated by Facebook, most of them with the word “militia” in their names. This has been a recurring issue with Facebook. A TTP investigation in May 2020 found that Facebook had auto-generated business pages for white supremacist groups.

Auto-generated pages are not managed by an administrator, but they can still play a role in amplifying extremist views. For example, if a Facebook user “likes” one of these pages, the page gets added to the “about” section of the user’s profile, giving it more visibility. This can also serve as a signal to potential recruiters about pro-militia sympathies.

Meanwhile, Facebook’s recommendation algorithm is pushing users who “like” militia pages toward other militia content.

When TTP “liked” the page for “Wo Co Militia,” Facebook recommended a page called “Arkansas Intelligent citizen,” which features a large Three Percenter logo as the page header. (The “history” section in the page transparency shows that it was previously named “3%ERS – Arkansas.”)

Of course, this certainly appears to be a strong contrast with what Facebook itself is claiming. In Mark Zuckerberg's testimony today before Congress on dealing with disinformation, he again suggests that Facebook has an "industry-leading" approach to dealing with this kind of content:

We remove Groups that represent QAnon, even if they contain no violent content. And we do not allow militarized social movements—such as militias or groups that support and organize violent acts amid protests—to have a presence on our platform. In addition, last year we temporarily stopped recommending US civic or political Groups, and earlier this year we announced that policy would be kept in place and expanded globally. We’ve instituted a recommendation waiting period for new Groups so that our systems can monitor the quality of the content in the Group before determining whether the Group should be recommended to people. And we limit the number of Group invites a person can send in a single day, which can help reduce the spread of harmful content from violating Groups.

We also take action to prevent people who repeatedly violate our Community Standards from creating new Groups. Our recidivism policy stops the administrators of a previously removed Group from creating another Group similar to the one removed, and an administrator or moderator who has had Groups taken down for policy violations cannot create any new Groups for a period of time. Posts from members who have violated any Community Standards in a Group must be approved by an administrator or moderator for 30 days following the violation. If administrators or moderators repeatedly approve posts that violate our Community Standards, we’ll remove the Group.

Our enforcement effort in Groups demonstrates our commitment to keeping content that violates these policies off the platform. In September, we shared that over the previous year we removed about 1.5 million pieces of content in Groups for violating our policies on organized hate, 91 percent of which we found proactively. We also removed about 12 million pieces of content in Groups for violating our policies on hate speech, 87 percent of which we found proactively. When it comes to Groups themselves, we will remove an entire Group if it repeatedly breaks our rules or if it was set up with the intent to violate our standards. We took down more than one million Groups for violating our policies in that same time period.

So, on the one hand, you have a report finding these kinds of groups still on the site, despite apparently being banned. And, on the other hand, you have Facebook talking about all of the proactive measures it's taken to deal with these groups. Both of them are telling the truth, but this highlights the impossibility of doing things well.

First, note the scale of the issue. Zuckerberg notes that Facebook has removed more than one million groups. The TTP found 13 militia groups, and 201 militia pages. At the kind of scale of Facebook some things that should be removed are always going to slip through. Some might argue that if the TTP could find these pages, then clearly Facebook could as well. But that raises two separate issues. First, what exactly are they looking for. There are so many things that could violate policies, that I'm sure Facebook trust & safety folks are constantly doing searches like these -- but just because they don't do the exact same search as the TTP does, it doesn't mean that they're not looking for this stuff. Indeed, one could argue that finding just 13 such groups is pretty good.

On top of that, what exactly is the policy violation? Facebook says that it bans militia groups "that support and organize violent acts amid protests." But that doesn't mean every group that refers to itself as a "militia" is going to violate those policies. You can easily see how many might not. On top of that, assuming that these groups recognize how Facebook has been cracking down, it's quite likely that many will simply try to "hide" behind other language to make it more difficult for Facebook to find (indeed, the TTP report points to one example of a "militia" group saying it needs to change the name of the group. In fact, in that example, it says that local law enforcement was who suggested changing the name:

So, there's always going to be some element of cat-and-mouse on these kinds of things, and some level of subjectivity in determining whether a group is actually violating Facebook's policies or not. It's easy to play a "gotcha" game and find groups like this, but that's because at scale it's always going to be impossible to be correct 100% of the time. Indeed, it's also quite likely that these efforts likely over-blocked in some cases, and took down groups that it should not have. Any effort at content moderation, especially at scale, is going to run into both Type I (false positive) and Type II (false negative) mistakes. Finding and highlighting just a few of those mistakes doesn't mean that the company is failing overall -- though it may provide some suggestions on how and where the company can improve.

Filed Under: content moderation, error rates, mark zuckerberg, militias
Companies: facebook

61 Comments

Oversight Says FBI's Facial Recognition System Has Gotten Bigger, But Not Better

Privacy

from the it's-not-like-the-FBI-is-some-tiny-underfunded-agency... dept

Wed, Jun 12th 2019 3:19pm — Tim Cushing

It appears the FBI's facial recognition program will never live up to the minimal expectations its oversight has placed on it. The FBI's database went live in 2014, far preceding the Privacy Impact Assessment that was supposed to be delivered in 2012.

Two years after its debut, the Government Accountability Office found the FBI's database -- which went live with a 20% failure rate -- was still a mess. The FBI showed little interest in improving the accuracy of its searches. It also showed little interest in periodically testing the system to see if it was improving or, quite possibly, getting worse.

The FBI's hands-off approach to facial recognition only applies to its oversight of the program. Otherwise, it's an enthusiastic participant. At the time of the GAO's examination, the FBI's database contained 411 million photos, drawn from both criminal and non-criminal databases. Indicative of the FBI's lackadaisical approach to facial recognition was a bank robbery case in Colorado, where the feds pitched in to help arrest the wrong person twice.

A year later, the House Oversight Committee noted nothing had improved since the GAO's 2016 recommendations. Input and output remained flawed, and the FBI still showed little interest in fixing the problems reported by the GAO.

Two years later, it's deja vu all over again. The GAO's latest report [PDF] says the only thing that's really changed is the size of the database. Since it's last assessment, the FBI has added 230 million photos, bringing the total to 641 million face shots. But otherwise, there's been little improvement. The GAO made six recommendations in 2016. To date, the FBI has only fully implemented one, and has taken no action at all on three of them.

As for the Privacy Impact Assessment the FBI was supposed to deliver in 2012? It's still in the works seven years later.

In its May 2016 report, GAO found that DOJ did not complete or publish key privacy documents for FBI’s face recognition systems in a timely manner and made two recommendations to DOJ regarding its processes for developing these documents. These included privacy impact assessments (PIA), which analyze how personal information is collected, stored, shared, and managed in federal systems, and system of records notices, which inform the public about, among other things, the existence of the systems and the types of data collected. DOJ has taken actions to expedite the development process of the PIA.

As for the system's accuracy, little forward progress has been made. The FBI is at least engaging in limited audits of the system, but only to ensure face searches are done according to policy. The problem with accuracy remains virtually untested. The FBI's testimony claims its vendor delivers a 99% accuracy rate, but as the GAO points out, this number comes from limited testing of batch sizes that may not be representative of those most commonly seen by the system's users.

GAO found that the FBI conducted limited assessments of the accuracy of face recognition searches prior to accepting and deploying its face recognition system. The face recognition system automatically generates a list of photos containing the requested number of best matched photos. The FBI assessed accuracy when users requested a list of 50 possible matches, but did not test other list sizes. GAO recommended accuracy testing on different list sizes.

On top of that, the FBI has no idea how accurate outside systems it utilizes are. It's own vendor might be delivering 99% accuracy, but the FBI makes use of databases and software used by other federal and state agencies. Despite being notified of this issue in 2016, the FBI has yet to assess the accuracy of these external systems.

This refusal to better police its system explains why the House Oversight Committee was less than impressed with the FBI's performance since it last took a look at the agency's facial recognition tech. The FBI's testimony was constantly undercut by the GAO's report, and this resulted in plenty of criticism from members of Congress.

During a hearing, members of the House Oversight Committee questioned witnesses on the steps being taken to ensure the facial recognition tools used by their agencies aren’t infringing on individuals’ privacy and civil liberties. By and large, lawmakers on both sides of the aisle seemed unsatisfied with their answers.

[...]

Lawmakers criticized Kimberly Del Greco, deputy assistant director of the FBI’s Criminal Justice Information Services division, over the bureau’s failure to correct multiple flaws in the way it evaluates its primary facial recognition tool.

Maybe this will finally prompt the FBI to follow up on the issues found in the GAO's latest assessment. But I wouldn't count on it. This same cycle of events played out in 2016 and 2017 -- a GAO report followed by Congressional tongue-lashing -- and the FBI still chose to completely ignore three of the GAO's recommendations. Maybe Congress should just tell the FBI it can't use the tech until it fixes the problems and see if that finally motivates the agency. Nothing else has worked so far. All the FBI has proven is that it can't be trusted with facial recognition tech.

Filed Under: doj, error rates, facial recognition, fbi

New Study Shows Massive Error Rates In E-Voting Machines

Studies

from the that-can-swing-an-election dept

Fri, Mar 21st 2008 7:42pm — Mike Masnick

Just as e-voting firm Sequoia is resisting having its machines reviewed independently, the Brookings Institute has put a bunch of e-voting machines to the test, and found error rates around 3% on some of the machines. These weren't errors due to software problems, but usability problems, where the design of the system resulted in people voting for a candidate they did not want. 3% is a huge number, and could easily change the results of an election. While the study found that people generally like e-voting technology, that still doesn't mean it's particularly effective. One other interesting part of the finding: when there was a voter-verified paper trail, it didn't cut down on errors. This suggests that many voters were either confused or didn't even bother to verify their vote. This should all be very worrisome. Even ignoring the technology problems that these machines have been shown to have, the fact that the design tends to create so many mistake votes should lead people to seriously question the use of e-voting machines.

Filed Under: e-voting, error rates
Companies: brookings, diebold, es&s, sequoia

59 Comments

Follow Techdirt

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

read all »

Techdirt Deals

Report this ad | Hide Techdirt ads

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Older Stuff

Thursday
13:33	Former Employees Say Mossad Members Dropped By NSO Officers To Run Off-The-Books Phone Hacks (2)
12:01	No, Creating An NFT Of The Video Of A Horrific Shooting Will Not Get It Removed From The Internet (18)
10:49	San Francisco Cops Are Running Rape Victims' DNA Through Criminal Databases Because What Even The Fuck (18)
10:44	Daily Deal: The Complete 2022 Java Coder Bundle (0)
09:31	As Expected, Trump's Social Network Is Rapidly Banning Users It Doesn't Like, Without Telling Them Why (44)
06:30	Comcast Continues To Bleed Olympics Viewers After Years Of Bumbling (19)
Wednesday
20:42	Apple Finally Defeats Dumb Diverse Emoji Lawsuit One Year Later (6)
15:39	Clearview Pitch Deck Says It's Aiming For A 100 Billion Image Database, Restarting Sales To The Private Sector (10)
13:41	Peloton Outage Prevents Customers From Using $2,500 Exercise Bikes (16)
12:09	The GOP Knows That The Dem's Antitrust Efforts Have A Content Moderation Trojan Horse; Why Don't The Dems? (16)
10:51	Hertz Ordered To Tell Court How Many Thousands Of Renters It Falsely Accuses Of Theft Every Year (24)
09:21	Even As Trump Relies On Section 230 For Truth Social, He's Claiming In Lawsuits That It's Unconstitutional (34)
06:16	Medical, Home Alarm Industries Warn Of Major Outages As AT&T Shuts Down 3G Network (25)
Tuesday
20:37	Video Game History Foundation: Nintendo Actions 'Actively Destructive To Video Game History' (29)
15:35	Massachusetts Court Says No Expectation Of Privacy In Social Media Posts Unwittingly Shared With An Undercover Cop (17)
13:30	Techdirt Podcast Episode 312: Regulating The Internet (2)
12:03	US Copyright Office Gets It Right (Again): AI-Generated Works Do Not Get A Copyright Monopoly (60)
10:42	LA Sheriff Threatens To 'Subject' City Council To 'Defamation Law' If They Won't Stop Calling His Deputies 'Gang Members' (20)
10:37	Daily Deal: codeSpark Academy Sibling Bundle (0)
09:25	Trump's Truth Social Bakes Section 230 Directly Into Its Terms, So Apparently Trump Now Likes Section 230 (128)
06:22	15 Years Late, The FCC Cracks Down On Broadband Apartment Monopolies (31)
Sunday
12:05	Funniest/Most Insightful Comments Of The Week At Techdirt (11)
Saturday
12:00	This Week In Techdirt History: February 13th - 19th (1)
Friday
19:39	Letter From High-Ranking FBI Lawyer Tells Prosecutors How To Avoid Court Scrutiny Of Firearms Analysis Junk Science (25)
15:52	Nintendo Is Beginning To Look Like The Disney Of The Video Game Industry (44)
13:49	Seattle Public Radio Station Manages To Partially Brick Area Mazdas Using Nothing More Than Some Image Files (44)
12:13	Thankfully, Jay Inslee's Unconstitutional Bill To Criminalize Political Speech Dies In The Washington Senate (8)
10:52	How Our Convoluted Copyright Regime Explains Why Spotify Chose Joe Rogan Over Neil Young (136)
10:47	Daily Deal: The Complete Blocs Website Builder Bundle (0)
09:33	Arizona Prosecutor Who Brought Bogus Gang Charges Against Protesters Files Ridiculous Defamation Suit Against Her Boss (12)

The Copyright Industry Wants Everything Filtered As It Is Uploaded; Here's Why That Will Be A Disaster

from the i'm-sorry,-we-can't-let-you-post-that,-dave dept

Militias Still Recruiting On Facebook Demonstrates The Impossibility Of Content Moderation At Scale

from the people-will-always-find-a-way dept

Oversight Says FBI's Facial Recognition System Has Gotten Bigger, But Not Better

from the it's-not-like-the-FBI-is-some-tiny-underfunded-agency... dept

New Study Shows Massive Error Rates In E-Voting Machines

from the that-can-swing-an-election dept

The Techdirt Greenhouse

Thursday

Wednesday

Tuesday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

from the i'm-sorry,-we-can't-let-you-post-that,-dave dept

from the people-will-always-find-a-way dept

from the it's-not-like-the-FBI-is-some-tiny-underfunded-agency... dept

from the that-can-swing-an-election dept

Techdirt Daily Newsletter

The Techdirt Greenhouse

Tools & Services

Company

Contact

More