The Tech Policy Greenhouse is an online symposium where experts tackle the most difficult policy challenges facing innovation and technology today. These are problems that don't have easy solutions, where every decision involves tradeoffs and unintended consequences, so we've gathered a wide variety of voices to help dissect existing policy proposals and better inform new ones.

Content Moderation Knowledge Sharing Shouldn't Be A Backdoor To Cross-Platform Censorship

Content Moderation

from the too-big-of-a-problem-to-tackle-alone dept

Fri, Aug 21st 2020 12:00pm — Emma Llanso

Ten thousand moderators at YouTube. Fifteen thousand moderators at Facebook. Billions of users, millions of decisions a day. These are the kinds of numbers that dominate most discussions of content moderation today. But we should also be talking about 10, 5, or even 1: the numbers of moderators at sites like Automattic (Wordpress), Pinterest, Medium, and JustPasteIt—sites that host millions of user-generated posts but have far fewer resources than the social media giants.

There are a plethora of smaller services on the web that host videos, images, blogs, discussion fora, product reviews, comments sections, and private file storage. And they face many of the same difficult decisions about the user-generated content (UGC) they host, be it removing child sexual abuse material (CSAM), fighting terrorist abuse of their services, addressing hate speech and harassment, or responding to allegations of copyright infringement. While they may not see the same scale of abuse that Facebook or YouTube does, they also have vastly smaller teams. Even Twitter, often spoken of in the same breath as a “social media giant,” has an order of magnitude fewer moderators at around 1,500.

One response to this resource disparity has been to focus on knowledge and technology sharing across different sites. Smaller sites, the theory goes, can benefit from the lessons learned (and the R&D dollars spent) by the biggest companies as they’ve tried to tackle the practical challenges of content moderation. These challenges include both responding to illegal material and enforcing content policies that govern lawful-but-awful (and mere lawful-but-off-topic) posts.

Some of the earliest efforts at cross-platform information-sharing tackled spam and malware such as the Mail Abuse Prevention System (MAPS) — which maintains blacklists of IP addresses associated with sending spam. Employees at different companies have also informally shared information about emerging trends and threats, and the recently launched Trust & Safety Professional Association is intended to provide people working in content moderation with access to “best practices” and “knowledge sharing” across the field.

There have also been organized efforts to share specific technical approaches to blocking content across different services, namely, hash-matching tools that enable an operator to compare uploaded files to a pre-existing list of content. Microsoft, for example, made its PhotoDNA tool freely available to other sites to use in detecting previously reported images of CSAM. Facebook adopted the tool in May 2011, and by 2016 it was being used by over 50 companies.

Hash-sharing also sits at the center of the Global Internet Forum to Counter Terrorism (GIFCT), an industry-led initiative that includes knowledge-sharing and capacity-building across the industry as one of its 4 main goals. GIFCT works with Tech Against Terrorism, a public-private partnership launched by the UN Counter-Terrrorism Executive Directorate, to “shar[e] best practices and tools between the GIFCT companies and small tech companies and startups.” Thirteen companies (including GIFCT founding companies Facebook, Google, Microsoft, and Twitter) now participate in the hash-sharing consortium.

There are many potential upsides to sharing tools, techniques, and information about threats across different sites. Content moderation is still a relatively new field, and it requires content hosts to consider an enormous range of issues, from the unimaginably atrocious to the benignly absurd. Smaller sites face resource constraints in the number of staff they can devote to moderation, and thus in the range of language fluency, subject matter expertise, and cultural backgrounds that they can apply to the task. They may not have access to — or the resources to develop — technology that can facilitate moderation.

When people who work in moderation share their best practices, and especially their failures, it can help small moderation teams avoid pitfalls and prevent abuse on their sites. And cross-site information-sharing is likely essential to combating cross-site abuse. As scholar evelyn douek discusses (with a strong note of caution) in her Content Cartels paper, there’s currently a focus among major services in sharing information about “coordinated inauthentic behavior” and election interference.

There are also potential downsides to sites coordinating their approaches to content moderation. If sites are sharing their practices for defining prohibited content, it risks creating a de facto standard of acceptable speech across the Internet. This undermines site operators’ ability to set the specific content standards that best enable their communities to thrive — one of the key ways that the Internet can support people’s freedom of expression. And company-to-company technology transfer can give smaller players a leg up, but if that technology comes with a specific definition of “acceptable speech” baked in, it can end up homogenizing the speech available online.

Cross-site knowledge-sharing could also suppress the diversity of approaches to content moderation, especially if knowledge-sharing is viewed as a one-way street, from giant companies to small ones. Smaller services can and do experiment with different ways of grappling with UGC that don’t necessarily rely on a centralized content moderation team, such as Reddit’s moderation powers for subreddits, Wikipedia’s extensive community-run moderation system, or Periscope’s use of “juries” of users to help moderate comments on live video streams. And differences in the business model and core functionality of a site can significantly affect the kind of moderation that actually works for them.

There’s also the risk that policymakers will take nascent “industry best practices” and convert them into new legal mandates. That risk is especially high in the current legislative environment, as policymakers on both sides of the Atlantic are actively debating all sorts of revisions and additions to intermediary liability frameworks.

Early versions of the EU’s Terrorist Content Regulation, for example, would have required intermediaries to adopt “proactive measures” to detect and remove terrorist propaganda, and pointed to the GIFCT’s hash database as an example of what that could look like (CDT joined a coalition of 16 human rights organizations recently in highlighting a number of concerns about the structure of GIFCT and the opacity of the hash database). And the EARN-IT Act in the US is aimed at effectively requiring intermediaries to use tools like PhotoDNA—and not to implement end-to-end encryption.

Potential policymaker overreach is not a reason for content moderators to stop talking to and learning from each other. But it does mean that knowledge-sharing initiatives, especially formalized ones like the GIFCT, need to be attuned to the risks of cross-site censorship and eliminating diversity among online fora. These initiatives should proceed with a clear articulation of what they are able to accomplish (useful exchange of problem-solving strategies, issue-spotting, and instructive failures) and also what they aren’t (creating one standard for prohibited — much less illegal— speech that can be operationalized across the entire Internet).

Crucially, this information exchange needs to be a two-way street. The resource constraints faced by smaller platforms can also lead to innovative ways to tackle abuse and specific techniques that work well for specific communities and use-cases. Different approaches should be explored and examined for their merit, not viewed with suspicion as a deviation from the “standard” way of moderating. Any recommendations and best practices should be flexible enough to be incorporated into different services’ unique approaches to content moderation, rather than act as a forcing function to standardize towards one top-down, centralized model. As much as there is to be gained from sharing knowledge, insights, and technology across different services, there’s no-one-size-fits-all approach to content moderation.

Emma Llansó is the Director of CDT’s Free Expression Project, which works to promote law and policy that support Internet users’ free expression rights in the United States and around the world. Emma also serves on the Board of the Global Network Initiative, a multistakeholder organization that works to advance individuals’ privacy and free expression rights in the ICT sector around the world. She is also a member of the multistakeholder Freedom Online Coalition Advisory Network, which provides advice to FOC member governments aimed at advancing human rights online.

Filed Under: best practices, censorship, content moderation, cross-platform, gifct, hashes, knowledge sharing, maps

6 Comments

Reader Comments

Subscribe: RSS

View by: Thread

ECA (profile), 21 Aug 2020 @ 1:25pm

From the past.
This seems like the old days.
When programmers would Bounce back and forth between Companies making games, and SHARING KNOWLEDGE.
BEFORE, a few idiots wanted to CR their work to death. Where a small bit of code, could send persons to court to fight over Lines of text/code/anything.
[ link to this | view in chronology ]
Anonymous Coward, 21 Aug 2020 @ 2:00pm

annnnnnnd this

At the end of the day, in corporate world, it is all about the money and how it can be done cheaper. Then some yahoo starts up a CMaaS (ContentModeration-as-a-Service) company in the Cloud! Then it isn't so-and-so got kicked off of platform A. Bonus! Risk Transference! "It was just our CMaaS provider - we totally love you!"

Then the beauty of the whole thing is the next set of screaming is aimed at something you now care about. Don't blame the automation. sigh

While I have actually been enjoying the discussions regarding content-management, and, it is a tough nut to crack, we really need to stop saying things like content-moderation is new. It isn't. We do it every day. With technology and without. I used to just call it "noise reduction" and have my desktop showing a picture of the signal-to-noise ratio.
[ link to this | view in chronology ]
Anonymous Coward, 21 Aug 2020 @ 2:50pm

These tools...
When we talk about laws of EARN IT, we are rarely talking about PhotoDNA. PhotoDNA has many issues with transparency, privacy and scope creep (the definition of child pornography grows by the year). But what they really want is something far more sinister and far more insidious. CSAI. Tools like CSAI only ever protect "non-existent children" from abuse (CG) and has a certain false positive rate. As an AI, it is opaque by nature and impossible to audit.

Are these tools capable of stopping child pornography from spreading across the Web? If they aren't, are they worth the impact on our civil liberties? Is the cost of trying to stop it entirely proportionate? 99% of that effort is going to be on people sharing it, rather than the most despicable originators.
[ link to this | view in chronology ]
- Anonymous Anonymous Coward (profile), 21 Aug 2020 @ 6:02pm
  
  Re: These tools...
  
  "Are these tools capable of stopping child pornography from spreading across the Web?"
  
  Probably not, but that, in and of itself, is little reason for politicians to stop working on ways to enable control over the populace. The stated goal is rarely the same as the intended goal. The intended goal is never spoken about, in public.
  [ link to this | view in chronology ]
sumgai (profile), 21 Aug 2020 @ 7:05pm

If sites are sharing their practices for defining prohibited content, it risks creating a de facto standard of acceptable speech across the Internet. This undermines site operators’ ability to set the specific content standards that best enable their communities to thrive.

Hmmm. Perhaps those particular site operators might wish to re-evaluate what their community considers to be their central tenet. I posit that if it's something that the much greater majority of the population deems "unacceptable", then it might be prudent to avoid "going there".

Free speech is one thing, the consequences of speaking freely are another thing entirely. The very term "unacceptable" connotes that a reaction will likely follow, and such may not stop with a simple banning and the like.

And yes, there are going to be a lot of "false positives" that require human intervention, I get that. But I'm also dead certain that I won't be here on this mortal coil before they finally get it all straightened out and everyone is happy with the way the content moderation on the Internet works. It's a long road, and probably more bumpy that some people will like, but if it happened any quicker, somebody is just gonna keep tugging on Superman's cape, so to speak.
[ link to this | view in chronology ]
- Thad (profile), 22 Aug 2020 @ 9:35am
  
  Re:
  
  Hmmm. Perhaps those particular site operators might wish to re-evaluate what their community considers to be their central tenet. I posit that if it's something that the much greater majority of the population deems "unacceptable", then it might be prudent to avoid "going there".
  
  Free speech is one thing, the consequences of speaking freely are another thing entirely. The very term "unacceptable" connotes that a reaction will likely follow, and such may not stop with a simple banning and the like.
  
  But acceptable speech depends greatly on the forum and the context. A Quentin Tarantino fan forum is going to have a different standard for what's acceptable than a Sesame Street forum, and both of them are going to have a different standard for what's acceptable than a porn site.
  
  The idea that there's one single "community standard" for acceptable speech that can be applied across the entire Internet is, I think, part of the problem.
  
  But I'm also dead certain that I won't be here on this mortal coil before they finally get it all straightened out and everyone is happy with the way the content moderation on the Internet works.
  
  My friend, the heat death of the universe will occur before someone finds a standard for content moderation that makes everyone happy.
  [ link to this | view in chronology ]