Techdirt's think tank, the Copia Institute, is working with the Trust & Safety Professional Association and its sister organization, the Trust & Safety Foundation, to produce an ongoing series of case studies about content moderation decisions. These case studies are presented in a neutral fashion, not aiming to criticize or applaud any particular decision, but to highlight the many different challenges that content moderators face and the tradeoffs they result in. Find more case studies here on Techdirt and on the TSF website.

Content Moderation Case Study: Game Developer Deals With Sexual Content Generated By Users And Its Own AI (2021)

Content Moderation

from the ai-to-the-unrescue? dept

Wed, Nov 17th 2021 3:30pm — Copia Institute

Summary: Dealing with content moderation involving user generated content from humans is already quite tricky — but those challenges can reach a different level when artificial intelligence is generating content as well. While the cautionary tale of Microsoft’s AI chatbot Tay may be well known, other developers are still grappling with the challenges of moderating AI-generated content.

AI Dungeon wasn't the first online text game to leverage the power of artificial intelligence. For nearly as long as gaming has been around, attempts have been made to pair players with algorithmically-generated content to create unique experiences.

AI Dungeon has proven incredibly popular with players, thanks to its use of powerful machine learning algorithms created by Open AI, the latest version of which substantially expands the input data and is capable of generating text that, in many cases, is indistinguishable from content created by humans.

For its first few months of existence, AI Dungeon used an older version of Open AI's machine learning algorithm. It wasn't until Open AI granted access to the most powerful version of this software (Generative Pre-Trained Transformer 3 [GPT-3]) that content problems began to develop.

As Tom Simonite reported for Wired, Open AI's moderation of AI Dungeon input and interaction uncovered some disturbing content being crafted by players as well as its own AI.

A new monitoring system revealed that some players were typing words that caused the game to generate stories depicting sexual encounters involving children. OpenAI asked Latitude to take immediate action. "Content moderation decisions are difficult in some cases, but not this one,” OpenAI CEO Sam Altman said in a statement. “This is not the future for AI that any of us want."

While Latitude (AI Dungeons' developer) had limited moderation methods during its first few iterations, its new partnership with Open AI and the subsequent inappropriate content, made it impossible for Latitude to continue its limited moderation and allow this content to remain unmoderated. It was clear that the inappropriate content wasn't always a case of users feeding input to the AI to lead it towards generating sexually abusive content. Some users reported seeing the AI generate sexual content on its own without any prompts from players. What may have been originally limited to a few users specifically seeking to push the AI towards creating questionable content had expanded due to the AI's own behavior, which assumed all input sources were valid and usable when generating its own text.

Company Considerations:

How can content created by a tool specifically designed to iteratively generate content be effectively moderated to limit the generation of impermissible or unwanted content?
What should companies do to stave off the inevitability that their powerful algorithms will be used (and abused) in unexpected (or expected) ways?
How should companies apply moderation standards to published content? How should these standards be applied to content that remains private and solely in the possession of the user?
How effective are blocklists when dealing with a program capable of generating an infinite amount of content in response to user interaction?

Issue Considerations:

What steps can be taken to ensure a powerful AI algorithm doesn't become weaponized by users seeking to generate abusive content?

Resolution: AI Dungeon's first response to Open AI's concerns was to implement a blocklist that would prevent users from nudging the AI towards generating questionable content, as well as prevent the AI from creating this content in response to user interactions.

Unfortunately, this initial response generated a number of false positives and many users became angry once it was apparent that their private content was being subjected to keyword searches and read by moderators.

AI Dungeon's creator made tweaks to filters in hopes of mitigating collateral damage. Finally, Latitude arrived at a solution that addressed over-blocking but still allowed it access to Open AI's algorithm. This is from the developer's latest update on AI Dungeon's moderation efforts, published in mid-August 2021:

We’ve agreed upon a new approach with OpenAI that will allow us to shift AI Dungeon’s filtering to have fewer incorrect flags and allow users more freedom in their experience. The biggest change is that instead of being blocked from playing when input triggers OpenAI’s filter, those requests will be handled by our own AI models. This will allow users to continue playing without broader filters that go beyond Latitude’s content policies.

While the fix addressed the overblocking problem, it did create other issues for players, as AI Dungeon's developer acknowledged in the same post. Users who were shunted to AI Dungeon's AI would suffer lower performance due to slower processing. On the other hand, routing around Open AI's filtering system would allow AI Dungeon users more flexibility when crafting stories and limit false flags and account suspensions.

Originally posted to the Trust & Safety Foundation website.

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team

Filed Under: ai, content moderation, generated content, sexual content

13 Comments

If you liked this post, you may also be interested in...

Reader Comments

Subscribe: RSS

View by: Time | Thread

Anonymous Coward, 17 Nov 2021 @ 4:19pm

AI Dungeon was great when it was free but now it's so restricted and money-grubbing that it's not interesting to play with anymore. Go look at the app store reviews. Everyone is upset about the change.
[ link to this | view in thread ]
GHB (profile), 17 Nov 2021 @ 5:58pm

The overblocking might be likely a scunthorpe problem
To block prohibited keywords is not an easy, as people try to use euphemism and any other methods of replacing words that mean something prohibited with innocents words. This is where context is really really hard for AI to understand this.
[ link to this | view in thread ]
Anonymous Coward, 17 Nov 2021 @ 6:55pm

The way AI generated stories work is you frequently re-generate prompts before you submit them.

The way the suspension worked was if you submitted a prompt the AI generated back into the AI to get the next prompt, you were auto-banned. It was a load of horseshit.

And of course, the now repeated phrase:
Won't someone please think of the procedurally generated children?
[ link to this | view in thread ]
Anonymous Coward, 17 Nov 2021 @ 7:20pm

Re:
Addendum - wholesale monitoring people's porn habits is not the future anyone should want either. Policing fictional porn generation in case someone imagines the wrong thing (let alone the computer procedurally generating it) is ridiculous.
[ link to this | view in thread ]
Scary Devil Monastery (profile), 18 Nov 2021 @ 12:28am

Monkey see, Monkey do.
"Some users reported seeing the AI generate sexual content on its own without any prompts from players."

What really surprises me is that the developers failed to foresee that an algorithm dedicated to learning from online behaviors wouldn't come up with Rule 34. I'm sure the techies who programmed it will be all too happy to invoke the Abigail Oath when they gleefully inform their employers that "learns to mimic human behavior" means exactly that.
[ link to this | view in thread ]
nasch (profile), 18 Nov 2021 @ 9:49am

Re: Re:
There's an argument to be made that it's better to let people interested in it have all the fake child porn they want so there might be less demand for the real thing, and thus fewer children being abused. But there are zero plus or minus zero politicians who want to campaign on a platform of improved accessibility of child porn.
[ link to this | view in thread ]
Anonymous Coward, 18 Nov 2021 @ 11:30am

Re: Re: Re:
It's not even that - this is just in the "icky" category. Just a disturbing figment of someone's imagination. But it's just that - not real.
[ link to this | view in thread ]
nasch (profile), 18 Nov 2021 @ 11:55am

Re: Re: Re: Re:

But it's just that - not real.

Right, that's what I'm saying. It may be better to let this not real stuff happen legally, because it's not hurting any children.
[ link to this | view in thread ]
Anonymous Coward, 18 Nov 2021 @ 12:13pm

Oh, the horror! A computer wrote a naughty story! We must Do Something™!
[ link to this | view in thread ]
ECA (profile), 18 Nov 2021 @ 4:05pm

Re: Monkey see, Monkey do.
DUH!
Until humans are willing to Understand themselves, and not be bashful, They will never understand EACH OTHER.
[ link to this | view in thread ]
Romane, 18 Nov 2021 @ 11:49pm

Re: The overblocking might be likely a scunthorpe problem
User-generated content moderation is not obvious you're right, especially in the gaming sector with real online hate issues (misogyny, threats, harassment, etc.) Bodyguard.ai, Two hat, Sentropy or Spectrum Labs can help! They use adaptive moderation / contextual moderation 👌
[ link to this | view in thread ]
Anonymous Coward, 19 Nov 2021 @ 4:39am

AI dungeon has always been weird.

Played adverture where I typed 'wear pants'. then had a "wizard" appear everywhere from then on trying to seduce me by taking his pants off for every command i typed....

"a wizard removes his pants sexily" " a wizard removes his pants erotically" "a wizard removes his pants hurriedly" etc
[ link to this | view in thread ]
Lostinlodos (profile), 20 Nov 2021 @ 11:23am

But, sword and boobs?
Seriously. Medieval dungeon boobies.
This is not exactly new. And fully should have been expecting.
Stories of heroes in loin cloth saving women and then fornicating date to pre Rome.
Hell, by the 800s we had stories of naked women saving abused men (girl power!).

That this would not be a known aspect of user generated content day one…? Facepalm.
The choice of solution to this, is up to them. But only a prude fool would think it wouldn’t have come up.

Oh, and save the dirty nerd cliché! Some of the most brutally assaultive sexual stories are written by women for women.
Serious, go pick up a romance novel.
[ link to this | view in thread ]