stories filed under: "language"

The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale

from the what's-in-a-name dept

Fri, Aug 31st 2018 12:09pm — Cathy Gellis

Maybe someday AI will be sophisticated, nuanced, and accurate enough to help us with platform content moderation, but that day isn't today.

Today it prevents an awful lot of perfectly normal and presumably TOS-abiding people from even signing up for platforms. A recent tweet from someone unable to sign up to use an app because it didn't like her name, as well as many, many, MANY replies from people who've had similar experiences, drove this point home:

Been there

— Matt Cummings (@MattCummingsDB) August 29, 2018

As a person named James Butts, I know these problems.

— James (@justjames8) August 28, 2018

As a Dickman I know the struggle is real

— Mike Dickman (@TheMikeDickman) August 29, 2018

I get this a lot surprisingly

— Kyle Medick (@medick32) August 28, 2018

We have quite similar circumstances here

— Jacob Cockrill (@jacob_cockrill) August 29, 2018

Tom Hiscock reporting in.

— aWildWatermelon (@aWildWatermelon) August 29, 2018

Uhm, my name is Analise. That’s exact spelling. Been through this many of times.

— AP (@aannpp23) August 29, 2018

Join the club

— Craig Cockburn (@siliconglen) August 29, 2018

Oh! Am I too late to join this club?

— James Ho (@IndieVideoJames) August 29, 2018

Happens to me often, as you can imagine.

— MatthewDicks (@MatthewDicks) August 29, 2018

Facebook, despite its insistence on users using real names, seems particularly bad at letting people actually use their real names.

A large part of my family uses a shortened form of our last name because many places, including Facebook, don't think Buckmaster is a real last name.

But Buck, Buckbuck, Bucker, Bucky and many more are all "real" >.>

— The Autistech (@theAutistech) August 28, 2018

Yeah - Facebook won't allow my real name of "Talks" so had to come up with something else. Although my wife's account is okay ...

It gets better because when Collette Talks put "in a relationship with Mike Torkelson", basically the family gossip went into overdrive!

— Mike Talks 💚💚💚 (@TestSheepNZ) August 29, 2018

My last name is Player and Facebook still won’t let me have that as a last name because it’s a “street name.”

— Sav (@TheSavannahOW) August 29, 2018

I have family members who use alternate names on Facebook because it wouldn't accept Lick

— Chris Hannas (@cjhannas) August 29, 2018

But of course, Facebook is not the only instance where censorship rules based on bare pattern matching interfere not just with speech but with speaker's ability to even get online to speak.

Can’t even create my own player in a Madden franchise. Smh.

— Ben Schmuck (@benschmuck13) August 28, 2018

Ha! I had the same damn thing happen to me today when I tried to RSVP for a webinar.

— Jen Dick (@Jennifer_Dick) August 28, 2018

You're right in there with Alan Cumming, the actor, whose name was autocensored by the late City of Heroes MMO's official forums. (The COH forums also auto-nixed Dick Grayson, which was... amusing... on a forum where superheroes got discussed a lot.)

— The Phantom of the Ottoman (@zgryphon) August 28, 2018

This dynamic is what's known as the Scunthorpe Problem. Scunthorpe is a town in the UK whose residents have had an appallingly difficult time using the Internet due to a naughty word being contained within the town name.

The Scunthorpe problem is the blocking of e-mails, forum posts or search results by a spam filter or search engine because their text contains a string of letters that are shared with another (usually obscene) word. While computers can easily identify strings of text within a document, broad blocking rules may result in false positives, causing innocent phrases to be blocked.

The problem was named after an incident in 1996 in which AOL's profanity filter prevented residents of the town of Scunthorpe, North Lincolnshire, England from creating accounts with AOL, because the town's name contains the substring cunt. Years later, Google's opt-in SafeSearch filters apparently made the same mistake, preventing residents from searching for local businesses that included Scunthorpe in their names.

(A related dynamic, the Clbuttic Problem, creates issues of its own when, instead of outright blocking, software automatically replaces the allegedly naughty words with ostensibly less-naughty words instead. People attempting to discuss such non-purient topics as Buttbuttin's Creed and the Lincoln Buttbuttination find this sort of officious editing particularly unhelpful…)

While examples of these dynamics can be amusing, each is also quite chilling to speech, and to speakers wishing to speak.

With the last name ‘Dicks’, I have to remind people to check their spam folder more often than a Nigerian prince.

— Chain of Lynx (@chainoflynx) August 28, 2018

The word Spam is literally in my last name. My husband’s family warned me that my last name can/will be marked as spam.

— Angela Spampata (@bird5445) August 29, 2018

Used to work with a lady whose last name is Wang, and it took us a few days to add exceptions to all the email filters

— Destroyer of Jeeps (@NewKindOfClown) August 29, 2018

It's not something we should be demanding more of, but every time people call for "AI" as a solution to online content challenges these are the censoring problems the call invites.

A big part of the problem is that calls for "AI" tend to treat it like some magical incantation, as if just adding it will solve all our problems. But in the end, AI is just software. Software can be very good at doing certain things, like finding patterns, including patterns in words (and people's names…). But it's not good at necessarily knowing what to make of those patterns.

pic.twitter.com/5U0a1Yk8Yv

— michelle 💞 (@tenderdamie) August 29, 2018

Our net Nanny at work flagged a co-worker for offensive language. He dealt with a lot of crane contractors. Net nanny told his boss he was sending lots of emails with the word erection. Lol.

— GoGoATL (@GoGoATL) August 28, 2018

More sophisticated software may be better at understanding context, or even sometimes learning context, but there are still limits to what we can expect from these tools. They are at best imperfect reflections of the imperfect humans who created them, and it's a mistake to forget that they have not yet replicated, or replaced, human judgment, which itself is often imperfect.

Which is not to say that there is no role for software to help in content moderation. The things that software is good at can make it an important tool to help support human decision-making about online content, especially at scale. But it is a mistake to expect software to supplant human decision-making. Because, as we see from these accruing examples, when we over-rely on them, it ends up being real humans that we hurt.

Had this on a website for the kids, the kids demanded to know why, our last name is ‘Clithero’ Interesting conversation. 😳

— DougHero 🇬🇧 (@ClitheroDoug) August 29, 2018

I know that feel pic.twitter.com/nMbjfTKGcZ

— Nazi Paikidze-Barnes (@NaziPaiki) August 29, 2018

Filed Under: ai, artificial intelligence, content moderation, language, natalie weiner, scunthorpe

46 Comments

How May 35th Freedoms Have Blossomed With China's Martian Language

Free Speech

from the say-what? dept

Tue, Aug 1st 2017 6:41pm — Glyn Moody

In recent years, the Internet news from China has been pretty depressing, as Xi Jinping tightens his control over every aspect of the online world. But the Chinese are a resourceful people, with thousands of years of experience of circumventing imperial oppression. For example, one of the many taboo subjects today is the "June 4th incident", better known in the West as the Tiananmen Square protests of 1989. A New York Times article published in 2011 explains how people in China managed to refer to this forbidden date online:

You might think May 35th is an imaginary date, but in China it's a real one. Here, where references to June 4 -- the date of the Tiananmen incident of 1989 -- are banned from the Internet, people use "May 35th" to circumvent censorship and commemorate the events of that day.

Inevitably, the authorities soon spotted this trick, and blocked references to May 35th too. But as the author of the New York Times piece, Yu Hua, explains:

May 35th freedom is an art form. To evade censorship when expressing their opinions on the Internet, Chinese people give full rein to the rhetorical functions of language, elevating to a sublime level both innuendo and metaphor, parody and hyperbole, conveying sarcasm and scorn through veiled gibes and wily indirection.

The latest, most highly-developed form of that "May 35th freedom" is described in an article on Quartz, which explores an invented Chinese language known as "Martian":

Martian dates back to at least 2004 but its origins are mysterious. Its use appears to have begun among young people in Taiwan for online chatting, and then it spread to the mainland. The characters randomly combine, split, and rebuild traditional Chinese characters, Japanese characters, pinyin, and sometimes English and kaomoji, a mixture of symbols that conveys an emotion (e.g. O(∩_∩)O: Happy).

Martian is an extension of the May 35th approach, but with additional elements, including fairly random ones. That makes it hard for the automated censorship systems to spot forbidden topics, since the Martian elements have to be decoded first. Naturally, though, the human censors eventually work out what the Martian terms mean, and add them to the blacklists for automatic blocking. However, according to the Quartz article, China's censorship system is not monolithic, and just because a post written in Martian is blocked on one service doesn't mean it will be blocked on another.

It's the continuing existence of those small spaces for free speech, coupled with the never-ending ingenuity of Chinese Internet users in coming up with Martian-like linguistic camouflage, that allows controversial material to be posted and circulated, despite the massive censorship machine.

Follow me @glynmoody on Twitter or identi.ca, and +glynmoody on Google+

Filed Under: censorship, china, ciphers, codes, language, martian, may 35th

46 Comments

Obama Administration Learns: If You Redefine Every Word In The Dictionary, You Can Get Away With Just About Anything

Say That Again

from the words-mean-something dept

Thu, Sep 25th 2014 7:58am — Mike Masnick

We've written before about how the NSA uses its own definitions of some fairly basic English words, in order to pretend to have the authority to do things it probably... doesn't really have authority to do. It's become clear that this powergrab-by-redefinition is not unique to the NSA when it comes to the executive branch of the government. Earlier this year, we also wrote about the stunning steady redefinition of words within the infamous "Authorization to Use Military Force" (AUMF) that was passed by Congress immediately after September 11, 2001. It officially let the President use "all necessary and appropriate force" against those who "planned, authorized, committed or aided the terrorist attacks that occurred on September 11, 2001." But, over time, the AUMF was being used to justify efforts against folks who had nothing to do with September 11th, leading to this neat sleight of hand in which the military started pretending that the AUMF also applied to "associated forces." That phrase appears nowhere in the AUMF, but it's a phrase that is regularly repeated and claimed by the administration and the military.

But, it goes beyond that. As Trevor Timm highlights over at The Guardian, pretty much the entire drone bombing (drones, by the way, are also apparently "authorized" by the AUMF) of Syria involves the administration conveniently redefining basic English to suit its purposes. Let's start with the authorization for the bombing itself:

For instance, in his Tuesday statement that US airstrikes that have expanded into Syria, Obama studiously avoided any discussion about his domestic legal authority to conduct these strikes. That dirty work was apparently left up to anonymous White House officials, who told the New York Times’s Charlie Savage that both the Authorization of Use of Military Force (AUMF) from 2001 (meant for al-Qaida) and the 2002 war resolution (meant for Saddam Hussein’s Iraq) gave the government the authority to strike Isis in Syria.

In other words: the legal authority provided to the White House to strike al-Qaida and invade Iraq more than a dozen years ago now means that the US can wage war against a terrorist organization that’s decidedly not al-Qaida, in a country that is definitely not Iraq.

It's amazing what you can accomplish when you pretend words mean something entirely different than they do. Hell, if you can just make words mean whatever the hell you want them to mean, there's no such thing as a limitation on what you can do. It's all fair game. Who needs laws when the law is basically a mad libs for you to fill in what you want?

Moving on. The definitional jujitsu covers the people who were killed by the bombing as well. Civilians? What civilians?

Buzzfeed’s Evan McMorris-Santoro reported that the Pentagon is “confident” that no civilians were killed in any of the initial airstrikes in Syria, despite a credible report to the contrary. But we have no idea what that actually means either. The White House previously embraced a re-definition of “civilian” so it could easily deny its drone strikes were killing anyone than “militants” in Yemen, Pakistan, and elsewhere, according to a New York Times report in 2012:

It in effect counts all military-age males in a strike zone as combatants, according to several administration officials, unless there is explicit intelligence posthumously proving them innocent.

So any casualties, if they’re men, might well be tallied as “militants” even if the actual dead people were not.

Kill anyone you want, just as long as they're men of a certain age. Thank you Pentagon dictionary. You just wiped out civilian deaths.

But why stop there? How about "imminent threats"? Because that sounds pretty scary, right? It sure is -- especially when it can mean whatever the hell the administration wants:

In addition to conducting airstrikes against Isis is Syria on Tuesday, the Obama administration also announced it had also targeted the “Khorasan Group”, a separate al-Qaida-linked terrorist organization. They justified it by claiming that the group was plotting an “imminent” attack on the US. Before last week, hardly anyone had heard of the Khorasan Group (in fact, even their name was classified), so it’s difficult to judge from public information just how threatening their alleged plot really was. But when you add in the administration’s definition of “imminent,” it becomes impossible.

Take, for example, this definition from a Justice Department white paper, which was leaked last year, intended to justify the killing of Americans overseas:

[A]n “imminent” threat of violent attack against the United States does not require the United States to have clear evidence that a specific attack on U.S. persons will take place in the immediate future.

To translate: “imminent” can mean a lot of things … including “not imminent”.

This is pretty neat. Anything else you've got for us? How about "combat" or "ground troops"? They're not what you think they are either, because a malleable language can do anything:

As the New York Times’s Mark Landler detailed over the weekend, White House has “an extremely narrow definition of combat … a definition rejected by virtually every military expert.” According to the Obama administration, the 1600 “military advisers” that have steadily been flowing in Iraq fall outside this definition, despite the fact that “military advisers” can be: embedded with Iraqi troops; carry weapons; fire their weapons if fired upon; and call in airstrikes. In the bizarro dictionary of war employed by this White House, none of that qualifies as “combat”.

Yes, the English language changes over time and that's generally a good thing. But we're not talking about the way the word "decimate" once meant to lop off 10% and now means "destroy everything." This is a deliberate misrepresentation of things.

Hell, this seems to go further than Orwell even imagined with his authoritarian use of language and rewriting of history. In this case, rather than just saying "we were always at war with Eurasia," he could have just changed the definition of "we," "were," "always," "at," "war," "with," and "Eurasia," and it would have been that much more powerful.

Filed Under: aumf, authorization to use military force, civilians, combat, definitions, english, fud, imminent, language, obama administration, terror, war

45 Comments

Language School's Blogger Fired For Writing A Post On Homophones; Director Fears Association With 'Gay Sex'

Failures

from the not-a-hoax dept

Fri, Aug 1st 2014 9:49am — Timothy Geigner

Let's say you're a company and you hired a social media expert to run your social media and blogging tasks. Now let's say you want to fire that person. You need a decent reason, right? Maybe your company is just going through layoffs and that job happens to fall under the ax (though, make sure you get control of your Twitter account before dropping the blade). Or maybe your "expert" ran a tweet/response campaign that backfired as badly as it possibly could. Those are good reasons to fire your social media and blogging guy.

What's a poor reason for firing that person? How about: Well, we thought the person's post about homophones for our language school's blog might make people think we're all gay and whatnot? Yeah, that pretty much covers it.

But when the social-media specialist for a private Provo-based English language learning center wrote a blog explaining homophones, he was let go for creating the perception that the school promoted a gay agenda. Tim Torkildson says after he wrote the blog on the website of his employer, Nomen Global Language Center, his boss and Nomen owner Clarke Woodger, called him into his office and told him he was fired.

Now, I know what you're thinking: that didn't f#&$ing happen. Well, au contraire, bonjour, it sure as hell did happen. A school entirely built to teach the English language to non-English speaking immigrants in Utah fired a guy for blogging about homophones. And, just so we're clear, homophones are not telephones run by the homosexual-ati as a hotline designed to disrupt the traditional family values of 'Merica. No, homophones are words that sound alike but have different definitions, like "I" and "eye."

Torkildson's account includes some eyebrow-raising quotes of Woodger claiming not to know what homophones were, claiming that they don't teach that kind of "advanced" language study to their English language students, and worrying that the post would associate the school with homosexuality for reasons uknown to this writer. Woodger's account is different, but vaguely not so different.

Woodger says his reaction to Torkildson's blog has nothing to do with homosexuality but that Torkildson had caused him concern because he would "go off on tangents" in his blogs that would be confusing and sometimes could be considered offensive...Woodger says his school has taught 6,500 students from 58 countries during the past 15 years. Most of them, he says, are at basic levels of English and are not ready for the more complicated concepts such as homophones.

Er, so yeah. It had nothing to do with homosexuality, except it has something to do with tangents and being offensive, and they don't teach the concept of homophones to English students because it's so advanced. I'd ask you to hazard a guess what the tangents and "offensive" stuff were in these damned language posts, but you've already probably guessed correctly.

Regardless, if homophones associate a language school with homosexuality, then I guess all of us Homo sapiens are at least a little gay. Right?

Filed Under: clarke woodger, english, homophobia, homophones, language, tim torkildson
Companies: nomen global language center

72 Comments

DailyDirt: Speaking The Language

Say That Again

from the urls-we-dig-up dept

Tue, May 20th 2014 5:00pm — Leigh Beadon

It's not hard to make the case that language is more worthy of study than any other topic — after all, it's virtually impossible to study any other topic without relying on language, and all its built-in assumptions and biases, in the first place. You can learn a lot about people by the words they use, and you can learn a lot about words from the people who use them. Here are a few interesting language facts to consider in whatever language your inner monologue prefers:

Ithkuil is a recently-invented language that was met with enthusiasm from the linguistics community, but got hijacked by some unpleasant people. The creator — an amateur who surprised everyone — was caught completely off-guard when his language was adopted with gusto by a Russian "psychonetics" expert aiming to create a "Slavic superman". [url]
A language map of the USA (with English and Spanish excluded) reveals considerable variety. Among the leading tongues (by state) are two Native American languages, Navajo and Dakota, plus Yupik in Alaska. [url]
At the local level, the US is rich in regional dialects, with Pennsylvania sporting the most diversity. Most states have two or three regional dialects, but the Keystone State has five. [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: dakota, ithkuil, language, linguistics, navajo, pennsylvania, yupik

11 Comments

DailyDirt: It's What You Say AND How You Say It

Say That Again

from the urls-we-dig-up dept

Thu, Mar 6th 2014 5:00pm — Michael Ho

Studying how language can predict behavior is a fascinating field. As communications are increasingly digital, everyone's messages are more easily data mined for all sorts of analysis (ahem, and not all of it is done by the CIA). Marketing folks are looking at how catchy phrases might increase sales -- which may be why you're seeing more headlines like "8 simple ways to ..." and "one simple trick that ..." in ads. Here are just a few other linguistic studies for you to peruse. Also, happy belated National Grammar Day!

Men who use the pronoun "whom" in an online dating profile receive 31% more responses from women. And you probably don't even need to use it correctly... well, unless another study concludes that women are significantly better at grammar usage than men. [url]
Can the language you speak influence your behavior? Speakers of languages with weak future tense grammar (eg. German, Finnish and Estonian) seem to correlate with more future-oriented behaviors such as an increased rate of financial saving, lower rates of smoking and higher rates of exercise, and higher condom usage -- compared to speakers of languages with stronger future tense grammar like English and French. [url]
Four minutes of conversation is about all it takes for a speed dating participant to figure out if there's any real chemistry between a potential couple. Protip: language analysis suggests you might want to sound sympathetic and not ask too many questions. [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: analysis, behavior, big data, communication, data mining, dating, grammar, language, linguistics, predictions, whom

5 Comments

DailyDirt: Technology Is Changing The Way We Talk Because Internetz, LOL

Say That Again

from the urls-we-dig-up dept

Mon, Dec 9th 2013 5:00pm — Michael Ho

Sometimes we can't even agree on the definitions of words, so it's not too hard to find examples of changes in communication technologies altering how we use words. Text messages on phones have made us lazy to spell words correctly or to spell out entire words or common phrases. Language is inherently flexible, and here are just a few ways some parts of our language have changed.

Linguists are officially recognizing that "because" is being used in a new way and have dubbed it the "prepositional-because" when it is followed immediately by a noun. However, the prepositional-because isn't limited to being followed by a noun, it can also be followed by verbs, adjectives, interjections and adverbs. [url]
In case you didn't know, the simple period as a punctuation mark can be used to indicate anger. Yes. It. Can. [url]
The ubiquitous LOL is becoming nearly meaningless -- no one is really laughing (quietly or audibly). LOL can be used sarcastically to indicate something isn't funny or just whenever a person doesn't know what else to say... so it's the new "like, yeah, um, well, really" word. [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: grammar, language, lol, meaning, messaging, prepositional-because, punctuation, text, words

18 Comments

DailyDirt: Can We At Least Agree On The Meanings Of Words?

Say That Again

from the urls-we-dig-up dept

Wed, Aug 28th 2013 5:00pm — Michael Ho

There are all kinds of silly arguments online, but perhaps the most common are arguments over the meanings of words. Some folks like to think that words should have static definitions, and all other usage is incorrect usage. Others don't care about the exact meaning of words, and they're not careful with their word choices... or they just make up new words to fit whatever they're trying to say. Language is funny; it evolves and changes -- and sometimes people are just wrong in how they choose their words. Here are just a few examples of word meanings that hopefully don't set off some crazy semantic arguments.

Grammar nerds (or nazis) are fuming at a new definition of "literally" which actually makes the word into a synonym for "figuratively" -- so now people who previously used "literally" incorrectly can now point to the dictionary and say they're using the word correctly. Three different dictionaries, including Merriam-Webster, have added this informal definition as a way to use the word literally for emphasis or as hyperbole. [url]
In the 1660s, the word terrific meant frightening or horrible, but by the late 1800s, it started to mean excellent or great. The English language actually has several examples of words that have become to mean the opposite of their original definitions. [url]
Words aren't the only components of language that can evolve different meanings -- some punctuation marks have moved beyond their formerly limited roles. The word "slash" is now a new conjunction or conjunctive adverb. It used to be funny how punctuation could change the meaning of words.... [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: conjunction, dictionary, grammar, language, literally, punctuation, semantics, slash, synonym, terrific, words

47 Comments

DailyDirt: Don't Let Your Computers Grow Up To Be Cowboys

Computers

from the urls-we-dig-up dept

Tue, Jul 30th 2013 5:00pm — Michael Ho

Machine learning is a fascinating field that is increasingly becoming a part of more and more aspects of our lives. Not too far in the future, we're supposed to have all kinds of robot helpers -- driving cars for us and doing all the boring jobs. We're not quite there yet, but it may only be a matter of time before it's common for a robot to ask, "Do you want fries with that?" Here are just a few interesting examples of machines learning how to be more like us.

ConceptNet 4, a version of an artificial intelligence developed at MIT, has scored about as well as a human 4 year old on an IQ assessment designed for young children. ConceptNet is programmed to make some common-sense associations, but it doesn't do well on answering "why" questions. At least we've gotten AI past the terrible two's.... [url]
Neural networks are one method for modeling how biological brains and nervous systems work, and these simulations are getting better at mimicking how actual brains learn. Increasingly complex Boltzmann machines are learning to perform activities similarly to how a person would -- and making some of the same mistakes that people do, too. [url]
Programming computers to use our spoken language is difficult because people often use the same word to convey very different meanings, and dictionaries aren't much help to a computer for making clear distinctions between these meanings. Computational models poring over vast amounts of unstructured data on words can generate more computer-friendly dictionaries that map word meanings into groups. The result could be automated paraphrasing programs that can re-state a sentence without completely changing its meaning. [url]
A startup called Vicarious is developing software that tries to process visual information like human brains do. Building a better neural network algorithm could create commercial software that can do things like help diagnose medical images. Expert systems were supposed to replace doctors decades ago, but maybe the technology has finally caught up with the science fiction predictions? [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: algorithms, artificial intelligence, brain simulations, computers, language, machine learning, neural networks, programming, unstructured data
Companies: vicarious

6 Comments

DailyDirt: You Say Tomato, I Say Tomahto

Say That Again

from the urls-we-dig-up dept

Wed, Jun 26th 2013 5:00pm — Michael Ho

Dead languages don't change and evolve. It's the languages that people speak the most that develop new words and new dialects. In the past, it's been difficult to track the evolution of language, but with more and more ~~wiretapped phonecalls~~ digital voice recordings available for analysis, linguists are in a better position to study how languages are changing. Here are just a few interesting links on language dialects.

Joshua Katz used linguistic data from Bert Vaux's dialect survey to generate interactive maps of how people speak across the continental US. What is your generic term for a sweetened carbonated beverage? [url]
Phonemica is a project to record the thousands of different Chinese dialects in order to preserve the richness of the language for future generations. It's run by volunteers who want to collect spoken stories, and it was started with an Indiegogo fundraising campaign. [url]
There are several barriers that prevent various English dialects from becoming their own languages. Modern literacy and the increasing global mobility of people make it harder and harder for new languages to split off and develop. [url]

If you'd like to read more awesome and interesting stuff, check out this unrelated (but not entirely random!) Techdirt post via StumbleUpon.

Filed Under: chinese, dialects, english, language, linguists, literacy, mandarin, phonemica, speaking, voices
Companies: indiegogo

14 Comments

Older Stories >>

Follow Techdirt

Essential Reading

The Techdirt Greenhouse

Read the latest posts:

read all »

Techdirt Deals

Report this ad | Hide Techdirt ads

Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Older Stuff

Thursday
13:33	Former Employees Say Mossad Members Dropped By NSO Officers To Run Off-The-Books Phone Hacks (2)
12:01	No, Creating An NFT Of The Video Of A Horrific Shooting Will Not Get It Removed From The Internet (18)
10:49	San Francisco Cops Are Running Rape Victims' DNA Through Criminal Databases Because What Even The Fuck (18)
10:44	Daily Deal: The Complete 2022 Java Coder Bundle (0)
09:31	As Expected, Trump's Social Network Is Rapidly Banning Users It Doesn't Like, Without Telling Them Why (44)
06:30	Comcast Continues To Bleed Olympics Viewers After Years Of Bumbling (19)
Wednesday
20:42	Apple Finally Defeats Dumb Diverse Emoji Lawsuit One Year Later (6)
15:39	Clearview Pitch Deck Says It's Aiming For A 100 Billion Image Database, Restarting Sales To The Private Sector (10)
13:41	Peloton Outage Prevents Customers From Using $2,500 Exercise Bikes (16)
12:09	The GOP Knows That The Dem's Antitrust Efforts Have A Content Moderation Trojan Horse; Why Don't The Dems? (16)
10:51	Hertz Ordered To Tell Court How Many Thousands Of Renters It Falsely Accuses Of Theft Every Year (24)
09:21	Even As Trump Relies On Section 230 For Truth Social, He's Claiming In Lawsuits That It's Unconstitutional (34)
06:16	Medical, Home Alarm Industries Warn Of Major Outages As AT&T Shuts Down 3G Network (25)
Tuesday
20:37	Video Game History Foundation: Nintendo Actions 'Actively Destructive To Video Game History' (29)
15:35	Massachusetts Court Says No Expectation Of Privacy In Social Media Posts Unwittingly Shared With An Undercover Cop (17)
13:30	Techdirt Podcast Episode 312: Regulating The Internet (2)
12:03	US Copyright Office Gets It Right (Again): AI-Generated Works Do Not Get A Copyright Monopoly (60)
10:42	LA Sheriff Threatens To 'Subject' City Council To 'Defamation Law' If They Won't Stop Calling His Deputies 'Gang Members' (20)
10:37	Daily Deal: codeSpark Academy Sibling Bundle (0)
09:25	Trump's Truth Social Bakes Section 230 Directly Into Its Terms, So Apparently Trump Now Likes Section 230 (128)
06:22	15 Years Late, The FCC Cracks Down On Broadband Apartment Monopolies (31)
Sunday
12:05	Funniest/Most Insightful Comments Of The Week At Techdirt (11)
Saturday
12:00	This Week In Techdirt History: February 13th - 19th (1)
Friday
19:39	Letter From High-Ranking FBI Lawyer Tells Prosecutors How To Avoid Court Scrutiny Of Firearms Analysis Junk Science (25)
15:52	Nintendo Is Beginning To Look Like The Disney Of The Video Game Industry (44)
13:49	Seattle Public Radio Station Manages To Partially Brick Area Mazdas Using Nothing More Than Some Image Files (44)
12:13	Thankfully, Jay Inslee's Unconstitutional Bill To Criminalize Political Speech Dies In The Washington Senate (8)
10:52	How Our Convoluted Copyright Regime Explains Why Spotify Chose Joe Rogan Over Neil Young (136)
10:47	Daily Deal: The Complete Blocs Website Builder Bundle (0)
09:33	Arizona Prosecutor Who Brought Bogus Gang Charges Against Protesters Files Ridiculous Defamation Suit Against Her Boss (12)

The Scunthorpe Problem, And Why AI Is Not A Silver Bullet For Moderating Platform Content At Scale

from the what's-in-a-name dept

How May 35th Freedoms Have Blossomed With China's Martian Language

from the say-what? dept

Obama Administration Learns: If You Redefine Every Word In The Dictionary, You Can Get Away With Just About Anything

from the words-mean-something dept

Language School's Blogger Fired For Writing A Post On Homophones; Director Fears Association With 'Gay Sex'

from the not-a-hoax dept

DailyDirt: Speaking The Language

from the urls-we-dig-up dept

DailyDirt: It's What You Say AND How You Say It

from the urls-we-dig-up dept

DailyDirt: Technology Is Changing The Way We Talk Because Internetz, LOL

from the urls-we-dig-up dept

DailyDirt: Can We At Least Agree On The Meanings Of Words?

from the urls-we-dig-up dept

DailyDirt: Don't Let Your Computers Grow Up To Be Cowboys

from the urls-we-dig-up dept

DailyDirt: You Say Tomato, I Say Tomahto

from the urls-we-dig-up dept

The Techdirt Greenhouse

Thursday

Wednesday

Tuesday

Sunday

Saturday

Friday

More

Tools & Services

Company

Contact

More

from the what's-in-a-name dept

from the say-what? dept

from the words-mean-something dept

from the not-a-hoax dept

from the urls-we-dig-up dept

from the urls-we-dig-up dept

from the urls-we-dig-up dept

from the urls-we-dig-up dept

from the urls-we-dig-up dept

from the urls-we-dig-up dept

Techdirt Daily Newsletter

The Techdirt Greenhouse

Tools & Services

Company

Contact

More