From Uzbek To Klingon, The Machine Cracks The Code

from the statistical-machine-translation dept

Move over Babelfish, there's a new translation technology in town. The NY Times has discovered that in just the past few years there have been some fairly impressive advancements in statistical machine translation. Traditional machine translation systems involve a bilingual programmer who can help map the languages, but with statistical machine translation, you just feed the system identical texts from multiple languages and let the machine figure it out. It sounds like, for now, the technology works in some cases, and is probably most useful in developing fast translation systems that might miss more nuanced language issues. Some of those who believe in the traditional methods scoff at the idea that the statistical method will ever be useful for anything more than very basic translations. However, with the rate of improvement over the past few years, it wouldn't be surprising to see statistical machine translation systems improve even more in the near future.
Hide this

Thank you for reading this Techdirt post. With so many things competing for everyone’s attention these days, we really appreciate you giving us your time. We work hard every day to put quality content out there for our community.

Techdirt is one of the few remaining truly independent media outlets. We do not have a giant corporation behind us, and we rely heavily on our community to support us, in an age when advertisers are increasingly uninterested in sponsoring small, independent sites — especially a site like ours that is unwilling to pull punches in its reporting and analysis.

While other websites have resorted to paywalls, registration requirements, and increasingly annoying/intrusive advertising, we have always kept Techdirt open and available to anyone. But in order to continue doing so, we need your support. We offer a variety of ways for our readers to support us, from direct donations to special subscriptions and cool merchandise — and every little bit helps. Thank you.

–The Techdirt Team


Reader Comments

Subscribe: RSS

View by: Time | Thread


  1. identicon
    Anonymous Coward, 31 Jul 2003 @ 1:38am

    Oh please

    There was one passage in there that made me lose all credibility --

    "If we can learn how to translate even Klingon into English, then most human languages are easy by comparison," he said.

    Klingon was invented by English-speaking sci-fi fans, with their narrow imagination. It does not represent another culture.

    The fact remains that languages have concepts that are not understood in other languages. As with the cliche about Inuits having 20 words for snow, the Japanese have about 20 ways to refer to "self", as well as nuanced honorifics that simply do not translate into English. American Sign Language with its parallel-speaking forms have no equivalents in written or spoken English. Even in its simple form, a literal translation of American sign language sentence would go something like "Past night movie me see wow me."

    Since ASL is so conceptually different, a new writing system called Signwriting is also evolving.

    http://www.at-links.gc.ca/guide/l3i-E.asp




    link to this | view in thread ]

  2. identicon
    Jack Ass, 31 Jul 2003 @ 6:58am

    Re: Oh please

    Of course Klingon represents a different culture. No one ever said it had to be a real culture. It's actually a pretty interesting little language. Kind of like a cross between Hebrew, Russian, and Spanish with thick pronunciation keys. It's pretty developed for a fictional language.

    link to this | view in thread ]

  3. identicon
    Anonymous Coward, 31 Jul 2003 @ 8:14am

    Re: Oh please

    So, a bunch of sci-fi dorks took a bunch of foreign-sounding sounds and made up a "language". Only within a Western context, too. Pretty lame.

    link to this | view in thread ]

  4. identicon
    TAD, 31 Jul 2003 @ 8:55am

    Nuances Don't Matter Much

    If you're translating technical or scientific documents, language nuances won't matter too much. Japanese honorifics probably won't be used much when discussing particle physics or nanotechnology. I doubt that in discussing object oriented programming, the 20 words for snow used by the inuits will be encountered.

    Nuances obviously matter when translating literary fiction or popular culture artifacs. I doubt this translation engine would handle most manga well. But it should work at least as well as BabelFish and perhaps, eventually, much better.

    link to this | view in thread ]

  5. identicon
    Anonymous Coward, 31 Jul 2003 @ 9:09am

    Re: Nuances Don't Matter Much

    >If you're translating technical or scientific documents, language nuances won't matter too much. Japanese honorifics probably won't be used much when discussing particle physics or nanotechnology. < br>
    I've done some technical Japanese-to-English translation work before. The problem is in the level of ambiguity allowed in Japanese language, which can make an English translation of a scientific paper sound vague, contradictory, or stupid. Often, one has to talk to the author directly to nail down an exact translation, though he may not be happy with the way it's expressed.


    >I doubt that in discussing object oriented programming, the 20 words for snow used by the inuits will be encountered.

    Depends -- if someone writes an Inuit manual and decides to use the 20 words for "snow" to illutrate polymorphism, then an English speaker is in trouble.


    >But it should work at least as well as BabelFish and perhaps, eventually, much better.

    BabelFish won't be too hard to beat.

    link to this | view in thread ]

  6. identicon
    Newob, 31 Jul 2003 @ 2:59pm

    20 words for snow

    It doesn't matter how many words for snow the Inuit have any more than it matters how many words for music (aka "genres") Americans/Westerners have. There is often no literal meaning in our own language for words that we use on a regular basis. How do we figure out what they are supposed to mean? Usually we don't consult a dictionary in the middle of conversation. Statistical machine translation may be as good a method of translation as the ones we employ on a daily basis. While there may be more precise idioms for a particular referent in one dialect than in another, words that don't translate directly can be ported over into a reasonably flexible language; programs that can aid in translation may hasten the merging of tody's languages into a universal pan-human language.

    link to this | view in thread ]

  7. identicon
    Anonymous Coward, 31 Jul 2003 @ 10:37pm

    Re: 20 words for snow

    >It doesn't matter how many words for snow the Inuit have any more than it matters how many words for music (aka "genres") Americans/Westerners have....There is often no literal meaning in our own language for words that we use on a regular basis.< br>
    That sums up a street hustler's anti-intellectual attitude, but in the world of serious translation, such attitudes do not fly. Translations have important legal, scientific, and medical consequences. The wrong instructions given to a patient can kill him. The wrong wording for treaties can lead to wars. The wrong instructions for operating a crane can kill lots of people, and this has happened before.

    >words that don't translate directly can be ported over into a reasonably flexible language; programs that can aid in translation may hasten the merging of tody's languages into a universal pan-human language.

    People have tried this with Esperanto or other Klingon-league languages, but they have no real world use.



    link to this | view in thread ]

  8. identicon
    Anonymous Coward, 1 Aug 2003 @ 4:56am

    Re: Nuances Don't Matter Much

    Technical Japanese translation into English?

    Between the specialized kanji and the fscking loanwords, it's a pretty hopless...

    Hell, even the movie "roadshowes" are the last on the planet to be released (of course that's probably just the Japanese movie industry cashing their monopoli in)... some of the alternative titles are funny though...

    link to this | view in thread ]

  9. identicon
    Anonymous Coward, 1 Aug 2003 @ 4:58am

    Re: 20 words for snow

    [...] Klingon-league languages, but they have no real world use [...]

    except for insulting newbies on-line...

    ...and letting the police try to find a translator who can read you your rights...

    link to this | view in thread ]

  10. identicon
    Kevin Hendzel, 12 Aug 2003 @ 5:13pm

    Re: Nuances Don't Matter Much

    Actually, "language nuances" matter a very great deal in translating scientific and technical documents. For example, a literal (mis)translation of a common phrase in Russian electronics reads, "the scheme is under reverse link." This is very commonly encountered in MT systems, even statistical ones. You might wonder what the "scheme" is, or what a "reverse link" means. In fact, as anybody who knows electronics and Russian could tell you, the sentence should read, "Feedback was applied to the circuit."

    Even technical terms have hundreds of meanings ("skhema" in Russian has about 3 dozen) all of which depend on subtle clues in the text. Even skilled humans, who are pre-wired for ambiguity, have difficulty sorting them out.

    link to this | view in thread ]

  11. identicon
    Kevin Hendzel, 16 Aug 2003 @ 12:48pm

    Re: Oh please

    No, "a bunch of sci-fi dorks" did not "take a bunch of foreign-sounding sounds and made up a language." Where does this stuff come from?

    Klingon was developed by a professional linguist, Mark Orkand, who did his graduate work in linguistics at Berkeley (his undergrad degree is from Georgetown), on contract to Paramount Pictures for Star Trek III. He is certainly not a "sci-fi dork," but he is most certainly a very accomplished professional linguist. Although the language is technically "invented," it has specific and fairly complex rules of grammar, syntax and morphology. There is even an invented culture to support specific terminology, complex multiple meanings, etc. so the semantics are well developed.



    link to this | view in thread ]


Follow Techdirt
Essential Reading
Techdirt Deals
Report this ad  |  Hide Techdirt ads
Techdirt Insider Discord

The latest chatter on the Techdirt Insider Discord channel...

Loading...
Recent Stories

This site, like most other sites on the web, uses cookies. For more information, see our privacy policy. Got it
Close

Email This

This feature is only available to registered users. Register or sign in to use it.