Deciphering Google's language translation

Here's a fascinating essay/interview about language translation this morning from Bill Softky, chief algorithmist at an Internet advertising startup. The interview is with Google's Franz Och. Among the questions addressed:

How did Google manage to beat mighty IBM in this language-translation contest conducted by NIST?

How does Google manage such a feat when its engineers neither speak nor understand the languages being translated?

Why is Google's best still not good enough? ... Or is it?

And why are computers better at chess than language translation? (OK, even a trade-press journalist can handle that one.)

Softky writes:

Ever since the the Second World War there have been two competing approaches to automatic translation: expert rules vs. statistical deciphering.

Expert-rule buffs have tried to automate the grammar-school approach of diagramming sentences (using modifiers, phrases, and clauses): for example, "I visited (the house next to (the park) )." But like other optimistic software efforts, the exact rules foundered on the ambiguities of real human languages. (Think not? Try explaining this sentence: "Time flies like an arrow, but fruit flies like a banana.")

The competing statistical approach began with cryptography: treat the second language as an unknown code, and use statistical cues to find a mathematical formula to decode it, like the Allies did with Hitler's famous Enigma code. While those early "decipering" efforts foundered on a lack of computing power, they have been resurrected in the "Statistical Machine Translation" approach used by Google, which eschews strict rules in favor of noticing the statistical correlations between "white house" and "casa blanca." Statistics deals with ambiguity better than rules do, it turns out.

It's a good read.

And if you're looking for related material, here's a Network World story about IBM's contention (NIST results aside) that language translation will be one of its five major innovation victories over the next five years.

And a critique of the widely accepted methodology, known as BLEU, that's used for measuring the effectiveness of translation software.

Welcome regulars and passersby. Here are a few more recent Buzzblog items. And, if you'd like to receive Buzzblog via e-mail newsletter, here's where to sign up.

A quarter of the under-30 crowd now going cell-phone-only.

State bans texting while driving.

Microsoft claims Linux infringes upon 235 of its patents

Firefox isn't quite that popular, but ...

Sprint Nextel vs. 41 schools and non-profits.

"Do we really need a security industry?"

"Ideal Digital Kitchen" looks like a recipe for overkill.

BlackBerry owes this guy a girlfriend.

The Onion tees up Vista ... hilarity fails to ensue.

Join the Network World communities on Facebook and LinkedIn to comment on topics that are top of mind.

Copyright © 2007 IDG Communications, Inc.

IT Salary Survey: The results are in