Tuesday, January 06, 2009

Google's Approach To Language Translation

Machine Translations Have Been Pretty Iffy
In a former life I worked as a translator and an interpreter. That experience has made me quite skeptical of machine translated text. In the past, translations I've seen from some software packages have ranged from "OK" to outright terrible. In the past when someone needed translations done and were contemplating purchasing software to do the work, I always strongly advised against machine translations in favor of human ones.  The cost may have been higher, but it was certainly offset by the quality of the work.

To be sure, human translations are not always perfect. There are subtle nuances which can be overlooked when going from one language to another. If the translator is not familiar with technical jargon, mistakes are also quite possible. During my time as a linguist I embarrassed myself on more than one occasion by completely botching a phrase. But, on the whole, humans tend to pick up on things and refer to dictionaries where software could not, no matter how well the package was written.

I think the biggest failing of those software packages, though, was the lack of memory and processing power available. Even today's powerful home computers are not powerful enough to run the serious computations required for the abstract programming a good translation package requires.

Google Has The Tools To Pull Off Better Translations
This is where the folks at Google can shine in this area. A recent posting in the Google Research Blog by Shankar Kumar and Wolfgang Macherey touches upon this very subject. Google has a good approach to language translation. First, they look at translation as a search problem. This makes a lot of sense because much of the same thought process and algorithms which goes into selecting relevant web sites based on key words goes into selecting the appropriate words and phrases when moving from one language to another. Second, Google has the processing power to run very complex calculations. They have the power to run routines which would be impossible for an individual PC to handle.

Basically, the Google method pulls a large number of possible translations for a given phrase and narrows them down to the "safest" one which will cause the "least amount of damage. " There is a link on the blog post which leads to the details of their method. I won't even pretend to understand most of the calculations, but the basic methodology appears to be similar to the way a person might work though a translation - especially if there is time to mull over and check things over once or twice. I think this makes their translations far superior than  those previously possible by machines.

Google's Method Seems To Work Well
I have used Google's Language Tools to help me quickly go through articles in German and Czech (2 languages I at least used to know very well). From what I can see, their methodology works quite nicely. I've at least come to trust them enough to translate items from English to French for our Intranet when the items are shown in our Montreal, Canada location. I'm no Francophile, but from what I can tell at least the basic points are made in the translation.

I love the way Kumar and Macherey describe getting the best translation as doing "the least amount of damage." As I mentioned above, I've caused little bit of damage, mostly to my own ego, with some of my translation faux pas. Perhaps I'll regale a story or two on another day.

