In 2014, Brian Krebs made an interesting discovery.

The security writer and analyst found that when he keyed “lorem ipsum” into Google Translate, he got a number of different translations in Chinese. And they made no sense.

The machine translation (MT) system “auto-detected” the source language as Latin and translated the text, often used as a placeholder in design, to “China.” When he capitalized the first letter of each word, the translation became NATO, the acronym for the North Atlantic Treaty Organization.

Krebs later found that changing capitalization and the arrangement of the words would create even more bizarre translations. “lorem lorem,” was deciphered as “China’s Internet.” “Ipsum ipsum” came out as “it is,” while “ipsum ipsum” was translated to the word “exam.”

Some believed the odd translations were the work of a techie at Google aiming to create a secret language only recognizable to a select few. Others simply saw the glitch as an entertaining way to kill free time.

Mistakes like these underscore the fact that free, web-based MT tools still aren’t quite at their full potential for professional users. In the case of Google, this is definitely not the first time the company has provided a faulty translation.

But beyond the risk of inaccurate translations lies the question of where your data goes when you use Google Translate. Many cost-conscious businesses have played around with the idea of translating documents via Google. And many don’t know what they risk if they choose not to hire a translation company and rely on the search engine.

Where Does Your Data Go?

Google Translate is a fast way to decipher documents, but it can be more than costly for anyone concerned with confidentiality and security information, including HR departments, or those dealing with any type of legally sensitive material.

Using the service could violate non-disclosure agreements and potentially result in heavy fines and a loss of trust among clients. In the business world, confidentiality is everything; violating this privacy puts both companies and clients at risk.

Translating with free, online MT systems over unsecured internet connections can pose a security risk and potentially compromise data.

Another risk, which might not come to mind right away, is what Google or other translation tools could do with your data. If you’ve ever read over a site’s “terms of service,” you’ll likely find that once you’ve submitted information to an entity, they have more leeway than you think.

Take an excerpt from Google’s, for instance:

“Some of our Services allow you to upload, submit, store, send or receive content. You retain ownership of any intellectual property rights that you hold in that content. In short, what belongs to you stays yours.

When you upload, submit, store, send or receive content to or through our Services, you give Google (and those we work with) a worldwide license to use, host, store, reproduce, modify, create derivative works (such as those resulting from translations, adaptations or other changes we make so that your content works better with our Services), communicate, publish, publicly perform, publicly display and distribute such content.”

In other words, Google has free range when it comes to your information. This raises the question: Is your “confidential” information still confidential when you use Google Translate? Based on what's described in the company's terms and services, it doesn’t seem like it is.

With that in mind, how could a scrupulous attorney use Google for eDiscovery? How could a competent HR director enter employee-sensitive information into the system?

Advancements Still Lagging

Last year Google announced it was overhauling its translation system, using the up-and-coming neural machine translation (NMT) to improve accuracy. The company said they saw a drastic increase in translation accuracy with the new technology.

But the improvements aren’t all they’re made out to be. When the news broke, many headlines included phrases that described Google’s new system as “approaching human accuracy,” which is still far from the truth.

Businesses need to realize that MT should only be used for getting the “gist” of documents. A strong translation is made up of a number of different components, including humans and machines working together to ensure accuracy.

Just because “Becky down the hall” speaks Spanish, doesn’t mean she’s in any position to interpret a stack of legal documents the firm needs to analyze.

OctaveMT_Banner.jpg

TAGS
Global News Machine Translation Octave

Jake Schild

Jake Schild

A former newspaper reporter and native Minnesotan, Jake Schild is a staff writer in the marketing department at ULG.

Weekly Digest