Tag Archives: Google Translate

Recent Advances in Google Translate



Advances in machine learning (ML) have driven improvements to automated translation, including the GNMT neural translation model introduced in Translate in 2016, that have enabled great improvements to the quality of translation for over 100 languages. Nevertheless, state-of-the-art systems lag significantly behind human performance in all but the most specific translation tasks. And while the research community has developed techniques that are successful for high-resource languages like Spanish and German, for which there exist copious amounts of training data, performance on low-resource languages, like Yoruba or Malayalam, still leaves much to be desired. Many techniques have demonstrated significant gains for low-resource languages in controlled research settings (e.g., the WMT Evaluation Campaign), however these results on smaller, publicly available datasets may not easily transition to large, web-crawled datasets.

In this post, we share some recent progress we have made in translation quality for supported languages, especially for those that are low-resource, by synthesizing and expanding a variety of recent advances, and demonstrate how they can be applied at scale to noisy, web-mined data. These techniques span improvements to model architecture and training, improved treatment of noise in datasets, increased multilingual transfer learning through M4 modeling, and use of monolingual data. The quality improvements, which averaged +5 BLEU score over all 100+ languages, are visualized below.
BLEU score of Google Translate models since shortly after its inception in 2006. The improvements since the implementation of the new techniques over the last year are highlighted at the end of the animation.
Advances for Both High- and Low-Resource Languages
Hybrid Model Architecture: Four years ago we introduced the RNN-based GNMT model, which yielded large quality improvements and enabled Translate to cover many more languages. Following our work decoupling different aspects of model performance, we have replaced the original GNMT system, instead training models with a transformer encoder and an RNN decoder, implemented in Lingvo (a TensorFlow framework). Transformer models have been demonstrated to be generally more effective at machine translation than RNN models, but our work suggested that most of these quality gains were from the transformer encoder, and that the transformer decoder was not significantly better than the RNN decoder. Since the RNN decoder is much faster at inference time, we applied a variety of optimizations before coupling it with the transformer encoder. The resulting hybrid models are higher-quality, more stable in training, and exhibit lower latency.

Web Crawl: Neural Machine Translation (NMT) models are trained using examples of translated sentences and documents, which are typically collected from the public web. Compared to phrase-based machine translation, NMT has been found to be more sensitive to data quality. As such, we replaced the previous data collection system with a new data miner that focuses more on precision than recall, which allows the collection of higher quality training data from the public web. Additionally, we switched the web crawler from a dictionary-based model to an embedding based model for 14 large language pairs, which increased the number of sentences collected by an average of 29 percent, without loss of precision.

Modeling Data Noise: Data with significant noise is not only redundant but also lowers the quality of models trained on it. In order to address data noise, we used our results on denoising NMT training to assign a score to every training example using preliminary models trained on noisy data and fine-tuned on clean data. We then treat training as a curriculum learning problem — the models start out training on all data, and then gradually train on smaller and cleaner subsets.

Advances That Benefited Low-Resource Languages in Particular
Back-Translation: Widely adopted in state-of-the-art machine translation systems, back-translation is especially helpful for low-resource languages, where parallel data is scarce. This technique augments parallel training data (where each sentence in one language is paired with its translation) with synthetic parallel data, where the sentences in one language are written by a human, but their translations have been generated by a neural translation model. By incorporating back-translation into Google Translate, we can make use of the more abundant monolingual text data for low-resource languages on the web for training our models. This is especially helpful in increasing fluency of model output, which is an area in which low-resource translation models underperform.

M4 Modeling: A technique that has been especially helpful for low-resource languages has been M4, which uses a single, giant model to translate between all languages and English. This allows for transfer learning at a massive scale. As an example, a lower-resource language like Yiddish has the benefit of co-training with a wide array of other related Germanic languages (e.g., German, Dutch, Danish, etc.), as well as almost a hundred other languages that may not share a known linguistic connection, but may provide useful signal to the model.

Judging Translation Quality
A popular metric for automatic quality evaluation of machine translation systems is the BLEU score, which is based on the similarity between a system’s translation and reference translations that were generated by people. With these latest updates, we see an average BLEU gain of +5 points over the previous GNMT models, with the 50 lowest-resource languages seeing an average gain of +7 BLEU. This improvement is comparable to the gain observed four years ago when transitioning from phrase-based translation to NMT.

Although BLEU score is a well-known approximate measure, it is known to have various pitfalls for systems that are already high-quality. For instance, several works have demonstrated how the BLEU score can be biased by translationese effects on the source side or target side, a phenomenon where translated text can sound awkward, containing attributes (like word order) from the source language. For this reason, we performed human side-by-side evaluations on all new models, which confirmed the gains in BLEU.

In addition to general quality improvements, the new models show increased robustness to machine translation hallucination, a phenomenon in which models produce strange “translations” when given nonsense input. This is a common problem for models that have been trained on small amounts of data, and affects many low-resource languages. For example, when given the string of Telugu characters “ష ష ష ష ష ష ష ష ష ష ష ష ష ష ష”, the old model produced the nonsensical output “Shenzhen Shenzhen Shaw International Airport (SSH)”, seemingly trying to make sense of the sounds, whereas the new model correctly learns to transliterate this as “Sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh”.

Conclusion
Although these are impressive strides forward for a machine, one must remember that, especially for low-resource languages, automatic translation quality is far from perfect. These models still fall prey to typical machine translation errors, including poor performance on particular genres of subject matter (“domains”), conflating different dialects of a language, producing overly literal translations, and poor performance on informal and spoken language.

Nonetheless, with this update, we are proud to provide automatic translations that are relatively coherent, even for the lowest-resource of the 108 supported languages. We are grateful for the research that has enabled this from the active community of machine translation researchers in academia and industry.

Acknowledgements
This effort is built on contributions from Tao Yu, Ali Dabirmoghaddam, Klaus Macherey, Pidong Wang, Ye Tian, Jeff Klingner, Jumpei Takeuchi, Yuichiro Sawai, Hideto Kazawa, Apu Shah, Manisha Jain, Keith Stevens, Fangxiaoyu Feng, Chao Tian, John Richardson, Rajat Tibrewal, Orhan Firat, Mia Chen, Ankur Bapna, Naveen Arivazhagan, Dmitry Lepikhin, Wei Wang, Wolfgang Macherey, Katrin Tomanek, Qin Gao, Mengmeng Niu, and Macduff Hughes.

Source: Google AI Blog


A Scalable Approach to Reducing Gender Bias in Google Translate



Machine learning (ML) models for language translation can be skewed by societal biases reflected in their training data. One such example, gender bias, often becomes more apparent when translating between a gender-specific language and one that is less-so. For instance, Google Translate historically translated the Turkish equivalent of “He/she is a doctor” into the masculine form, and the Turkish equivalent of “He/she is a nurse” into the feminine form.

In line with Google’s AI Principles, which emphasizes the importance to avoid creating or reinforcing unfair biases, in December 2018 we announced gender-specific translations. This feature in Google Translate provides options for both feminine and masculine translations when translating queries that are gender-neutral in the source language. For this work, we developed a three-step approach, which involved detecting gender-neutral queries, generating gender-specific translations and checking for accuracy. We used this approach to enable gender-specific translations for phrases and sentences in Turkish-to-English and have now expanded this approach for English-to-Spanish translations, the most popular language-pair in Google Translate.
Left: Early example of the translation of a gender neutral English phrase to a gender-specific Spanish counterpart. In this case, only a biased example is given. Right: The new Translate provides both a feminine and a masculine translation option.
But as this approach was applied to more languages, it became apparent that there were issues in scaling. Specifically, generating masculine and feminine translations independently using a neural machine translation (NMT) system resulted in low recall, failing to show gender-specific translations for up to 40% of eligible queries, because the two translations often weren’t exactly equivalent, except for gender-related phenomena. Additionally, building a classifier to detect gender-neutrality for each source language was data intensive.

Today, along with the release of the new English-to-Spanish gender-specific translations, we announce an improved approach that uses a dramatically different paradigm to address gender bias by rewriting or post-editing the initial translation. This approach is more scalable, especially when translating from gender-neutral languages to English, since it does not require a gender-neutrality detector. Using this approach we have expanded gender-specific translations to include Finnish, Hungarian, and Persian-to-English. We have also replaced the previous Turkish-to-English system using the new rewriting-based method.

Rewriting-Based Gender-Specific Translation
The first step in the rewriting-based method is to generate the initial translation. The translation is then reviewed to identify instances where a gender-neutral source phrase yielded a gender-specific translation. If that is the case, we apply a sentence-level rewriter to generate an alternative gendered translation. Finally, both the initial and the rewritten translations are reviewed to ensure that the only difference is the gender.
Top: The original approach. Bottom: The new rewriting-based approach.
Rewriter
Building a rewriter involved generating millions of training examples composed of pairs of phrases, each of which included both masculine and feminine translations. Because such data was not readily available, we generated a new dataset for this purpose. Starting with a large monolingual dataset, we programmatically generated candidate rewrites by swapping gendered pronouns from masculine to feminine, or vice versa. Since there can be multiple valid candidates, depending on the context — for example the feminine pronoun “her” can map to either “him” or “his” and the masculine pronoun “his” can map to “her” or “hers” — a mechanism was needed for choosing the correct one. To resolve this tie, one can either use a syntactic parser or a language model. Because a syntactic parsing model would require training with labeled datasets in each language, it is less scalable than a language model, which can learn in an unsupervised fashion. So, we select the best candidate using an in-house language model trained on millions of English sentences.
This table demonstrates the data generation process. We start with the input, generate candidates and finally break the tie using a language model.
The above data generation process results in training data that goes from a masculine input to a feminine output and vice versa. We merge data from both these directions and train a one-layer transformer-based sequence-to-sequence model on it. We introduce punctuation and casing variants in the training data to increase the model robustness. Our final model can reliably produce the requested masculine or feminine rewrites 99% of the time.

Evaluation
We also devised a new method of evaluation, named bias reduction, which measures the relative reduction of bias between the new translation system and the existing system. Here “bias” is defined as making a gender choice in the translation that is unspecified in the source. For example, if the current system is biased 90% of the time and the new system is biased 45% of the time, this results in a 50% relative bias reduction. Using this metric, the new approach results in a bias reduction of ≥90% for translations from Hungarian, Finnish and Persian-to-English. The bias reduction of the existing Turkish-to-English system improved from 60% to 95% with the new approach. Our system triggers gender-specific translations with an average precision of 97% (i.e., when we decide to show gender-specific translations we’re right 97% of the time).
We’ve made significant progress since our initial launch by increasing the quality of gender-specific translations and also expanding it to 4 more language-pairs. We are committed to further addressing gender bias in Google Translate and plan to extend this work to document-level translation, as well.

Acknowledgements:
This effort has been successful thanks to the hard work of many people, including, but not limited to the following (in alphabetical order of last name): Anja Austermann, Jennifer Choi‎, Hossein Emami, Rick Genter, Megan Hancock, Mikio Hirabayashi‎, Macduff Hughes, Tolga Kayadelen, Mira Keskinen, Michelle Linch, Klaus Macherey‎, Gergely Morvay, Tetsuji Nakagawa, Thom Nelson, Mengmeng Niu, Jennimaria Palomaki‎, Alex Rudnick, Apu Shah, Jason Smith, Romina Stella, Vilis Urban, Colin Young, Angie Whitnah, Pendar Yousefi, Tao Yu

Source: Google AI Blog


Now you can transcribe speech with Google Translate

Recently, I was at my friend’s family gathering, where her grandmother told a story from her childhood. I could see that she was excited to share it with everyone but there was a problem—she told the story in Spanish, a language that I don’t understand. I pulled out Google Translate to transcribe the speech as it was happening. As she was telling the story, the English translation appeared on my phone so that I could follow along—it fostered a moment of understanding that would have otherwise been lost. And now anyone can do this—starting today, you can use the Google Translate Android app to transcribe foreign language speech as it’s happening.

Transcribe will be rolling out in the next few days with support for any combination of the following eight languages: English, French, German, Hindi, Portuguese, Russian, Spanish and Thai. 

Ongoing translated transcript


To try the transcribe feature, go to your Translate app on Android, and make sure you have the latest updates from the Play store. Tap on the “Transcribe” icon from the home screen and select the source and target languages from the language dropdown at the top. You can pause or restart transcription by tapping on the mic icon. You also can see the original transcript, change the text size or choose a dark theme in the settings menu. 

On the left: redesigned home screen, On the right:  change settings for a comfortable read

On the left: redesigned home screen. On the right: how to change the settings for a comfortable read.

We’ll continue to make speech translations available in a variety of situations. Right now, the transcribe feature will work best in a quiet environment with one person speaking at a time. In other situations, the app will still do its best to provide the gist of what's being said. Conversation mode in the app will continue to help you to have a back and forth translated conversation with someone.  

Try it out and give us feedback on how we can be better. 

Source: Translate


Google Translate adds five languages

Millions of people around the world use Google Translate, whether in a verbal conversation, or while navigating a menu or reading a webpage online. Translate learns from existing translations, which are most often found on the web. Languages without a lot of web content have traditionally been challenging to translate, but through advancements in our machine learning technology, coupled with active involvement of the Google Translate Community, we’ve added support for five languages: Kinyarwanda, Odia (Oriya), Tatar, Turkmen and Uyghur. These languages, spoken by more than 75 million people worldwide, are the first languages we’ve added to Google Translate in four years, and expand the capabilities of Google Translate to 108 languages.

Translate supports both text translation and website translation for each of these languages. In addition, Translate supports virtual keyboard input for Kinyarwanda, Tatar and Uyghur. Below you can see our team motto, “Enable everyone, everywhere to understand the world and express themselves across languages,” translated into the five new languages. 

Translate Mission.gif

If you speak any of these languages and are interested in helping, please join the Google Translate Community and improve our translations.

Source: Translate


Google Translate adds five languages

Millions of people around the world use Google Translate, whether in a verbal conversation, or while navigating a menu or reading a webpage online. Translate learns from existing translations, which are most often found on the web. Languages without a lot of web content have traditionally been challenging to translate, but through advancements in our machine learning technology, coupled with active involvement of the Google Translate Community, we’ve added support for five languages: Kinyarwanda, Odia (Oriya), Tatar, Turkmen and Uyghur. These languages, spoken by more than 75 million people worldwide, are the first languages we’ve added to Google Translate in four years, and expand the capabilities of Google Translate to 108 languages.

Translate supports both text translation and website translation for each of these languages. In addition, Translate supports virtual keyboard input for Kinyarwanda, Tatar and Uyghur. Below you can see our team motto, “Enable everyone, everywhere to understand the world and express themselves across languages,” translated into the five new languages. 

Translate Mission.gif

If you speak any of these languages and are interested in helping, please join the Google Translate Community and improve our translations.

Source: Translate


Google Translate improves offline translation

When you’re traveling somewhere without access to the internet or don’t want to use your data plan, you can still use the Google Translate app on Android and iOS when your phone is offline. Offline translation is getting better: now, in 59 languages, offline translation is 12 percent more accurate, with improved word choice, grammar and sentence structure. In some languages like Japanese, Korean, Thai, Polish, and Hindi the quality gain is more than 20 percent. 

translation.png

It can be particularly hard to pronounce and spell words in languages that are written in a script you're not familiar with. To help you in these cases, Translate offers transliteration, which gives an equivalent spelling in the alphabet you're used to. For example, when you translate “hello” to Hindi, you will see “नमस्ते” and “namaste” in the translation card where “namaste” is the transliteration of “नमस्ते.” This is a great tool for learning how to communicate in a different language, and Translate has offline transliteration support for 10 new languages: Arabic, Bengali, Gujrati, Kannada, Marathi, Tamil, Telugu and Urdu.

Transliteration

To try our improved offline translation and transliteration, go to your Translate app on Android or iOS. If you do not have the app, you can download it. Make sure you have the latest updates from the Play or App store. If you’ve used offline translation before, you’ll see a banner on your home screen that will take you to the right place to update your offline files. If not, go to your offline translation settings and tap the arrow next to the language name to download that language. Now you’ll be ready to translate text whether you’re online or not.


Source: Translate


Google Translate improves offline translation

When you’re traveling somewhere without access to the internet or don’t want to use your data plan, you can still use the Google Translate app on Android and iOS when your phone is offline. Offline translation is getting better: now, in 59 languages, offline translation is 12 percent more accurate, with improved word choice, grammar and sentence structure. In some languages like Japanese, Korean, Thai, Polish, and Hindi the quality gain is more than 20 percent. 

translation.png

It can be particularly hard to pronounce and spell words in languages that are written in a script you're not familiar with. To help you in these cases, Translate offers transliteration, which gives an equivalent spelling in the alphabet you're used to. For example, when you translate “hello” to Hindi, you will see “नमस्ते” and “namaste” in the translation card where “namaste” is the transliteration of “नमस्ते.” This is a great tool for learning how to communicate in a different language, and Translate has offline transliteration support for 10 new languages: Arabic, Bengali, Gujarati, Kannada, Marathi, Tamil, Telugu and Urdu.

Transliteration

To try our improved offline translation and transliteration, go to your Translate app on Android or iOS. If you do not have the app, you can download it. Make sure you have the latest updates from the Play or App store. If you’ve used offline translation before, you’ll see a banner on your home screen that will take you to the right place to update your offline files. If not, go to your offline translation settings and tap the arrow next to the language name to download that language. Now you’ll be ready to translate text whether you’re online or not.


Source: Translate


Speak easy while traveling with Google Maps

Google Maps has made travel easier than ever before. You can scout out a neighborhood before booking a hotel, get directions on the go and even see what nearby restaurants the locals recommend thanks to auto-translated reviews.

But when you're in a foreign country where you don't speak or read the language, getting around can still be difficult -- especially when you need to speak with someone. Think about that anxiety-inducing time you tried to talk to a taxi driver, or that moment you tried to casually ask a passerby for directions.

To help, we're bringing Google Maps and Google Translate closer together. This month, we’re adding a new translator feature that enables your phone to speak out a place's name and address in the local lingo. Simply tap the new speaker button next to the place name or address, and Google Maps will say it out loud, making your next trip that much simpler. And when you want to have a deeper conversation, Google Maps will quickly link you to the Google Translate app.

Google_SpeakEasy_GIF_191018.gif

This text-to-speech technology automatically detects what language your phone is using to determine which places you might need help translating. For instance, if your phone is set to English and you’re looking at a place of interest in Tokyo, you’ll see the new speaker icon next to the place’s name and address so you can get a real-time translation. 

The new feature will be rolling out this month on Android and iOS with support for 50 languages and more on the way. 

Source: Translate


Speak easy while traveling with Google Maps

Google Maps has made travel easier than ever before. You can scout out a neighborhood before booking a hotel, get directions on the go and even see what nearby restaurants the locals recommend thanks to auto-translated reviews.

But when you're in a foreign country where you don't speak or read the language, getting around can still be difficult -- especially when you need to speak with someone. Think about that anxiety-inducing time you tried to talk to a taxi driver, or that moment you tried to casually ask a passerby for directions.

To help, we're bringing Google Maps and Google Translate closer together. This month, we’re adding a new translator feature that enables your phone to speak out a place's name and address in the local lingo. Simply tap the new speaker button next to the place name or address, and Google Maps will say it out loud, making your next trip that much simpler. And when you want to have a deeper conversation, Google Maps will quickly link you to the Google Translate app.

Google_SpeakEasy_GIF_191018.gif

This text-to-speech technology automatically detects what language your phone is using to determine which places you might need help translating. For instance, if your phone is set to English and you’re looking at a place of interest in Tokyo, you’ll see the new speaker icon next to the place’s name and address so you can get a real-time translation. 

The new feature will be rolling out this month on Android and iOS with support for 50 languages and more on the way. 

Source: Translate


Google Translate’s instant camera translation gets an upgrade

Google Translate allows you to explore unfamiliar lands, communicate in different languages, and make connections that would be otherwise impossible. One of my favorite features on the Google Translate mobile app is instant camera translation, which allows you to see the world in your language by just pointing your camera lens at the foreign text. Similar to the real-time translation feature we recently launched in Google Lens, this is an intuitive way to understand your surroundings, and it’s especially helpful when you’re traveling abroad as it works even when you’re not connected to Wi-Fi or using cellular data. Today, we’re launching new upgrades to this feature, so that it’s even more useful.

Instant camera translation.gif

Translate from 88 languages into 100+ languages


The instant camera translation adds support for 60 more languages, such as Arabic, Hindi, Malay, Thai and Vietnamese. Here’s a full list of all 88 supported languages.

What’s more exciting is that, previously you could only translate between English and other languages, but now you can translate into any of the 100+ languages supported on Google Translate. This means you can now translate from Arabic to French, or from Japanese to Chinese, etc. 

Automatically detect the language

When traveling abroad, especially in a region with multiple languages, it can be challenging for people to determine the language of the text that they need to translate. We took care of that—in the new version of the app, you can just select “Detect language” as the source language, and the Translate app will automatically detect the language and translate. Say you’re traveling through South America, where both Portuguese and Spanish are spoken, and you encounter a sign. Translate app can now determine what language the sign is in, and then translate it for you into your language of choice.

Better translations powered by Neural Machine Translation

For the first time, Neural Machine Translation (NMT) technology is built into instant camera translations. This produces more accurate and natural translations, reducing errors by 55-85 percent in certain language pairs. And most of the languages can be downloaded onto your device, so that you can use the feature without an internet connection. However, when your device is connected to the internet, the feature uses that connection to produce higher quality translations.

A new look

Last but not least, the feature has a new look and is more intuitive to use. In the past, you might have noticed the translated text would flicker when viewed on your phone, making it difficult to read. We’ve reduced that flickering, making the text more stable and easier to understand. The new look has all three camera translation features conveniently located on the bottom of the app: “Instant” translates foreign text when you point your camera at it. "Scan" lets you take a photo and use your finger to highlight text you want translated. And “Import” lets you translate text from photos on your camera roll. 


To try out the the instant camera translation feature, download the Google Translate app.

Source: Translate