Tag Archives: Google Translate

A Scalable Approach to Reducing Gender Bias in Google Translate



Machine learning (ML) models for language translation can be skewed by societal biases reflected in their training data. One such example, gender bias, often becomes more apparent when translating between a gender-specific language and one that is less-so. For instance, Google Translate historically translated the Turkish equivalent of “He/she is a doctor” into the masculine form, and the Turkish equivalent of “He/she is a nurse” into the feminine form.

In line with Google’s AI Principles, which emphasizes the importance to avoid creating or reinforcing unfair biases, in December 2018 we announced gender-specific translations. This feature in Google Translate provides options for both feminine and masculine translations when translating queries that are gender-neutral in the source language. For this work, we developed a three-step approach, which involved detecting gender-neutral queries, generating gender-specific translations and checking for accuracy. We used this approach to enable gender-specific translations for phrases and sentences in Turkish-to-English and have now expanded this approach for English-to-Spanish translations, the most popular language-pair in Google Translate.
Left: Early example of the translation of a gender neutral English phrase to a gender-specific Spanish counterpart. In this case, only a biased example is given. Right: The new Translate provides both a feminine and a masculine translation option.
But as this approach was applied to more languages, it became apparent that there were issues in scaling. Specifically, generating masculine and feminine translations independently using a neural machine translation (NMT) system resulted in low recall, failing to show gender-specific translations for up to 40% of eligible queries, because the two translations often weren’t exactly equivalent, except for gender-related phenomena. Additionally, building a classifier to detect gender-neutrality for each source language was data intensive.

Today, along with the release of the new English-to-Spanish gender-specific translations, we announce an improved approach that uses a dramatically different paradigm to address gender bias by rewriting or post-editing the initial translation. This approach is more scalable, especially when translating from gender-neutral languages to English, since it does not require a gender-neutrality detector. Using this approach we have expanded gender-specific translations to include Finnish, Hungarian, and Persian-to-English. We have also replaced the previous Turkish-to-English system using the new rewriting-based method.

Rewriting-Based Gender-Specific Translation
The first step in the rewriting-based method is to generate the initial translation. The translation is then reviewed to identify instances where a gender-neutral source phrase yielded a gender-specific translation. If that is the case, we apply a sentence-level rewriter to generate an alternative gendered translation. Finally, both the initial and the rewritten translations are reviewed to ensure that the only difference is the gender.
Top: The original approach. Bottom: The new rewriting-based approach.
Rewriter
Building a rewriter involved generating millions of training examples composed of pairs of phrases, each of which included both masculine and feminine translations. Because such data was not readily available, we generated a new dataset for this purpose. Starting with a large monolingual dataset, we programmatically generated candidate rewrites by swapping gendered pronouns from masculine to feminine, or vice versa. Since there can be multiple valid candidates, depending on the context — for example the feminine pronoun “her” can map to either “him” or “his” and the masculine pronoun “his” can map to “her” or “hers” — a mechanism was needed for choosing the correct one. To resolve this tie, one can either use a syntactic parser or a language model. Because a syntactic parsing model would require training with labeled datasets in each language, it is less scalable than a language model, which can learn in an unsupervised fashion. So, we select the best candidate using an in-house language model trained on millions of English sentences.
This table demonstrates the data generation process. We start with the input, generate candidates and finally break the tie using a language model.
The above data generation process results in training data that goes from a masculine input to a feminine output and vice versa. We merge data from both these directions and train a one-layer transformer-based sequence-to-sequence model on it. We introduce punctuation and casing variants in the training data to increase the model robustness. Our final model can reliably produce the requested masculine or feminine rewrites 99% of the time.

Evaluation
We also devised a new method of evaluation, named bias reduction, which measures the relative reduction of bias between the new translation system and the existing system. Here “bias” is defined as making a gender choice in the translation that is unspecified in the source. For example, if the current system is biased 90% of the time and the new system is biased 45% of the time, this results in a 50% relative bias reduction. Using this metric, the new approach results in a bias reduction of ≥90% for translations from Hungarian, Finnish and Persian-to-English. The bias reduction of the existing Turkish-to-English system improved from 60% to 95% with the new approach. Our system triggers gender-specific translations with an average precision of 97% (i.e., when we decide to show gender-specific translations we’re right 97% of the time).
We’ve made significant progress since our initial launch by increasing the quality of gender-specific translations and also expanding it to 4 more language-pairs. We are committed to further addressing gender bias in Google Translate and plan to extend this work to document-level translation, as well.

Acknowledgements:
This effort has been successful thanks to the hard work of many people, including, but not limited to the following (in alphabetical order of last name): Anja Austermann, Jennifer Choi‎, Hossein Emami, Rick Genter, Megan Hancock, Mikio Hirabayashi‎, Macduff Hughes, Tolga Kayadelen, Mira Keskinen, Michelle Linch, Klaus Macherey‎, Gergely Morvay, Tetsuji Nakagawa, Thom Nelson, Mengmeng Niu, Jennimaria Palomaki‎, Alex Rudnick, Apu Shah, Jason Smith, Romina Stella, Vilis Urban, Colin Young, Angie Whitnah, Pendar Yousefi, Tao Yu

Source: Google AI Blog


Google Translate adds five languages

Millions of people around the world use Google Translate, whether in a verbal conversation, or while navigating a menu or reading a webpage online. Translate learns from existing translations, which are most often found on the web. Languages without a lot of web content have traditionally been challenging to translate, but through advancements in our machine learning technology, coupled with active involvement of the Google Translate Community, we’ve added support for five languages: Kinyarwanda, Odia (Oriya), Tatar, Turkmen and Uyghur. These languages, spoken by more than 75 million people worldwide, are the first languages we’ve added to Google Translate in four years, and expand the capabilities of Google Translate to 108 languages.

Translate supports both text translation and website translation for each of these languages. In addition, Translate supports virtual keyboard input for Kinyarwanda, Tatar and Uyghur. Below you can see our team motto, “Enable everyone, everywhere to understand the world and express themselves across languages,” translated into the five new languages. 

Translate Mission.gif

If you speak any of these languages and are interested in helping, please join the Google Translate Community and improve our translations.

Source: Translate


Google Translate improves offline translation

When you’re traveling somewhere without access to the internet or don’t want to use your data plan, you can still use the Google Translate app on Android and iOS when your phone is offline. Offline translation is getting better: now, in 59 languages, offline translation is 12 percent more accurate, with improved word choice, grammar and sentence structure. In some languages like Japanese, Korean, Thai, Polish, and Hindi the quality gain is more than 20 percent. 

translation.png

It can be particularly hard to pronounce and spell words in languages that are written in a script you're not familiar with. To help you in these cases, Translate offers transliteration, which gives an equivalent spelling in the alphabet you're used to. For example, when you translate “hello” to Hindi, you will see “नमस्ते” and “namaste” in the translation card where “namaste” is the transliteration of “नमस्ते.” This is a great tool for learning how to communicate in a different language, and Translate has offline transliteration support for 10 new languages: Arabic, Bengali, Gujrati, Kannada, Marathi, Tamil, Telugu and Urdu.

Transliteration

To try our improved offline translation and transliteration, go to your Translate app on Android or iOS. If you do not have the app, you can download it. Make sure you have the latest updates from the Play or App store. If you’ve used offline translation before, you’ll see a banner on your home screen that will take you to the right place to update your offline files. If not, go to your offline translation settings and tap the arrow next to the language name to download that language. Now you’ll be ready to translate text whether you’re online or not.


Source: Translate


Speak easy while traveling with Google Maps

Google Maps has made travel easier than ever before. You can scout out a neighborhood before booking a hotel, get directions on the go and even see what nearby restaurants the locals recommend thanks to auto-translated reviews.

But when you're in a foreign country where you don't speak or read the language, getting around can still be difficult -- especially when you need to speak with someone. Think about that anxiety-inducing time you tried to talk to a taxi driver, or that moment you tried to casually ask a passerby for directions.

To help, we're bringing Google Maps and Google Translate closer together. This month, we’re adding a new translator feature that enables your phone to speak out a place's name and address in the local lingo. Simply tap the new speaker button next to the place name or address, and Google Maps will say it out loud, making your next trip that much simpler. And when you want to have a deeper conversation, Google Maps will quickly link you to the Google Translate app.

Google_SpeakEasy_GIF_191018.gif

This text-to-speech technology automatically detects what language your phone is using to determine which places you might need help translating. For instance, if your phone is set to English and you’re looking at a place of interest in Tokyo, you’ll see the new speaker icon next to the place’s name and address so you can get a real-time translation. 

The new feature will be rolling out this month on Android and iOS with support for 50 languages and more on the way. 

Source: Translate


Google Translate’s instant camera translation gets an upgrade

Google Translate allows you to explore unfamiliar lands, communicate in different languages, and make connections that would be otherwise impossible. One of my favorite features on the Google Translate mobile app is instant camera translation, which allows you to see the world in your language by just pointing your camera lens at the foreign text. Similar to the real-time translation feature we recently launched in Google Lens, this is an intuitive way to understand your surroundings, and it’s especially helpful when you’re traveling abroad as it works even when you’re not connected to Wi-Fi or using cellular data. Today, we’re launching new upgrades to this feature, so that it’s even more useful.

Instant camera translation.gif

Translate from 88 languages into 100+ languages


The instant camera translation adds support for 60 more languages, such as Arabic, Hindi, Malay, Thai and Vietnamese. Here’s a full list of all 88 supported languages.

What’s more exciting is that, previously you could only translate between English and other languages, but now you can translate into any of the 100+ languages supported on Google Translate. This means you can now translate from Arabic to French, or from Japanese to Chinese, etc. 

Automatically detect the language

When traveling abroad, especially in a region with multiple languages, it can be challenging for people to determine the language of the text that they need to translate. We took care of that—in the new version of the app, you can just select “Detect language” as the source language, and the Translate app will automatically detect the language and translate. Say you’re traveling through South America, where both Portuguese and Spanish are spoken, and you encounter a sign. Translate app can now determine what language the sign is in, and then translate it for you into your language of choice.

Better translations powered by Neural Machine Translation

For the first time, Neural Machine Translation (NMT) technology is built into instant camera translations. This produces more accurate and natural translations, reducing errors by 55-85 percent in certain language pairs. And most of the languages can be downloaded onto your device, so that you can use the feature without an internet connection. However, when your device is connected to the internet, the feature uses that connection to produce higher quality translations.

A new look

Last but not least, the feature has a new look and is more intuitive to use. In the past, you might have noticed the translated text would flicker when viewed on your phone, making it difficult to read. We’ve reduced that flickering, making the text more stable and easier to understand. The new look has all three camera translation features conveniently located on the bottom of the app: “Instant” translates foreign text when you point your camera at it. "Scan" lets you take a photo and use your finger to highlight text you want translated. And “Import” lets you translate text from photos on your camera roll. 


To try out the the instant camera translation feature, download the Google Translate app.

Source: Translate


Providing Gender-Specific Translations in Google Translate



Over the past few years, Google Translate has made significant improvements to translation quality by switching to an end-to-end neural network-based system. At the same time, we realized that translations from our models can reflect societal biases, such as gender bias. Specifically, languages differ a lot in how they represent gender, and when there are ambiguities during translation, the systems tend to pick gender choices that reflect societal asymmetries, resulting in biased translations. For instance, Google Translate historically translated the Turkish equivalent of “He/she is a doctor” into the masculine form, and the Turkish equivalent of “He/she is a nurse” into the feminine form.

Recently, we announced that we’re taking the first step at reducing gender bias in our translations. We now provide both feminine and masculine translations when translating single-word queries from English to four different languages (French, Italian, Portuguese, and Spanish), and when translating phrases and sentences from Turkish to English.
Gender-specific translations on the Google Translate website.
Supporting gender-specific translations for single-word queries involved enriching our underlying dictionary with gender attributes. Supporting gender-specific translations for longer queries (phrases and sentences) was particularly challenging and involved making significant changes to our translation framework. For these longer queries, we focused initially on Turkish-to-English translation. We developed a three-step approach to solve the problem of providing a masculine and feminine translation in English for a gender-neutral query in Turkish.
Detecting Gender-Neutral Queries
Many Turkish sentences that refer to people are gender-neutral, but not all are. Detecting which queries are eligible for gender-specific translations is a hard problem because Turkish is morphologically complex, meaning that reference to a person can either be explicit with a gender-neutral pronoun (e.g. O, Ona) or implicitly encoded. For example, the sentence “Biliyor mu?” has no explicit gender-neutral pronoun but can be translated as either “Does she know?” or “Does he know?”. This complexity means that we cannot use a simple list of gender-neutral pronouns to detect gender-neutral Turkish queries and need a machine-learned system. We estimate that approximately 10% of Turkish Translate queries are ambiguous, and eligible for both feminine and masculine translations.

To detect these queries, we use state-of-the-art text classification algorithms (same as those used in our Cloud Natural Language API) to build a system that is able to detect when a given Turkish query is gender-neutral. Since this introduces an additional step before obtaining the translations, we had to carefully balance model complexity with latency. We trained our system on thousands of human-rated Turkish examples, where raters were asked to judge whether a given example is gender-neutral or not. Our final classification system is a convolutional neural network that can accurately detect queries which require gender-specific translations.

Generating Gender-Specific Translations
Next, we enhanced our underlying Neural Machine Translation (NMT) system to produce feminine and masculine translations when requested. When no gender is requested, we trained the model to produce the default translation. This involved:
  • Identifying and dividing our parallel training data into those with feminine words, those with masculine and those with ungendered words.
  • Adding an additional input token to the beginning of the sentence to specify the required gender to translate to, similar to how we build multilingual NMT systems:
    • <2MALE> O bir doktor → He is a doctor
    • <2FEMALE> O bir doktor → She is a doctor
  • Training our enhanced NMT model on the feminine, masculine and ungendered data sources. We experimented with various mixing ratios for these sources to enable the model to perform equally well on the three tasks.
If a user's query is determined to be gender-neutral, we add a gender prefix to the translation request. For these requests, our final NMT model can reliably produce feminine and masculine translations 99% of the time. Additionally, the system maintains translation quality on queries without the gender prefix.

Checking for Accuracy
Finally, we have a step that decides whether to display the gender-specific translations. Since the training data that produces the masculine translation is different from the training data that produces the feminine translation, there may be differences between the two translations unrelated to gender. If the gender-specific translations are determined to be low quality, we show only the single default translation. To determine the quality of the gender-specific translations, we verify:
  • If the requested feminine translation is feminine.
  • If the requested masculine translation is masculine.
  • If the feminine and masculine translations are exactly equivalent with the exception of gender-related changes. Even minor changes in the wording between the translations will result in being filtered.
Top: The masculine and feminine translations differ only with respect to gender i.e. “he” and “his” vs “she” and “her”. Hence, we will show gender-specific translations. Bottom: The masculine and feminine translations differ correctly with respect to gender i.e. “he” vs “she”. However, the change from “really” to “actually” is not related to gender. Hence, we will filter gender-specific translations and display the default translation.
Putting it all together, input sentences first go through the classifier, which detects whether they’re eligible for gender-specific translations. If the classifier says “yes”, we send three requests to our enhanced NMT model—a feminine request, a masculine request and an ungendered request. Our final step takes into account all three responses and decides whether to display gender-specific translations or a single default translation. This step is still quite conservative in order to maximize the quality of gender-specific translations shown; hence our overall recall is only around 60%. We plan to increase our coverage and add support for more complex sentences in future iterations.

This is just the first step toward addressing gender bias in machine-translation systems and reiterates Google’s commitment to fairness in machine learning. In the future, we plan to extend gender-specific translations to more languages and to address non-binary gender in translations.

Acknowledgements:
This effort has been successful thanks to the hard work of a lot of people including, but not limited to, the following (in alphabetical order of last name): Lindsey Boran, HyunJeong Choe, Héctor Fernández Alcalde, Orhan Firat, Qin Gao, Rick Genter, Macduff Hughes, Tolga Kayadelen, James Kuczmarski, Tatiana Lando, Liu Liu, Michael Mandl, Nihal Meriç Atilla, Mengmeng Niu, Adnan Ozturel, Emily Pitler, Kathy Ray, John Richardson, Larissa Rinaldi, Alex Rudnick, Apu Shah, Jason Smith, Antonio Stella, Romina Stella, Jana Strnadova, Katrin Tomanek, Barak Turovsky, Dan Schwarz, Shilp Vaishnav, Clayton Watts, Kellie Webster, Colin Young, Pendar Yousefi, Candice Zhang and Min Zhao.

Source: Google AI Blog


Reducing gender bias in Google Translate

Over the course of this year, there’s been an effort across Google to promote fairness and reduce bias in machine learning. Our latest development in this effort addresses gender bias by providing feminine and masculine translations for some gender-neutral words on the Google Translate website.


Google Translate learns from hundreds of millions of already-translated examples from the web. Historically, it has provided only one translation for a query, even if the translation could have either a feminine or masculine form. So when the model produced one translation, it inadvertently replicated gender biases that already existed. For example: it would skew masculine for words like “strong” or “doctor,” and feminine for other words, like “nurse” or “beautiful.”


Now you’ll get both a feminine and masculine translation for a single word—like “surgeon”—when translating from English into French, Italian, Portuguese or Spanish. You’ll also get both translations when translating phrases and sentences from Turkish to English. For example, if you type “o bir doktor” in Turkish, you’ll now get “she is a doctor” and “he is a doctor” as the gender-specific translations.


gender specific translation

Gender-specific translations on the Google Translate website.

In the future, we plan to extend gender-specific translations to more languages, launch on other Translate surfaces like our iOS and Android apps, and address gender bias in features like query auto-complete. And we're already thinking about how to address non-binary gender in translations, though it’s not part of this initial launch.


To check out gender-specific translations, visit the Google Translate website, and you can get more information on our Google Translate Help Center page.

Source: Translate


A new look for Google Translate on the web

It’s been twelve years since the launch of Google Translate, and since then Translate has evolved to keep up with the ways people use it. Initially translating between English and Arabic only, we now translate 30 trillion sentences per year across 103 languages.

Google Translate has become an essential tool for communicating across languages, and we recently redesigned the Translate website to make it easier to use. Here’s what you need to know:

  • The site’s new look is now consistent with other Google products, and updated labeling and typography make it easier to navigate. For instance, you’ve always been able to upload documents for translation, but now that feature is easier to find. 
  • Now it’s even more convenient to save and organize important translations you regularly utilize or search for. We’ve added labels to each saved translation, so if you speak multiple languages, you can sort and group your translations with a single click.
  • We've made the website responsive so it can adjust dynamically for your screen size. So when we launch new features, you get a great web experience across all your devices: mobile, tablet, or desktop. 
translate web redesign gif

The new responsive website adjusts dynamically to your screen size.

To check out the new site, visit translate.google.com.

Source: Translate


Bringing hope to a refugee family, using Google Translate

In 2015, I joined Google to be a part of a company using technology to help others. I’m proud that Google’s commitment to its mission—to organize the world’s information and make it universally accessible and useful—remains strong 20 years in. I knew I wanted to be a part of it all, but had no idea that I would experience the power of our mission firsthand, and that it would help me to forge a friendship when I least expected it.

For the past three years, my wife and I have been working with organizations involved with refugee resettlement efforts. We both have immigrant parents, so we’ve heard stories about resettling in a country to make a better life for your children, but being forced to leave a country is very different. These refugees are often fleeing from life threatening situations. Aside from dealing with their past trauma and being in an unfamiliar place without a support system, they often can’t speak the local language.

My wife and I learned of a family of four—Nour, Mariam, three-year old Sanah and six-month-old Yousuf—who settled in Rialto, 45 minutes from where my wife and I grew up in Southern California. Through the assistance of organizations such as Hearts of Mercy and Miry’s List, they settled into an apartment shortly before giving birth to Yousuf. Still recovering from injuries sustained in Syria, Nour was unable to work, and had to rely on the help of others to get by. Without a car, their options were further limited. Then, in April of this year, they faced their hardest challenge yet: their daughter Sanah was diagnosed with Stage 4 Neuroblastoma.

We wanted to help, but didn’t know where to start—and as new parents ourselves, we could relate on a personal level. We fundraised for the family and collected toys for Yousuf and Sanah in hopes that they could feel supported. Moreso, we wanted to help them get through Sanah’s treatments with as little to worry about as possible.

A few weeks after we first heard of their story, we went to their home to meet in person. Nour was waiting outside for us, and we quickly realized there was a challenge that we had overlooked: the family only spoke Arabic. There I was, face to face with Nour, wanting to hear his story and reassure him that he’s surrounded by a supportive community, but couldn't convey those thoughts or give Nour the ability to convey his. The only option I could think of was Google Translate, which I had used in previous international trips, and hoped would bridge this gap.

I opened the app to translate a few words, but we couldn’t get far by manually typing sentences. Instead, I tried "conversation" mode, which allows for real-time audio translations and makes the interaction feel more natural. We talked about his family’s story and what they were up against. I learned that back in Syria, Nour was shot twice in the back, and endured the deaths of his brothers. Now, Nour and Mariam are giving up everything to take care of Sanah and spend up to two hours commuting on a bus to and from her hospital treatments. Through all of this, they continue to be optimistic and hopeful, and are grateful for being able to make it to America.

image (2).png

A snapshot of my visit with Nour.

I never imagined that we could sustain a 90-minute conversation in two languages, and that it would bring us closer together, inspiring me in a way I didn’t expect. Without Translate, we would have exchanged a few pleasantries, shared poorly communicated words and parted ways. Instead, we walked away with a bond built on an understanding of one another—we were just two fathers, talking about our fears and hopes for our family’s future. To this day, we stay connected on how the family is doing, and I’m looking forward to keeping this relationship going for a long time.

Refugee families often find themselves in situations that may seem normal to you and me—like at the DMV trying to get a driver’s license—or worse, in a dire situation like a hospital, with no way of communicating. We generally think of technology as an enabler of change, driving efficiency or making the impossible happen. But in this case, technology allowed me to make a life-changing connection, and brought me closer to family who was very far away from home.

Source: Translate


Tune in for the world’s first Google Translate music tour

Eleven years ago, Google Translate was created to break down language barriers. Since then, it has enabled billions of people and businesses all over the world to talk, connect and understand each other in new ways.

And we’re still re-imagining how it can be used—most recently, with music. The music industry in Sweden is one of the world's most successful exporters of hit music in English—with artists such Abba, The Cardigans and Avicii originating from the country. But there are still many talented Swedish artists who may not get the recognition or success they deserve except for in a small country up in the north.

This sparked an idea: might it be possible to use Google Translate with the sole purpose of breaking a Swedish band internationally?

Today, we’re presenting Translate Tour, in which up and coming Swedish indie pop group Vita Bergen will be using Google Translate to perform their new single “Tänd Ljusen” in three different languages—English, Spanish and French—on the streets of three different European cities. In just a couple of days, the band will set off to London, Paris and Madrid to sing their locally adapted songs in front of the eyes of the public—with the aim of spreading Swedish music culture and inviting people all over the world to tune into the band’s cross-European indie pop music.

Translate Tour 2_Credit Anton Olin.jpg

William Hellström from Vita Bergen will be performing his song in English, Spanish and French.

Last year Google Translate switched from phrase-based translation to Google Neural Machine Translation, which means that the tool now translates whole sentences at a time, rather than just piece by piece. It uses this broader context to figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar.

Using this updated version of Google Translate, the English, Spanish and French translations of the song were close to flawless. The translations will also continue to improve, as the system learns from the more people using it.

Tune in to Vita Bergen’s release event, live streamed on YouTube today at 5:00 p.m. CEST, or listen to the songs in Swedish (“Tänd Ljusen”), English (“Light the Lights”), Spanish (“Enciende las Luces”) and French (“Allumez les Lumières”).

Source: Translate