Making the internet more inclusive in India

More than 400 million people in India use the internet, and more are coming online every day. But the vast majority of India’s online content is in English, which only 20 percent of the country’s population speaks—meaning most Indians have a hard time finding content and services in their language.

Building for everyone means first and foremost making things work in the languages people speak. That’s why we’ve now brought our new neural machine translation technology to translations between English and nine widely used Indian languages—Hindi, Bengali, Marathi, Gujarati, Punjabi, Tamil, Telugu, Malayalam and Kannada.

Neural machine translation translates full sentences at a time, instead of pieces of a sentence, using this broader context to help it figure out the most relevant translation. The result is higher-quality, more human sounding translations.

Just like it’s easier to learn a language when you already know a related language, our neural technology speaks each language better when it learns several at a time. For example, we have a whole lot more sample data for Hindi than its relatives Marathi and Bengali, but when we train them all together, the translations for all improve more than if we’d trained each individually.

NMT Translation India.jpg
Left: Phrase-based translation; right: neural machine translation

These improvements to Google Translate in India join several other updates we announced at an event in New Delhi today, including neutral machine translation in Chrome and bringing the Rajpal & Sons Hindi dictionary online so it’s easier for Hindi speakers to find word meanings right in search results. All these improvements help make the web more useful for hundreds of millions of Indians, and bring them closer to benefiting from the full value of the internet.