Tag Archives: Translation

Improving Indian Language Transliterations in Google Maps

Nearly 75% of India’s population — which possesses the second highest number of internet users in the world — interacts with the web primarily using Indian languages, rather than English. Over the next five years, that number is expected to rise to 90%. In order to make Google Maps as accessible as possible to the next billion users, it must allow people to use it in their preferred language, enabling them to explore anywhere in the world.

However, the names of most Indian places of interest (POIs) in Google Maps are not generally available in the native scripts of the languages of India. These names are often in English and may be combined with acronyms based on the Latin script, as well as Indian language words and names. Addressing such mixed-language representations requires a transliteration system that maps characters from one script to another, based on the source and target languages, while accounting for the phonetic properties of the words as well.

For example, consider a user in Ahmedabad, Gujarat, who is looking for a nearby hospital, KD Hospital. They issue the search query, કેડી હોસ્પિટલ, in the native script of Gujarati, the 6th most widely spoken language in India. Here, કેડી (“kay-dee”) is the sounding out of the acronym KD, and હોસ્પિટલ is “hospital”. In this search, Google Maps knows to look for hospitals, but it doesn't understand that કેડી is KD, hence it finds another hospital, CIMS. As a consequence of the relative sparsity of names available in the Gujarati script for places of interest (POIs) in India, instead of their desired result, the user is shown a result that is further away.

To address this challenge, we have built an ensemble of learned models to transliterate names of Latin script POIs into 10 languages prominent in India: Hindi, Bangla, Marathi, Telugu, Tamil, Gujarati, Kannada, Malayalam, Punjabi, and Odia. Using this ensemble, we have added names in these languages to millions of POIs in India, increasing the coverage nearly twenty-fold in some languages. This will immediately benefit millions of existing Indian users who don't speak English, enabling them to find doctors, hospitals, grocery stores, banks, bus stops, train stations and other essential services in their own language.

Transliteration vs. Transcription vs. Translation

Our goal was to design a system that will transliterate from a reference Latin script name into the scripts and orthographies native to the above-mentioned languages. For example, the Devanagari script is the native script for both Hindi and Marathi (the language native to Nagpur, Maharashtra). Transliterating the Latin script names for NIT Garden and Chandramani Garden, both POIs in Nagpur, results in एनआईटी गार्डन and चंद्रमणी गार्डन, respectively, depending on the specific language’s orthography in that script.

It is important to note that the transliterated POI names are not translations. Transliteration is only concerned with writing the same words in a different script, much like an English language newspaper might choose to write the name Горбачёв from the Cyrillic script as “Gorbachev” for their readers who do not read the Cyrillic script. For example, the second word in both of the transliterated POI names above is still pronounced “garden”, and the second word of the Gujarati example earlier is still “hospital” — they remain the English words “garden” and “hospital”, just written in the other script. Indeed, common English words are frequently used in POI names in India, even when written in the native script. How the name is written in these scripts is largely driven by its pronunciation; so एनआईटी from the acronym NIT is pronounced “en-aye-tee”, not as the English word “nit”. Knowing that NIT is a common acronym from the region is one piece of evidence that can be used when deriving the correct transliteration.

Note also that, while we use the term transliteration, following convention in the NLP community for mapping directly between writing systems, romanization in South Asian languages regardless of the script is generally pronunciation-driven, and hence one could call these methods transcription rather than transliteration. The task remains, however, mapping between scripts, since pronunciation is only relatively coarsely captured in the Latin script for these languages, and there remain many script-specific correspondences that must be accounted for. This, coupled with the lack of standard spelling in the Latin script and the resulting variability, is what makes the task challenging.

Transliteration Ensemble

We use an ensemble of models to automatically transliterate from the reference Latin script name (such as NIT Garden or Chandramani Garden) into the scripts and orthographies native to the above-mentioned languages. Candidate transliterations are derived from a pair of sequence-to-sequence (seq2seq) models. One is a finite-state model for general text transliteration, trained in a manner similar to models used by Gboard on-device for transliteration keyboards. The other is a neural long short-term memory (LSTM) model trained, in part, on the publicly released Dakshina dataset. This dataset contains Latin and native script data drawn from Wikipedia in 12 South Asian languages, including all but one of the languages mentioned above, and permits training and evaluation of various transliteration methods. Because the two models have such different characteristics, together they produce a greater variety of transliteration candidates.

To deal with the tricky phenomena of acronyms (such as the “NIT” and “KD” examples above), we developed a specialized transliteration module that generates additional candidate transliterations for these cases.

For each native language script, the ensemble makes use of specialized romanization dictionaries of varying provenance that are tailored for place names, proper names, or common words. Examples of such romanization dictionaries are found in the Dakshina dataset.

Scoring in the Ensemble

The ensemble combines scores for the possible transliterations in a weighted mixture, the parameters of which are tuned specifically for POI name accuracy using small targeted development sets for such names.

For each native script token in candidate transliterations, the ensemble also weights the result according to its frequency in a very large sample of on-line text. Additional candidate scoring is based on a deterministic romanization approach derived from the ISO 15919 romanization standard, which maps each native script token to a unique Latin script string. This string allows the ensemble to track certain key correspondences when compared to the original Latin script token being transliterated, even though the ISO-derived mapping itself does not always perfectly correspond to how the given native script word is typically written in the Latin script.

In aggregate, these many moving parts provide substantially higher quality transliterations than possible for any of the individual methods alone.

Coverage

The following table provides the per-language quality and coverage improvements due to the ensemble over existing automatic transliterations of POI names. The coverage improvement measures the increase in items for which an automatic transliteration has been made available. Quality improvement measures the ratio of updated transliterations that were judged to be improvements versus those that were judged to be inferior to existing automatic transliterations.

Language 

Coverage Improvement

Quality Improvement

Hindi

3.2x

1.8x

Bengali

19x

3.3x

Marathi

19x

2.9x

Telugu

3.9x

2.6x

Tamil

19x

3.6x

Gujarati

19x

2.5x

Kannada

24x

2.3x

Malayalam

24x

1.7x

Odia

960x

*

Punjabi

24x

*


* Unknown / No Baseline.


Conclusion

As with any machine learned system, the resulting automatic transliterations may contain a few errors or infelicities, but the large increase in coverage in these widely spoken languages marks a substantial expansion of the accessibility of information within Google Maps in India. Future work will include using the ensemble for transliteration of other classes of entities within Maps and its extension to other languages and scripts, including Perso-Arabic scripts, which are also commonly used in the region.


Acknowledgments: This work was a collaboration between the authors and Jacob Farner, Jonathan Herbert, Anna Katanova, Andre Lebedev, Chris Miles, Brian Roark, Anurag Sharma, Kevin Wang, Andy Wildenberg, and many others.

Posted by Cibu Johny, Software Engineer, Google Research and Saumya Dalal, Product Manager, Google Geo


The latest Android App Bundle updates including the additional languages API

Posted by Wojtek Kaliciński, Developer Advocate, Android

Last year, we launched Android App Bundles and Google Play's Dynamic Delivery to introduce modular development, reduce app size and streamline the release process. Since then, we've seen developers quickly adopt this new app model in over 60,000 production apps. We've been excited to see developers experience significant app size savings and reductions in the time needed to manage each release, and have documented these benefits in case studies with Duolingo and redBus.

Thank you to everyone who took the time to give us feedback on our initial launch. We're always open to new ideas, and today, we're happy to announce some new improvements based on your suggestions:

  • A new additional languages install API, which supports in-app language pickers
  • A streamlined publishing process for instant-enabled app bundles
  • A new enrollment option for app signing by Google Play
  • The ability to permanently uninstall dynamic feature modules that are included in your app's initial install


Additional languages API

When you adopt the Android App Bundle as the publishing format for your app, Google Play is able to optimize the installation by delivering only the language resources that match the device's system locales. If a user changes the system locale after the app is installed, Play automatically downloads the required resources.

Some developers choose to decouple the app's display language from the system locale by adding an in-app language switcher. With the latest release of the Play Core library (version 1.4.0), we're introducing a new additional languages API that makes it possible to build in-app language pickers while retaining the full benefits of smaller installs provided by using app bundles.

With the additional languages API, apps can now request the Play Store to install resources for a new language configuration on demand and immediately start using it.

Get a list of installed languages

The app can get a list of languages that are already installed using the SplitInstallManager#getInstalledLanguages() method.

val splitInstallManager = SplitInstallManagerFactory.create(context)
val langs: Set<String> = splitInstallManager.installedLanguages

Requesting additional languages

Requesting an additional language is similar to requesting an on demand module. You can do this by specifying a language in the request through SplitInstallRequest.Builder#addLanguage(java.util.Locale).

val installRequestBuilder = SplitInstallRequest.newBuilder()
installRequestBuilder.addLanguage(Locale.forLanguageTag("pl"))
splitInstallManager.startInstall(installRequestBuilder.build())

The app can also monitor install success with callbacks and monitor the download state with a listener, just like when requesting an on demand module.

Remember to handle the SplitInstallSessionStatus.REQUIRES_USER_CONFIRMATION state. Please note that there was an API change in a recent Play Core release, which means you should use the new SplitInstallManager#startConfirmationDialogForResult() together with Activity#onActivityResult(). The previous method of using SplitInstallSessionState#resolutionIntent() with startIntentSender() has been deprecated.

Check out the updated Play Core Library documentation for more information on how to access the newly installed language resources in your activity.

We've also updated our dynamic features sample on GitHub with the additional languages API, including how to store the user's language preference and apply it to your activities at startup.

Please note that while the additional languages API is now available to all developers, on demand modules are in a closed beta for the time being. You can experiment with on demand modules in your internal, open, and closed test tracks, while we work with our partners to make sure this feature is ready for production apps.

Instant-enabled App Bundle

In Android Studio 3.3, we introduced a way to build app bundles that contain both the regular, installed version of your app as well as a Google Play Instant experience for modules marked with the dist:instant="true" attribute in their AndroidManifest.xml:

<manifest ... xmlns:dist="http://schemas.android.com/apk/distribution">
    <dist:module dist:instant="true" />
    ...
</manifest>

Even though you could use a single project to generate the installed and instant versions of your app, up until now, developers were still required to use product flavors in order to build two separate app bundles and upload both to Play.

We're happy to announce that we have now removed this restriction. It's now possible to upload a single, unified app bundle artifact, containing modules enabled for the instant experience. This functionality is now available for everyone.

After you build an instant-enabled app bundle, upload it to any track on the Play Console, and you'll be able to select it when creating a new instant app release. This also means that the installed and instant versions of your app no longer need different version codes, which will simplify the release workflow.

Opt in to app signing by Google Play

You need to enable app signing by Google Play to publish your app using an Android App Bundle and automatically benefit from Dynamic Delivery optimizations. It is also a more secure way to manage your signing key, which we recommend to everyone, even if you want to keep publishing regular APKs for now.

Based on your feedback, we've revamped the sign-up flow for new apps to make it easier to initialize the key you want to use for signing your app.

Now developers can explicitly choose to upload their existing key without needing to upload a self-signed artifact first. You can also choose to start with a key generated by Google Play, so that the key used to locally sign your app bundle can become your upload key.

Read more about the new flow.

Permanent uninstallation of install time modules

We have now added the ability to permanently uninstall dynamic feature modules that are included in your app's initial install.

This is a behavior change, which means you can now call the existing SplitInstallManager#deferredUninstall() API on modules that set onDemand="false". The module will be permanently uninstalled, even when the app is updated.

This opens up new possibilities for developers to further reduce the installed app size. For example, you can now uninstall a heavy sign-up module or any other onboarding content once the user completes it. If the user navigates to a section of your app that has been uninstalled, you can reinstall it using the standard on demand modules install API.

We hope you enjoy these improvements and test them out in your apps. Continue to share your feedback as we work to make these features even more useful for you!

How useful did you find this blog post?

Empowering a new generation of localization professionals

Posted by The Google Localization Team

When her grandmother turned 80, Christina Hayek — Arabic Language Manager at Google — and her sisters wanted to give their beloved sitto a gift that would bring her closer to them. Chadia lives in Lebanon but her children and grandchildren are spread across the world. To bridge this geographical gap, Christina and her siblings gave their grandmother an Android smartphone. Much to Chadia’s surprise, she was able to use her phone in Arabic straight out of the box.

This isn’t magic—it’s the work of a dedicated localization team at Google. Spread over more than 30 countries, our team makes sure that all Google products are fun and easy to use in more than 70 languages. Localization goes beyond translation. While references to baseball and donuts work well in the US, these are not necessarily popular concepts in other cultures. Therefore we change these, for example, to football in Italy and croissant in France. Our mission is to create a diverse user experience that fits every language and every culture. We do this through a network of passionate translators and reviewers who localize Google products to make sure they sound natural to people everywhere.

With more and more people from around the world coming online every day, the localization industry keeps growing—and so does the demand for great translators, reviewers, and localization professionals. So, as part of Google’s mission to build products for everyone and make the web globally accessible, no matter where users are, we’re launching a massive open online course (MOOC) called Localization Essentials. In the words of Peter Lubbers, Google's Head of Developer Training:

"The language industry is one of the fastest growing in the world today, and as a former Internationalization Product Manager (and Dutch translator), I am absolutely thrilled that we've added Localization Essentials to our Google/Udacity training course catalog. The course is now available—free of charge—to students all over the world. This was a huge cross-functional effort; a large team of localization experts across Google came together and rallied to create this course. It was great to see how everybody poured their heart and soul into this effort and it really shows in the course quality."

Localization Essentials was developed in collaboration with Udacity, and is free to access. It covers all localization basics needed to develop global products. This is how Bert Vander Meeren, Director of Localization at Google, described the collaboration:

“Today, localization is becoming more and more important because the internet user base is growing rapidly, especially in non-English speaking countries. At the same time, education opportunities in the field are limited. This is an issue for our team and any business in need of large numbers of localization resources. So we decided to take the lead and address the issue, because who knows localization better than dedicated localization professionals with years of experience? Udacity already helped us develop and host several successful courses for Android developers, so this partnership was more than logical. This course is a great opportunity for anyone who wants to get knowledge and new skills in a still lesser-known field that’s important to develop products for a truly global audience. Whether you are a student, a professional, or an entrepreneur, you will learn a lot and expand your horizons.”

By sharing our knowledge we hope that more culturally relevant products will become available to users everywhere, to provide opportunities to them that they didn’t have before.

We’re looking forward to seeing how sharing this localization knowledge will impact users from all over the world.

Request a professional app translation from the Google Play Console and reach new users

Posted by Rahim Nathwani, Product Manager, Google Play
Localizing your app or game is an important step in allowing you to reach the widest possible audience. It helps you increase downloads and provide better experiences for your audience.
To help do this, Google Play offers an app translation service. The service, by professional linguists, can translate app user interface strings, Play Store text, in-app products and universal app campaign ads. We've made the app translation service available directly from inside the Google Play Console, making it easy and quick to get started.
  • Choose from a selection of professional translation vendors.
  • Order, receive and apply translations, without leaving the Play Console.
  • Pay online with Google Wallet.
  • Translations from your previous orders (if any) are reused, so you never pay for the same translation twice. Great if you release new versions frequently.
Using the app translation service to translate a typical app and store description into one language may cost around US$50. (cost depends on the amount of text and languages).
Find out more and get started with the app translation service.

How useful did you find this blogpost?