Tag Archives: Gboard

Advances in private training for production on-device language models

Language models (LMs) trained to predict the next word given input text are the key technology for many applications [1, 2]. In Gboard, LMs are used to improve users’ typing experience by supporting features like next word prediction (NWP), Smart Compose, smart completion and suggestion, slide to type, and proofread. Deploying models on users’ devices rather than enterprise servers has advantages like lower latency and better privacy for model usage. While training on-device models directly from user data effectively improves the utility performance for applications such as NWP and smart text selection, protecting the privacy of user data for model training is important.

Gboard features powered by on-device language models.

In this blog we discuss how years of research advances now power the private training of Gboard LMs, since the proof-of-concept development of federated learning (FL) in 2017 and formal differential privacy (DP) guarantees in 2022. FL enables mobile phones to collaboratively learn a model while keeping all the training data on device, and DP provides a quantifiable measure of data anonymization. Formally, DP is often characterized by (ε, δ) with smaller values representing stronger guarantees. Machine learning (ML) models are considered to have reasonable DP guarantees for ε=10 and strong DP guarantees for ε=1 when δ is small.

As of today, all NWP neural network LMs in Gboard are trained with FL with formal DP guarantees, and all future launches of Gboard LMs trained on user data require DP. These 30+ Gboard on-device LMs are launched in 7+ languages and 15+ countries, and satisfy (ɛ, δ)-DP guarantees of small δ of 10-10 and ɛ between 0.994 and 13.69. To the best of our knowledge, this is the largest known deployment of user-level DP in production at Google or anywhere, and the first time a strong DP guarantee of ɛ < 1 is announced for models trained directly on user data.


Privacy principles and practices in Gboard

In “Private Federated Learning in Gboard”, we discussed how different privacy principles are currently reflected in production models, including:

  • Transparency and user control: We provide disclosure of what data is used, what purpose it is used for, how it is processed in various channels, and how Gboard users can easily configure the data usage in learning models.
  • Data minimization: FL immediately aggregates only focused updates that improve a specific model. Secure aggregation (SecAgg) is an encryption method to further guarantee that only aggregated results of the ephemeral updates can be accessed.
  • Data anonymization: DP is applied by the server to prevent models from memorizing the unique information in individual user’s training data.
  • Auditability and verifiability: We have made public the key algorithmic approaches and privacy accounting in open-sourced code (TFF aggregator, TFP DPQuery, DP accounting, and FL system).

A brief history

In recent years, FL has become the default method for training Gboard on-device LMs from user data. In 2020, a DP mechanism that clips and adds noise to model updates was used to prevent memorization for training the Spanish LM in Spain, which satisfies finite DP guarantees (Tier 3 described in “How to DP-fy ML“ guide). In 2022, with the help of the DP-Follow-The-Regularized-Leader (DP-FTRL) algorithm, the Spanish LM became the first production neural network trained directly on user data announced with a formal DP guarantee of (ε=8.9, δ=10-10)-DP (equivalent to the reported ρ=0.81 zero-Concentrated-Differential-Privacy), and therefore satisfies reasonable privacy guarantees (Tier 2).


Differential privacy by default in federated learning

In “Federated Learning of Gboard Language Models with Differential Privacy”, we announced that all the NWP neural network LMs in Gboard have DP guarantees, and all future launches of Gboard LMs trained on user data require DP guarantees. DP is enabled in FL by applying the following practices:

  • Pre-train the model with the multilingual C4 dataset.
  • Via simulation experiments on public datasets, find a large DP-noise-to-signal ratio that allows for high utility. Increasing the number of clients contributing to one round of model update improves privacy while keeping the noise ratio fixed for good utility, up to the point the DP target is met, or the maximum allowed by the system and the size of the population.
  • Configure the parameter to restrict the frequency each client can contribute (e.g., once every few days) based on computation budget and estimated population in the FL system.
  • Run DP-FTRL training with limits on the magnitude of per-device updates chosen either via adaptive clipping, or fixed based on experience.

SecAgg can be additionally applied by adopting the advances in improving computation and communication for scales and sensitivity.

Federated learning with differential privacy and (SecAgg).

Reporting DP guarantees

The DP guarantees of launched Gboard NWP LMs are visualized in the barplot below. The x-axis shows LMs labeled by language-locale and trained on corresponding populations; the y-axis shows the ε value when δ is fixed to a small value of 10-10 for (ε, δ)-DP (lower is better). The utility of these models are either significantly better than previous non-neural models in production, or comparable with previous LMs without DP, measured based on user-interactions metrics during A/B testing. For example, by applying the best practices, the DP guarantee of the Spanish model in Spain is improved from ε=8.9 to ε=5.37. SecAgg is additionally used for training the Spanish model in Spain and English model in the US. More details of the DP guarantees are reported in the appendix following the guidelines outlined in “How to DP-fy ML”.


Towards stronger DP guarantees

The ε~10 DP guarantees of many launched LMs are already considered reasonable for ML models in practice, while the journey of DP FL in Gboard continues for improving user typing experience while protecting data privacy. We are excited to announce that, for the first time, production LMs of Portuguese in Brazil and Spanish in Latin America are trained and launched with a DP guarantee of ε ≤ 1, which satisfies Tier 1 strong privacy guarantees. Specifically, the (ε=0.994, δ=10-10)-DP guarantee is achieved by running the advanced Matrix Factorization DP-FTRL (MF-DP-FTRL) algorithm, with 12,000+ devices participating in every training round of server model update larger than the common setting of 6500+ devices, and a carefully configured policy to restrict each client to at most participate twice in the total 2000 rounds of training in 14 days in the large Portuguese user population of Brazil. Using a similar setting, the es-US Spanish LM was trained in a large population combining multiple countries in Latin America to achieve (ε=0.994, δ=10-10)-DP. The ε ≤ 1 es-US model significantly improved the utility in many countries, and launched in Colombia, Ecuador, Guatemala, Mexico, and Venezuela. For the smaller population in Spain, the DP guarantee of es-ES LM is improved from ε=5.37 to ε=3.42 by only replacing DP-FTRL with MF-DP-FTRL without increasing the number of devices participating every round. More technical details are disclosed in the colab for privacy accounting.

DP guarantees for Gboard NWP LMs (the purple bar represents the first es-ES launch of ε=8.9; cyan bars represent privacy improvements for models trained with MF-DP-FTRL; tiers are from “How to DP-fy ML“ guide; en-US* and es-ES* are additionally trained with SecAgg).

Discussion and next steps

Our experience suggests that DP can be achieved in practice through system algorithm co-design on client participation, and that both privacy and utility can be strong when populations are large and a large number of devices' contributions are aggregated. Privacy-utility-computation trade-offs can be improved by using public data, the new MF-DP-FTRL algorithm, and tightening accounting. With these techniques, a strong DP guarantee of ε ≤ 1 is possible but still challenging. Active research on empirical privacy auditing [1, 2] suggests that DP models are potentially more private than the worst-case DP guarantees imply. While we keep pushing the frontier of algorithms, which dimension of privacy-utility-computation should be prioritized?

We are actively working on all privacy aspects of ML, including extending DP-FTRL to distributed DP and improving auditability and verifiability. Trusted Execution Environment opens the opportunity for substantially increasing the model size with verifiable privacy. The recent breakthrough in large LMs (LLMs) motivates us to rethink the usage of public information in private training and more future interactions between LLMs, on-device LMs, and Gboard production.


Acknowledgments

The authors would like to thank Peter Kairouz, Brendan McMahan, and Daniel Ramage for their early feedback on the blog post itself, Shaofeng Li and Tom Small for helping with the animated figures, and the teams at Google that helped with algorithm design, infrastructure implementation, and production maintenance. The collaborators below directly contribute to the presented results:

Research and algorithm development: Galen Andrew, Stanislav Chiknavaryan, Christopher A. Choquette-Choo, Arun Ganesh, Peter Kairouz, Ryan McKenna, H. Brendan McMahan, Jesse Rosenstock, Timon Van Overveldt, Keith Rush, Shuang Song, Thomas Steinke, Abhradeep Guha Thakurta, Om Thakkar, and Yuanbo Zhang.

Infrastructure, production and leadership support: Mingqing Chen, Stefan Dierauf, Billy Dou, Hubert Eichner, Zachary Garrett, Jeremy Gillula, Jianpeng Hou, Hui Li, Xu Liu, Wenzhi Mao, Brett McLarnon, Mengchen Pei, Daniel Ramage, Swaroop Ramaswamy, Haicheng Sun, Andreas Terzis, Yun Wang, Shanshan Wu, Yu Xiao, and Shumin Zhai.

Source: Google AI Blog


Grammar Correction as You Type, on Pixel 6

Despite the success and widespread adoption of smartphones, using them to compose longer pieces of text is still quite cumbersome. As one writes, grammatical errors can often creep into the text (especially undesirable in formal situations), and correcting these errors can be time consuming on a small display with limited controls.

To address some of these challenges, we are launching a grammar correction feature that is directly built into Gboard on Pixel 6 that works entirely on-device to preserve privacy, detecting and suggesting corrections for grammatical errors while the user is typing. Building such functionality required addressing a few key obstacles: memory size limitations, latency requirements, and handling partial sentences. Currently, the feature is capable of correcting English sentences (we plan to expand to more languages in the near future) and available on almost any app with Gboard1.

Gboard suggests how to correct an ungrammatical sentence as the user types.

Model Architecture
We trained a sequence-to-sequence neural network to take an input sentence (or a sentence prefix) and output the grammatically correct version — if the original text is already grammatically correct, the output of the model is identical to its input, indicating that no corrections are needed. The model uses a hybrid architecture that combines a Transformer encoder with an LSTM decoder, a combination that provides a good balance of quality and latency.

Overview of the grammatical error correction (GEC) model architecture.

Mobile devices are constrained by limited memory and computational power, which make it more difficult to build a high quality grammar checking system. There are a few techniques we use to build a small, efficient, and capable model.

  • Shared embedding: Because the input and output of the model are structurally similar (e.g., both are text in the same language), we share some of the model weights between the Transformer encoder and the LSTM decoder, which reduces the model file size considerably without unduly affecting accuracy.
  • Factorized embedding: The model splits a sentence into a sequence of predefined tokens. To achieve good quality, we find that it is important to use a large vocabulary of predefined tokens, however, this substantially increases the model size. A factorized embedding separates the size of the hidden layers from the size of the vocabulary embedding. This enables us to have a model with a large vocabulary without significantly increasing the number of total weights.
  • Quantization: To reduce the model size further, we perform post-training quantization, which allows us to store each 32-bit floating point weight using only 8-bits. While this means that each weight is stored with lower fidelity, nevertheless, we find that the quality of the model is not materially affected.

By employing these techniques, the resulting model takes up only 20MB of storage and performs inference on 60 input characters under 22ms on the Google Pixel 6 CPU.

Training the Model
In order to train the model, we needed training data in the form of <original, corrected> text pairs.

One possible approach to generating a small on-device model would be to use the same training data as a large cloud-based grammar model. While this data produces a reasonably high quality on-device model, we found that using a technique called hard distillation to generate training data that is better-matched to the on-device domain yields even better quality results.

Hard distillation works as follows: We first collected hundreds of millions of English sentences from across the public web. We then used the large cloud-based grammar model to generate grammar corrections for those sentences. This training dataset of <original, corrected> sentence pairs is then used to train a smaller on-device model that can correct full sentences. We found that the on-device model built from this training dataset produces significantly higher quality suggestions than a similar-sized on-device model built on the original data used to train the cloud-based model.

Before training the model from this data, however, there is another issue to address. To enable the model to correct grammar as the user types (an important capability of mobile devices) it needs to be able to handle sentence prefixes. While this enables grammar correction when the user has only typed part of a sentence, this capability is particularly useful in messaging apps, where the user often omits the final period in a sentence and presses the send button as soon as they finish typing. If grammar correction is only triggered on complete sentences, it might miss many errors.

This raises the question of how to decide whether a given sentence prefix is grammatically correct. We used a heuristic to solve this — if a given sentence prefix can be completed to form a grammatically correct sentence, we then consider it grammatically correct. If not, it is assumed to be incorrect.

What the user has typed so far       Suggested grammar correction
She puts a lot
She puts a lot of
She puts a lot of effort
She puts a lot of effort yesterday   Replace "puts" with "put in".
GEC on incomplete sentences. There is no correction for valid sentence prefixes.

We created a second dataset suitable for training a large cloud-based model, but this time focusing on sentence prefixes. We generated the data using the aforementioned heuristic by taking the <original, corrected> sentence pairs from the cloud-based model’s training dataset and randomly sampling aligned prefixes from them.

For example, given the <original, corrected> sentence pair:

Original sentence: She puts a lot of effort yesterday afternoon.
Corrected sentence: She put in a lot of effort yesterday afternoon.

We might sample the following prefix pairs:

Original prefix: She puts
Corrected prefix: She put in

Original prefix: She puts a lot of effort yesterday
Corrected prefix: She put in a lot of effort yesterday

We then autocompleted each original prefix to a full sentence using a neural language model (similar in spirit to that used by SmartCompose). If a full-sentence grammar model finds no errors in the full sentence, then that means there is at least one possible way to complete this original prefix without making any grammatical errors, so we consider the original prefix to be correct and output <original prefix, original prefix> as a training example. Otherwise, we output <original prefix, corrected prefix>. We used this training data to train a large cloud-based model that can correct sentence prefixes, then used that model for hard distillation, generating new <original, corrected> sentence prefix pairs that are better-matched to the on-device domain.

Finally, we constructed the final training data for the on-device model by combining these new sentence prefix pairs with the full sentence pairs. The on-device model trained on this combined data is then capable of correcting both full sentences as well as sentence prefixes.

Training data for the on-device model is generated from cloud-based models.

Grammar Correction On-Device
Gboard sends a request to the on-device grammar model whenever the user has typed more than three words, whether the sentence is completed or not. To provide a quality user experience, we underline the grammar mistakes and provide replacement suggestions when the user interacts with them. However, the model outputs only corrected sentences, so those need to be transformed into replacement suggestions. To do this, we align the original sentence and the corrected sentence by minimizing the Levenshtein distance (i.e., the number of edits that are needed to transform the original sentence to the corrected sentence).

Extracting edits by aligning the corrected sentence to the original sentence.

Finally, we transform the insertion edits and deletion edits to be replacement edits. In the above example, we transform the suggested insertion of "in" to be an edit that suggests replacing "puts" with "put in". And we similarly suggest replacing “effort on” with “effort”.

Conclusion
We have built a small high-quality grammar correction model by designing a compact model architecture and leveraging a cloud-based grammar system during training via hard distillation. This compact model enables users to correct their text entirely on their own device without ever needing to send their keystrokes to a remote server.

Acknowledgements
We gratefully acknowledge the key contributions of the other team members, including Abhanshu Sharma, Akshay Kannan, Bharath Mankalale, Chenxi Ni, Felix Stahlberg, Florian Hartmann, Jacek Jurewicz, Jayakumar Hoskere, Jenny Chin, Kohsuke Yatoh, Lukas Zilka, Martin Sundermeyer, Matt Sharifi, Max Gubin, Nick Pezzotti, Nithi Gupta, Olivia Graham, Qi Wang, Sam Jaffee, Sebastian Millius, Shankar Kumar, Sina Hassani, Vishal Kumawat, and Yuanbo Zhang, Yunpeng Li, Yuxin Dai. We would also like to thank Xu Liu and David Petrou for their support.


1The feature will eventually be available in all apps with Gboard, but is currently unavailable for those in WebView

Source: Google AI Blog


Privacy-Preserving Smart Input with Gboard

Google Keyboard (a.k.a Gboard) has a critical mission to provide frictionless input on Android to empower users to communicate accurately and express themselves effortlessly. In order to accomplish this mission, Gboard must also protect users' private and sensitive data. Nothing users type is sent to Google servers. We recently launched privacy-preserving input by further advancing the latest federated technologies. In Android 11, Gboard also launched the contextual input suggestion experience by integrating on-device smarts into the user's daily communication in a privacy-preserving way.

Before Android 11, input suggestions were surfaced to users in several different places. In Android 11, Gboard launched a consistent and coordinated approach to access contextual input suggestions. For the first time, we've brought Smart Replies to the keyboard suggestions - powered by system intelligence running entirely on device. The smart input suggestions are rendered with a transparent layer on top of Gboard’s suggestion strip. This structure maintains the trust boundaries between the Android platform and Gboard, meaning sensitive personal content cannot be not accessed by Gboard. The suggestions are only sent to the app after the user taps to accept them.

For instance, when a user receives the message “Have a virtual coffee at 5pm?” in Whatsapp, on-device system intelligence predicts smart text and emoji replies “Sounds great!” and “?”. Android system intelligence can see the incoming message but Gboard cannot. In Android 11, these Smart Replies are rendered by the Android platform on Gboard’s suggestion strip as a transparent layer. The suggested reply is generated by the system intelligence. When the user taps the suggestion, Android platform sends it to the input field directly. If the user doesn't tap the suggestion, gBoard and the app cannot see it. In this way, Android and Gboard surface the best of Google smarts whilst keeping users' data private: none of their data goes to any app, including the keyboard, unless they've tapped a suggestion.

Additionally, federated learning has enabled Gboard to train intelligent input models across many devices while keeping everything individual users type on their device. Today, the emoji is as common as punctuation - and have become the way for our users to express themselves in messaging. Our users want a way to have fresh and diversified emojis to better express their thoughts in messaging apps. Recently, we launched new on-device transformer models that are fine-tuned with federated learning in Gboard, to produce more contextual emoji predictions for English, Spanish and Portuguese.

Furthermore, following the success of privacy-preserving machine learning techniques, Gboard continues to leverage federated analytics to understand how Gboard is used from decentralized data. What we've learned from privacy-preserving analysis has let us make better decisions in our product.

When a user shares an emoji in a conversation, their phone keeps an ongoing count of which emojis are used. Later, when the phone is idle, plugged in, and connected to WiFi, Google’s federated analytics server invites the device to join a “round” of federated analytics data computation with hundreds of other participating phones. Every device involved in one round will compute the emoji share frequency, encrypt the result and send it a federated analytics server. Although the server can’t decrypt the data individually, the final tally of total emoji counts can be decrypted when combining encrypted data across devices. The aggregated data shows that the most popular emoji is ? in Whatsapp, ? in Roblox(gaming), and ✔ in Google Docs. Emoji ? moved up from 119th to 42nd in terms of frequency during COVID-19.

Gboard always has a strong commitment to Google’s Privacy Principles. Gboard strives to build privacy-preserving effortless input products for users to freely express their thoughts in 900+ languages while safeguarding user data. We will keep pushing the state of the art in smart input technologies on Android while safeguarding user data. Stay tuned!

Create stickers for Gboard on Google Play

Posted by Alan Ni, Associate Product Manager, Gboard

Messaging is getting more and more expressive -- today you can say I love you with an emoji, a gif, or a sticker. Millions of users share expressive content every day on Android devices using Gboard as their default keyboard. We want to push expression even further by allowing developers to create their own stickers for Gboard. Some of our early partners include Bitmoji, Disney, and even our own Allo team. Once published, your stickers could be seen and shared by millions of users around the world.

Using the Firebase App Indexing API, you'll be able index any sticker assets you create, publish your app to the Play Store, and get featured in the Gboard Sticker collection. Once a user downloads your sticker pack from the Play store, they'll be able to send those stickers directly from their keyboard in any Android app that supports image insertion!

Getting Started with Stickers

To kick things off, you'll need to add the Firebase App Indexing library. Visit the Firebase Getting Started Guide for details. Once you've set up Firebase App Indexing, read through our sticker guide to learn how to index those stickers. Next, create your sticker assets!

You should build and index stickers on first run after update or install to minimize the lag between a user installing the app and seeing the stickers in Gboard. Our sample app should give an idea of the end-to-end flow.

Making your Stickers Searchable

Users often look for stickers via searching on keywords. That means you'll want to add appropriate keywords to allow users to find your stickers and you can use the put method to add keywords. In the code snippet below, you'll see that a Snoopy sticker is tagged with the keywords: "bye", "snoopy", "see ya", and "good bye".

new Indexable.Builder("Sticker")
   .setName("Bye")
   // add url for sticker asset 
   .setImage("http://www.snoopysticker.com?id=1234")
   // see: Support links to your app content section
   .setUrl("http://sticker/canonical/image/bye")
   // Set the accessibility label for the sticker.
   .setDescription("A sticker for Bye")
   // Add search keywords.
   .put("keywords", "bye", "snoopy", "see ya", "good bye")
   .put("isPartOf",
        new Indexable.Builder("StickerPack")
          .setName("Snoopy Pack")
          .build())
   .build())};

For larger sticker packs, you'll want to make sure you've tagged stickers with keywords so that they're easier for users to find. We've come up with a list of common English phrases/keywords you can use to tag your stickers. But don't forget to internationalize your stickers -- to do this you'll want to first detect the device language and then index keywords that correspond to that language.

Get Featured in the Sticker Collection

Finally, share your stickers with the world! To be featured in our Sticker Collection on the Play Store, fill out this form. But first, make sure to thoroughly test the sticker pack using the latest build of Gboard, If your app has high-quality stickers and is working well with Gboard, we'll add it to our sticker collection; this is the best way to get it seen by millions of Gboard users!

We're really excited to see what sticker packs you're able to build.

The Machine Intelligence Behind Gboard



Most people spend a significant amount of time each day using mobile-device keyboards: composing emails, texting, engaging in social media, and more. Yet, mobile keyboards are still cumbersome to handle. The average user is roughly 35% slower typing on a mobile device than on a physical keyboard. To change that, we recently provided many exciting improvements to Gboard for Android, working towards our vision of creating an intelligent mechanism that enables faster input while offering suggestions and correcting mistakes, in any language you choose.

With the realization that the way a mobile keyboard translates touch inputs into text is similar to how a speech recognition system translates voice inputs into text, we leveraged our experience in Speech Recognition to pursue our vision. First, we created robust spatial models that map fuzzy sequences of raw touch points to keys on the keyboard, just like acoustic models map sequences of sound bites to phonetic units. Second, we built a powerful core decoding engine based on finite state transducers (FST) to determine the likeliest word sequence given an input touch sequence. With its mathematical formalism and broad success in speech applications, we knew that an FST decoder would offer the flexibility needed to support a variety of complex keyboard input behaviors as well as language features. In this post, we will detail what went into the development of both of these systems.

Neural Spatial Models
Mobile keyboard input is subject to errors that are generally attributed to “fat finger typing” (or tracing spatially similar words in glide typing, as illustrated below) along with cognitive and motor errors (manifesting in misspellings, character insertions, deletions or swaps, etc). An intelligent keyboard needs to be able to account for these errors and predict the intended words rapidly and accurately. As such, we built a spatial model for Gboard that addresses these errors at the character level, mapping the touch points on the screen to actual keys.
Average glide trails for two spatially-similar words: “Vampire” and “Value”.
Up to recently, Gboard used a Gaussian model to quantify the probability of tapping neighboring keys and a rule-based model to represent cognitive and motor errors. These models were simple and intuitive, but they didn’t allow us to directly optimize metrics that correlate with better typing quality. Drawing on our experience with Voice Search acoustic models we replaced both the Gaussian and rule-based models with a single, highly efficient long short-term memory (LSTM) model trained with a connectionist temporal classification (CTC) criterion.

However, training this model turned out to be a lot more complicated than we had anticipated. While acoustic models are trained from human-transcribed audio data, one cannot easily transcribe millions of touch point sequences and glide traces. So the team exploited user-interaction signals, e.g. reverted auto-corrections and suggestion picks as negative and positive semi-supervised learning signals, to form rich training and test sets.
Raw data points corresponding to the word “could” (left), and normalized sampled trajectory with per-sample variances (right).
A plethora of techniques from the speech recognition literature was used to iterate on the NSM models to make them small and fast enough to be run on any device. The TensorFlow infrastructure was used to train hundreds of models, optimizing various signals surfaced by the keyboard: completions, suggestions, gliding, etc. After more than a year of work, the resulting models were about 6 times faster and 10 times smaller than the initial versions, they also showed about 15% reduction in bad autocorrects and 10% reduction in wrongly decoded gestures on offline datasets.

Finite-State Transducers
While the NSM uses spatial information to help determine what was tapped or swiped, there are additional constraints — lexical and grammatical — that can be brought to bear. A lexicon tells us what words occur in a language and a probabilistic grammar tells us what words are likely to follow other words. To encode this information we use finite-state transducers. FSTs have long been a key component of Google’s speech recognition and synthesis systems. They provide a principled way to represent various probabilistic models (lexicons, grammars, normalizers, etc) used in natural language processing together with the mathematical framework needed to manipulate, optimize, combine and search the models*.

In Gboard, a key-to-word transducer compactly represents the keyboard lexicon as shown in the figure below. It encodes the mapping from key sequences to words, allowing for alternative key sequences and optional spaces.
This transducer encodes “I”, “I’ve”, “If” along paths from the start state (the bold circle 1) to final states (the double circle states 0 and 1). Each arc is labeled with an input key (before the “:”) and a corresponding output word (after the “:”) where ε encodes the empty symbol. The apostrophe in “I’ve” can be omitted. The user may skip the space bar sometimes. To account for that, the space key transition between words in the transducer is optional. The ε and space back arcs allow accepting more than one word.
A probabilistic n-gram transducer is used to represent the language model for the keyboard. A state in the model represents an (up to) n-1 word context and an arc leaving that state is labeled with a successor word together with its probability of following that context (estimated from textual data). These, together with the spatial model that gives the likelihoods of sequences of key touches (discrete tap entries or continuous gestures in glide typing), are combined and explored with a beam search.

Generic FST principles, such as streaming, support for dynamic models, etc took us a long way towards building a new keyboard decoder, but several new functionalities also had to be added. When you speak, you don’t need the decoder to complete your words or guess what you will say next to save you a few syllables; but when you type, you appreciate the help of word completions and predictions. Also, we wanted the keyboard to provide seamless multilingual support, as shown below.
Trilingual input typing in Gboard.
It was a complex effort to get our new decoder off the ground, but the principled nature of FSTs has many benefits. For example, supporting transliterations for languages like Hindi is just a simple extension of the generic decoder.

Transliteration Models
In many languages with complex scripts, romanization systems have been developed to map characters into the Latin alphabet, often according to their phonetic pronunciations. For example, the Pinyin “xièxiè” corresponds to the Chinese characters “谢谢” (“thank you”). A Pinyin keyboard allows users to conveniently type words on a QWERTY layout and have them automatically “translated” into the target script. Likewise, a transliterated Hindi keyboard allows users to type “daanth” for “दांत” (teeth). Whereas Pinyin is an agreed-upon romanization system, Hindi transliterations are more fuzzy; for example “daant” would be a valid alternative for “दांत”.
Transliterated glide input for Hindi.
Just as we have a transducer mapping from letter sequences to words (a lexicon) and a weighted language model automaton providing probabilities for word sequences, we built weighted transducer mappings between Latin key sequences and target script symbol sequences for 22 Indic languages. Some languages have multiple writing systems (Bodo for example can be written in the Bengali or Devanagari scripts) so between transliterated and native layouts, we built 57 new input methods in just a few months.

The general nature of the FST decoder let us leverage all the work we had done to support completions, predictions, glide typing and many UI features with no extra effort, allowing us to offer a rich experience to our Indian users right from the start.

A More Intelligent Keyboard
All in all, our recent work cut the decoding latency by 50%, reduced the fraction of words users have to manually correct by more than 10%, allowed us to launch transliteration support for the 22 official languages of India, and enabled many new features you may have noticed.

While we hope that these recent changes improve your typing experience, we recognize that on-device typing is by no means solved. Gboard can still make suggestions that seem nonintuitive or of low utility and gestures can still be decoded to words a human would never pick. However, our shift towards powerful machine intelligence algorithms has opened new spaces that we’re actively exploring to make more useful tools and products for our users worldwide.

Acknowledgments
This work was done by Cyril Allauzen, Ouais Alsharif, Lars Hellsten, Tom Ouyang, Brian Roark and David Rybach, with help from Speech Data Operation team. Special thanks go to Johan Schalkwyk and Corinna Cortes for their support.


* The toolbox of relevant algorithms is available in the OpenFst open-source library.

Bringing down the language barriers – making the internet more inclusive

There are currently over 400* million Internet users in India, but with only 20% of the population fluent in English, most Internet users have significant language barriers to getting the full value of the Internet. A speaker of Indian languages like Hindi or Tamil still has trouble finding content to read and or use services that they can use in their own languages.

To build rich and empowering experiences for everyone means first and foremost making things work in the languages people speak. Today, we’re taking a huge step forward by launching new set of products and features that will empower the Internet ecosystem to create more language content and better serve the needs of a billion Indians who’re coming online rapidly.

Neural Machine Translation: The world’s content, in your language
Starting today, when you use Google Translate, you might notice that the translation is more accurate and easier to understand, especially when translating full sentences. That’s because we’ve brought our new Neural Machine Translation technology to translations between English and nine widely used Indian languages — Hindi, Bengali, Marathi, Tamil, Telugu, Gujarati, Punjabi, Malayalam and Kannada.

Neural translation is a lot better than our old phrase-based system, translating full sentences at a time, instead of pieces of a sentence. It uses this broader context to help it figure out the most relevant translation, which it then rearranges and adjusts to be more like a human speaking with proper grammar. This new technique improves the quality of translation more in a single jump than we’ve seen in the last ten years combined.

Just like it’s easier to learn a language when you already know a related language, we’ve discovered that our neural technology speaks each language better when it learns several at a time. For example, we have a whole lot more sample data for Hindi than its relatives Marathi and Bengali, but when we train them all together, the translations for all improve more than if we’d trained each individually.

Screen Shot 2017-04-24 at 12.07.04 PM.png
Phrase based Translation      Neural Machine Translation

You can try, these out on iOS and Android Google Translate apps, at translate.google.co.in and through Google Search.

But how does this make the whole web better for everyone — Chrome has it covered!
That’s where Chrome’s built-in Translate functionality comes into play. Every day, more than 150 million web pages are translated by Chrome users through the magic of machine translations with one click or tap.  The Chrome team and the Google Translate team have worked together to bring the power of Neural Machine Translation to web content, making full-page translations more accurate and easier to read.

Today, we’re extending Neural Machine Translation built into Chrome to and from English for the same nine Indian languages (Bengali, Gujarati, Kannada, Malayalam, Marathi, Punjabi, Tamil Telugu and Hindi). This means higher quality translations of everything from song lyrics to news articles to cricket discussions.

            
Screen Shot 2017-04-24 at 5.10.07 PM.png

Gboard in 22 Indian Languages and more
Being able to type in your language of choice is as important as understanding content on the web. Today, we are ramping up support to include 11 new languages to the list of 11 existing Indian languages —with transliteration support—including Hindi, Bengali, Telugu, Marathi, Tamil, Urdu, and Gujarati.

Gboard has all the things you love about your Google Keyboard — speed and accuracy, Glide Typing and voice typing — plus Google Search built in. It also allows you to search and use Google Translate right in your keyboard (just tap the “G” button to get started). And—as a reminder—Gboard already has a Hinglish language option for those of you who often switch back and forth between Hindi and English.

With today’s update, we’ve also dropped in a new text editing tool that makes it easier to select, copy and paste, plus new options for resizing and repositioning the keyboard so it fits to your hand and texting style. And to top it all off, this Gboard update comes with some under-the-hood improvements including better accuracy and predictions while you type

Like Google Indic Keyboard, Gboard has auto-correction and prediction in these new languages, plus two layouts for each—one in the native language script and one with the QWERTY layout for transliteration, which lets you spell words phonetically using the QWERTY alphabet and get text output in your native language script. For example, type “aapko holi ki hardik shubhkamnay” and get “आपको होली की हार्दिक शुभकामनायें ”.

hindi_translit_fixed.gif                                           

This is available today on Google Play Store, so make sure you’re running the latest version of the app.

Auto-translated local reviews in Maps
The local information across Google Search and Maps helps millions of people, every day, to discover and share great places. Our goal is to build a map tailored to each user and their likes and preferences and make it work for everyone in their local languages. Starting today, we’ll automatically add translations to local reviews on Google Maps, both on mobile and desktop. With this update, millions of these reviews – from restaurants to cafes or hotels – will appear in your own language.

All you need to do is launch Google Maps, open reviews, and they’ll appear in both the original language as well as the language you set on your device. So for instance if you speak Tamil and travel to Kolkata, and you want to see reviews of the popular restaurants in Kolkata, you can now automatically see reviews both in your own language and the original language of the review.

Hindi Dictionary in Search
When you search for the meaning of a word in English, for instance “meaning of nostalgic”, you’ll get a dictionary straight in Google Search. Today, in collaboration with the Oxford University Press, we’re bringing the Rajpal & Sons Hindi dictionary online. This new experience supports transliteration so you don’t even need to switch to a Hindi keyboard. So  the next time when you’d like to know more about a word, say Nirdeshak, you can just type in Nirdeshak ka matlab in Search, and you’ll instantly get to see word meanings and dictionary definitions on the search results page, including English translations.

pasted image 0 (11).png

While all these new products and improvements takes us closer to make the web more useful for Indian Language users.  We realise that we can’t do this alone, we need India’s internet ecosystem to come together to build apps and more content to make India’s Internet that serve its users need. And one way to effectively get the Internet Industry together to solve for local language users is to really understand the users, understand their needs to shape India’s Internet landscape. We have worked with KPMG India to compile an industry report titled “Indian Languages - Defining India’s Internet” - which provides rich insights on what we need to do together as an Industry to bring the Internet alive for every Indian.

Source: *Indian Languages - Defining India’s Internet”  Report

Posted By Barak Turovsky, Group Product Manager, Google Translate