Tag Archives: AI

How AI creates photorealistic images from text

Pictures of puppy in a nest emerging from a cracked egg. Photos overlooking a steampunk city with airships. Picture of two robots having a romantic evening at the movies.

Have you ever seen a puppy in a nest emerging from a cracked egg? What about a photo that’s overlooking a steampunk city with airships? Or a picture of two robots having a romantic evening at the movies? These might sound far-fetched, but a novel type of machine learning technology called text-to-image generation makes them possible. These models can generate high-quality, photorealistic images from a simple text prompt.

Within Google Research, our scientists and engineers have been exploring text-to-image generation using a variety of AI techniques. After a lot of testing we recently announced two new text-to-image models — Imagen and Parti. Both have the ability to generate photorealistic images but use different approaches. We want to share a little more about how these models work and their potential.

How text-to-image models work

With text-to-image models, people provide a text description and the models produce images matching the description as closely as possible. This can be something as simple as “an apple” or “a cat sitting on a couch” to more complex details, interactions and descriptive indicators like “a cute sloth holding a small treasure chest. A bright golden glow is coming from the chest.”

A picture of a cute sloth holding a small treasure chest. A bright golden glow is coming from the chest

In the past few years, ML models have been trained on large image datasets with corresponding textual descriptions, resulting in higher quality images and a broader range of descriptions. This has sparked major breakthroughs in this area, including Open AI’s DALL-E 2.

How Imagen and Parti work

Imagen and Parti build on previous models. Transformer models are able to process words in relationship to one another in a sentence. They are foundational to how we represent text in our text-to-image models. Both models also use a new technique that helps generate images that more closely match the text description. While Imagen and Parti use similar technology, they pursue different, but complementary strategies.

Imagen is a Diffusion model, which learns to convert a pattern of random dots to images. These images first start as low resolution and then progressively increase in resolution. Recently, Diffusion models have seen success in both image and audio tasks like enhancing image resolution, recoloring black and white photos, editing regions of an image, uncropping images, and text-to-speech synthesis.

Parti’s approach first converts a collection of images into a sequence of code entries, similar to puzzle pieces. A given text prompt is then translated into these code entries and a new image is created. This approach takes advantage of existing research and infrastructure for large language models such as PaLM and is critical for handling long, complex text prompts and producing high-quality images.

These models have many limitations. For example, neither can reliably produce specific counts of objects (e.g. “ten apples”), nor place them correctly based on specific spatial descriptions (e.g. “a red sphere to the left of a blue block with a yellow triangle on it”). Also, as prompts become more complex, the models begin to falter, either missing details or introducing details that were not provided in the prompt. These behaviors are a result of several shortcomings, including lack of explicit training material, limited data representation, and lack of 3D awareness. We hope to address these gaps through broader representations and more effective integration into the text-to-image generation process.

Taking a responsible approach to Imagen and Parti

Text-to-image models are exciting tools for inspiration and creativity. They also come with risks related to disinformation, bias and safety. We’re having discussions around Responsible AI practices and the necessary steps to safely pursue this technology. As an initial step, we’re using easily identifiable watermarks to ensure people can always recognize an Imagen- or Parti-generated image. We’re also conducting experiments to better understand biases of the models, like how they represent people and cultures, while exploring possible mitigations. The Imagen and Parti papers provide extensive discussion of these issues.

What’s next for text-to-image models at Google

We will push on new ideas that combine the best of both models, and expand to related tasks such as adding the ability to interactively generate and edit images through text. We’re also continuing to conduct in-depth comparisons and evaluations to align with our Responsible AI Principles. Our goal is to bring user experiences based on these models to the world in a safe, responsible way that will inspire creativity.

Building a more helpful browser with machine learning

At Google we use technologies like machine learning (ML) to build more useful products — from filtering out email spam, to keeping maps up to date, to offering more relevant search results. Chrome is no exception: We use ML to make web images more accessible to people who are blind or have low vision, and we also generate real-time captions for online videos, in service of people in noisy environments, and those who are hard of hearing.

This work in Chrome continues, so we wanted to share some recent and future ML improvements that offer a safer, more accessible and more personalized browsing experience. Importantly: these updates are powered by on-device ML models, which means your data stays private, and never leaves your device.

More peace of mind, less annoying prompts

Safe Browsing in Chrome helps protect billions of devices every day, by showing warnings when people try to navigate to dangerous sites or download dangerous files (see the big red example below). Starting in March of this year, we rolled out a new ML model that identifies 2.5 times more potentially malicious sites and phishing attacks as the previous model – resulting in a safer and more secure web.

To further improve the browsing experience, we’re also evolving how people interact with web notifications. On the one hand, page notifications help deliver updates from sites you care about; on the other hand, notification permission prompts can become a nuisance. To help people browse the web with minimal interruption, Chrome predicts when permission prompts are unlikely to be granted based on how the user previously interacted with similar permission prompts, and silences these undesired prompts. In the next release of Chrome, we’re launching an ML model that makes these predictions entirely on-device.

Two separate images side by side. The first on the left is a smartphone showing a red screen and a warning message about phishing. The image on the right shows a Chrome browser window showing a pop-up message saying “Notifications blocked”.

With the next release of Chrome, this is what you will see if a phishing attempt is detected (Left) and Chrome will show permission requests quietly when the user is unlikely to grant them (Right).

Finding what's important, always in your language

Earlier this year we launched Journeys to help people retrace their steps online. For example: You might spend weeks planning a national park visit – researching attractions, comparing flights and shopping for gear. With ML and Journeys, Chrome brings together the pages you’ve visited about a given topic, and makes it easy to pick up where you left off (vs. scr o o o l l ling through your browser history).

When you return to those hiking boots and camping guides, we’re also using ML to make those websites available in your preferred language. In particular, we’ve launched an updated language identification model to figure out the language of the page, and whether it needs to be translated to match your preferences. As a result, we’re seeing tens of millions more successful translations every day.

A Chrome browser showing Journeys related to travel. The user can see a cluster of recent searches they did related to a trip to Yosemite.

The Journeys feature of Chrome groups together your search history based on topic or intent.

A browser built just for you

Maybe you like to read news articles in the morning – phone in one hand, cereal spoon in the other – so you share lots of links from Chrome. Or maybe voice search is more your thing, as you sneak in a few questions during your transit ride to work. Either way, we want to make sure Chrome is meeting you where you’re at, so in the near future, we’ll be using ML to adjust the toolbar in real-time – highlighting the action that’s most useful in that moment (e.g., share link, voice search, etc.). Of course, you’ll be able to customize it manually as well.

A Chrome browser with a highlighted square around an icon to the right of the address bar. At the top is a share icon, and at the bottom is a microphone icon.

The toolbar in Chrome on Android will adapt based on your needs.

Our goal is to build a browser that’s genuinely and continuously helpful, and we’re excited about the possibilities that ML provides. At the end of the day, though, your experience is what really matters, so please tweet @googlechrome to send us your feedback.

How we build with and for people with disabilities

Editor’s note: Today is Global Accessibility Awareness Day. We’re also sharing how we’re making education more accessibleand launching a newAndroid accessibility feature.

Over the past nine years, my job has focused on building accessible products and supporting Googlers with disabilities. Along the way, I’ve been constantly reminded of how vast and diverse the disability community is, and how important it is to continue working alongside this community to build technology and solutions that are truly helpful.

Before delving into some of the accessibility features our teams have been building, I want to share how we’re working to be more inclusive of people with disabilities to create more accessible tools overall.

Nothing about us, without us

In the disability community, people often say “nothing about us without us.” It’s a sentiment that I find sums up what disability inclusion means. The types of barriers that people with disabilities face in society vary depending on who they are, where they live and what resources they have access to. No one’s experience is universal. That’s why it’s essential to include a wide array of people with disabilities at every stage of the development process for any of our accessibility products, initiatives or programs.

We need to work to make sure our teams at Google are reflective of the people we’re building for. To do so, last year we launched our hiring site geared toward people with disabilities — including our Autism Career Program to further grow and strengthen our autistic community. Most recently, we helped launch the Neurodiversity Career Connector along with other companies to create a job portal that connects neurodiverse candidates to companies that are committed to hiring more inclusively.

Beyond our internal communities, we also must partner with communities outside of Google so we can learn what is truly useful to different groups and parlay that understanding into the improvement of current products or the creation of new ones. Those partnerships have resulted in the creation of Project Relate, a communication tool for people with speech impairments, the development of a completely new TalkBack, Android’s built-in screen reader, and the improvement of Select-to-Speak, a Chromebook tool that lets you hear selected text on your screen spoken out loud.

Equitable experiences for everyone

Engaging and listening to these communities — inside and outside of Google — make it possible to create tools and features like the ones we’re sharing today.

The ability to add alt-text, which is a short description of an image that is read aloud by screen readers, directly to images sent through Gmail starts rolling out today. With this update, people who use screen readers will know what’s being sent to them, whether it’s a GIF celebrating the end of the week or a screenshot of an important graph.

Communication tools that are inclusive of everyone are especially important as teams have shifted to fully virtual or hybrid meetings. Again, everyone experiences these changes differently. We’ve heard from some people who are deaf or hard of hearing, that this shift has made it easier to identify who is speaking — something that is often more difficult in person. But, in the case of people who use ASL, we’ve heard that it can be difficult to be in a virtual meeting and simultaneously see their interpreter and the person speaking to them.

Multi-pin, a new feature in Google Meet, helps solve this. Now you can pin multiple video tiles at once, for example, the presenter’s screen and the interpreter’s screen. And like many accessibility features, the usefulness extends beyond people with disabilities. The next time someone is watching a panel and wants to pin multiple people to the screen, this feature makes that possible.

We've also been working to make video content more accessible to those who are blind or low-vision through audio descriptions that describe verbally what is on the screen visually. All of our English language YouTube Originals content from the past year — and moving forward — will now have English audio descriptions available globally. To turn on the audio description track, at the bottom right of the video player, click on “Settings”, select “Audio track”, and choose “English descriptive”.

For many people with speech impairments, being understood by the technology that powers tools like voice typing or virtual assistants can be difficult. In 2019, we started work to change that through Project Euphonia, a research initiative that works with community organizations and people with speech impairments to create more inclusive speech recognition models. Today, we’re expanding Project Euphonia’s research to include four more languages: French, Hindi, Japanese and Spanish. With this expansion, we can create even more helpful technology for more people — no matter where they are or what language they speak.

I’ve learned so much in my time working in this space and among the things I’ve learned is the absolute importance of building right alongside the very people who will most use these tools in the end. We’ll continue to do that as we work to create a more inclusive and accessible world, both physically and digitally.

Improving skin tone representation across Google

Seeing yourself reflected in the world around you — in real life, media or online — is so important. And we know that challenges with image-based technologies and representation on the web have historically left people of color feeling overlooked and misrepresented. Last year, we announced Real Tone for Pixel, which is just one example of our efforts to improve representation of diverse skin tones across Google products.

Today, we're introducing a next step in our commitment to image equity and improving representation across our products. In partnership with Harvard professor and sociologist Dr. Ellis Monk, we’re releasing a new skin tone scale designed to be more inclusive of the spectrum of skin tones we see in our society. Dr. Monk has been studying how skin tone and colorism affect people’s lives for more than 10 years.

The culmination of Dr. Monk’s research is the Monk Skin Tone (MST) Scale, a 10-shade scale that will be incorporated into various Google products over the coming months. We’re openly releasing the scale so anyone can use it for research and product development. Our goal is for the scale to support inclusive products and research across the industry — we see this as a chance to share, learn and evolve our work with the help of others.

Ten circles in a row, ranging from dark to light.

The 10 shades of the Monk Skin Tone Scale.

This scale was designed to be easy-to-use for development and evaluation of technology while representing a broader range of skin tones. In fact, our research found that amongst participants in the U.S., people found the Monk Skin Tone Scale to be more representative of their skin tones compared to the current tech industry standard. This was especially true for people with darker skin tones.

“In our research, we found that a lot of the time people feel they’re lumped into racial categories, but there’s all this heterogeneity with ethnic and racial categories,” Dr. Monk says. “And many methods of categorization, including past skin tone scales, don’t pay attention to this diversity. That’s where a lack of representation can happen…we need to fine-tune the way we measure things, so people feel represented.”

Using the Monk Skin Tone Scale to improve Google products

Updating our approach to skin tone can help us better understand representation in imagery, as well as evaluate whether a product or feature works well across a range of skin tones. This is especially important for computer vision, a type of AI that allows computers to see and understand images. When not built and tested intentionally to include a broad range of skin-tones, computer vision systems have been found to not perform as well for people with darker skin.

The MST Scale will help us and the tech industry at large build more representative datasets so we can train and evaluate AI models for fairness, resulting in features and products that work better for everyone — of all skin tones. For example, we use the scale to evaluate and improve the models that detect faces in images.

Here are other ways you’ll see this show up in Google products.

Improving skin tone representation in Search

Every day, millions of people search the web expecting to find images that reflect their specific needs. That’s why we’re also introducing new features using the MST Scale to make it easier for people of all backgrounds to find more relevant and helpful results.

For example, now when you search for makeup related queries in Google Images, you'll see an option to further refine your results by skin tone. So if you’re looking for “everyday eyeshadow” or “bridal makeup looks” you’ll more easily find results that work better for your needs.

Animated GIF showing a Google Images search for “bridal makeup looks.” The results include an option to filter by skin tone; the cursor selects a darker skin tone, which adjusts to results that are more relevant to this choice.

Seeing yourself represented in results can be key to finding information that's truly relevant and useful, which is why we’re also rolling out improvements to show a greater range of skin tones in image results for broad searches about people, or ones where people show up in the results. In the future, we’ll incorporate the MST Scale to better detect and rank images to include a broader range of results, so everyone can find what they're looking for.

Creating a more representative Search experience isn’t something we can do alone, though. How content is labeled online is a key factor in how our systems surface relevant results. In the coming months, we'll also be developing a standardized way to label web content. Creators, brands and publishers will be able to use this new inclusive schema to label their content with attributes like skin tone, hair color and hair texture. This will make it possible for content creators or online businesses to label their imagery in a way that search engines and other platforms can easily understand.

A photograph of a Black person looking into the camera. Tags hover over various areas of the photo; one over their skin says “Skin tone” with a circle matching their skin tone. Two additional tags over their hair read “Hair color” and “Hair texture.

Improving skin tone representation in Google Photos

We’ll also be using the MST Scale to improve Google Photos. Last year, we introduced an improvement to our auto enhance feature in partnership with professional image makers. Now we’re launching a new set of Real Tone filters that are designed to work well across skin tones and evaluated using the MST Scale. We worked with a diverse range of renowned image makers, like Kennedi Carter and Joshua Kissi, who are celebrated for beautiful and accurate depictions of their subjects, to evaluate, test and build these filters. These new Real Tone filters allow you to choose from a wider assortment of looks and find one that reflects your style. Real Tone filters will be rolling out on Google Photos across Android, iOS and Web in the coming weeks.

Animated video showing before and after photos of images with the Real Tone Filter.

What’s next?

We’re openly releasing the Monk Skin Tone Scale so that others can use it in their own products, and learn from this work —and so that we can partner with and learn from them. We want to get feedback, drive more interdisciplinary research, and make progress together. We encourage you to share your thoughts here. We’re continuing to collaborate with Dr. Monk to evaluate the MST Scale across different regions and product applications, and we’ll iterate and improve on it to make sure the scale works for people and use cases all over the world. And, we’ll continue our efforts to make Google’s products work even better for every user.

The best part of working on this project is that it isn’t just ours — while we’re committed to making Google products better and more inclusive, we’re also excited about all the possibilities that exist as we work together to build for everyone across the web.

Unlocking Zero-Resource Machine Translation to Support New Languages in Google Translate

Machine translation (MT) technology has made significant advances in recent years, as deep learning has been integrated with natural language processing (NLP). Performance on research benchmarks like WMT have soared, and translation services have improved in quality and expanded to include new languages. Nevertheless, while existing translation services cover languages spoken by the majority of people world wide, they only include around 100 languages in total, just over 1% of those actively spoken globally. Moreover, the languages that are currently represented are overwhelmingly European, largely overlooking regions of high linguistic diversity, like Africa and the Americas.

There are two key bottlenecks towards building functioning translation models for the long tail of languages. The first arises from data scarcity; digitized data for many languages is limited and can be difficult to find on the web due to quality issues with Language Identification (LangID) models. The second challenge arises from modeling limitations. MT models usually train on large amounts of parallel (translated) text, but without such data, models must learn to translate from limited amounts of monolingual text, which is a novel area of research. Both of these challenges need to be addressed for translation models to reach sufficient quality.

In “Building Machine Translation Systems for the Next Thousand Languages”, we describe how to build high-quality monolingual datasets for over a thousand languages that do not have translation datasets available and demonstrate how one can use monolingual data alone to train MT models. As part of this effort, we are expanding Google Translate to include 24 under-resourced languages. For these languages, we created monolingual datasets by developing and using specialized neural language identification models combined with novel filtering approaches. The techniques we introduce supplement massively multilingual models with a self supervised task to enable zero-resource translation. Finally, we highlight how native speakers have helped us realize this accomplishment.

Meet the Data
Automatically gathering usable textual data for under-resourced languages is much more difficult than it may seem. Tasks like LangID, which work well for high-resource languages, are unsuccessful for under-resourced languages, and many publicly available datasets crawled from the web often contain more noise than usable data for the languages they attempt to support. In our early attempts to identify under-resourced languages on the web by training a standard Compact Language Detector v3 (CLD3) LangID model, we too found that the dataset was too noisy to be usable.

As an alternative, we trained a Transformer-based, semi-supervised LangID model on over 1000 languages. This model supplements the LangID task with the MAsked Sequence-to-Sequence (MASS) task to better generalize over noisy web data. MASS simply garbles the input by randomly removing sequences of tokens from it, and trains the model to predict these sequences. We applied the Transformer-based model to a dataset that had been filtered with a CLD3 model and trained to recognize clusters of similar languages.

We then applied the open sourced Term Frequency-Inverse Internet Frequency (TF-IIF) filtering to the resulting dataset to find and discard sentences that were actually in related high-resource languages, and developed a variety of language-specific filters to eliminate specific pathologies. The result of this effort was a dataset with monolingual text in over 1000 languages, of which 400 had over 100,000 sentences. We performed human evaluations on samples of 68 of these languages and found that the majority (>70%) reflected high-quality, in-language content.

The amount of monolingual data per language versus the amount of parallel (translated) data per language. A small number of languages have large amounts of parallel data, but there is a long tail of languages with only monolingual data.

Meet the Models
Once we had a dataset of monolingual text in over 1000 languages, we then developed a simple yet practical approach for zero-resource translation, i.e., translation for languages with no in-language parallel text and no language-specific translation examples. Rather than limiting our model to an artificial scenario with only monolingual text, we also include all available parallel text data with millions of examples for higher resource languages to enable the model to learn the translation task. Simultaneously, we train the model to learn representations of under-resourced languages directly from monolingual text using the MASS task. In order to solve this task, the model is forced to develop a sophisticated representation of the language in question, developing a complex understanding of how words relate to other words in a sentence.

Relying on the benefits of transfer learning in massively multilingual models, we train a single giant translation model on all available data for over 1000 languages. The model trains on monolingual text for all 1138 languages and on parallel text for a subset of 112 of the higher-resourced languages.

At training time, any input the model sees has a special token indicating which language the output should be in, exactly like the standard formulation for multilingual translation. Our additional innovation is to use the same special tokens for both the monolingual MASS task and the translation task. Therefore, the token translate_to_french may indicate that the source is in English and needs to be translated to French (the translation task), or it may mean that the source is in garbled French and needs to be translated to fluent French (the MASS task). By using the same tags for both tasks, a translate_to_french tag takes on the meaning, “Produce a fluent output in French that is semantically close to the input, regardless of whether the input is garbled in the same language or in another language entirely. From the model’s perspective, there is not much difference between the two.

Surprisingly, this simple procedure produces high quality zero-shot translations. The BLEU and ChrF scores for the resulting model are in the 10–40 and 20–60 ranges respectively, indicating mid- to high-quality translation. We observed meaningful translations even for highly inflected languages like Quechua and Kalaallisut, despite these languages being linguistically dissimilar to all other languages in the model. However, we only computed these metrics on the small subset of languages with human-translated evaluation sets. In order to understand the quality of translation for the remaining languages, we developed an evaluation metric based on round-trip translation, which allowed us to see that several hundred languages are reaching high translation quality.

To further improve quality, we use the model to generate large amounts of synthetic parallel data, filter the data based on round-trip translation (comparing a sentence translated into another language and back again), and continue training the model on this filtered synthetic data via back-translation and self-training. Finally, we fine-tune the model on a smaller subset of 30 languages and distill it into a model small enough to be served.

Translation accuracy scores for 638 of the languages supported in our model, using the metric we developed (RTTLangIDChrF), for both the higher-resource supervised languages and the low-resource zero-resource languages.

Contributions from Native Speakers
Regular communication with native speakers of these languages was critical for our research. We collaborated with over 100 people at Google and other institutions who spoke these languages. Some volunteers helped develop specialized filters to remove out-of-language content overlooked by automatic methods, for instance Hindi mixed with Sanskrit. Others helped with transliterating between different scripts used by the languages, for instance between Meetei Mayek and Bengali, for which sufficient tools didn’t exist; and yet others helped with a gamut of tasks related to evaluation. Native speakers were also key for advising in matters of political sensitivity, like the appropriate name for the language, and the appropriate writing system to use for it. And only native speakers could answer the ultimate question: given the current quality of translation, would it be valuable to the community for Google Translate to support this language?

Closing Notes
This advance is an exciting first step toward supporting more language technologies in under-resourced languages. Most importantly, we want to stress that the quality of translations produced by these models still lags far behind that of the higher-resource languages supported by Google Translate. These models are certainly a useful first tool for understanding content in under-resourced languages, but they will make mistakes and exhibit their own biases. As with any ML-driven tool, one should consider the output carefully.

The complete list of new languages added to Google Translate in this update:

We would like to thank Julia Kreutzer, Orhan Firat, Daan van Esch, Aditya Siddhant, Mengmeng Niu, Pallavi Baljekar, Xavier Garcia, Wolfgang Macherey, Theresa Breiner, Vera Axelrod, Jason Riesa, Yuan Cao, Mia Xu Chen, Klaus Macherey, Maxim Krikun, Pidong Wang, Alexander Gutkin, Apurva Shah, Yanping Huang, Zhifeng Chen, Yonghui Wu, and Macduff Hughes for their contributions to the research, engineering, and leadership of this project.

We would also like to extend our deepest gratitude to the following native speakers and members of affected communities, who helped us in a wide variety of ways: Yasser Salah Eddine Bouchareb (Algerian Arabic); Mfoniso Ukwak (Anaang); Bhaskar Borthakur, Kishor Barman, Rasika Saikia, Suraj Bharech (Assamese); Ruben Hilare Quispe (Aymara); Devina Suyanto (Balinese); Allahserix Auguste Tapo, Bakary Diarrassouba, Maimouna Siby (Bambara); Mohammad Jahangir (Baluchi); Subhajit Naskar (Bengali); Animesh Pathak, Ankur Bapna, Anup Mohan, Chaitanya Joshi, Chandan Dubey, Kapil Kumar, Manish Katiyar, Mayank Srivastava, Neeharika, Saumya Pathak, Tanya Sinha, Vikas Singh (Bhojpuri); Bowen Liang, Ellie Chio, Eric Dong, Frank Tang, Jeff Pitman, John Wong, Kenneth Chang, Manish Goregaokar, Mingfei Lau, Ryan Li, Yiwen Luo (Cantonese); Monang Setyawan (Caribbean Javanese); Craig Cornelius (Cherokee); Anton Prokopyev (Chuvash); Rajat Dogra, Sid Dogra (Dogri); Mohamed Kamagate (Dyula); Chris Assigbe, Dan Ameme, Emeafa Doe, Irene Nyavor, Thierry Gnanih, Yvonne Dumor (Ewe); Abdoulaye Barry, Adama Diallo, Fauzia van der Leeuw, Ibrahima Barry (Fulfulde); Isabel Papadimitriou (Greek); Alex Rudnick (Guarani); Mohammad Khdeir (Gulf Arabic); Paul Remollata (Hiligaynon); Ankur Bapna (Hindi); Mfoniso Ukwak (Ibibio); Nze Lawson (Igbo); D.J. Abuy, Miami Cabansay (Ilocano); Archana Koul, Shashwat Razdan, Sujeet Akula (Kashmiri); Jatin Kulkarni, Salil Rajadhyaksha, Sanjeet Hegde Desai, Sharayu Shenoy, Shashank Shanbhag, Shashi Shenoy (Konkani); Ryan Michael, Terrence Taylor (Krio); Bokan Jaff, Medya Ghazizadeh, Roshna Omer Abdulrahman, Saman Vaisipour, Sarchia Khursheed (Kurdish (Sorani));Suphian Tweel (Libyan Arabic); Doudou Kisabaka (Lingala); Colleen Mallahan, John Quinn (Luganda); Cynthia Mboli (Luyia); Abhishek Kumar, Neeraj Mishra, Priyaranjan Jha, Saket Kumar, Snehal Bhilare (Maithili); Lisa Wang (Mandarin Chinese); Cibu Johny (Malayalam); Viresh Ratnakar (Marathi); Abhi Sanoujam, Gautam Thockchom, Pritam Pebam, Sam Chaomai, Shangkar Mayanglambam, Thangjam Hindustani Devi (Meiteilon (Manipuri)); Hala Ajil (Mesopotamian Arabic); Hamdanil Rasyid (Minangkabau); Elizabeth John, Remi Ralte, S Lallienkawl Gangte,Vaiphei Thatsing, Vanlalzami Vanlalzami (Mizo); George Ouais (MSA); Ahmed Kachkach, Hanaa El Azizi (Morrocan Arabic); Ujjwal Rajbhandari (Newari); Ebuka Ufere, Gabriel Fynecontry, Onome Ofoman, Titi Akinsanmi (Nigerian Pidgin); Marwa Khost Jarkas (North Levantine Arabic); Abduselam Shaltu, Ace Patterson, Adel Kassem, Mo Ali, Yonas Hambissa (Oromo); Helvia Taina, Marisol Necochea (Quechua); AbdelKarim Mardini (Saidi Arabic); Ishank Saxena, Manasa Harish, Manish Godara, Mayank Agrawal, Nitin Kashyap, Ranjani Padmanabhan, Ruchi Lohani, Shilpa Jindal, Shreevatsa Rajagopalan, Vaibhav Agarwal, Vinod Krishnan (Sanskrit); Nabil Shahid (Saraiki); Ayanda Mnyakeni (Sesotho, Sepedi); Landis Baker (Seychellois Creole); Taps Matangira (Shona); Ashraf Elsharif (Sudanese Arabic); Sakhile Dlamini (Swati); Hakim Sidahmed (Tamazight); Melvin Johnson (Tamil); Sneha Kudugunta (Telugu); Alexander Tekle, Bserat Ghebremicael, Nami Russom, Naud Ghebre (Tigrinya); Abigail Annkah, Diana Akron, Maame Ofori, Monica Opoku-Geren, Seth Duodu-baah, Yvonne Dumor (Twi); Ousmane Loum (Wolof); and Daniel Virtheim (Yiddish).

Source: Google AI Blog

Google I/O 2022: Advancing knowledge and computing


Nearly 24 years ago, Google started with two graduate students, one product, and a big mission: to organize the world’s information and make it universally accessible and useful. In the decades since, we’ve been developing our technology to deliver on that mission.

The progress we've made is because of our years of investment in advanced technologies, from AI to the technical infrastructure that powers it all. And once a year — on my favorite day of the year :) — we share an update on how it’s going at Google I/O.

Today, I talked about how we’re advancing two fundamental aspects of our mission — knowledge and computing — to create products that are built to help. It’s exciting to build these products; it’s even more exciting to see what people do with them.

Thank you to everyone who helps us do this work, and most especially our Googlers. We are grateful for the opportunity.

- Sundar

Editor’s note: Below is an edited transcript of Sundar Pichai's keynote address during the opening of today's Google I/O Developers Conference.

Hi, everyone, and welcome. Actually, let’s make that welcome back! It’s great to return to Shoreline Amphitheatre after three years away. To the thousands of developers, partners and Googlers here with us, it’s great to see all of you. And to the millions more joining us around the world — we’re so happy you’re here, too.

Last year, we shared how new breakthroughs in some of the most technically challenging areas of computer science are making Google products more helpful in the moments that matter. All this work is in service of our timeless mission: to organize the world's information and make it universally accessible and useful.

I'm excited to show you how we’re driving that mission forward in two key ways: by deepening our understanding of information so that we can turn it into knowledge; and advancing the state of computing, so that knowledge is easier to access, no matter who or where you are.

Today, you'll see how progress on these two parts of our mission ensures Google products are built to help. I’ll start with a few quick examples. Throughout the pandemic, Google has focused on delivering accurate information to help people stay healthy. Over the last year, people used Google Search and Maps to find where they could get a COVID vaccine nearly two billion times.

A visualization of Google’s flood forecasting system, with three 3D maps stacked on top of one another, showing landscapes and weather patterns in green and brown colors. The maps are floating against a gray background.

Google’s flood forecasting technology sent flood alerts to 23 million people in India and Bangladesh last year.

We’ve also expanded our flood forecasting technology to help people stay safe in the face of natural disasters. During last year’s monsoon season, our flood alerts notified more than 23 million people in India and Bangladesh. And we estimate this supported the timely evacuation of hundreds of thousands of people.

In Ukraine, we worked with the government to rapidly deploy air raid alerts. To date, we’ve delivered hundreds of millions of alerts to help people get to safety. In March I was in Poland, where millions of Ukrainians have sought refuge. Warsaw’s population has increased by nearly 20% as families host refugees in their homes, and schools welcome thousands of new students. Nearly every Google employee I spoke with there was hosting someone.

Adding 24 more languages to Google Translate

In countries around the world, Google Translate has been a crucial tool for newcomers and residents trying to communicate with one another. We’re proud of how it’s helping Ukrainians find a bit of hope and connection until they are able to return home again.

Two boxes, one showing a question in English — “What’s the weather like today?” — the other showing its translation in Quechua. There is a microphone symbol below the English question and a loudspeaker symbol below the Quechua answer.

With machine learning advances, we're able to add languages like Quechua to Google Translate.

Real-time translation is a testament to how knowledge and computing come together to make people's lives better. More people are using Google Translate than ever before, but we still have work to do to make it universally accessible. There’s a long tail of languages that are underrepresented on the web today, and translating them is a hard technical problem. That’s because translation models are usually trained with bilingual text — for example, the same phrase in both English and Spanish. However, there's not enough publicly available bilingual text for every language.

So with advances in machine learning, we’ve developed a monolingual approach where the model learns to translate a new language without ever seeing a direct translation of it. By collaborating with native speakers and institutions, we found these translations were of sufficient quality to be useful, and we'll continue to improve them.

A list of the 24 new languages Google Translate now has available.

We’re adding 24 new languages to Google Translate.

Today, I’m excited to announce that we’re adding 24 new languages to Google Translate, including the first indigenous languages of the Americas. Together, these languages are spoken by more than 300 million people. Breakthroughs like this are powering a radical shift in how we access knowledge and use computers.

Taking Google Maps to the next level

So much of what’s knowable about our world goes beyond language — it’s in the physical and geospatial information all around us. For more than 15 years, Google Maps has worked to create rich and useful representations of this information to help us navigate. Advances in AI are taking this work to the next level, whether it’s expanding our coverage to remote areas, or reimagining how to explore the world in more intuitive ways.

An overhead image of a map of a dense urban area, showing gray roads cutting through clusters of buildings outlined in blue.

Advances in AI are helping to map remote and rural areas.

Around the world, we’ve mapped around 1.6 billion buildings and over 60 million kilometers of roads to date. Some remote and rural areas have previously been difficult to map, due to scarcity of high-quality imagery and distinct building types and terrain. To address this, we’re using computer vision and neural networks to detect buildings at scale from satellite images. As a result, we have increased the number of buildings on Google Maps in Africa by 5X since July 2020, from 60 million to nearly 300 million.

We’ve also doubled the number of buildings mapped in India and Indonesia this year. Globally, over 20% of the buildings on Google Maps have been detected using these new techniques. We’ve gone a step further, and made the dataset of buildings in Africa publicly available. International organizations like the United Nations and the World Bank are already using it to better understand population density, and to provide support and emergency assistance.

Immersive view in Google Maps fuses together aerial and street level images.

We’re also bringing new capabilities into Maps. Using advances in 3D mapping and machine learning, we’re fusing billions of aerial and street level images to create a new, high-fidelity representation of a place. These breakthrough technologies are coming together to power a new experience in Maps called immersive view: it allows you to explore a place like never before.

Let’s go to London and take a look. Say you’re planning to visit Westminster with your family. You can get into this immersive view straight from Maps on your phone, and you can pan around the sights… here’s Westminster Abbey. If you’re thinking of heading to Big Ben, you can check if there's traffic, how busy it is, and even see the weather forecast. And if you’re looking to grab a bite during your visit, you can check out restaurants nearby and get a glimpse inside.

What's amazing is that isn't a drone flying in the restaurant — we use neural rendering to create the experience from images alone. And Google Cloud Immersive Stream allows this experience to run on just about any smartphone. This feature will start rolling out in Google Maps for select cities globally later this year.

Another big improvement to Maps is eco-friendly routing. Launched last year, it shows you the most fuel-efficient route, giving you the choice to save money on gas and reduce carbon emissions. Eco-friendly routes have already rolled out in the U.S. and Canada — and people have used them to travel approximately 86 billion miles, helping save an estimated half million metric tons of carbon emissions, the equivalent of taking 100,000 cars off the road.

Still image of eco-friendly routing on Google Maps — a 53-minute driving route in Berlin is pictured, with text below the map showing it will add three minutes but save 18% more fuel.

Eco-friendly routes will expand to Europe later this year.

I’m happy to share that we’re expanding this feature to more places, including Europe later this year. In this Berlin example, you could reduce your fuel consumption by 18% taking a route that’s just three minutes slower. These small decisions have a big impact at scale. With the expansion into Europe and beyond, we estimate carbon emission savings will double by the end of the year.

And we’ve added a similar feature to Google Flights. When you search for flights between two cities, we also show you carbon emission estimates alongside other information like price and schedule, making it easy to choose a greener option. These eco-friendly features in Maps and Flights are part of our goal to empower 1 billion people to make more sustainable choices through our products, and we’re excited about the progress here.

New YouTube features to help people easily access video content

Beyond Maps, video is becoming an even more fundamental part of how we share information, communicate, and learn. Often when you come to YouTube, you are looking for a specific moment in a video and we want to help you get there faster.

Last year we launched auto-generated chapters to make it easier to jump to the part you’re most interested in.

This is also great for creators because it saves them time making chapters. We’re now applying multimodal technology from DeepMind. It simultaneously uses text, audio and video to auto-generate chapters with greater accuracy and speed. With this, we now have a goal to 10X the number of videos with auto-generated chapters, from eight million today, to 80 million over the next year.

Often the fastest way to get a sense of a video’s content is to read its transcript, so we’re also using speech recognition models to transcribe videos. Video transcripts are now available to all Android and iOS users.

Animation showing a video being automatically translated. Then text reads "Now available in sixteen languages."

Auto-translated captions on YouTube.

Next up, we’re bringing auto-translated captions on YouTube to mobile. Which means viewers can now auto-translate video captions in 16 languages, and creators can grow their global audience. We’ll also be expanding auto-translated captions to Ukrainian YouTube content next month, part of our larger effort to increase access to accurate information about the war.

Helping people be more efficient with Google Workspace

Just as we’re using AI to improve features in YouTube, we’re building it into our Workspace products to help people be more efficient. Whether you work for a small business or a large institution, chances are you spend a lot of time reading documents. Maybe you’ve felt that wave of panic when you realize you have a 25-page document to read ahead of a meeting that starts in five minutes.

At Google, whenever I get a long document or email, I look for a TL;DR at the top — TL;DR is short for “Too Long, Didn’t Read.” And it got us thinking, wouldn’t life be better if more things had a TL;DR?

That’s why we’ve introduced automated summarization for Google Docs. Using one of our machine learning models for text summarization, Google Docs will automatically parse the words and pull out the main points.

This marks a big leap forward for natural language processing. Summarization requires understanding of long passages, information compression and language generation, which used to be outside of the capabilities of even the best machine learning models.

And docs are only the beginning. We’re launching summarization for other products in Workspace. It will come to Google Chat in the next few months, providing a helpful digest of chat conversations, so you can jump right into a group chat or look back at the key highlights.

Animation showing summary in Google Chat

We’re bringing summarization to Google Chat in the coming months.

And we’re working to bring transcription and summarization to Google Meet as well so you can catch up on some important meetings you missed.

Visual improvements on Google Meet

Of course there are many moments where you really want to be in a virtual room with someone. And that’s why we continue to improve audio and video quality, inspired by Project Starline. We introduced Project Starline at I/O last year. And we’ve been testing it across Google offices to get feedback and improve the technology for the future. And in the process, we’ve learned some things that we can apply right now to Google Meet.

Starline inspired machine learning-powered image processing to automatically improve your image quality in Google Meet. And it works on all types of devices so you look your best wherever you are.

An animation of a man looking directly at the camera then waving and smiling. A white line sweeps across the screen, adjusting the image quality to make it brighter and clearer.

Machine learning-powered image processing automatically improves image quality in Google Meet.

We’re also bringing studio quality virtual lighting to Meet. You can adjust the light position and brightness, so you’ll still be visible in a dark room or sitting in front of a window. We’re testing this feature to ensure everyone looks like their true selves, continuing the work we’ve done with Real Tone on Pixel phones and the Monk Scale.

These are just some of the ways AI is improving our products: making them more helpful, more accessible, and delivering innovative new features for everyone.

Gif shows a phone camera pointed towards a rack of shelves, generating helpful information about food items. Text on the screen shows the words ‘dark’, ‘nut-free’ and ‘highly-rated’.

Today at I/O Prabhakar Raghavan shared how we’re helping people find helpful information in more intuitive ways on Search.

Making knowledge accessible through computing

We’ve talked about how we’re advancing access to knowledge as part of our mission: from better language translation to improved Search experiences across images and video, to richer explorations of the world using Maps.

Now we’re going to focus on how we make that knowledge even more accessible through computing. The journey we’ve been on with computing is an exciting one. Every shift, from desktop to the web to mobile to wearables and ambient computing has made knowledge more useful in our daily lives.

As helpful as our devices are, we’ve had to work pretty hard to adapt to them. I’ve always thought computers should be adapting to people, not the other way around. We continue to push ourselves to make progress here.

Here’s how we’re making computing more natural and intuitive with the Google Assistant.

Introducing LaMDA 2 and AI Test Kitchen

Animation shows demos of how LaMDA can converse on any topic and how AI Test Kitchen can help create lists.

A demo of LaMDA, our generative language model for dialogue application, and the AI Test Kitchen.

We're continually working to advance our conversational capabilities. Conversation and natural language processing are powerful ways to make computers more accessible to everyone. And large language models are key to this.

Last year, we introduced LaMDA, our generative language model for dialogue applications that can converse on any topic. Today, we are excited to announce LaMDA 2, our most advanced conversational AI yet.

We are at the beginning of a journey to make models like these useful to people, and we feel a deep responsibility to get it right. To make progress, we need people to experience the technology and provide feedback. We opened LaMDA up to thousands of Googlers, who enjoyed testing it and seeing its capabilities. This yielded significant quality improvements, and led to a reduction in inaccurate or offensive responses.

That’s why we’ve made AI Test Kitchen. It’s a new way to explore AI features with a broader audience. Inside the AI Test Kitchen, there are a few different experiences. Each is meant to give you a sense of what it might be like to have LaMDA in your hands and use it for things you care about.

The first is called “Imagine it.” This demo tests if the model can take a creative idea you give it, and generate imaginative and relevant descriptions. These are not products, they are quick sketches that allow us to explore what LaMDA can do with you. The user interfaces are very simple.

Say you’re writing a story and need some inspirational ideas. Maybe one of your characters is exploring the deep ocean. You can ask what that might feel like. Here LaMDA describes a scene in the Mariana Trench. It even generates follow-up questions on the fly. You can ask LaMDA to imagine what kinds of creatures might live there. Remember, we didn’t hand-program the model for specific topics like submarines or bioluminescence. It synthesized these concepts from its training data. That’s why you can ask about almost any topic: Saturn’s rings or even being on a planet made of ice cream.

Staying on topic is a challenge for language models. Say you’re building a learning experience — you want it to be open-ended enough to allow people to explore where curiosity takes them, but stay safely on topic. Our second demo tests how LaMDA does with that.

In this demo, we’ve primed the model to focus on the topic of dogs. It starts by generating a question to spark conversation, “Have you ever wondered why dogs love to play fetch so much?” And if you ask a follow-up question, you get an answer with some relevant details: it’s interesting, it thinks it might have something to do with the sense of smell and treasure hunting.

You can take the conversation anywhere you want. Maybe you’re curious about how smell works and you want to dive deeper. You’ll get a unique response for that too. No matter what you ask, it will try to keep the conversation on the topic of dogs. If I start asking about cricket, which I probably would, the model brings the topic back to dogs in a fun way.

This challenge of staying on-topic is a tricky one, and it’s an important area of research for building useful applications with language models.

These experiences show the potential of language models to one day help us with things like planning, learning about the world, and more.

Of course, there are significant challenges to solve before these models can truly be useful. While we have improved safety, the model might still generate inaccurate, inappropriate, or offensive responses. That’s why we are inviting feedback in the app, so people can help report problems.

We will be doing all of this work in accordance with our AI Principles. Our process will be iterative, opening up access over the coming months, and carefully assessing feedback with a broad range of stakeholders — from AI researchers and social scientists to human rights experts. We’ll incorporate this feedback into future versions of LaMDA, and share our findings as we go.

Over time, we intend to continue adding other emerging areas of AI into AI Test Kitchen. You can learn more at: g.co/AITestKitchen.

Advancing AI language models

LaMDA 2 has incredible conversational capabilities. To explore other aspects of natural language processing and AI, we recently announced a new model. It’s called Pathways Language Model, or PaLM for short. It’s our largest model to date and trained on 540 billion parameters.

PaLM demonstrates breakthrough performance on many natural language processing tasks, such as generating code from text, answering a math word problem, or even explaining a joke.

It achieves this through greater scale. And when we combine that scale with a new technique called chain-of- thought prompting, the results are promising. Chain-of-thought prompting allows us to describe multi-step problems as a series of intermediate steps.

Let’s take an example of a math word problem that requires reasoning. Normally, how you use a model is you prompt it with a question and answer, and then you start asking questions. In this case: How many hours are in the month of May? So you can see, the model didn’t quite get it right.

In chain-of-thought prompting, we give the model a question-answer pair, but this time, an explanation of how the answer was derived. Kind of like when your teacher gives you a step-by-step example to help you understand how to solve a problem. Now, if we ask the model again — how many hours are in the month of May — or other related questions, it actually answers correctly and even shows its work.

There are two boxes below a heading saying ‘chain-of-thought prompting’. A box headed ‘input’ guides the model through answering a question about how many tennis balls a person called Roger has. The output box shows the model correctly reasoning through and answering a separate question (‘how many hours are in the month of May?’)

Chain-of-thought prompting leads to better reasoning and more accurate answers.

Chain-of-thought prompting increases accuracy by a large margin. This leads to state-of-the-art performance across several reasoning benchmarks, including math word problems. And we can do it all without ever changing how the model is trained.

PaLM is highly capable and can do so much more. For example, you might be someone who speaks a language that’s not well-represented on the web today — which makes it hard to find information. Even more frustrating because the answer you are looking for is probably out there. PaLM offers a new approach that holds enormous promise for making knowledge more accessible for everyone.

Let me show you an example in which we can help answer questions in a language like Bengali — spoken by a quarter billion people. Just like before we prompt the model with two examples of questions in Bengali with both Bengali and English answers.

That’s it, now we can start asking questions in Bengali: “What is the national song of Bangladesh?” The answer, by the way, is “Amar Sonar Bangla” — and PaLM got it right, too. This is not that surprising because you would expect that content to exist in Bengali.

You can also try something that is less likely to have related information in Bengali such as: “What are popular pizza toppings in New York City?” The model again answers correctly in Bengali. Though it probably just stirred up a debate amongst New Yorkers about how “correct” that answer really is.

What’s so impressive is that PaLM has never seen parallel sentences between Bengali and English. Nor was it ever explicitly taught to answer questions or translate at all! The model brought all of its capabilities together to answer questions correctly in Bengali. And we can extend the techniques to more languages and other complex tasks.

We're so optimistic about the potential for language models. One day, we hope we can answer questions on more topics in any language you speak, making knowledge even more accessible, in Search and across all of Google.

Introducing the world’s largest, publicly available machine learning hub

The advances we’ve shared today are possible only because of our continued innovation in our infrastructure. Recently we announced plans to invest $9.5 billion in data centers and offices across the U.S.

One of our state-of-the-art data centers is in Mayes County, Oklahoma. I’m excited to announce that, there, we are launching the world’s largest, publicly-available machine learning hub for our Google Cloud customers.

Still image of a data center with Oklahoma map pin on bottom left corner.

One of our state-of-the-art data centers in Mayes County, Oklahoma.

This machine learning hub has eight Cloud TPU v4 pods, custom-built on the same networking infrastructure that powers Google’s largest neural models. They provide nearly nine exaflops of computing power in aggregate — bringing our customers an unprecedented ability to run complex models and workloads. We hope this will fuel innovation across many fields, from medicine to logistics, sustainability and more.

And speaking of sustainability, this machine learning hub is already operating at 90% carbon-free energy. This is helping us make progress on our goal to become the first major company to operate all of our data centers and campuses globally on 24/7 carbon-free energy by 2030.

Even as we invest in our data centers, we are working to innovate on our mobile platforms so more processing can happen locally on device. Google Tensor, our custom system on a chip, was an important step in this direction. It’s already running on Pixel 6 and Pixel 6 Pro, and it brings our AI capabilities — including the best speech recognition we’ve ever deployed — right to your phone. It’s also a big step forward in making those devices more secure. Combined with Android’s Private Compute Core, it can run data-powered features directly on device so that it’s private to you.

People turn to our products every day for help in moments big and small. Core to making this possible is protecting your private information each step of the way. Even as technology grows increasingly complex, we keep more people safe online than anyone else in the world, with products that are secure by default, private by design and that put you in control.

We also spent time today sharing updates to platforms like Android. They’re delivering access, connectivity, and information to billions of people through their smartphones and other connected devices like TVs, cars and watches.

And we shared our new Pixel Portfolio, including the Pixel 6a, Pixel Buds Pro, Google Pixel Watch, Pixel 7, and Pixel tablet all built with ambient computing in mind. We’re excited to share a family of devices that work better together — for you.

The next frontier of computing: augmented reality

Today we talked about all the technologies that are changing how we use computers and access knowledge. We see devices working seamlessly together, exactly when and where you need them and with conversational interfaces that make it easier to get things done.

Looking ahead, there's a new frontier of computing, which has the potential to extend all of this even further, and that is augmented reality. At Google, we have been heavily invested in this area. We’ve been building augmented reality into many Google products, from Google Lens to multisearch, scene exploration, and Live and immersive views in Maps.

These AR capabilities are already useful on phones and the magic will really come alive when you can use them in the real world without the technology getting in the way.

That potential is what gets us most excited about AR: the ability to spend time focusing on what matters in the real world, in our real lives. Because the real world is pretty amazing!

It’s important we design in a way that is built for the real world — and doesn’t take you away from it. And AR gives us new ways to accomplish this.

Let’s take language as an example. Language is just so fundamental to connecting with one another. And yet, understanding someone who speaks a different language, or trying to follow a conversation if you are deaf or hard of hearing can be a real challenge. Let's see what happens when we take our advancements in translation and transcription and deliver them in your line of sight in one of the early prototypes we’ve been testing.

You can see it in their faces: the joy that comes with speaking naturally to someone. That moment of connection. To understand and be understood. That’s what our focus on knowledge and computing is all about. And it’s what we strive for every day, with products that are built to help.

Each year we get a little closer to delivering on our timeless mission. And we still have so much further to go. At Google, we genuinely feel a sense of excitement about that. And we are optimistic that the breakthroughs you just saw will help us get there. Thank you to all of the developers, partners and customers who joined us today. We look forward to building the future with all of you.

Understanding the world through language

Language is at the heart of how people communicate with each other. It’s also proving to be powerful in advancing AI and building helpful experiences for people worldwide.

From the beginning, we set out to connect words in your search to words on a page so we could make the web’s information more accessible and useful. Over 20 years later, as the web changes, and the ways people consume information expand from text to images to videos and more — the one constant is that language remains a surprisingly powerful tool for understanding information.

In recent years, we’ve seen an incredible acceleration in the field of natural language understanding. While our systems still don’t understand language the way people do, they’re increasingly able to spot patterns in information, identify complex concepts and even draw implicit connections between them. We’re even finding that many of our advanced models can understand information across languages or in non-language-based formats like images and videos.

Building the next generation of language models

In 2017, Google researchers developed the Transformer, the neural network that underlies major advancements like MUM and LaMDA. Last year, we shared our thinking on a new architecture called Pathways, which is loosely inspired by the sparse patterns of neural activity in the brain. When you read a blog post like this one, only the critical parts of your brain needed to process this information fire up — not every single neuron. With Pathways, we’re now able to train AI models to be similarly effective.

Using this system, we recently introduced PaLM, a new model that achieves state-of-the-art performance on challenging language modeling tasks. It can solve complex math word problems, and answer questions in new languages with very little additional training data.

PaLM also shows improvements in understanding and expressing logic. This is significant because it allows the model to express its reasoning through words. Remember your algebra problem sets? It wasn’t enough to just get the right answer — you had to explain how you got there. PaLM is able to prompt a “Chain of Thought” to explain its thought process, step-by-step. This emerging capability helps improve accuracy and our understanding of how a model arrives at answers.

Flow chart for the difference between "Standard Prompting" and "Chain of Thought Prompting"

Translating the languages of the world

Pathways-related models are enabling us to break down language barriers in a way never before possible. Nowhere is this clearer than in our recently added support for 24 new languages in Google Translate, spoken by over 300 million people worldwide — including the first indigenous languages of the Americas. The amazing part is that the neural model did this using only monolingual text with no translation pairs — which allows us to help communities and languages underrepresented by technology. Machine translation at this level helps the world feel a bit smaller, while allowing us to dream bigger.

Unlocking knowledge about the world across modalities

Today, people consume information through webpages, images, videos, and more. Our advanced language and Pathways-related models are learning to make sense of information stemming from these different modalities through language. With these multimodal capabilities, we’re expanding multisearch in the Google app so you can search more naturally than ever before. As the saying goes — “a picture is worth a thousand words” — it turns out, words are really the key to sharing information about the world.

"Scene exploration" GIF of a store shelf demonstrating multisearch

Improving conversational AI

Despite these advancements, human language continues to be one of the most complex undertakings for computers.

In everyday conversation, we all naturally say “um,” pause to find the right words, or correct ourselves — and yet other people have no trouble understanding what we’re saying. That’s because people can react to conversational cues in as little as 200 milliseconds. Moving our speech model from data centers to run on the device made things faster, but we wanted to push the envelope even more.

Computers aren’t there yet — so we’re introducing improvements to responsiveness on the Assistant with unified neural networks, combining many models into smarter ones capable of understanding more — like when someone pauses but is not finished speaking. Getting closer to the fluidity of real-time conversation is finally possible with Google's Tensor chip, which is custom-engineered to handle on-device machine learning tasks super fast.

We’re also investing in building models that are capable of carrying more natural, sensible and specific conversations. Since introducing LaMDA to the world last year, we’ve made great progress, improving the model in key areas of quality, safety and groundedness — areas where we know conversational AI models can struggle. We’ll be releasing the next iteration, LaMDA 2, as a part of the AI Test Kitchen, which we’ll be opening up to small groups of people gradually. Our goal with AI Test Kitchen is to learn, improve, and innovate responsibly on this technology together. It’s still early days for LaMDA, but we want to continue to make progress and do so responsibly with feedback from the community.

GIF showing LaMDA 2 on device

Responsible development of AI models

While language is a remarkably powerful and versatile tool for understanding the world around us, we also know it comes with its limitations and challenges. In 2018, we published our AI Principles as guidelines to help us avoid bias, test rigorously for safety, design with privacy top of mind and make technology accountable to people. We’re investing in research across disciplines to understand the types of harms language models can affect, and to develop the frameworks and methods to ensure we bring in a diversity of perspectives and make meaningful improvements. We also build and use tools that can help us better understand our models (e.g., identifying how different words affect a prediction, tracing an error back to training data and even measuring correlations within a model). And while we work to improve underlying models, we also test rigorously before and after any kind of product deployment.

We’ve come a long way since introducing the world to the Transformer. We’re proud of the tremendous value that it and its predecessors have brought not only to everyday Google products like Search and Translate, but also the breakthroughs they’ve powered in natural language understanding. Our work advancing the future of AI is driven by something as old as time: the power language has to bring people together.

Immersive view coming soon to Maps — plus more updates

Google Maps helps over one billion people navigate and explore. And over the past few years, our investments in AI have supercharged the ability to bring you the most helpful information about the real world, including when a business is open and how crowded your bus is. Today at Google I/O, we announced new ways the latest advancements in AI are transforming Google Maps — helping you explore with an all-new immersive view of the world, find the most fuel-efficient route, and use the magic of Live View in your favorite third-party apps.

A more immersive, intuitive map

Google Maps first launched to help people navigate to their destinations. Since then, it’s evolved to become much more — it’s a handy companion when you need to find the perfect restaurant or get information about a local business. Today — thanks to advances in computer vision and AI that allow us to fuse together billions of Street View and aerial images to create a rich, digital model of the world — we’re introducing a whole new way to explore with Maps. With our new immersive view, you’ll be able to experience what a neighborhood, landmark, restaurant or popular venue is like — and even feel like you’re right there before you ever set foot inside. So whether you’re traveling somewhere new or scoping out hidden local gems, immersive view will help you make the most informed decisions before you go.

Say you’re planning a trip to London and want to figure out the best sights to see and places to eat. With a quick search, you can virtually soar over Westminster to see the neighborhood and stunning architecture of places, like Big Ben, up close. With Google Maps’ helpful information layered on top, you can use the time slider to check out what the area looks like at different times of day and in various weather conditions, and see where the busy spots are. Looking for a spot for lunch? Glide down to street level to explore nearby restaurants and see helpful information, like live busyness and nearby traffic. You can even look inside them to quickly get a feel for the vibe of the place before you book your reservation.

The best part? Immersive view will work on just about any phone and device. It starts rolling out in Los Angeles, London, New York, San Francisco and Tokyo later this year with more cities coming soon.

Immersive view lets you explore and understand the vibe of a place before you go

An update on eco-friendly routing

In addition to making places easier to explore, we want to help you get there more sustainably. We recently launched eco-friendly routing in the U.S. and Canada, which lets you see and choose the most fuel-efficient route when looking for driving directions — helping you save money on gas. Since then, people have used it to travel 86 billion miles, saving more than an estimated half a million metric tons of carbon emissions — equivalent to taking 100,000 cars off the road. We’re on track to double this amount as we expand to more places, like Europe.

Still image of eco-friendly routing on Google Maps

Eco-friendly routing has helped save more than an estimated half a million metric tons of carbon emissions

The magic of Live View — now in your favorite apps

Live View helps you find your way when walking around, using AR to display helpful arrows and directions right on top of your world. It's especially helpful when navigating tricky indoor areas, like airports, malls and train stations. Thanks to our AI-based technology called global localization, Google Maps can point you where you need to go in a matter of seconds. As part of our efforts to bring the helpfulness of Google Maps to more places, we’re now making this technology available to developers at no cost with the new ARCore Geospatial API.

Developers are already using the API to make apps that are even more useful and provide an easy way to interact with both the digital and physical worlds at once. Shared electric vehicle company Lime is piloting the API in London, Paris, Tel Aviv, Madrid, San Diego, and Bordeaux to help riders park their e-bikes and e-scooters responsibly and out of pedestrians’ right of way. Telstra and Accenture are using it to help sports fans and concertgoers find their seats, concession stands and restrooms at Marvel Stadium in Melbourne. DOCOMO and Curiosity are building a new game that lets you fend off virtual dragons with robot companions in front of iconic Tokyo landmarks, like the Tokyo Tower. The new Geospatial API is available now to ARCore developers, wherever Street View is available.

DOCOMO and Curiosity game showing an AR dragon, alien and spaceship interacting on top of a real-world image, powered by the ARCore Geospatial API.

Live View technology is now available to ARCore developers around the world

AI will continue to play a critical role in making Google Maps the most comprehensive and helpful map possible for people everywhere.

Google Translate learns 24 new languages

For years, Google Translate has helped break down language barriers and connect communities all over the world. And we want to make this possible for even more people — especially those whose languages aren’t represented in most technology. So today we’ve added 24 languages to Translate, now supporting a total of 133 used around the globe.

Over 300 million people speak these newly added languages — like Mizo, used by around 800,000 people in the far northeast of India, and Lingala, used by over 45 million people across Central Africa. As part of this update, Indigenous languages of the Americas (Quechua, Guarani and Aymara) and an English dialect (Sierra Leonean Krio) have also been added to Translate for the first time.

The Google Translate bar translates the phrase "Our mission: to enable everyone, everywhere to understand the world and express themselves across languages" into different languages.

Translate's mission translated into some of our newly added languages

Here’s a complete list of the new languages now available in Google Translate:

  • Assamese, used by about 25 million people in Northeast India
  • Aymara, used by about two million people in Bolivia, Chile and Peru
  • Bambara, used by about 14 million people in Mali
  • Bhojpuri, used by about 50 million people in northern India, Nepal and Fiji
  • Dhivehi, used by about 300,000 people in the Maldives
  • Dogri, used by about three million people in northern India
  • Ewe, used by about seven million people in Ghana and Togo
  • Guarani, used by about seven million people in Paraguay and Bolivia, Argentina and Brazil
  • Ilocano, used by about 10 million people in northern Philippines
  • Konkani, used by about two million people in Central India
  • Krio, used by about four million people in Sierra Leone
  • Kurdish (Sorani), used by about eight million people, mostly in Iraq
  • Lingala, used by about 45 million people in the Democratic Republic of the Congo, Republic of the Congo, Central African Republic, Angola and the Republic of South Sudan
  • Luganda, used by about 20 million people in Uganda and Rwanda
  • Maithili, used by about 34 million people in northern India
  • Meiteilon (Manipuri), used by about two million people in Northeast India
  • Mizo, used by about 830,000 people in Northeast India
  • Oromo, used by about 37 million people in Ethiopia and Kenya
  • Quechua, used by about 10 million people in Peru, Bolivia, Ecuador and surrounding countries
  • Sanskrit, used by about 20,000 people in India
  • Sepedi, used by about 14 million people in South Africa
  • Tigrinya, used by about eight million people in Eritrea and Ethiopia
  • Tsonga, used by about seven million people in Eswatini, Mozambique, South Africa and Zimbabwe
  • Twi, used by about 11 million people in Ghana

This is also a technical milestone for Google Translate. These are the first languages we’ve added using Zero-Shot Machine Translation, where a machine learning model only sees monolingual text — meaning, it learns to translate into another language without ever seeing an example. While this technology is impressive, it isn't perfect. And we’ll keep improving these models to deliver the same experience you’re used to with a Spanish or German translation, for example. If you want to dig into the technical details, check out our Google AI blog post and research paper.

We’re grateful to the many native speakers, professors and linguists who worked with us on this latest update and kept us inspired with their passion and enthusiasm. If you want to help us support your language in a future update, contribute evaluations or translations through Translate Contribute.

Coral, Google’s platform for Edge AI, chooses ASUS as OEM partner for global scale

We launched Coral in 2019 with a mission to make edge AI powerful, private, and efficient, and also accessible to a wide variety of customers with affordable tools that reliably go from prototype to production. In these first few years, we’ve seen a strong growth in demand for our products across industries and geographies, and with that, a growing need for worldwide availability and support.

That’s why we're pleased to announce that we have signed an agreement with ASUS IoT, to help scale our manufacturing, distribution and support. With decades of experience in electronics manufacturing at a global scale, ASUS IoT will provide Coral with the resources to meet our growth demands while we continue to develop new products for edge computing.

ASUS IoT is a sub-brand of ASUS dedicated to the creation of solutions in the fields of AI and the internet of things (IoT). Their mission is to become a trusted provider of embedded systems and the wider AI and IoT ecosystem. ASUS IoT strives to deliver best-in-class products and services across diverse vertical markets, and to partner with customers in the development of fully-integrated and rapid-to-market applications that drive efficiency – providing convenient, efficient, and secure living and working environments for people everywhere.

ASUS IoT already has a long-standing history of collaboration with Coral, being the first partner to release a product using the Coral SoM when they launched the Tinker Edge T development board. ASUS IoT has also integrated Coral accelerators into their enterprise class intelligent edge computers and was the first to release a multi Edge TPU device with the award winning AI Accelerator PCIe Card. Because we have this history of collaboration, we know they share our strong commitment to new innovation in edge computing.

ASUS IoT also has an established manufacturing and distribution processes, and a strong reputation in enterprise-level sales and support. So we're excited to work with them to enable scale and long-term availability for Coral products.

With this agreement, the Coral brand and user experience will not change, as Google will maintain ownership of the brand and product portfolio. The Coral team will continue to work with our customers on partnership initiatives and case studies through our Coral Partnership Program. Those interested in joining our partner ecosystem can visit our website to learn more and apply.

Coral.ai will remain the home for all product information and documentation, and in the coming months ASUS IoT will become the primary channel for sales, distribution and support. With this partnership, our customers will gain access to dedicated teams for sales and technical support managed by ASUS IoT.

ASUS IoT will be working to expand the distribution network to make Coral available in more countries. Distributors interested in carrying Coral products will be able to contact ASUS IoT for consideration.

We continue to be impressed by the innovative ways in which our customers use Coral to explore new AI-driven solutions. And now with ASUS IoT bringing expanded sales, support and resources for long-term availability, our Coral team will continue to focus on building the next generation of privacy-preserving features and tools for neural computing at the edge.

We look forward to the continued growth of the Coral platform as it flourishes and we are excited to have ASUS IoT join us on our journey.