SkyWater and Google expand open source program to new 90nm technology

Today, Google is announcing the expansion of our partnership with SkyWater Technology. We are working together to release an open source process design kit (PDK) for SKY90-FD, SkyWater’s commercial 90nm fully depleted silicon on insulator (FDSOI) CMOS process technology. SKY90-FD is based on MIT Lincoln Laboratory’s 90 nm commercial FDSOI technology, and enables designers to create complex integrated circuits for a diverse range of applications.

Over the last two years, Google and SkyWater Technology have partnered to make building open silicon accessible to all developers, starting with the open source release of the SKY130 PDK and continuing with a series of no-cost manufacturing shuttles for developers in the open source hardware ecosystem. To date, Google has sponsored six shuttles on the Efabless platform, manufacturing 240 designs from over 364 community submissions. This is the first partnership of its type ever launched, and the results to date have been impressive.
The latest MPW-6 shuttle received 90 submissions from a diverse community across 24 different countries:

Over the coming months, we'll work closely with SkyWater Technology to release their new SKY90-FD PDK under the Apache 2.0 license and organize additional Open MPW shuttles to manufacture open source designs for this new 90nm FDSOI technology, through the Efabless platform.

We believe that having access to different technologies through open source PDKs is critical to grow and strengthen the open silicon ecosystem:
  • Developers can go beyond the constraints of their familiar process nodes and explore different performance, power and area trade offs with existing or new designs.
  • Researchers can reproduce their research on different technologies to produce diverse figures of merit.
  • Tool maintainers can generalize their technologies' backends to support more than one process.
  • The community can refine the ways we structure, distribute and maintain these PDKs.
SKY90-FD is a 90nm FDSOI process. Unlike a traditional CMOS BULK process, SKY90-FD features a thin layer of insulator material between the substrate and the upper silicon layer. This thin oxide process allows the transistor to be significantly thinner than in the BULK process, allowing the device to be “fully depleted,” and simplifying the fabrication process. This extra insulation greatly reduces parasitic current leakage and lowers junction capacitances, providing improved speed and power performance under various environmental conditions.
The SKY90-FD process stack topology features 5x thin Copper base metal layers for the main interconnect and two extra thicker Al (Aluminum) metal layers capable of conducting higher current.
Google is excited about the new range of applications this open source PDK will enable once it's released later this year, and we can't wait to hear from you and watch the growing stream of innovative project ideas originating from the open silicon community.

In the meantime, make sure to check https://developers.google.com/silicon for resources and pointers to start your open silicon journey!


By Johan Euphrosine and Ethan Mahintorabi – Hardware Toolchains Team

New ways to diversify your games revenue

With more than 3 billion people playing games across platforms, the games industry continues to evolve rapidly. Still, one thing remains unchanged: developers need to grow revenue and profitability from their mobile games for long term success. This week, at the Think with Google Gaming Day in China, we shared new ways to help developers like you earn more revenue and attract high-value players.

Strengthen your monetization strategies

The right metrics can make a huge difference to your game’s success by enhancing transparency and clarity in your ads performance. AdMob’s updated Ads Activity report contains new measurement dimensions to help you do just that. Easily analyze earnings including those from third-party ad sources with dimensions like “hour of delivery,” “app version” or “ad source.” Publishers can also better monitor and understand the impact of privacy changes on revenue with report dimensions indicating publisher and user response to the iOS privacy framework.

Screengrab of Google Ads user interface, featuring the ads activity report dimensions and metrics in table format

The Ads Activity report contains new dimensions to help you understand your ads performance

Along with the Ads Activity report, we announced more features to help you diversify and grow your revenue for the long-term:

  • Google Mobile Ads Software Developer Kit (GMA SDK): Implement the latest GMA SDK version to stay updated on new feature releases such as the same app key that delivers more relevant and personalized ads for your apps on iOS.
  • H5 Games Ads (beta): Grow your earnings by easily showing interstitial and rewarded ads in your HTML5 (H5) games today.
  • New bidding partner: Access demand from Pangle, now available on AdMob in addition to more than 200 demand partners competing in real-time for your inventory.

Drive deeper engagement and revenue performance

To drive sustainable growth for your game, you’ll need more than just a strong monetization strategy. It is also important to have the right tools to effectively attract quality players. Now, with the ability to add an audience signal to your Android App campaigns, we’re making this even easier. You’ll be able to use your existing knowledge on the types of players you believe your campaigns would be most successful with to help guide our models to find similar new players who are more likely to convert. This will be available in beta in the coming months.

Add an audience signal to help you find new players who are more likely to convert

As the industry moves away from individual identifiers like device IDs, measuring your campaign performance accurately — along with acting on your conversion data — is critical. That’s why earlier this year, we introduced on-device conversion measurement. With on-device conversion measurement, user interactions with app ads can be matched to app conversions in a way that prevents user-identifying information from leaving a user's device. This helps you to prioritize privacy standards without compromising performance. Explore our developer guide to learn how you can implement this solution for your iOS App campaigns.

We are also releasing other new features to help you grow engagement and performance:

  • New audience lists: Re-engage high-value players with automatically generated lists of past purchasers based on your apps’ play data. This feature is now generally available through App campaigns for engagement.
  • Creative testing for video: Easily run experiments to understand the impact your video creative has on your App campaign performance. This will be available in beta in the coming months.
  • Target return on ad spend (tROAS) for ad revenue: Acquire players who are more likely to engage with ads shown in-app. In the coming months, all developers can send ad revenue from monetization platforms to Google Analytics to improve tROAS bidding in Google Ads.

Scale your reach to third-party app inventory

Lastly, advertisers now have the opportunity to extend their App campaign reach to more users. Advertisers using Google Ads and Display & Video 360 will have the opportunity to participate in real-time bidding integrations with third-party monetization platforms AppLovin (MAX), DT FairBid and Helium by Chartboost.

Also, developers who use third-party platforms will now have easy access to competitive real-time bids from advertisers using Google Ads and Display & Video 360. The program is currently in closed beta and these buying tools will be available as a bidder for approved publishers on these third-party real-time bidding monetization platforms at this time.

Watch the full Ads keynote to hear more about how these solutions can help you drive revenue and profitability for your games business.

Innovation success in Middle East, Africa and Turkey

Picture shows a group of five people who make up the team at the Dubawa Centre for Journalism Innovation and Development, who will automate radio fact-checking.

Success! The team at the Dubawa Centre for Journalism Innovation and Development, who will automate radio fact-checking.

The GNI Innovation Challenges, part of Google’s $300 million commitment to help journalism thrive in the digital age, have seen news innovators across the world step forward with many exciting initiatives demonstrating new thinking.

The 3rd Middle East, Turkey and Africa Innovation Challenge launched in February 2022, and received 425 applications from 42 countries – a 27% increase in overall applications. After a rigorous review, a round of interviews and a final jury selection process, 34 projects were selected from 17 countries to receive $3.2 million in funding.

This Innovation Challenge saw a significant increase in applications from news organizations undertaking fact checking activities: an increase of 118% when compared to previous Innovation Challenges in the region. Proposed projects which use artificial intelligence (AI) and machine learning (ML) also showed significant growth (92%), reflecting a trend across the news ecosystem to embrace cutting edge new technologies and data.

The call for applications listed five criteria: impact on the news ecosystem; innovation; diversity, equity and inclusion; inspiration; and feasibility – and the chosen projects clearly demonstrated all five. Here’s a selection of the successful recipients (you can find the full list on our website):

  • Dubawa, Centre for Journalism Innovation and Development from Nigeria, an online-only publisher, will introduce automated radio fact-checking.
  • Majarra from the UAE will apply AI and ML to use readers’ data to better help them navigate their website and be more inclusive to female subscribers.
  • The Bridge Across the Abraham Accords project: In an industry-first initiative, Tel Aviv-based Israel Hayom and Abu Dhabi-based Al-Ittihad will collaborate to give readers of both news organizations the ability to share news and comment in the same multilingual environment.
  • Minority Africa from Uganda is designing and implementing a web distribution application that will make it easier for newsrooms publishing under a Creative Commons license to have more control of their work.
  • Quote This Woman+ from South Africa will build a tool to provide women+ (identified as women, people living with disabilities, LGBTQI+, rural and religious minorities).
    sources to newsrooms and journalists to diversify sources in news coverage.
  • Dipnot from Turkey, a TV company, will create COM+: a multi-screen OTT platform for curated news in Turkey.

The successful recipients will be embarking on their projects later this summer and will share their learnings with the wider news ecosystem.

What it’s like to have a hybrid internship at Google

After three virtual college semesters, I felt like a fish out of water applying for summer internships. My networking and interviewing skills were rusty, and as a first-generation college student without access to career prep resources, I felt totally unprepared for the job application process. I didn't know what role I wanted, where to apply or how to write my resume. So I joined a professional development program for underrepresented talent, where I spent hours in workshops, interview prep sessions and meetings with my career coach.

Inspired by a lecture on battling imposter syndrome and the power of believing in yourself, I built up the confidence to apply to Google. I trusted the process and kept my best foot forward, and before I knew it, I was in my first round of interviews for Google’s communications team. Not long afterward, I was walking through the doors of Google’s New York City campus on my first day as an intern.

This year’s interns are the first to participate in Google’s hybrid work week and the first to go into Google’s offices since early 2020. The hybrid schedule has helped me embrace the best of both worlds — from connecting with my teammates over lunch at the office to focusing on projects in the comfort of my home. Through this hybrid experience, and especially as a member of the communications team, I've learned how important it is to ask questions, stay connected and engage thoughtfully.

A big part of my role at Google is seeking out and sharing stories about our culture, products and people — including my fellow interns. So in celebration of International Intern Day today, I asked a few of them to share more about their hybrid internship experiences and their proudest accomplishments so far. Here’s what I learned.

Innovation success in Middle East, Africa and Turkey

We announced the third GNI Middle East, Turkey and Africa Innovation Challenge in February, as part of our ongoing commitment to spur innovation in news and journalism throughout the globe, as well as the creation of new business models. This year, like in prior years, news innovators have stepped forward with several fascinating projects displaying innovative thinking.


WANANCHI Reporting’s new interactive platform will allow Kenya's unserved and underprivileged to become active participants in telling and/or re-telling their tales from diverse, but rich perspectives.


In an industry first initiative, Nigeria-based TheCable intends to create the country's first disability-inclusive news application, along with assistive technologies that will make it the go-to destination for those with vision impairments, hearing issues, and many other limb challenges.


South Africa-based Daily Maverick aims to solve the pervasive problem of audience engagement for news publishers by developing a suite of tools that will increase engagement rates with high-impact content.



These 3 projects are some of the 34 announced today as part of the 3rd Google News Initiative (GNI) Innovation Challenge for the Middle East, Turkey and Africa.


Picture shows a group of 5 people who make up the team at the Dubawa Centre for Journalism Innovation and Development who will automate radio fact-checking.

Success! The team at the (Dubawa Centre for Journalism Innovation and Development) who will automate radio fact-checking. 




The GNI Innovation Challenges, part of Google’s $300 million commitment to help journalism thrive in the digital age, have seen news innovators step forward with many exciting initiatives demonstrating new thinking.


The 3rd Middle East, Turkey and Africa Innovation Challenge received 425 applications from 42 countries – a 27% increase in overall applications. After a rigorous review, a round of interviews and a final jury selection process, 34 projects were selected from 17 countries to receive $3.2 million in funding.


This Innovation Challenge saw a significant increase in applications from news organizations undertaking fact checking activities: an increase of 118% when compared to previous Innovation Challenges in the region. Proposed projects which use artificial intelligence (AI) and machine learning (ML) also showed significant growth (92%), reflecting a trend across the news ecosystem to embrace cutting edge new technologies and data.


The call for applications listed five criteria: impact on the news ecosystem; innovation; diversity, equity and inclusion; inspiration; and feasibility – and the chosen projects clearly demonstrated all five. Here’s a selection of the successful recipients (you can find the full list on our website):
  • Dubawa, Centre for Journalism Innovation and Development from Nigeria, an online-only publisher, will introduce automated radio fact-checking.
  • Majarra from the UAE will apply AI and ML to use readers’ data to better help them navigate their website and be more inclusive to female subscribers.
  • Minority Africa from Uganda is designing and implementing a web distribution application that will make it easier for newsrooms publishing under a Creative Commons license to have more control of their work.
  • Quote This Woman+ from South Africa will build a tool to provide women+ (identified as women, people living with disabilities, LGBTQI+, rural and religious minorities)
  • sources to newsrooms and journalists to diversify sources in news coverage.
  • Dipnot from Turkey, a TV company, will create COM+: a multi-screen OTT platform for curated news in Turkey.
The successful recipients will be embarking on their projects later this summer and will share their learnings with the wider news ecosystem.



Posted by Ludovic Blecher, Head of Innovation, Google News Initiative

 ==== 

Our support for the Economic Opportunity Coalition

A healthy economy exists only when opportunities to participate are open to everyone. Google has long worked to make that possible through efforts such as our Google Career Certificates, the Grow with Google Small Business Fund and our commitment tosupplier diversity. Core to this work is our belief that progress is best achieved when we partner with others to scale these efforts.

Consistent with that approach, today Google is proud to help launch the Economic Opportunity Coalition, a group dedicated to building an equitable economy. Google intends to work alongside others in the public, private and nonprofit sectors to find ways to help close the racial wealth gap in the United States. The Coalition has identified four focus areas: investing in strengthening community finance organizations, supporting entrepreneurship, improving financial health and addressing infrastructure needs, such as affordable housing.

At Google, we have pioneered solutions to these issues and continue to do so. For example, our $100 million Google Career Certificates Fund focuses on Google’s digital skills training program and introduces a new financing model to provide loans and grants to students through Social Finance, a leading national nonprofit in the field of workforce development. Another example is our funding to Opportunity Finance Network to help Community Development Financial Institutions, which provide capital to underserved small businesses.

Our work in this regard contributes to sustainable economic growth, and the efforts of others in the Coalition will amplify our impact. Building a world in which everyone has access to opportunities will help foster more vibrant economic communities, and we look forward to others joining us in the Economic Opportunity Coalition and contributing to this important work.

Long Term Support Channel Update for ChromeOS

LTS-96 has been updated in the LTS channel to 96.0.4664.215 (Platform Version: 14268.94.0) for most ChromeOS devices. Want to know more about Long-term Support? Click here


This update includes the following Security fixes:

1325298  High  CVE-2022-2010  Out of bounds read in compositing
1302959  Medium  CVE-2022-1488  Security: Extension permission escalation
1327241  Medium  CVE-2021-30560  CrOS: Vulnerability reported in dev-libs/libxslt
1324563  Medium  CVE-2022-29824  CrOS: Vulnerability reported in dev-libs/libxml2


Giuliana Pritchard

Google Chrome OS

A unified Gmail, for all the ways you connect

Gmail has changed a lot over the past 18 years, and since the beginning, we’ve aspired to help billions of people around the world stay connected and get things done. Our latest changes bring helpful updates to every Gmail user, including the best of Google Workspace, combined with a fresh new look based on Google’s Material Design 3.

Evolving right along with you

Over the years, we’ve introduced new ways to stay productive, like the tabbed inbox, AI-based innovations like Smart Compose and Smart Reply and the ability to get your Gmail on the go with native apps for iOS and Android. (Fun fact: Gmail was the first app on the Google Play Store to hit one billion installs!) Often these changes are highly visual, like custom inbox themes, but some really important ones are less visible, like AI-based spam, phishing and malware protections.

Modern communication, modern design

During the pandemic, we’ve seen a further evolution as tens of millions of people around the world started to move between email, messaging, group chat and video calls as a part of managing their daily lives. To help people stay connected, we’re bringing together Gmail, Chat, Spaces and Meet in a single, unified view.

The new integrated Gmail view with Chat, Spaces and Meet

We first announced the new integrated Gmail view as a sneak-peek preview back in January, and we got tons of feedback from users who are excited about the new look and feel, along with improvement suggestions.

Starting today, the integrated view will begin to roll out for all Gmail users who have turned on Chat. You’ll see a clean, streamlined way to move between apps that you can customize based on what works best for you.

Using Quick Settings, you can select apps you’d like to toggle between on the left side of your window, whether it’s Gmail by itself or a combination of Gmail, Chat, Spaces and Meet. Label lovers will see separate sections for system labels (like Starred, Snoozed and Important) and custom labels you make yourself. And people who love to chat will see conversation bubbles with snippets of incoming messages, along with options to quick reply instead of opening the full message.

Easily select the applications you want to use in Gmail

Over the next few weeks, users can enable the integrated view, using the new visual configuration option in Settings — and anyone who wants to keep their existing Gmail layout will be able to do so. You get the Gmail that best fits your personal style, along with a clean, new look, thanks to our Material 3 design.

The new design of the Gmail inbox.

The new Gmail interface updated with Material 3 look and feel.

More than a pretty (inter)face

Beyond the user interface, we’re continuing to make Gmail more powerful and customizable. For example, we’re making it easier than ever to find the message you’re looking for by bringing search chips to your inbox and improved search results that suggest the best match for your query.

New inbox filters and improved search results

And later this year, we’re delivering an improved experience for tablet users, better emojis, new accessibility features and a whole lot more.

Looking ahead

Now you can optimize Gmail for how you like to stay connected, whether it’s as a standalone email application or a hub for easily moving between Chat, Spaces and video calls in Google Meet. After 18 years of helping people collaborate and get things done, Gmail is more helpful, customizable, and integrated than ever before.

A unified Gmail, for all the ways you connect

Gmail has changed a lot over the past 18 years, and since the beginning, we’ve aspired to help billions of people around the world stay connected and get things done. Our latest changes bring helpful updates to every Gmail user, including the best of Google Workspace, combined with a fresh new look based on Google’s Material Design 3.

Evolving right along with you

Over the years, we’ve introduced new ways to stay productive, like the tabbed inbox, AI-based innovations like Smart Compose and Smart Reply and the ability to get your Gmail on the go with native apps for iOS and Android. (Fun fact: Gmail was the first app on the Google Play Store to hit one billion installs!) Often these changes are highly visual, like custom inbox themes, but some really important ones are less visible, like AI-based spam, phishing and malware protections.

Modern communication, modern design

During the pandemic, we’ve seen a further evolution as tens of millions of people around the world started to move between email, messaging, group chat and video calls as a part of managing their daily lives. To help people stay connected, we’re bringing together Gmail, Chat, Spaces and Meet in a single, unified view.

The new integrated Gmail view with Chat, Spaces and Meet

We first announced the new integrated Gmail view as a sneak-peek preview back in January, and we got tons of feedback from users who are excited about the new look and feel, along with improvement suggestions.

Starting today, the integrated view will begin to roll out for all Gmail users who have turned on Chat. You’ll see a clean, streamlined way to move between apps that you can customize based on what works best for you.

Using Quick Settings, you can select apps you’d like to toggle between on the left side of your window, whether it’s Gmail by itself or a combination of Gmail, Chat, Spaces and Meet. Label lovers will see separate sections for system labels (like Starred, Snoozed and Important) and custom labels you make yourself. And people who love to chat will see conversation bubbles with snippets of incoming messages, along with options to quick reply instead of opening the full message.

Easily select the applications you want to use in Gmail

Over the next few weeks, users can enable the integrated view, using the new visual configuration option in Settings — and anyone who wants to keep their existing Gmail layout will be able to do so. You get the Gmail that best fits your personal style, along with a clean, new look, thanks to our Material 3 design.

The new design of the Gmail inbox.

The new Gmail interface updated with Material 3 look and feel.

More than a pretty (inter)face

Beyond the user interface, we’re continuing to make Gmail more powerful and customizable. For example, we’re making it easier than ever to find the message you’re looking for by bringing search chips to your inbox and improved search results that suggest the best match for your query.

New inbox filters and improved search results

And later this year, we’re delivering an improved experience for tablet users, better emojis, new accessibility features and a whole lot more.

Looking ahead

Now you can optimize Gmail for how you like to stay connected, whether it’s as a standalone email application or a hub for easily moving between Chat, Spaces and video calls in Google Meet. After 18 years of helping people collaborate and get things done, Gmail is more helpful, customizable, and integrated than ever before.

Look and Talk: Natural Conversations with Google Assistant

In natural conversations, we don't say people's names every time we speak to each other. Instead, we rely on contextual signaling mechanisms to initiate conversations, and eye contact is often all it takes. Google Assistant, now available in more than 95 countries and over 29 languages, has primarily relied on a hotword mechanism ("Hey Google" or “OK Google”) to help more than 700 million people every month get things done across Assistant devices. As virtual assistants become an integral part of our everyday lives, we're developing ways to initiate conversations more naturally.

At Google I/O 2022, we announced Look and Talk, a major development in our journey to create natural and intuitive ways to interact with Google Assistant-powered home devices. This is the first multimodal, on-device Assistant feature that simultaneously analyzes audio, video, and text to determine when you are speaking to your Nest Hub Max. Using eight machine learning models together, the algorithm can differentiate intentional interactions from passing glances in order to accurately identify a user's intent to engage with Assistant. Once within 5ft of the device, the user may simply look at the screen and talk to start interacting with the Assistant.

We developed Look and Talk in alignment with our AI Principles. It meets our strict audio and video processing requirements, and like our other camera sensing features, video never leaves the device. You can always stop, review and delete your Assistant activity at myactivity.google.com. These added layers of protection enable Look and Talk to work just for those who turn it on, while keeping your data safe.

Google Assistant relies on a number of signals to accurately determine when the user is speaking to it. On the right is a list of signals used with indicators showing when each signal is triggered based on the user's proximity to the device and gaze direction.

Modeling Challenges
The journey of this feature began as a technical prototype built on top of models developed for academic research. Deployment at scale, however, required solving real-world challenges unique to this feature. It had to:

  1. Support a range of demographic characteristics (e.g., age, skin tones).
  2. Adapt to the ambient diversity of the real world, including challenging lighting (e.g., backlighting, shadow patterns) and acoustic conditions (e.g., reverberation, background noise).
  3. Deal with unusual camera perspectives, since smart displays are commonly used as countertop devices and look up at the user(s), unlike the frontal faces typically used in research datasets to train models.
  4. Run in real-time to ensure timely responses while processing video on-device.

The evolution of the algorithm involved experiments with approaches ranging from domain adaptation and personalization to domain-specific dataset development, field-testing and feedback, and repeated tuning of the overall algorithm.

Technology Overview
A Look and Talk interaction has three phases. In the first phase, Assistant uses visual signals to detect when a user is demonstrating an intent to engage with it and then “wakes up” to listen to their utterance. The second phase is designed to further validate and understand the user’s intent using visual and acoustic signals. If any signal in the first or second processing phases indicates that it isn't an Assistant query, Assistant returns to standby mode. These two phases are the core Look and Talk functionality, and are discussed below. The third phase of query fulfillment is typical query flow, and is beyond the scope of this blog.

Phase One: Engaging with Assistant
The first phase of Look and Talk is designed to assess whether an enrolled user is intentionally engaging with Assistant. Look and Talk uses face detection to identify the user’s presence, filters for proximity using the detected face box size to infer distance, and then uses the existing Face Match system to determine whether they are enrolled Look and Talk users.

For an enrolled user within range, an custom eye gaze model determines whether they are looking at the device. This model estimates both the gaze angle and a binary gaze-on-camera confidence from image frames using a multi-tower convolutional neural network architecture, with one tower processing the whole face and another processing patches around the eyes. Since the device screen covers a region underneath the camera that would be natural for a user to look at, we map the gaze angle and binary gaze-on-camera prediction to the device screen area. To ensure that the final prediction is resilient to spurious individual predictions and involuntary eye blinks and saccades, we apply a smoothing function to the individual frame-based predictions to remove spurious individual predictions.

Eye-gaze prediction and post-processing overview.

We enforce stricter attention requirements before informing users that the system is ready for interaction to minimize false triggers, e.g., when a passing user briefly glances at the device. Once the user looking at the device starts speaking, we relax the attention requirement, allowing the user to naturally shift their gaze.

The final signal necessary in this processing phase checks that the Face Matched user is the active speaker. This is provided by a multimodal active speaker detection model that takes as input both video of the user’s face and the audio containing speech, and predicts whether they are speaking. A number of augmentation techniques (including RandAugment, SpecAugment, and augmenting with AudioSet sounds) helps improve prediction quality for the in-home domain, boosting end-feature performance by over 10%.The final deployed model is a quantized, hardware-accelerated TFLite model, which uses five frames of context for the visual input and 0.5 seconds for the audio input.

Active speaker detection model overview: The two-tower audiovisual model provides the “speaking” probability prediction for the face. The visual network auxiliary prediction pushes the visual network to be as good as possible on its own, improving the final multimodal prediction.

Phase Two: Assistant Starts Listening
In phase two, the system starts listening to the content of the user’s query, still entirely on-device, to further assess whether the interaction is intended for Assistant using additional signals. First, Look and Talk uses Voice Match to further ensure that the speaker is enrolled and matches the earlier Face Match signal. Then, it runs a state-of-the-art automatic speech recognition model on-device to transcribe the utterance.

The next critical processing step is the intent understanding algorithm, which predicts whether the user’s utterance was intended to be an Assistant query. This has two parts: 1) a model that analyzes the non-lexical information in the audio (i.e., pitch, speed, hesitation sounds) to determine whether the utterance sounds like an Assistant query, and 2) a text analysis model that determines whether the transcript is an Assistant request. Together, these filter out queries not intended for Assistant. It also uses contextual visual signals to determine the likelihood that the interaction was intended for Assistant.

Overview of the semantic filtering approach to determine if a user utterance is a query intended for the Assistant.

Finally, when the intent understanding model determines that the user utterance was likely meant for Assistant, Look and Talk moves into the fulfillment phase where it communicates with the Assistant server to obtain a response to the user’s intent and query text.

Performance, Personalization and UX
Each model that supports Look and Talk was evaluated and improved in isolation and then tested in the end-to-end Look and Talk system. The huge variety of ambient conditions in which Look and Talk operates necessitates the introduction of personalization parameters for algorithm robustness. By using signals obtained during the user’s hotword-based interactions, the system personalizes parameters to individual users to deliver improvements over the generalized global model. This personalization also runs entirely on-device.

Without a predefined hotword as a proxy for user intent, latency was a significant concern for Look and Talk. Often, a strong enough interaction signal does not occur until well after the user has started speaking, which can add hundreds of milliseconds of latency, and existing models for intent understanding add to this since they require complete, not partial, queries. To bridge this gap, Look and Talk completely forgoes streaming audio to the server, with transcription and intent understanding being on-device. The intent understanding models can work off of partial utterances. This results in an end-to-end latency comparable with current hotword-based systems.

The UI experience is based on user research to provide well-balanced visual feedback with high learnability. This is illustrated in the figure below.

Left: The spatial interaction diagram of a user engaging with Look and Talk. Right: The User Interface (UI) experience.

We developed a diverse video dataset with over 3,000 participants to test the feature across demographic subgroups. Modeling improvements driven by diversity in our training data improved performance for all subgroups.

Conclusion
Look and Talk represents a significant step toward making user engagement with Google Assistant as natural as possible. While this is a key milestone in our journey, we hope this will be the first of many improvements to our interaction paradigms that will continue to reimagine the Google Assistant experience responsibly. Our goal is to make getting help feel natural and easy, ultimately saving time so users can focus on what matters most.

Acknowledgements
This work involved collaborative efforts from a multidisciplinary team of software engineers, researchers, UX, and cross-functional contributors. Key contributors from Google Assistant include Alexey Galata, Alice Chuang‎, Barbara Wang, Britanie Hall, Gabriel Leblanc, Gloria McGee, Hideaki Matsui, James Zanoni, Joanna (Qiong) Huang, Krunal Shah, Kavitha Kandappan, Pedro Silva, Tanya Sinha, Tuan Nguyen, Vishal Desai, Will Truong‎, Yixing Cai‎, Yunfan Ye; from Research including Hao Wu, Joseph Roth, Sagar Savla, Sourish Chaudhuri, Susanna Ricco. Thanks to Yuan Yuan and Caroline Pantofaru for their leadership, and everyone on the Nest, Assistant, and Research teams who provided invaluable input toward the development of Look and Talk.

Source: Google AI Blog