How AI powers great search results

Do you ever wonder how Google understands what you’re looking for? There’s a lot that goes into delivering helpful search results, and understanding language is one of the most important skills. Thanks to advancements in AI and machine learning, our Search systems are understanding human language better than ever before. And we want to share a behind-the-scenes look at how this translates into relevant results for you.

But first, let's walk down memory lane: In the early days of Search, before we had advanced AI, our systems simply looked for matching words. For example, if you searched for “pziza” — unless there was a page with that particular misspelling, you’d likely have to redo the search with the correct spelling to find a slice near you. And eventually, we learned how to code algorithms to find classes of patterns, like popular misspellings or potential typos from neighboring keys. Now, with advanced machine learning, our systems can more intuitively recognize if a word doesn’t look right and suggest a possible correction.

These kinds of AI improvements to our Search systems mean that they’re constantly getting better at understanding what you’re looking for. And since the world and people’s curiosities are always evolving, it’s really important that Search does, too. In fact, 15% of searches we see every day are entirely new. AI plays a major role in showing you helpful results, even at the outermost edges of your imagination.

How our systems play together

We’ve developed hundreds of algorithms over the years, like our early spelling system, to help deliver relevant search results. When we develop new AI systems, our legacy algorithms and systems don’t just get shelved away. In fact, Search runs on hundreds of algorithms and machine learning models, and we’re able to improve it when our systems — new and old — can play well together. Each algorithm and model has a specialized role, and they trigger at different times and in distinct combinations to help deliver the most helpful results. And some of our more advanced systems play a more prominent role than others. Let’s take a closer look at the major AI systems running in Search today, and what they do.

RankBrain — a smarter ranking system

When we launched RankBrain in 2015, it was the first deep learning system deployed in Search. At the time, it was groundbreaking — not only because it was our first AI system, but because it helped us understand how words relate to concepts. Humans understand this instinctively, but it’s a complex challenge for a computer. RankBrain helps us find information we weren’t able to before by more broadly understanding how words in a search relate to real-world concepts. For example, if you search for “what’s the title of the consumer at the highest level of a food chain,” our systems learn from seeing those words on various pages that the concept of a food chain may have to do with animals, and not human consumers. By understanding and matching these words to their related concepts, RankBrain understands that you’re looking for what’s commonly referred to as an “apex predator.”

Search bar with the query “what’s the title of the consumer at the highest level of a food chain,” and a mobile view of a featured snippet for “apex predator.”

Thanks to this type of understanding, RankBrain (as its name suggests) is used to help rank — or decide the best order for — top search results. Although it was our very first deep learning model, RankBrain continues to be one of the major AI systems powering Search today.

Neural matching — a sophisticated retrieval engine

Neural networks underpin many modern AI systems today. But it wasn’t until 2018, when we introduced neural matching to Search, that we could use them to better understand how queries relate to pages. Neural matching helps us understand fuzzier representations of concepts in queries and pages, and match them to one another. It looks at an entire query or page rather than just keywords, developing a better understanding of the underlying concepts represented in them. Take the search “insights how to manage a green,” for example. If a friend asked you this, you’d probably be stumped. But with neural matching, we’re able to make sense of it. By looking at the broader representations of concepts in the query — management, leadership, personality and more — neural matching can decipher that this searcher is looking for management tips based on a popular, color-based personality guide.

Search bar with the query “insights how to manage a green” with a mobile view of relevant search results.

When our systems understand the broader concepts represented in a query or page, they can more easily match them with one another. This level of understanding helps us cast a wide net when we scan our index for content that may be relevant to your query. This is what makes neural matching such a critical part of how we retrieve relevant documents from a massive and constantly changing information stream.

BERT — a model for understanding meaning and context

Launched in 2019, BERT was a huge step change in natural language understanding, helping us understand how combinations of words express different meanings and intents. Rather than simply searching for content that matches individual words, BERT comprehends how a combination of words expresses a complex idea. BERT understands words in a sequence and how they relate to each other, so it ensures we don’t drop important words from your query — no matter how small they are. For example, if you search for “can you get medicine for someone pharmacy,” BERT understands that you’re trying to figure out if you can pick up medicine for someone else. Before BERT, we took that short preposition for granted, mostly sharing results about how to fill a prescription. Thanks to BERT, we understand that even small words can have big meanings.

Search bar with the query “can you get medicine for someone pharmacy” with a mobile view of a featured snippet highlighting relevant text from an HHS.gov result.

Today, BERT plays a critical role in almost every English query. This is because our BERT systems excel at two of the most important tasks in delivering relevant results — ranking and retrieving. Based on its complex language understanding, BERT can very quickly rank documents for relevance. We’ve also improved legacy systems with BERT training, making them more helpful in retrieving relevant documents for ranking. And while BERT plays a major role in Search, it’s never working alone — like all of our systems, BERT is part of an ensemble of systems that work together to share high-quality results.

MUM — moving from language to information understanding

In May, we introduced our latest AI milestone in Search — Multitask Unified Model, or MUM. A thousand times more powerful than BERT, MUM is capable of both understanding and generating language. It’s trained across 75 languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge. MUM is also multimodal, meaning it can understand information across multiple modalities such as text, images and more in the future.

While we’re still in the early days of tapping into MUM’s potential, we’ve already used it to improve searches for COVID-19 vaccine information, and we’ll offer more intuitive ways to search using a combination of both text and images in Google Lens in the coming months. These are very specialized applications — so MUM is not currently used to help rank and improve the quality of search results like RankBrain, neural matching and BERT systems do.

As we introduce more MUM-powered experiences to Search, we’ll begin to shift from advanced language understanding to a more nuanced understanding of information about the world. And as with all improvements to Search, any MUM application will go through a rigorous evaluation process, with special attention to the responsible application of AI. And when they’re deployed, they’ll join the chorus of systems that run together to make Search helpful.

How AI powers great search results

Do you ever wonder how Google understands what you’re looking for? There’s a lot that goes into delivering helpful search results, and understanding language is one of the most important skills. Thanks to advancements in AI and machine learning, our Search systems are understanding human language better than ever before. And we want to share a behind-the-scenes look at how this translates into relevant results for you.

But first, let's walk down memory lane: In the early days of Search, before we had advanced AI, our systems simply looked for matching words. For example, if you searched for “pziza” — unless there was a page with that particular misspelling, you’d likely have to redo the search with the correct spelling to find a slice near you. And eventually, we learned how to code algorithms to find classes of patterns, like popular misspellings or potential typos from neighboring keys. Now, with advanced machine learning, our systems can more intuitively recognize if a word doesn’t look right and suggest a possible correction.

These kinds of AI improvements to our Search systems mean that they’re constantly getting better at understanding what you’re looking for. And since the world and people’s curiosities are always evolving, it’s really important that Search does, too. In fact, 15% of searches we see every day are entirely new. AI plays a major role in showing you helpful results, even at the outermost edges of your imagination.

How our systems play together

We’ve developed hundreds of algorithms over the years, like our early spelling system, to help deliver relevant search results. When we develop new AI systems, our legacy algorithms and systems don’t just get shelved away. In fact, Search runs on hundreds of algorithms and machine learning models, and we’re able to improve it when our systems — new and old — can play well together. Each algorithm and model has a specialized role, and they trigger at different times and in distinct combinations to help deliver the most helpful results. And some of our more advanced systems play a more prominent role than others. Let’s take a closer look at the major AI systems running in Search today, and what they do.

RankBrain — a smarter ranking system

When we launched RankBrain in 2015, it was the first deep learning system deployed in Search. At the time, it was groundbreaking — not only because it was our first AI system, but because it helped us understand how words relate to concepts. Humans understand this instinctively, but it’s a complex challenge for a computer. RankBrain helps us find information we weren’t able to before by more broadly understanding how words in a search relate to real-world concepts. For example, if you search for “what’s the title of the consumer at the highest level of a food chain,” our systems learn from seeing those words on various pages that the concept of a food chain may have to do with animals, and not human consumers. By understanding and matching these words to their related concepts, RankBrain understands that you’re looking for what’s commonly referred to as an “apex predator.”

Search bar with the query “what’s the title of the consumer at the highest level of a food chain,” and a mobile view of a featured snippet for “apex predator.”

Thanks to this type of understanding, RankBrain (as its name suggests) is used to help rank — or decide the best order for — top search results. Although it was our very first deep learning model, RankBrain continues to be one of the major AI systems powering Search today.

Neural matching — a sophisticated retrieval engine

Neural networks underpin many modern AI systems today. But it wasn’t until 2018, when we introduced neural matching to Search, that we could use them to better understand how queries relate to pages. Neural matching helps us understand fuzzier representations of concepts in queries and pages, and match them to one another. It looks at an entire query or page rather than just keywords, developing a better understanding of the underlying concepts represented in them. Take the search “insights how to manage a green,” for example. If a friend asked you this, you’d probably be stumped. But with neural matching, we’re able to make sense of it. By looking at the broader representations of concepts in the query — management, leadership, personality and more — neural matching can decipher that this searcher is looking for management tips based on a popular, color-based personality guide.

Search bar with the query “insights how to manage a green” with a mobile view of relevant search results.

When our systems understand the broader concepts represented in a query or page, they can more easily match them with one another. This level of understanding helps us cast a wide net when we scan our index for content that may be relevant to your query. This is what makes neural matching such a critical part of how we retrieve relevant documents from a massive and constantly changing information stream.

BERT — a model for understanding meaning and context

Launched in 2019, BERT was a huge step change in natural language understanding, helping us understand how combinations of words express different meanings and intents. Rather than simply searching for content that matches individual words, BERT comprehends how a combination of words expresses a complex idea. BERT understands words in a sequence and how they relate to each other, so it ensures we don’t drop important words from your query — no matter how small they are. For example, if you search for “can you get medicine for someone pharmacy,” BERT understands that you’re trying to figure out if you can pick up medicine for someone else. Before BERT, we took that short preposition for granted, mostly sharing results about how to fill a prescription. Thanks to BERT, we understand that even small words can have big meanings.

Search bar with the query “can you get medicine for someone pharmacy” with a mobile view of a featured snippet highlighting relevant text from an HHS.gov result.

Today, BERT plays a critical role in almost every English query. This is because our BERT systems excel at two of the most important tasks in delivering relevant results — ranking and retrieving. Based on its complex language understanding, BERT can very quickly rank documents for relevance. We’ve also improved legacy systems with BERT training, making them more helpful in retrieving relevant documents for ranking. And while BERT plays a major role in Search, it’s never working alone — like all of our systems, BERT is part of an ensemble of systems that work together to share high-quality results.

MUM — moving from language to information understanding

In May, we introduced our latest AI milestone in Search — Multitask Unified Model, or MUM. A thousand times more powerful than BERT, MUM is capable of both understanding and generating language. It’s trained across 75 languages and many different tasks at once, allowing it to develop a more comprehensive understanding of information and world knowledge. MUM is also multimodal, meaning it can understand information across multiple modalities such as text, images and more in the future.

While we’re still in the early days of tapping into MUM’s potential, we’ve already used it to improve searches for COVID-19 vaccine information, and we’ll offer more intuitive ways to search using a combination of both text and images in Google Lens in the coming months. These are very specialized applications — so MUM is not currently used to help rank and improve the quality of search results like RankBrain, neural matching and BERT systems do.

As we introduce more MUM-powered experiences to Search, we’ll begin to shift from advanced language understanding to a more nuanced understanding of information about the world. And as with all improvements to Search, any MUM application will go through a rigorous evaluation process, with special attention to the responsible application of AI. And when they’re deployed, they’ll join the chorus of systems that run together to make Search helpful.

Control your ad frequency on connected TV

I love everything about the Big Game — the suspense, the halftime performances and, of course, the commercials. Watching the much-anticipated ads on my living room’s connected TV (CTV), with buffalo wings in hand, has become a family tradition. But as much as I like the ads, seeing them over and over again after the game can ruin the fun.

At Google, we’re helping advertisers deliver a better CTV viewing experience. As part of that, we’re launching new CTV frequency management solutions in Display & Video 360 that allow marketers to control the number of times people see their ads across YouTube and other CTV apps. This gives CTV streamers a smoother viewing experience and limits the risk of brand backlash because of ad overexposure.

Early results show that this new functionality also significantly improves media performance for advertisers. On average, brands see a 5% reach per dollar increase when managing CTV ad frequency across YouTube and other CTV apps rather than separately.[561496]In other words, if your Big Game campaign is scheduled to reach five million CTV viewers, you can now reach a few football stadiums worth of new streamers at no extra cost.

Better viewer experience and use of your CTV media dollars

With Display & Video 360, you can already control how many ads CTV viewers see across YouTube and YouTube TV apps. Separately, you can also set a frequency goal for ads running across other CTV apps. Now, for the first time, you can manage CTV ad frequency across both.

Let’s say you set a frequency goal of five ads per week for your CTV campaign. Instead of showing up to five CTV ads on YouTube and five ads on other CTV apps, Display & Video 360 will now aim to show your ad no more than five times total. Viewers won’t see your ad more than they should as they navigate across YouTube, Hulu or any of their other favorite CTV apps. This more user-centric approach lowers your risk of triggering ad fatigue.

It can also help you get more bang for your buck. By more evenly distributing CTV ad budgets across viewers, you can get more reach for the same budget. But don’t just take our word for it — to determine the reach benefit of this new feature on your next CTV campaign, you can run a Unique Reach report or use the cross-channel frequency quantification tools in Display & Video 360.

Display & Video 360’s cross-channel CTV frequency management solution works for all formats, exchanges, CTV devices and deal types. This includes content on the YouTube and YouTube TV apps watched on CTV devices. For example, you can set a single ad frequency goal across an ESPN Programmatic Guaranteed deal and your YouTube sports campaign.

Cross-channel management that puts
privacy first

To determine the number of times a CTV ad is shown, Display & Video 360 uses Google data on YouTube and the IAB standard Identifier for Advertising on other CTV inventory.

To identify overlaps of people who watch both YouTube CTV and shows on other CTV apps, we use Google’s Unique Reach model. This method — which uses a combination of data sources like census data, panels and surveys — is based on over a decade of understanding deduplication across devices and environments, and is designed to work in a post-cookie world. Once we’ve modeled that duplication of viewers across YouTube and other CTV apps, we can determine the appropriate budget placement to control average ad frequency.

To start using this new feature, just combine YouTube CTV and other CTV strategies under the same campaign or insertion order, and set your frequency goal at that level.

Control your ad frequency on connected TV

I love everything about the Big Game — the suspense, the halftime performances and, of course, the commercials. Watching the much-anticipated ads on my living room’s connected TV (CTV), with buffalo wings in hand, has become a family tradition. But as much as I like the ads, seeing them over and over again after the game can ruin the fun.

At Google, we’re helping advertisers deliver a better CTV viewing experience. As part of that, we’re launching new CTV frequency management solutions in Display & Video 360 that allow marketers to control the number of times people see their ads across YouTube and other CTV apps. This gives CTV streamers a smoother viewing experience and limits the risk of brand backlash because of ad overexposure.

Early results show that this new functionality also significantly improves media performance for advertisers. On average, brands see a 5% reach per dollar increase when managing CTV ad frequency across YouTube and other CTV apps rather than separately.[561496]In other words, if your Big Game campaign is scheduled to reach five million CTV viewers, you can now reach a few football stadiums worth of new streamers at no extra cost.

Better viewer experience and use of your CTV media dollars

With Display & Video 360, you can already control how many ads CTV viewers see across YouTube and YouTube TV apps. Separately, you can also set a frequency goal for ads running across other CTV apps. Now, for the first time, you can manage CTV ad frequency across both.

Let’s say you set a frequency goal of five ads per week for your CTV campaign. Instead of showing up to five CTV ads on YouTube and five ads on other CTV apps, Display & Video 360 will now aim to show your ad no more than five times total. Viewers won’t see your ad more than they should as they navigate across YouTube, Hulu or any of their other favorite CTV apps. This more user-centric approach lowers your risk of triggering ad fatigue.

It can also help you get more bang for your buck. By more evenly distributing CTV ad budgets across viewers, you can get more reach for the same budget. But don’t just take our word for it — to determine the reach benefit of this new feature on your next CTV campaign, you can run a Unique Reach report or use the cross-channel frequency quantification tools in Display & Video 360.

Display & Video 360’s cross-channel CTV frequency management solution works for all formats, exchanges, CTV devices and deal types. This includes content on the YouTube and YouTube TV apps watched on CTV devices. For example, you can set a single ad frequency goal across an ESPN Programmatic Guaranteed deal and your YouTube sports campaign.

Cross-channel management that puts
privacy first

To determine the number of times a CTV ad is shown, Display & Video 360 uses Google data on YouTube and the IAB standard Identifier for Advertising on other CTV inventory.

To identify overlaps of people who watch both YouTube CTV and shows on other CTV apps, we use Google’s Unique Reach model. This method — which uses a combination of data sources like census data, panels and surveys — is based on over a decade of understanding deduplication across devices and environments, and is designed to work in a post-cookie world. Once we’ve modeled that duplication of viewers across YouTube and other CTV apps, we can determine the appropriate budget placement to control average ad frequency.

To start using this new feature, just combine YouTube CTV and other CTV strategies under the same campaign or insertion order, and set your frequency goal at that level.

David Archer: Reflections from an Anti-Racist Psychotherapist

Editor's note: This Black History Month, we’re highlighting Black perspectives, and sharing stories from Black Googlers, partners, and culture shapers from across Canada. 

David Archer, MSW, MFT, is an anti-racist psychotherapist from Montreal, Canada (Tiohtià:ke).

Twenty years ago, I was a software engineer. I was fascinated with the ability to transmute lines of code into complex software. Programmers are interested in finding solutions to the endless barrage of error messages that obstruct our everyday apps and platforms. Currently, I am an anti-racist psychotherapist. In this field we also search for logic; within every client’s mind lies a solution that explains the errors they encounter in their lives. The clinician’s job is to elicit solutions. By deciphering the logic of the psyche, we move people to acknowledge their innate gifts and confront the suffering caused by the challenges of living in an imperfect social structure. 

My clinical experiences led me to the following understanding: anti-Black racism is like a trauma response. Much like the trauma survivor who avoids the source of their injury, the racist operates on the basis of a survival response: a fight, flight, or freeze reaction unconsciously activated to deal with a perceived threat to their insecure power structure. 

Within the social structure, racism can never be resolved by attending simple workshops, changing profile pictures on social media, or by corporations providing a superficial interest toward people with darker skin complexions. These kinds of performances only placate political interests rather than eradicating social problems. We require systemic interventions to address our overburdened and defunded health care system. There is an urgent need to transform our society into one that views mental health as a human right; and our leaders must understand the importance of anti-racism to encourage an unremitting conviction towards systemic and substantive change. 

When you hear people say that African-Canadians descended from enslaved people, there is an error in this logic. It refers to the trauma but not to who they were before the traumatization of European colonization. The ancestral Black mother of the human species, “mitochondrial Eve,” lived approximately 200,000 years ago on the African continent. Therefore, the Maafa, the centuries long atrocities of chattel slavery, cannot be seen as the beginning of our story. This is because humanity spent more millennia being melanated in the motherland. 

Thousands of years ago white people descended from the continent of Africa. This provokes an error in the white consciousness so the construct of race is necessary to divide our shared humanity. The only reason why they created a concept of race was to make a group of humans less human. Race was meant to dehumanize groups of people in order to justify genocide, cultural imperialism, and racial capitalism. Stealing the land from Indigenous people, robbing Africans from their own continent, and dissociating from our common origins are ways of reinforcing the deep multigenerational trauma that pervades our society. 

There is a higher chance of being traumatized depending on your social identity. Women face gendered violence at a higher rate and people of colour experience colourism differently from one another. Regardless of our labels, all people can heal. My main approach is called EMDR (Eye Movement Desensitization and Reprocessing) therapy. But anti-racist psychotherapy is not limited to EMDR; there are a range of approaches that are neuro-affective in nature, or even community based, that still utilize therapeutic memory reconsolidation. When we try something different, changes can happen. 

When people begin to reprocess their racial trauma, the goal is not to force them to stop identifying with their race, but to cultivate radical self-acceptance, revolutionary self-love, and a courageous commitment to improving their community. I have helped people to recover even with the odds stacked against them. In recent years, we have improved mental health awareness and have identified improved forms of trauma treatment. In the next few years we must develop the technology to decolonize our psychotherapy, to target higher order problems in our society, and help our families to break the generational cycles that have plagued them. 

I am old enough to remember not having social media and never hearing of an anti-racist psychotherapist. Imagine what the next 20 years will hold? My job is to help people to make changes in their lives. Once a programmer, always a programmer, debugging through error messages is a life’s work for me. But we need to upgrade our technology. There is a collective responsibility to heal from the trauma of our nations, societies, and families. Healing starts from a simple acknowledgment and sometimes the path can reveal itself. As our technology continues to stimulate our minds, let us have the courage to elevate our hearts as well.

February 2022 update to Display & Video 360 API v1

Today we’re releasing an update to the Display & Video 360 API that includes the following features:

In addition to these new features, this update also doubles existing default API request limits for the Display & Video 360 API. The updated quota values can be found on our usage limits documentation.

More detailed information about this update can be found in the Display & Video 360 API release notes.

Before using these new features, make sure to update your client library to the latest version. We have also added a new Use Audiences guide featuring a page on uploading Customer Match audience data using the Display & Video 360 API.

If you run into issues or need help with these new features or samples, please contact us using our support contact form.

Introducing the new Merchant Center Status Dashboard

At Google, we strive to provide the highest level of service possible to our users. Still, from time to time, unexpected service disruptions do occur. When your team experiences an outage or other technical challenge, one of the first evaluations they need to make is whether the issue is with a third-party service provider or in-house. As part of our commitment to transparency and speed when communicating the status of our products, and incidents when they happen to occur, we’re pleased to roll out the Merchant Center Status Dashboard.

Check the status of a service
A look at the Merchant Center Status Dashboard

If a major incident is identified, we will generally post an outage notice on the Dashboard and provide updates when the issue is resolved. If the dot is green, it means there are no issues. The Merchant Center Status Dashboard provides status information on the Content API for Shopping, Merchant Center, and Feeds.

You can check the dashboard to view the current status of any of those services. All incidents are first verified by our support engineers, so there may be a slight delay from the time they actually occurred. More information can be found on the Merchant Center Help Center.

Subscribe to the Status Dashboard RSS or JSON feeds
To get the fastest outage alerts, we recommend subscribing to the Status Dashboard RSS feed:
  1. Go to the Status Dashboard.
  2. At the bottom, click RSS Feed and copy the feed URL.
  3. In your RSS reader, paste the URL to add the Status Dashboard feed.
  4. If you want programmatic access to the Status Dashboard, for example to integrate it into your monitoring system, at the bottom, click JSON History.
Learn more
More information about the Status dashboard can be found on the Merchant Center Help Center.

To investigate the most common errors regarding the Content API for Shopping, check this page on our Developers site. For other errors or for general Content API support, visit the forum.

Can Robots Follow Instructions for New Tasks?

People can flexibly maneuver objects in their physical surroundings to accomplish various goals. One of the grand challenges in robotics is to successfully train robots to do the same, i.e., to develop a general-purpose robot capable of performing a multitude of tasks based on arbitrary user commands. Robots that are faced with the real world will also inevitably encounter new user instructions and situations that were not seen during training. Therefore, it is imperative for robots to be trained to perform multiple tasks in a variety of situations and, more importantly, to be capable of solving new tasks as requested by human users, even if the robot was not explicitly trained on those tasks.

Existing robotics research has made strides towards allowing robots to generalize to new objects, task descriptions, and goals. However, enabling robots to complete instructions that describe entirely new tasks has largely remained out-of-reach. This problem is remarkably difficult since it requires robots to both decipher the novel instructions and identify how to complete the task without any training data for that task. This goal becomes even more difficult when a robot needs to simultaneously handle other axes of generalization, such as variability in the scene and positions of objects. So, we ask the question: How can we confer noteworthy generalization capabilities onto real robots capable of performing complex manipulation tasks from raw pixels? Furthermore, can the generalization capabilities of language models help support better generalization in other domains, such as visuomotor control of a real robot?

In “BC-Z: Zero-Shot Task Generalization with Robotic Imitation Learning”, published at CoRL 2021, we present new research that studies how robots can generalize to new tasks that they were not trained to do. The system, called BC-Z, comprises two key components: (i) the collection of a large-scale demonstration dataset covering 100 different tasks and (ii) a neural network policy conditioned on a language or video instruction of the task. The resulting system can perform at least 24 novel tasks, including ones that require interaction with pairs of objects that were not previously seen together. We are also excited to release the robot demonstration dataset used to train our policies, along with pre-computed task embeddings.

The BC-Z system allows a robot to complete instructions for new tasks that the robot was not explicitly trained to do. It does so by training the policy to take as input a description of the task along with the robot’s camera image and to predict the correct action.

Collecting Data for 100 Tasks
Generalizing to a new task altogether is substantially harder than generalizing to held-out variations in training tasks. Simply put, we want robots to have more generalization all around, which requires that we train them on large amounts of diverse data.

We collect data by teleoperating the robot with a virtual reality headset. This data collection follows a scheme similar to how one might teach an autonomous car to drive. First, the human operator records complete demonstrations of each task. Then, once the robot has learned an initial policy, this policy is deployed under close supervision where, if the robot starts to make a mistake or gets stuck, the operator intervenes and demonstrates a correction before allowing the robot to resume.

This mixture of demonstrations and interventions has been shown to significantly improve performance by mitigating compounding errors. In our experiments, we see a 2x improvement in performance when using this data collection strategy compared to only using human demonstrations.

Example demonstrations collected for 12 out of the 100 training tasks, visualized from the perspective of the robot and shown at 2x speed.

Training a General-Purpose Policy
For all 100 tasks, we use this data to train a neural network policy to map from camera images to the position and orientation of the robot’s gripper and arm. Crucially, to allow this policy the potential to solve new tasks beyond the 100 training tasks, we also input a description of the task, either in the form of a language command (e.g., “place grapes in red bowl”) or a video of a person doing the task.

To accomplish a variety of tasks, the BC-Z system takes as input either a language command describing the task or a video of a person doing the task, as shown here.

By training the policy on 100 tasks and conditioning the policy on such a description, we unlock the possibility that the neural network will be able to interpret and complete instructions for new tasks. This is a challenge, however, because the neural network needs to correctly interpret the instruction, visually identify relevant objects for that instruction while ignoring other clutter in the scene, and translate the interpreted instruction and perception into the robot’s action space.

Experimental Results
In language models, it is well known that sentence embeddings generalize on compositions of concepts encountered in training data. For instance, if you train a translation model on sentences like “pick up a cup” and “push a bowl”, the model should also translate “push a cup” correctly.

We study the question of whether the compositional generalization capabilities found in language encoders can be transferred to real robots, i.e., being able to compose unseen object-object and task-object pairs.

We test this method by pre-selecting a set of 28 tasks, none of which were among the 100 training tasks. For example, one of these new test tasks is to pick up the grapes and place them into a ceramic bowl, but the training tasks involve doing other things with the grapes and placing other items into the ceramic bowl. The grapes and the ceramic bowl never appeared in the same scene during training.

In our experiments, we see that the robot can complete many tasks that were not included in the training set. Below are a few examples of the robot’s learned policy.

The robot completes three instructions of tasks that were not in its training data, shown at 2x speed.

Quantitatively, we see that the robot can succeed to some degree on a total of 24 out of the 28 held-out tasks, indicating a promising capacity for generalization. Further, we see a notably small gap between the performance on the training tasks and performance on the test tasks. These results indicate that simply improving multi-task visuomotor control could considerably improve performance.

The BC-Z performance on held-out tasks, i.e., tasks that the robot was not trained to perform. The system correctly interprets the language command and translates that into action to complete many of the tasks in our evaluation.

Takeaways
The results of this research show that simple imitation learning approaches can be scaled in a way that enables zero-shot generalization to new tasks. That is, it shows one of the first indications of robots being able to successfully carry out behaviors that were not in the training data. Interestingly, language embeddings pre-trained on ungrounded language corpora make for excellent task conditioners. We demonstrated that natural language models can not only provide a flexible input interface to robots, but that pretrained language representations actually confer new generalization capabilities to the downstream policy, such as composing unseen object pairs together.

In the course of building this system, we confirmed that periodic human interventions are a simple but important technique for achieving good performance. While there is a substantial amount of work to be done in the future, we believe that the zero-shot generalization capabilities of BC-Z are an important advancement towards increasing the generality of robotic learning systems and allowing people to command robots. We have released the teleoperated demonstrations used to train the policy in this paper, which we hope will provide researchers with a valuable resource for future multi-task robotic learning research.

Acknowledgements
We would like to thank the co-authors of this research: Alex Irpan, Mohi Khansari, Daniel Kappler, Frederik Ebert, Corey Lynch, and Sergey Levine. This project was a collaboration between Google Research and Everyday Robots. We would like to give special thanks to Noah Brown, Omar Cortes, Armando Fuentes, Kyle Jeffrey, Linda Luu, Sphurti Kirit More, Jornell Quiambao, Jarek Rettinghouse, Diego Reyes, Rosario Jau-regui Ruano, and Clayton Tan for overseeing robot operations and collecting human videos of the tasks, as well as Jeffrey Bingham, Jonathan Weisz, and Kanishka Rao for valuable discussions. We would also like to thank Tom Small for creating animations in this post and Paul Mooney for helping with dataset open-sourcing.

Source: Google AI Blog


Smule Adopts Google’s Oboe to Improve Recording Quality & Completion Rates

Posted by the Smule Engineering team: David Gayle, Chris Manchester, Mark Gills, Trayko Traykov, Randal Leistikow, Mariya Ivanova.

Light blue graphic with karaoke elements

Executive Summary

As the most downloaded singing app of all time, Smule Inc. has been investing on Android to improve the overall audio quality audio and more specifically to improve the lower latency, i.e. allowing singers to hear their voices in the headset as they perform. The teams specialized in Audio and Video allocated a significant part of 2021 into making the necessary changes to convert the Smule application used by over ten million Android users from using the OpenSL audio API to the Oboe audio library, enabling roughly a 10%+ increase in recording completion rate.

Introduction

Smule Inc. is a leader in karaoke, with an app that helps millions of people sing their favorite songs and share performances daily. The Smule application goes beyond traditional karaoke by focusing on co-creation, offering users the unique opportunity to share music and collaborate with friends, other singers on the platform, and their favorite music artists. Audio quality is paramount, and, in 2020, the Smule team saw potential to enhance the experience on Android.

Screenshots of Smule karaoke app

Smule’s legacy OpenSL implementation wasn’t well-suited to leverage the blazing-fast hardware of new devices while supporting the diverse devices across its world-wide market. Smule’s development team determined that upgrading the audio system was a necessary and a logical improvement.

Oboe Rollout Strategy

Smule was faced with two possible routes for improvement: directly targeting AAudio, a high-performance Android C audio API introduced in Android O designed for applications that require low latency, or Oboe, which wraps both AAudio and OpenSL internally. After careful evaluation, Smule’s development team opted for Oboe’s easy-to-use code base, and broad device compatibility, and robust community support, which achieved the lowest latency and made the best use of the available native audio.

The conversion to Oboe represented a significant architectural and technologicaltechnology evolution. As a result, Smule approached the rollout process conservatively, with a planned, gradual release that started with a small selection of device models to d validate quality. Week after week, the team enabled more devices (reverting a limited number of devices exhibiting problems in Oboe back to OpenSL). This incremental, methodical approach helped to minimize risk and allowed the engineering team to handle device-specific issues as they occurred.

Improving the Audio Quality Experience

Smule switched to Oboe to help improve the app experience. They hoped to reduce dramatically audio playback crashes, eliminate issues such as echo and crackling during recording, and reduce audio latency. A recent article in the Android Developers blog shows that the average latency of the twenty most popular devices decreased from 109 ms in 2017 to 39ms today using Oboe. Whereas a monitoring delay of 109ms is heard as a distinct echo which interferes with live singing, 39ms is beneath the acceptable threshold for real-time applications. The latencies of top devices today are all within 22ms of one another, and this consistency is a big plus.

The lift in recording completion rate Smule has seen using Oboe is likely due to this lower latency, allowing singers to hear their voices in the headset as they perform with Smule’s world-class audio effects applied, but without an echo.

Using an effective collaborative GitHub portal dedicated to Oboe, the Google team played a significant role in Smule’s Oboe integration, providing them with key insights and support. Working together, the two teams were able to launch the largest Oboe deployment to date, reaching millions of active users. The Smule team contributed to addressing some Oboe code issues, and the Google team coordinated with certain mobile device makers to further improve Oboe's compatibility.


Audio quality is of the utmost importance to our community of singers, and we're thankful for our shared commitment to delivering the best possible experience as well as empowering musical creation on Smule. - Eric Dumas, Smule CTO.


Given the massive scale of the operation, it was only natural to face device-specific issues. One notable example was an OS built-in functionality that injected echo sound effects in the raw audio stream, which prevented Smule from correctly applying its own patented DSP algorithms and audio filters. Google’s team came to the rescue, providing lightning fast updates and patches to the library. The process of reporting Oboe issues was straightforward, well defined, and handled in a timely manner by the Google team.

Smule overcame other device-specific roadblocks together, including errors with specific chipsets. As an example, when Oboe was asking for mono microphone input, a few devices provided stereo inputs mixed into one fake mono microphone input. Smule created a ticket in Oboe’s GitHub, providing examples and reproducing the issue using the Oboe tester app.

The Google-developed Oboe tester app was a helpful tool in solving and identifying issues throughout the implementation. It proved especially useful in testing many of the features of Oboe, AAudio, and OpenSL ES, as well as testing Android devices, measuring latency and glitches, and much more. The application offers a myriad of features that can help to simulate almost any audio setup. The Oboe tester can also be used in automated testing, by launching it from a shell script using an Android Intent. Smule relied heavily on the automation testing, given the large number of devices covered in the integration.

Once Smule was confident the device-specific issues were resolved and the Oboe audio was stable enough, Smule switched to a wider split testing rollout approach. In just a few weeks, Smule increased the population using Oboe from 10% to 100% percent of the successful devices, which was only possible due to the positive feedback and green KPI metrics Oboe received continuously throughout the release journey.

The results speak for themselves. Smule users on Oboe are singing more - it’s as simple as that. Unique karaoke recordings and performance joins, or duets, increased by a whopping 8.07%, Unique uploads by 3.84%, and song performances were completed by 4.10% more. Smule has observed in Q3 and Q4 2021 an increase of the recording completion rate by 10%+.

Using the Firebase Crashlytics tool by Google, Smule has seen a decline in audio-related crashes since the full Oboe release, making the app more stable - even on lower-end devices. Smule’s dedicated customer support team was thrilled to report a 33% reduction in audio-related complaints, including issues like (unintended) robot voice and echo.

The decision to switch to Oboe has paid off in spades. The application is functioning better than ever and Smule is well-equipped to face further advancements in audio and hardware with the updated technology. Most importantly, Smule users are happy and making music, which is what it’s all about.

Maps 101: How Google Maps reviews work



When exploring new places, reviews on Google are a treasure trove of local knowledge that can point you to the places and businesses you’ll enjoy most — whether it’s a bakery with the best gluten-free cupcake or a nearby restaurant with live music. 


With millions of reviews posted every day from people around the world, we have around-the-clock support to keep the information on Google relevant and accurate. Much of our work to prevent inappropriate content is done behind the scenes, so we wanted to shed some light on what happens after you hit “post” on a review. 




How we create and enforce our policies 

We’ve created strict content policies to make sure reviews are based on real-world experiences and to keep irrelevant and offensive comments off of Google Business Profiles. 

As the world evolves, so do our policies and protections. This helps us guard places and businesses from violative and off-topic content when there’s potential for them to be targeted for abuse. For instance, when governments and businesses started requiring proof of COVID-19 vaccine before entering certain places, we put extra protections in place to remove Google reviews that criticize a business for its health and safety policies or for complying with a vaccine mandate. 

Once a policy is written, it’s turned into training material — both for our operators and machine learning algorithms — to help our teams catch policy-violating content and ultimately keep Google reviews helpful and authentic. 



Moderating reviews with the help of machine learning 

As soon as someone posts a review, we send it to our moderation system to make sure the review doesn’t violate any of our policies. You can think of our moderation system as a security guard that stops unauthorized people from getting into a building — but instead, our team is stopping bad content from being posted on Google. 


Given the volume of reviews we regularly receive, we’ve found that we need both the nuanced understanding that humans offer and the scale that machines provide to help us moderate contributed content. They have different strengths so we continue to invest tremendously in both. 


Machines are our first line of defense because they’re good at identifying patterns. These patterns often immediately help our machines determine if the content is legitimate, and the vast majority of fake and fraudulent content is removed before anyone actually sees it. 


Our machines look at reviews from multiple angles, such as: 
  • The content of the review: Does it contain offensive or off-topic content? 
  • The account that left the review: Does the Google account have any history of suspicious behavior?
  • The place itself: Has there been uncharacteristic activity — such as an abundance of reviews over a short period of time? Has it recently gotten attention in the news or on social media that would motivate people to leave fraudulent reviews? 

Training a machine on the difference between acceptable and policy-violating content is a delicate balance. For example, sometimes the word “gay” is used as a derogatory term, and that’s not something we tolerate in Google reviews. But if we teach our machine learning models that it’s only used in hate speech, we might erroneously remove reviews that promote a gay business owner or an LGBTQ+ safe space. Our human operators regularly run quality tests and complete additional training to remove bias from the machine learning models. By thoroughly training our models on all the ways certain words or phrases are used, we improve our ability to catch policy-violating content and reduce the chance of inadvertently blocking legitimate reviews from going live. 


If our systems detect no policy violations, then the review can post within a matter of seconds. But our job doesn’t stop once a review goes live. Our systems continue to analyze the contributed content and watch for questionable patterns. These patterns can be anything from a group of people leaving reviews on the same cluster of Business Profiles to a business or place receiving an unusually high number of 1 or 5-star reviews over a short period of time. 


Keeping reviews authentic and reliable 

Like any platform that welcomes contributions from users, we also have to stay vigilant in our efforts to prevent fraud and abuse from appearing on Maps. Part of that is making it easy for people using Google Maps to flag any policy-violating reviews. If you think you see a policy-violating review on Google, we encourage you to report it to our team. Businesses can report reviews on their profiles here, and consumers can report them here


Our team of human operators works around the clock to review flagged content. When we find reviews that violate our policies, we remove them from Google and, in some cases, suspend the user account or even pursue litigation. 


In addition to reviewing flagged content, our team proactively works to identify potential abuse risks, which reduces the likelihood of successful abuse attacks. For instance, when there’s an upcoming event with a significant following — such as an election — we implement elevated protections to the places associated with the event and other nearby businesses that people might look for on Maps. We continue to monitor these places and businesses until the risk of abuse has subsided to support our mission of only publishing authentic and reliable reviews. Our investment in analyzing and understanding how contributed content can be abused has been critical in keeping us one step ahead of bad actors. 


With more than 1 billion people turning to Google Maps every month to navigate and explore, we want to make sure the information they see — especially reviews — is reliable for everyone. Our work is never done; we’re constantly improving our system and working hard to keep abuse, including fake reviews, off of the map.