Tag Archives: AI

Introducing PlaNet: A Deep Planning Network for Reinforcement Learning



Research into how artificial agents can improve their decisions over time is progressing rapidly via reinforcement learning (RL). For this technique, an agent observes a stream of sensory inputs (e.g. camera images) while choosing actions (e.g. motor commands), and sometimes receives a reward for achieving a specified goal. Model-free approaches to RL aim to directly predict good actions from the sensory observations, enabling DeepMind's DQN to play Atari and other agents to control robots. However, this blackbox approach often requires several weeks of simulated interaction to learn through trial and error, limiting its usefulness in practice.

Model-based RL, in contrast, attempts to have agents learn how the world behaves in general. Instead of directly mapping observations to actions, this allows an agent to explicitly plan ahead, to more carefully select actions by "imagining" their long-term outcomes. Model-based approaches have achieved substantial successes, including AlphaGo, which imagines taking sequences of moves on a fictitious board with the known rules of the game. However, to leverage planning in unknown environments (such as controlling a robot given only pixels as input), the agent must learn the rules or dynamics from experience. Because such dynamics models in principle allow for higher efficiency and natural multi-task learning, creating models that are accurate enough for successful planning is a long-standing goal of RL.

To spur progress on this research challenge and in collaboration with DeepMind, we present the Deep Planning Network (PlaNet) agent, which learns a world model from image inputs only and successfully leverages it for planning. PlaNet solves a variety of image-based control tasks, competing with advanced model-free agents in terms of final performance while being 5000% more data efficient on average. We are additionally releasing the source code for the research community to build upon.
The PlaNet agent learning to solve a variety of continuous control tasks from images in 2000 attempts. Previous agents that do not learn a model of the environment often require 50 times as many attempts to reach comparable performance.
How PlaNet Works
In short, PlaNet learns a dynamics model given image inputs and efficiently plans with it to gather new experience. In contrast to previous methods that plan over images, we rely on a compact sequence of hidden or latent states. This is called a latent dynamics model: instead of directly predicting from one image to the next image, we predict the latent state forward. The image and reward at each step is then generated from the corresponding latent state. By compressing the images in this way, the agent can automatically learn more abstract representations, such as positions and velocities of objects, making it easier to predict forward without having to generate images along the way.
Learned Latent Dynamics Model: In a latent dynamics model, the information of the input images is integrated into the hidden states (green) using the encoder network (grey trapezoids). The hidden state is then projected forward in time to predict future images (blue trapezoids) and rewards (blue rectangle).
To learn an accurate latent dynamics model, we introduce:
  • A Recurrent State Space Model: A latent dynamics model with both deterministic and stochastic components, allowing to predict a variety of possible futures as needed for robust planning, while remembering information over many time steps. Our experiments indicate both components to be crucial for high planning performance.
  • A Latent Overshooting Objective: We generalize the standard training objective for latent dynamics models to train multi-step predictions, by enforcing consistency between one-step and multi-step predictions in latent space. This yields a fast and effective objective that improves long-term predictions and is compatible with any latent sequence model.
While predicting future images allows us teach the model, encoding and decoding images (trapezoids in the figure above) requires significant computation, which would slow down planning. However, planning in the compact latent state space is fast since we only need to predict future rewards, and not images, to evaluate an action sequence. For example, the agent can imagine how the position of a ball and its distance to the goal will change for certain actions, without having to visualize the scenario. This allows us to compare 10,000 imagined action sequences with a large batch size every time the agent chooses an action. We then execute the first action of the best sequence found and replan at the next step.
Planning in Latent Space: For planning, we encode past images (gray trapezoid) into the current hidden state (green). From there, we efficiently predict future rewards for multiple action sequences. Note how the expensive image decoder (blue trapezoid) from the previous figure is gone. We then execute the first action of the best sequence found (red box).
Compared to our preceding work on world models, PlaNet works without a policy network -- it chooses actions purely by planning, so it benefits from model improvements on the spot. For the technical details, check out our online research paper or the PDF version.

PlaNet vs. Model-Free Methods
We evaluate PlaNet on continuous control tasks. The agent is only given image observations and rewards. We consider tasks that pose a variety of different challenges:
  • A cartpole swing-up task, with a fixed camera, so the cart can move out of sight. The agent thus must absorb and remember information over multiple frames.
  • A finger spin task that requires predicting two separate objects, as well as the interactions between them.
  • A cheetah running task that includes contacts with the ground that are difficult to predict precisely, calling for a model that can predict multiple possible futures.
  • A cup task, which only provides a sparse reward signal once a ball is caught. This demands accurate predictions far into the future to plan a precise sequence of actions.
  • A walker task, in which a simulated robot starts off by lying on the ground, and must first learn to stand up and then walk.
PlaNet agents trained on a variety of image-based control tasks. The animation shows the input images as the agent is solving the tasks. The tasks pose different challenges: partial observability, contacts with the ground, sparse rewards for catching a ball, and controlling a challenging bipedal robot.
Our work constitutes one of the first examples where planning with a learned model outperforms model-free methods on image-based tasks. The table below compares PlaNet to the well-known A3C agent and the D4PG agent, that combines recent advances in model-free RL. The numbers for these baselines are taken from the DeepMind Control Suite. PlaNet clearly outperforms A3C on all tasks and reaches final performance close to D4PG while, using 5000% less interaction with the environment on average.
One Agent for All Tasks
Additionally, we train a single PlaNet agent to solve all six tasks. The agent is randomly placed into different environments without knowing the task, so it needs to infer the task from its image observations. Without changes to the hyper parameters, the multi-task agent achieves the same mean performance as individual agents. While learning slower on the cartpole tasks, it learns substantially faster and reaches a higher final performance on the challenging walker task that requires exploration.
Video predictions of the PlaNet agent trained on multiple tasks. Holdout episodes collected with the trained agent are shown above and open-loop agent hallucinations below. The agent observes the first 5 frames as context to infer the task and state and accurately predicts ahead for 50 steps given a sequence of actions.
Conclusion
Our results showcase the promise of learning dynamics models for building autonomous RL agents. We advocate for further research that focuses on learning accurate dynamics models on tasks of even higher difficulty, such as 3D environments and real-world robotics tasks. A possible ingredient for scaling up is the processing power of TPUs. We are excited about the possibilities that model-based reinforcement learning opens up, including multi-task learning, hierarchical planning and active exploration using uncertainty estimates.

Acknowledgements
This project is a collaboration with Timothy Lillicrap, Ian Fischer, Ruben Villegas, Honglak Lee, David Ha and James Davidson. We further thank everybody who commented on our paper draft and provided feedback at any point throughout the project.




Source: Google AI Blog


Try your hand at the art of shadow puppetry, with help from AI

I grew up in Nantong, a beautiful, small coastal city in China, where the Yangtze River flows into the East China Sea. One of my earliest childhood memories was when my parents, during the long winter nights, would entertain me by narrating stories using hand shadows as puppets.

Later, I would see my first traditional shadow puppetry performance during a Lunar New Year family trip to Wuzhen. Over the years as I made my way around the world—whether in Indonesia, Egypt, or Greece—I found a form of shadow puppetry in local cultures, beautifully combining legends and traditions, music and art, imagination and craftsmanship. And it always made me think about those childhood nights with my family, and about passing down stories, connection, joy and love. 

With technology, I'm hoping to help connect people to this ancient art form. In September last year, we built an interactive installation that used AI to help people explore shadow puppetry. Though it’s an ancient art, people connected with shadow puppetry in a new way, and after the conference, we decided to bring it online so that everyone could play. So today, we’re making it available as a new AI experiment, Shadow Art

Shadow Art screengrab.gif

Shadow Art is a web browser-based game that lets you experience AI and shadow puppetry in a playful way. To bring what used to be an offline experience online, we used TensorFlow.js, a TensorFlow library which makes it easy to build and train a machine learning model directly in the browser.

How does Shadow Art work? 

You use your hands to form one of twelve zodiac animals from the lunar cycle in front of your laptop or phone camera, trying to match your hand to the diagram on the screen. The “shadow” of your hands on the screen then transform into a shadow puppet animal. Sounds easy, right? Here’s the catch: we turned it into an interactive game where you have twenty seconds to form each animal. The goal is to go through the full lunar cycle as fast as possible. 

The new experiment is now available in eleven language varieties, including English, Chinese, Thai, Bahasa Indonesia, Malay, Japanese, Korean, Spanish and Portuguese. In several countries around the world, our annual Lunar New Year Doodle is also celebrating the ancient storytelling art of shadow puppetry.

lunar-new-year-Doodle-2019.gif

It’s been great to see the shadow puppets of my childhood come to life in Shadow Art. I hope you’ll have as much fun with it as we did (my personal record for the full Zodiac cycle? 2:23 mins ;)). Happy Lunar New Year to all of you! 

Advancing research on fake audio detection

When you listen to Google Maps driving directions in your car, get answers from your Google Home, or hear a spoken translation in Google Translate, you're using Google's speech synthesis, or text-to-speech (TTS) technology. Speech interfaces not only allow you to interact naturally and conveniently with digital devices, they're a crucial technology for making information universally accessible: TTS opens up the internet to millions of users all over the world who may not be able to read, or who have visual impairments.


Over the last few years, there’s been an explosion of new research using neural networks to simulate a human voice. These models, including many developed at Google, can generate increasingly realistic, human-like speech.


While the progress is exciting, we’re keenly aware of the risks this technology can pose if used with the intent to cause harm. Malicious actors may synthesize speech to try to fool voice authentication systems, or they may create forged audio recordings to defame public figures. Perhaps equally concerning, public awareness of "deep fakes" (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media: as it becomes harder to distinguish real from tampered content, bad actors can more credibly claim that authentic data is fake.


We're taking action. When we launched the Google News Initiative last March, we committed to releasing datasets that would help advance state-of-the-art research on fake audio detection.  Today, we're delivering on that promise: Google AI and Google News Initiative have partnered to create a body of synthetic speech containing thousands of phrases spoken by our deep learning TTS models. These phrases are drawn from English newspaper articles, and are spoken by 68 synthetic "voices" covering a variety of regional accents.  


We're making this dataset available to all participants in the independent, externally-run 2019 ASVspoof challenge. This open challenge invites researchers all over the globe to submit countermeasures against fake (or "spoofed") speech, with the goal of making automatic speaker verification (ASV) systems more secure. By training models on both real and computer-generated speech, ASVspoof participants can develop systems that learn to distinguish between the two. The results will be announced in September at the 2019 Interspeech conference in Graz, Austria.


As we published in our AI Principles last year, we take seriously our responsibility both to engage with the external research community, and to apply strong safety practices to avoid unintended results that create risks of harm. We're also firmly committed to Google News Initiative's charter to help journalism thrive in the digital age, and our support for the ASVspoof challenge is an important step along the way.

When Iowa’s snow piles up, TensorFlow can keep roads safe

Iowa may be heaven, but it’s a snowy one. With an average of around 33 inches of snow every year, keeping roads open and safe is an important challenge. Car accidents tend to spike during the winter months each year in Iowa, as do costly delays. And dangerous commutes can mean hazards for people and commerce alike: the state is one of the country’s largest producers of agricultural output, and much of that is moved on roads.

To improve road safety and efficiency, the Iowa Department of Transportation has teamed up with researchers at Iowa State University to use machine learning, including our TensorFlow framework, to provide insights into traffic behavior. Iowa State’s technology helps analyze the visual data gathered from stationary cameras and cameras mounted on snow plows. They also capture traffic information using radar detectors. Machine learning transforms that data into conclusions about road conditions, like identifying congestion and getting first responders to the scenes of accidents faster..

This is just one recent example of TensorFlow being used to make drivers’ lives easier across the United States. In California, snow may not be an issue, but traffic certainly is, and college students there used TensorFlow to identify pot holes and dangerous road cracks in Los Angeles.

Officials in Iowa say machine learning could also be used to predict crash risks and travel speeds, and better understand drivers’ reactions or failures behind the wheel. But that doesn’t mean drivers will be off the hook. Iowa’s transportation and public safety departments constantly spread the same message: when it’s winter, slow down. Add some time onto your daily commute, and don’t use cruise control during a storm. That way, both drivers and state officials can work together to make winter travel less dreary—and a lot safer.

Expanding the Application of Deep Learning to Electronic Health Records



In 2018 we published a paper that showed how machine learning, when applied to medical records, can predict what might happen to patients who are hospitalized: for example, how long they would need to be in the hospital and, if discharged, how likely they would be to come back unexpectedly. Predictive models of various kinds have already been deployed in hospital settings by others, and our work aims to further improve potential clinical benefit by using new models that can make predictions faster, more accurate, and more adaptable for a broader range of clinical contexts.

Any endeavor to demonstrate the promise of machine learning requires intense collaboration between engineers, doctors, and medical researchers to make sure the work benefits patients, physicians, and health systems, and that it is equitable. Google is already fortunate to partner with some of the best academic medical centers in the world and we are now expanding this work to include Intermountain Healthcare, based in Utah.
The initial collaboration will focus on understanding how Google might adapt machine learning predictions to the various Intermountain care settings, from primary care clinics to the TeleHealth critical care unit, which remotely monitors critically ill patients in surrounding hospitals. We see potential in exploring how scalable computing platforms that include predictions might assist clinical teams in providing the best possible care.

As with our previous research, we will begin with jointly testing the performance of machine learning models on historical records, following strict policies to ensure that all data privacy and security measures are followed.

We are excited to explore how scalable computing platforms that include predictions might assist clinical teams in providing the best possible care in these settings. We additionally hope to further validate that our approach to predictions can work across health systems and improve care for patients.

Source: Google AI Blog


How machine learning can drive change in traffic-packed L.A.

There's nothing quite like driving through Los Angeles on a perfectly sunny day. But for drivers, the beauty of Southern California’s great weather and scenery is ruined by one thing: traffic.

According to a report by INRIX, my hometown is the worst city in the world for traffic, with a record of 102 hours of congestion during peak hours in 2017. My classmate, Ericson Hernandez, comes from New York City, which is ranked third globally for its traffic woes. Together, we decided to use machine learning to figure out the roots of bad traffic, including elements like road damage from potholes and cracks, and make rides around our beautiful cities enjoyable again.

As Ericson and I started studying electrical engineering at Loyola Marymount University, we began to develop an interest in a relatively new topic to the engineering world: machine learning. Our professor, Dr. Lei Huang, encouraged us to pick a project that we were passionate about, and Ericson and I wanted to use technology to tackle problems in the real world—such as helping the communities around us with road development.

This summer, we looked at previous research projects on detecting road cracks, and pondered how we could improve the algorithm and apply it to Los Angeles communities. We decided to use TensorFlow, Google’s open-source machine learning platform, to train a model that could quickly identify potholes and dangerous road cracks from camera footage of L.A. roads.

Students mount their camera before heading out to collect data.

Students mount their camera before heading out to collect data. 

Construction companies and cities could use this technology to identify which roads need fixing the most. With safer driving conditions and efficient road-work repairs, traffic in major cities could dramatically decrease, allowing for people to travel in a quick, safe and enjoyable manner. 

And that way, driving through Los Angeles can be about enjoying the view, not grumbling at the traffic.

Google AI Impact Challenge: a week to apply, plus research on why you should

In my twelve years at Google, I've seen that big things happen when you don't shy away from big ideas—especially when you pair those ideas with emerging technology. We're trying to encourage more of that kind of thinking with the Google AI Impact Challenge, a call for organizations to use AI to help address social, humanitarian and environmental problems. Before you read on, remember this: there are only seven days left to apply to the Challenge!


Hundreds of nonprofits and research organizations have already applied, and there’s good reason for all the excitement. Recently, we collaborated with McKinsey on research to identify ways AI can drive social change. The resulting report shows that AI projects have the potential to improve all 17 of the United Nations Sustainable Development Goals: end poverty and hunger, promote good health and wellbeing for all, and several more.

What works?

According to the research, AI has the greatest potential for impact in four areas: health and hunger, education, justice, equality and inclusion. AI can have the largest and most immediate impact through the application of computer vision, giving machines the ability to understand images and videos, and natural language processing, teaching computers to parse and understand human languages.


Computer vision can be used to improve health through better disease detection, our environment through wildlife tracking, and our education through new forms of learning for people with different learning capabilities. You’ve seen natural language processing at work in chatbots, which make the job-seeking process more efficient, or allow for better interaction between people seeking medical help and health providers.

What’s the hold up?

While AI cannot solve every problem, its potential is profound. So why isn’t every nonprofit and social entrepreneur embracing it? Three of the greatest challenges are access to talent, access to relevant data, and the capacity to deploy and sustain an AI project once it’s created. Nonprofits and their funders, the private sector and governments will need to work together to address these challenges.


To solve for talent scarcity, we need to continue to push for more education globally—especially for professionals willing to pursue AI. Private and public sector organizations may be able to open access to subsets of their data that could serve the clear public interest. Tools like Dataset Search are making it easier to discover potentially relevant datasets. Also, Nonprofits should look for opportunities to collect and share data most relevant to the problems they are looking to address. Finally, funders should consider how they can best support the ongoing deployment of AI projects and ensure social sector professionals have access to basic AI training.


McKinsey’s findings also show that to be successful, AI tools and techniques must be applied responsibly: clear principles must be established so that the solutions consider potential negative impacts—like the perpetuation of bias—on disadvantaged populations.


So, back to what I told you to remember: applications for the AI Impact Challenge close in seven days, on January 22 (@ 11:59:59 PST, to be exact). I’ll be part of an international panel of expert reviewers that will review all finalists and ultimately decide which ones will receive funds from our $25 million pool as well as other resources. We're excited to see what you come up with.

Looking Back at Google’s Research Efforts in 2018



2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.

Ethical Principles and AI
Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.

AI for Social Good
The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property.
A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.

We have also seen a huge number of external parties, sometimes in collaboration with Google researchers and engineers, using open source software like TensorFlow to tackle a wide range of scientific and social problems, such as using convolutional neural networks to identify humpback whale calls, detecting new exoplanets, identifying diseased cassava plants and more.
To spur creative activity in this area, we announced the Google AI for Social Impact Challenge in collaboration with Google.org, whereby individuals and organizations can receive grants from a total of $25M of funding, along with mentorship and advice from Google research scientists, engineers and other experts as they work to take a project with large potential social impact from idea to reality.

Assistive Technology
Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.

Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).

An important focus in our research is helping to make products like the Google Assistant support more languages and allow better understanding of semantic similarity, even when very different ways of expressing the same concept or idea are used. Underlying new product capabilities like these is research we performed on improving the quality of both speech synthesis and text-to-speech for languages without much training data available.

Quantum computing
Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.
A Bristlecone chip being installed by Research Scientist Marissa Giustina at the Quantum AI Lab in Santa Barbara.
We also released Cirq, an open source programming framework for quantum computers, and explored how quantum computers could be used for neural networks. Finally, we shared our experience and techniques for understanding performance fluctuations in quantum processors, and shared some thoughts on how quantum computers might be useful as a computational substrate for neural networks. We're looking forward to exciting results in the quantum computing space in 2019!

Natural Language Understanding
Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.
BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.
In addition to collaborating with various research teams to enable Smart Compose and Duplex (discussed previously), we worked to make the Google Assistant handle multilingual use cases better, with the goal of making the Assistant naturally conversational for all users.

Perception
Our perception research tackles the hard problems of allowing computers to understand images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality. In 2018, our technology improved Google Photos' ability to organize the content that users most care about, such as people and pets. Google Lens and the Assistant enabled users to learn about the natural world, answer questions in real-time, and do more with Lens in Google Images. A key aspect of the Google AI mission is to empower others to benefit from our technology, and we've made a lot of progress this year in improving capabilities and building blocks that are parts of Google APIs. Examples include improved and new capabilities in vision and video in Cloud ML APIs and face-related on-device building blocks through ML Kit.
Google Lens can help you learn more about the world around you. Here, Lens identifies the breed of this dog. Learn more in this blog post.
In 2018, our contributions to academic research included advances in deep learning for 3D scene understanding, such as stereo magnification, which enables synthesizing novel photorealistic views of a scene. Our ongoing research on better understanding images and video enables users to find, organize, enhance and improve images and video in Google products such as Photos, YouTube, Search and more. In 2018, notable advances included a fast bottom-up model for joint pose estimation and person instance segmentation, a system for visualizing complex motion, a system which models spatio-temporal relations between people and objects and improvements in video action recognition based on distillation and 3D convolutions.

In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listen combines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.

Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.

Computational Photography
The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.
Motion photos on the Pixel 2 in Google Photos. For more examples, check out this Google Photos album.
Augmented chicken family with Motion Stills AR mode.
This year, one of our primary efforts in computational photography research was to create a new capability called Night Sight, which enables Pixel phone cameras to "see in the dark", earning praise by both press and users. Of course, Night Sight is just one of the new software-enabled camera features our teams have developed to help you take the perfect photo, including using ML to provide better portrait mode shots, seeing better and further with Super Res Zoom and capturing special moments with Top Shot and Google Clips.
Left: iPhone XS (full resolution image here). Right: Pixel 3 Night Sight (full resolution image here).
Algorithms and Theory
Algorithms are the backbone of Google systems and touch all our products, from routing algorithms behind Google trips to consistent hashing for Google cloud. Over the past year, we continued our research in algorithms and theory covering a wide range of areas from theoretical foundations to applied algorithms, and from graph mining to privacy-preserving computation. Our work in optimization spans areas from studying continuous optimization for machine learning to distributed combinatorial optimization. In the former area, our work on studying convergence of stochastic optimization algorithms for training neural networks (which won an ICLR 2018 Best Paper Award) exhibited issues with popular gradient-based optimization methods (such as some variants of ADAM), but provided a solid foundation for new gradient-based optimization methods.
Performance comparison of ADAM and AMSGRAD on a synthetic example of a simple one dimensional convex problem inspired by our examples of non-convergence. The first two plots (left and center) are for the online setting and the the last one (right) is for the stochastic setting.
In distributed optimization, we worked to improve the round and communication complexity of well-studied combinatorial optimization problems such as matchings in graphs via round compression and via core-sets, as well as submodular maximization, and k-core decomposition. On the more applied side, we developed algorithmic techniques for solving set cover at scale via sketching and for solving balanced partitioning and hierarchical clustering for graphs with trillions of edges. Our work on online delivery services was nominated for the best paper award at WWW'18. Finally, our open source optimization OR-tools platform won 4 gold medals at the 2018 Minizinc constraint programming competition.

In algorithmic choice theory, we have proposed new models and investigated the problems of reconstruction and learning a mixture of multinomial logits. We also studied the classes of functions learnable by neural networks and how to use machine-learned oracles to improve classic online algorithms.

Understanding learning techniques with strong privacy guarantees is of great importance for us at Google. In this context, we developed two new means of analyzing how differential privacy can be amplified by iteration and by shuffling. We also applied differential privacy techniques to design incentive-aware learning methods that are robust against gaming. Such learning techniques have applications in efficient online market design. Our new research in the area of market algorithms include also techniques to help advertisers test incentive compatibility of ad auctions, and optimizing ad refresh for in-app advertising. We also pushed the boundaries of state-of-the-art dynamic mechanisms for repeated auctions, and presented dynamic auctions that are robust against lack of prediction of future, against noisy forecasts, or against heterogenous buyer behaviour, and extend our results to dynamic double auctions. Finally, in the context of robustness in online optimization and online learning, we developed new online allocation algorithms for stochastic input with traffic spikes and new bandit algorithms robust to corrupted data.

Software Systems
A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.
The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.
We also released JAX, an accelerator-backed variant of NumPy that supports automatic differentiation of Python functions to arbitrary order. While JAX is not part of TensorFlow, it leverages some of the same underlying software infrastructure (e.g. XLA), and some of its ideas and algorithms have been helpful to our TensorFlow projects. Finally, we continued our research on the security and privacy of machine learning, and our development of open source frameworks for safety and privacy in AI systems, such as CleverHans and TensorFlow Privacy.

Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.
The Hierarchical Planner's placement of a NMT (4-layer) model. White denotes CPU and the four colors each represent one of the GPUs. Note that every step of every layer is allocated across multiple GPUs. This placement is 53.7% faster than that generated by a human expert.
In 2018 we learned about Spectre and Meltdown, new classes of serious security vulnerabilities in modern computer processors, thanks to Google's Project Zero team in collaboration with others. These and related vulnerabilities will keep computer architecture researchers quite busy. In our continuing efforts to model CPU behavior, our Compiler Research team integrated their tool for measuring machine instruction latency and port pressure into LLVM, making possible better compilation decisions.

Google products, our Cloud offerings and inference for machine learning models depend critically on the ability to provide large-scale, reliable, efficient technical infrastructure for computing, storage and networking. A few research highlights from the past year include the evolution of Google's Software Defined Networking WAN, a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats, in many storage systems (BigTable, Spanner, Google Spreadsheets, etc.) and a report on our extensive use of code review, investigating the motivations behind code review at Google, current practices, and developers' satisfaction and challenges.

Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guarantees on the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.

AutoML
AutoML, also known as meta-learning, is the use of machine learning to automate some aspects of machine learning. We have been performing research in this space for many years, and the long-term goal is to develop learning systems that can learn to take a new problem and solve it automatically, using insights and capabilities derived from other problems that have been previously solved. Our earlier work in this space has mostly used reinforcement learning, but we are also interested in the use of evolutionary algorithms. Last year we showed how evolutionary algorithms can be used to automatically discover state-of-the-art neural network architectures for a variety of visual tasks. We also explored how reinforcement learning can be applied to other problems than just neural network architecture search, showing that it can be used to 1) automatically generate image transformation sequences that improve the accuracy of a wide variety of image models, and 2) find new symbolic optimization expressions that are more effective than the commonly used optimization update rules. Our work on AdaNet showed how to have a fast and flexible AutoML algorithm with learning guarantees.
AdaNet adaptively growing an ensemble of neural networks. At each iteration, it measures the ensemble loss for each candidate, and selects the best one to move onto the next iteration.
Another focus for us was on automatically discovering neural network architectures that are computationally efficient, so that they can run in environments such as mobile phones or autonomous vehicles that have tight constraints on either computational resources or on inference time. For this, we showed that combining the accuracy of a model with its inference computation time in the reward function for a reinforcement learning architecture search can find models that are highly accurate while meeting particular performance constraints. We also explored using ML to learn to automatically compress ML models to have fewer parameters and use less computational resources.

TPUs
Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.
An individual TPU v3 device (left) and a portion of a TPU v3 Pod (right). TPU v3 is the latest generation of Google's Tensor Processing Unit (TPU) hardware. Available to external customers as Cloud TPU v3, these systems are liquid-cooled for maximum performance (computer chips + liquid = exciting!), and a full TPU v3 Pod can apply more than 100 petaflops of computational power to the world's largest ML problems.
Open Source Software and Datasets
Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.

We are happy that TensorFlow has the strongest Github user retention of the top machine learning and deep learning frameworks. The TensorFlow team is also working to address Github issues faster and provide a smooth path for external contributors. In research, we continue to power much of the world's machine learning and deep learning research on a published paper basis according to Google Scholar data. TensorFlow Lite is now on more than 1.5B devices globally after being available for just one year. Additionally, TensorFlow.js is the number one ML framework for JavaScript; in the nine months since launch, it had over 2M Content Delivery Network (CDN) hits, 250K downloads and more than 10,000 stars on Github.

In addition to continued work on existing open source ecosystems, in 2018 we introduced a new framework for flexible and reproducible reinforcement learning, new visualization tools to rapidly understand the characteristics of a dataset (without needing to write any code), added a high-level library for expressing machine learning problems that involve learning-to-rank (the process of ordering a list of items in a way that maximizes the utility of the entire list, applicable across domains that include search engines, recommender systems, machine translation, dialogue systems and even computational biology), released a framework for fast and flexible AutoML solutions with learning guarantees, a library for doing in-browser realtime t-SNE visualizations using TensorFlow.js and added FHIR tools and software for working with electronic healthcare data (discussed in the healthcare section of this post).
Real-time evolution of the tSNE embedding for the complete MNIST dataset. The dataset contains images of 60,000 handwritten digits. You can find a live demo here.
Public datasets are often a great source of inspiration that lead to great progress across many fields, since they give the broader community both access to interesting data and problems as well as a healthy competitive drive to achieve better results on a variety of tasks. This year we were happy to release Google Dataset Search, a new tool for finding public datasets from all of the web. Over the years we have also curated and released many new, novel datasets, including everything from millions of general annotated images or videos, to a crowd-source Bengali dataset for speech recognition to robot arm grasping datasets and more. In 2018, we added even more datasets to that list.
Pictures from India & Singapore added to Open Images Extended using the Crowdsource app.
We released Open Images V4, a dataset containing 15.4M bounding-boxes for 600 categories on 1.9M images, as well as 30.1M human-verified image-level labels from 19,794 categories. We also extended this dataset to add more diversity of people and scenes from all over the world, by adding 5.5M generated annotations provided by tens of thousands of users from all over the world using crowdsource.google.com. We released the Atomic Visual Actions (AVA) dataset that provides audiovisual annotations of video for improving the state of the art in understanding human actions and speech in video. We also announced an updated YouTube-8M, and the 2nd YouTube-8M Large-Scale Video Understanding Challenge and Workshop. The HDR+ Burst Photography Dataset aims to enable a wide variety of research in the field of computational photography, and Google-Landmarks was a new dataset and challenge for landmark recognition. And while not a dataset release, we explored techniques that can enable faster creation of visual datasets using Fluid Annotation, an exploratory ML-powered interface for faster image annotation.
Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene, original image.
From time-to-time, we also help establish new kinds of challenges for the research community, so that we can all work together on solving difficult research problems. Often these are done with the release of a new dataset, but not always. This year, we established new challenges around the Inclusive Images Challenge, to work towards making more robust models that are free from many kinds of biases, the iNaturalist 2018 Challenge which aims to enable computers' fine-grained discrimination of visual categories (such as species of plants in an image), a Kaggle "Quick, Draw!" Doodle Recognition Challenge to create a better classifier for the QuickDraw challenge game, and Conceptual Captions, a larger-scale image captioning dataset and challenge aimed at enabling better image captioning model research.

Robotics
In 2018, we made significant progress towards our goal of understanding how ML can teach robots how to act in the world, achieving a new milestone in the ability to teach robots to grasp novel objects (best systems paper at CoRL'18), and using it to learn about objects without human supervision. We've also made progress in learning robot motion by combining ML and sampling-based methods (best paper in service robotics at ICRA'18) and learning robot geometry for faster planning. We've made great strides in our ability to better perceive the structure of the world from autonomous observation. For the first time, we've been able to successfully train deep reinforcement learning models online on real robots, and are finding new, theoretically grounded ways, to learn stable approaches to robot control.
Applications of AI to Other Fields
In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.

Our Nature Methods paper on high-precision automated reconstruction of neurons proposed a new model that improves the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques.
Our algorithm in action as it traces a single neurite in 3d in a songbird brain.
Some other examples of applying ML to science include:
A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.
Health
For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.

At the end of 2016, we published work showing that a model trained to assess retinal fundus images for signs of diabetic retinopathy was able to perform on-par to slightly-better than U.S. medical-board-certified ophthalmologists at this task in a retrospective study. In 2018, we were able to show that by having the training images labeled by retinal specialists and by using an adjudicated protocol (where multiple retinal specialists convene and have to arrive at a single collective assessment for each fundus image), we could arrive at a model that is on-par with retinal specialists. Later, we published an evaluation that showed how pairing ophthalmologists and this ML model allow them to make more accurate decisions than either alone. We have deployed this diabetic retinopathy detection system in partnership with our Alphabet colleagues at Verily at over 10 sites including Aravind Eye Hospitals in India and at Rajavithi Hospital affiliated with the Ministry of Health in Thailand.
On the left is a retinal fundus image graded as having moderate DR ("Mo") by an adjudication panel of ophthalmologists (ground truth). On the top right is an illustration of the predicted scores ("N" = no DR, "Mi" = Mild DR, "Mo" = Moderate DR) from the model. On the bottom right is the set of scores given by physicians without assistance ("Unassisted") and those who saw the model's predictions ("Grades Only").
In work that medical and eye specialists found quite remarkable, we also published research on a machine learning model that can assess cardiovascular risk from retinal images. This shows early promising signs for a novel, non-invasive biomarker that can help clinicians better understand the health of their patients.

We have also continued our focus on pathology this year, showing how to improve the grading of prostate cancer using ML, detect metastatic breast cancer with deep learning, and developed a prototype for an augmented-reality microscope that can aid pathologists and other scientists by overlaying visual information derived from computer vision models into the visual field of the microscopist in real time.

For the past four years, we have had a significant research effort around using deep learning on electronic health records to make clinically-relevant predictions. In 2018, in collaboration with University of Chicago Medicine, UCSF and Stanford Medicine, we published work in Nature Digital Medicine showing how ML models applied to de-identified electronic medical records can make significantly higher accuracy predictions for a variety of clinically relevant tasks than the current clinical best practice. As part of this work, we developed tools to make it significantly easier to create these models even on quite different tasks and quite different underlying EHR data sets. We have open sourced software related to the Fast Healthcare Interoperability Resources (FHIR) standard that we developed in this work to help make working with medical data easier and more standardized (see this GitHub repository). We also improved the accuracy, speed and utility of our deep learning-based variant caller, DeepVariant. The team has forged ahead with partners and recently published the peer-reviewed paper in Nature Biotechnology.

When applying ML to historically-collected data, it's important to understand the populations that have experienced human and structural biases in the past and how those biases have been codified in the data. Machine-learning offers an opportunity to detect and address bias and to proactively advance health equity, which we are designing our systems to do.

Research Outreach
We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world.
Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.

Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.

We believe that contributing openly to the broader research community is a critical part of supporting a healthy and productive research ecosystem. In addition to our open source and dataset releases, much of our research is published openly in top conference venues and journals, and we actively participate in the organization and sponsorship of conferences, all across the spectrum of different disciplines. For just a small sample, see our involvement at ICLR 2018, NAACL 2018, ICML 2018, CVPR 2018, NeurIPS 2018, ECCV 2018 and EMNLP 2018. Googlers also participated extensively in ASPLOS, HPCA, ICSE, IEEE Security & Privacy, OSDI, SIGCOMM, and many other conferences in 2018.

New Places, New Faces
In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.

Looking Forward to 2019
This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!

Source: Google AI Blog


How we worked to make AI for everyone in 2018

Seeing music. Predicting earthquake aftershocks. Finding emojis in real life. These are just a few examples of how researchers, engineers and user-experience (UX) professionals made imaginative ideas real. They made it happen using tools and techniques developed by Google’s People + AI Research (PAIR) team in 2018.

We founded PAIR in 2017 to conduct researchcreate design frameworks and build new technologies that help make partnerships between humans and artificial intelligence productive, enjoyable and fair. One of our main goals is to create easy-to-use tools to visualize machine learning (ML) datasets and train ML models (the mathematical equations that represent the steps a machine will complete to make a decision) in browsers. Put simply, this means anyone with an internet connection can now use ML.

Here’s what PAIR has accomplished over the past year—and here’s how engineers and UX teams can put our resources to use in 2019 and beyond.

Creating a design library—and learning how to design for AI

In January, we launched a library of user-experience articles and case studies on Google Design. These show how Google makes decisions to balance our users’ needs for familiarity and trust with new functionality and experiences enabled by AI. The case studies go behind the scenes to show how Google teams developed user experiences for applications, like the fun mobile game Emoji Scavenger Hunt.

In these articles, practicing user-experience designers offer clear how-tos. They address challenges in designing for AI, such as balancing how to design for habits like swiping or scrolling in certain directions, and building personalized experiences for individual users. We know we don’t have all the answers, so we also seek advice from outside experts, like Paola Antonelli, Senior Curator of Architecture and Design at New York’s Museum of Modern Art (MoMA), who answered our team’s questions on how to use AI as a design material itself.

Talking about AI across disciplines

A key part of our process is partnering with domain experts in other fields. For example, this year we worked with Harvard’s Brendan Meade and the University of Connecticut’s Phoebe de Vries on a model for predicting and visualizing earthquake aftershocks. This project led to a state-of-the-art model for aftershock prediction--and, intriguingly, our analysis of the  AI suggested new, unexpected directions for human researchers to investigate.

In March, we  hosted our first UX symposium in Zurich, featuring external researchers and industry professionals. And in May, we held a panel at  I/O, “AI for Everyone,” featuring Google engineering leaders with a spectrum of expertise, from cloud computing to climate science, to discuss fair and inclusive AI in these fields.

We’re also dedicated to translating the complicated language behind AI for everyone who uses it, even if they’re not engineers. Since June, our first PAIR writer-in-residence, tech journalist David Weinberger, has been embedded in PAIR’s Cambridge, Mass. lab. He’s explaining key AI concepts, like classification and confidence levels, and timely topics like fairness in machine learning, for non-technical audiences.

New open-source tools for engineers, UXers and beyond

Seeing Music

Using TensorFlow.js, an open-source Javascript library created by PAIR, and other software, a group of musicians, designers, engineers and the Google Creative Lab created Seeing Music, which makes it possible to visualize subtle textures in sound.

We believe in applying deep insights to invent, and open-source, new technologies that can be used by engineers, UX professionals, and other stakeholders who may not be experts in ML.

So we started TensorFlow.js, a pure Javascript library that extends TensorFlow into the browser. Since open-sourcing TensorFlow.js in March, we've seen a variety of applications–including a set of accessible creative tools for drawing, making music and more, designed by Google’s Creative Lab with collaborators from the accessibility community.

Creatability: Exploring ways to make creative tools more accessible for everyone

Our PAIR team also built the What-If Tool, released this fall, so professionals building ML systems don’t have to write a single line of code to answer “what if” questions such as: “What if I changed data points, how would this affect my model’s predictions? Does it perform differently for various groups–for example, historically marginalized people?" Our tool makes it possible to simply click a button to visualize and inspect alternative scenarios.

Also this year, our team developed and open-sourced a new technique for helping people more easily understand the inner workings of neural networks in terms of simple, human-understandable concepts – like showing how AI can recognize images of zebras by their stripes.

In 2019, we’re excited to expand PAIR’s work further with global audiences of engineers and user-experience designers–and everyday users. For more resources, updates and information on our research, head to PAIR’s website.

Google AI Principles updates, six months in

Six months ago we announced Google’s AI Principles, which guide the ethical development and use of AI in our research and products. As a complement to the Principles, we also posted our Responsible AI Practices, a set of quarterly-updated technical recommendations and results to share with the wider AI ecosystem. Since then we’ve put in place additional initiatives and processes to ensure we live up to the Principles in practice.  

First, we want to encourage teams throughout Google to consider how and whether our AI Principles affect their projects. To that end, we’ve established several efforts:

  • Trainings based on the “Ethics in Technology Practice” project developed at the Markkula Center for Applied Ethics at Santa Clara University, with additional materials tailored to the AI Principles. The content is designed to help technical and non-technical Googlers address the multifaceted ethical issues that arise in their work. So far, more than 100 Googlers from different countries have tried out the course and in the future we plan to make it accessible for everyone across the company.
  • AI Ethics Speaker Series with external experts across different countries, regions, and professional disciplines. So far, we’ve had eight sessions with 11 speakers, covering topics from bias in natural language processing (NLP) to the use of AI in criminal justice. 
  • We added a technical module on fairnessto our free Machine Learning Crash Course, which is available in 11 languages and has been used to train more than 21,000 Google employees. The fairness module, which is currently available in English with more languages coming soon, explores how bias can crop up in training data, and ways to identify and mitigate it.

Along with these efforts to engage Googlers, we’ve established a formal review structure to assess new projects, products and deals. Thoughtful decisions require a careful and nuanced consideration of how the AI Principles (which are intentionally high-level to allow flexibility as technology and circumstances evolve) should apply, how to make tradeoffs when principles come into conflict, and how to mitigate risks for a given circumstance. The review structure consists of three core groups:

  • A responsible innovation team that handles day-to-day operations and initial assessments. This group includes user researchers, social scientists, ethicists, human rights specialists, policy and privacy advisors, and legal experts on both a full- and part-time basis, which allows for diversity and inclusion of perspectives and disciplines. 
  • A group of senior experts from a range of disciplines across Alphabet who provide technological, functional, and application expertise. 
  • A council of senior executives to handle the most complex and difficult issues, including decisions that affect multiple products and technologies.

We’ve conducted more than 100 reviews so far, assessing the scale, severity, and likelihood of best- and worst-case scenarios for each product and deal. Most of these cases, like the integration of guidelines for creating inclusive machine learning in our Cloud AutoML products, have aligned with the Principles. We’ve modified some efforts, like research in visual speech recognition, to clearly outline assistive benefits as well as model limitations that minimize the potential for misuse. And in a small number of product use-cases—like a general-purpose facial recognition API—we’ve decided to hold off on offering functionality before working through important technology and policy questions.


The variety and scope of the cases considered so far are helping us build a framework for scaling this process across Google products and technologies. This framework will include the creation of an external advisory group, comprised of experts from a variety of disciplines, to complement the internal governance and processes outlined above.


We’re committed to promoting thoughtful consideration of these important issues and appreciate the work of the many teams contributing to the review process, as we continue to refine our approach.