Posted by Danijar Hafner, Student Researcher, Google AI
Research into how artificial agents can improve their decisions over time is progressing rapidly via reinforcement learning (RL). For this technique, an agent observes a stream of sensory inputs (e.g. camera images) while choosing actions (e.g. motor commands), and sometimes receives a reward for achieving a specified goal. Model-free approaches to RL aim to directly predict good actions from the sensory observations, enabling DeepMind's DQN to play Atari and other agents to controlrobots. However, this blackbox approach often requires several weeks of simulated interaction to learn through trial and error, limiting its usefulness in practice.
Model-based RL, in contrast, attempts to have agents learn how the world behaves in general. Instead of directly mapping observations to actions, this allows an agent to explicitly plan ahead, to more carefully select actions by "imagining" their long-term outcomes. Model-based approaches have achieved substantial successes, including AlphaGo, which imagines taking sequences of moves on a fictitious board with the known rules of the game. However, to leverage planning in unknown environments (such as controlling a robot given only pixels as input), the agent must learn the rules or dynamics from experience. Because such dynamics models in principle allow for higher efficiency and natural multi-task learning, creating models that are accurate enough for successful planning is a long-standing goal of RL.
To spur progress on this research challenge and in collaboration with DeepMind, we present the Deep Planning Network (PlaNet) agent, which learns a world model from image inputs only and successfully leverages it for planning. PlaNet solves a variety of image-based control tasks, competing with advanced model-free agents in terms of final performance while being 5000% more data efficient on average. We are additionally releasing the source code for the research community to build upon.
The PlaNet agent learning to solve a variety of continuous control tasks from images in 2000 attempts. Previous agents that do not learn a model of the environment often require 50 times as many attempts to reach comparable performance.
How PlaNet Works In short, PlaNet learns a dynamics model given image inputs and efficiently plans with it to gather new experience. In contrast to previous methods that plan over images, we rely on a compact sequence of hidden or latent states. This is called a latent dynamics model: instead of directly predicting from one image to the next image, we predict the latent state forward. The image and reward at each step is then generated from the corresponding latent state. By compressing the images in this way, the agent can automatically learn more abstract representations, such as positions and velocities of objects, making it easier to predict forward without having to generate images along the way.
Learned Latent Dynamics Model: In a latent dynamics model, the information of the input images is integrated into the hidden states (green) using the encoder network (grey trapezoids). The hidden state is then projected forward in time to predict future images (blue trapezoids) and rewards (blue rectangle).
To learn an accurate latent dynamics model, we introduce:
A Recurrent State Space Model: A latent dynamics model with both deterministic and stochastic components, allowing to predict a variety of possible futures as needed for robust planning, while remembering information over many time steps. Our experiments indicate both components to be crucial for high planning performance.
A Latent Overshooting Objective: We generalize the standard training objective for latent dynamics models to train multi-step predictions, by enforcing consistency between one-step and multi-step predictions in latent space. This yields a fast and effective objective that improves long-term predictions and is compatible with any latent sequence model.
While predicting future images allows us teach the model, encoding and decoding images (trapezoids in the figure above) requires significant computation, which would slow down planning. However,planning in the compact latent state space is fast since we only need to predict future rewards, and not images, to evaluate an action sequence. For example, the agent can imagine how the position of a ball and its distance to the goal will change for certain actions, without having to visualize the scenario. This allows us to compare 10,000 imagined action sequences with a large batch size every time the agent chooses an action. We then execute the first action of the best sequence found and replan at the next step.
Planning in Latent Space: For planning, we encode past images (gray trapezoid) into the current hidden state (green). From there, we efficiently predict future rewards for multiple action sequences. Note how the expensive image decoder (blue trapezoid) from the previous figure is gone. We then execute the first action of the best sequence found (red box).
Compared to our preceding work on world models, PlaNet works without a policy network -- it chooses actions purely by planning, so it benefits from model improvements on the spot. For the technical details, check out our online research paper or the PDF version.
PlaNet vs. Model-Free Methods We evaluate PlaNet on continuous control tasks. The agent is only given image observations and rewards. We consider tasks that pose a variety of different challenges:
A cartpole swing-up task, with a fixed camera, so the cart can move out of sight. The agent thus must absorb and remember information over multiple frames.
A finger spin task that requires predicting two separate objects, as well as the interactions between them.
A cheetah running task that includes contacts with the ground that are difficult to predict precisely, calling for a model that can predict multiple possible futures.
A cup task, which only provides a sparse reward signal once a ball is caught. This demands accurate predictions far into the future to plan a precise sequence of actions.
A walker task, in which a simulated robot starts off by lying on the ground, and must first learn to stand up and then walk.
PlaNet agents trained on a variety of image-based control tasks. The animation shows the input images as the agent is solving the tasks. The tasks pose different challenges: partial observability, contacts with the ground, sparse rewards for catching a ball, and controlling a challenging bipedal robot.
Our work constitutes one of the first examples where planning with a learned model outperforms model-free methods on image-based tasks. The table below compares PlaNet to the well-known A3C agent and the D4PG agent, that combines recent advances in model-free RL. The numbers for these baselines are taken from the DeepMind Control Suite. PlaNet clearly outperforms A3C on all tasks and reaches final performance close to D4PG while, using 5000% less interaction with the environment on average.
One Agent for All Tasks Additionally, we train a single PlaNet agent to solve all six tasks. The agent is randomly placed into different environments without knowing the task, so it needs to infer the task from its image observations. Without changes to the hyper parameters, the multi-task agent achieves the same mean performance as individual agents. While learning slower on the cartpole tasks, it learns substantially faster and reaches a higher final performance on the challenging walker task that requires exploration.
Video predictions of the PlaNet agent trained on multiple tasks. Holdout episodes collected with the trained agent are shown above and open-loop agent hallucinations below. The agent observes the first 5 frames as context to infer the task and state and accurately predicts ahead for 50 steps given a sequence of actions.
Conclusion Our results showcase the promise of learning dynamics models for building autonomous RL agents. We advocate for further research that focuses on learning accurate dynamics models on tasks of even higher difficulty, such as 3D environments and real-world robotics tasks. A possible ingredient for scaling up is the processing power of TPUs. We are excited about the possibilities that model-based reinforcement learning opens up, including multi-task learning, hierarchical planning and active exploration using uncertainty estimates.
Acknowledgements This project is a collaboration with Timothy Lillicrap, Ian Fischer, Ruben Villegas, Honglak Lee, David Ha and James Davidson. We further thank everybody who commented on our paper draft and provided feedback at any point throughout the project.
I grew up in Nantong, a beautiful, small coastal city in China, where the Yangtze River flows into the East China Sea. One of my earliest childhood memories was when my parents, during the long winter nights, would entertain me by narrating stories using hand shadows as puppets.
Later, I would see my first traditional shadow puppetry performance during a Lunar New Year family trip to Wuzhen. Over the years as I made my way around the world—whether in Indonesia, Egypt, or Greece—I found a form of shadow puppetry in local cultures, beautifully combining legends and traditions, music and art, imagination and craftsmanship. And it always made me think about those childhood nights with my family, and about passing down stories, connection, joy and love.
With technology, I'm hoping to help connect people to this ancient art form. In September last year, we built an interactive installation that used AI to help people explore shadow puppetry. Though it’s an ancient art, people connected with shadow puppetry in a new way, and after the conference, we decided to bring it online so that everyone could play. So today, we’re making it available as a new AI experiment, Shadow Art.
Shadow Art is a web browser-based game that lets you experience AI and shadow puppetry in a playful way. To bring what used to be an offline experience online, we used TensorFlow.js, a TensorFlow library which makes it easy to build and train a machine learning model directly in the browser.
How does Shadow Art work?
You use your hands to form one of twelve zodiac animals from the lunar cycle in front of your laptop or phone camera, trying to match your hand to the diagram on the screen. The “shadow” of your hands on the screen then transform into a shadow puppet animal. Sounds easy, right? Here’s the catch: we turned it into an interactive game where you have twenty seconds to form each animal. The goal is to go through the full lunar cycle as fast as possible.
The new experiment is now available in eleven language varieties, including English, Chinese, Thai, Bahasa Indonesia, Malay, Japanese, Korean, Spanish and Portuguese. In several countries around the world, our annual Lunar New Year Doodle is also celebrating the ancient storytelling art of shadow puppetry.
It’s been great to see the shadow puppets of my childhood come to life in Shadow Art. I hope you’ll have as much fun with it as we did (my personal record for the full Zodiac cycle? 2:23 mins ;)). Happy Lunar New Year to all of you!
When you listen to Google Maps driving directions in your car, get answers from your Google Home, or hear a spoken translation in Google Translate, you're using Google's speech synthesis, or text-to-speech (TTS) technology. Speech interfaces not only allow you to interact naturally and conveniently with digital devices, they're a crucial technology for making information universally accessible: TTS opens up the internet to millions of users all over the world who may not be able to read, or who have visual impairments.
Over the last few years, there’s been an explosion of new research using neural networks to simulate a human voice. These models, including many developed at Google, can generate increasingly realistic, human-like speech.
While the progress is exciting, we’re keenly aware of the risks this technology can pose if used with the intent to cause harm. Malicious actors may synthesize speech to try to fool voice authentication systems, or they may create forged audio recordings to defame public figures. Perhaps equally concerning, public awareness of "deep fakes" (audio or video clips generated by deep learning models) can be exploited to manipulate trust in media: as it becomes harder to distinguish real from tampered content, bad actors can more credibly claim that authentic data is fake.
We're taking action. When we launched the Google News Initiative last March, we committed to releasing datasets that would help advance state-of-the-art research on fake audio detection. Today, we're delivering on that promise: Google AI and Google News Initiative have partnered to create a body of synthetic speech containing thousands of phrases spoken by our deep learning TTS models. These phrases are drawn from English newspaper articles, and are spoken by 68 synthetic "voices" covering a variety of regional accents.
We're making this dataset available to all participants in the independent, externally-run 2019 ASVspoof challenge. This open challenge invites researchers all over the globe to submit countermeasures against fake (or "spoofed") speech, with the goal of making automatic speaker verification (ASV) systems more secure. By training models on both real and computer-generated speech, ASVspoof participants can develop systems that learn to distinguish between the two. The results will be announced in September at the 2019 Interspeech conference in Graz, Austria.
As we published in our AI Principles last year, we take seriously our responsibility both to engage with the external research community, and to apply strong safety practices to avoid unintended results that create risks of harm. We're also firmly committed to Google News Initiative's charter to help journalism thrive in the digital age, and our support for the ASVspoof challenge is an important step along the way.
Iowa may be heaven, but it’s a snowy one. With an average of around 33 inches of snow every year, keeping roads open and safe is an important challenge. Car accidents tend to spike during the winter months each year in Iowa, as do costly delays. And dangerous commutes can mean hazards for people and commerce alike: the state is one of the country’s largest producers of agricultural output, and much of that is moved on roads.
To improve road safety and efficiency, the Iowa Department of Transportation has teamed up with researchers at Iowa State University to use machine learning, including our TensorFlow framework, to provide insights into traffic behavior. Iowa State’s technology helps analyze the visual data gathered from stationary cameras and cameras mounted on snow plows. They also capture traffic information using radar detectors. Machine learning transforms that data into conclusions about road conditions, like identifying congestion and getting first responders to the scenes of accidents faster..
This is just one recent example of TensorFlow being used to make drivers’ lives easier across the United States. In California, snow may not be an issue, but traffic certainly is, and college students there used TensorFlow to identify pot holes and dangerous road cracks in Los Angeles.
Officials in Iowa say machine learning could also be used to predict crash risks and travel speeds, and better understand drivers’ reactions or failures behind the wheel. But that doesn’t mean drivers will be off the hook. Iowa’s transportation and public safety departments constantly spread the same message: when it’s winter, slow down. Add some time onto your daily commute, and don’t use cruise control during a storm. That way, both drivers and state officials can work together to make winter travel less dreary—and a lot safer.
Posted by Alvin Rajkomar, MD and Eyal Oren, PhD, Google AI, Healthcare
In 2018 we published a paper that showed how machine learning, when applied to medical records, can predict what might happen to patients who are hospitalized: for example, how long they would need to be in the hospital and, if discharged, how likely they would be to come back unexpectedly. Predictive models of various kinds have already been deployed in hospital settings by others, and our work aims to further improve potential clinical benefit by using new models that can make predictions faster, more accurate, and more adaptable for a broader range of clinical contexts.
Any endeavor to demonstrate the promise of machine learning requires intense collaboration between engineers, doctors, and medical researchers to make sure the work benefits patients, physicians, and health systems, and that it is equitable. Google is already fortunate to partner with some of the best academic medical centers in the world and we are now expanding this work to include Intermountain Healthcare, based in Utah.
The initial collaboration will focus on understanding how Google might adapt machine learning predictions to the various Intermountain care settings, from primary care clinics to the TeleHealth critical care unit, which remotely monitors critically ill patients in surrounding hospitals. We see potential in exploring how scalable computing platforms that include predictions might assist clinical teams in providing the best possible care.
As with our previous research, we will begin with jointly testing the performance of machine learning models on historical records, following strict policies to ensure that all data privacy and security measures are followed.
We are excited to explore how scalable computing platforms that include predictions might assist clinical teams in providing the best possible care in these settings. We additionally hope to further validate that our approach to predictions can work across health systems and improve care for patients.
There's nothing quite like driving through Los Angeles on a perfectly sunny day. But for drivers, the beauty of Southern California’s great weather and scenery is ruined by one thing: traffic.
According to a report by INRIX, my hometown is the worst city in the world for traffic, with a record of 102 hours of congestion during peak hours in 2017. My classmate, Ericson Hernandez, comes from New York City, which is ranked third globally for its traffic woes. Together, we decided to use machine learning to figure out the roots of bad traffic, including elements like road damage from potholes and cracks, and make rides around our beautiful cities enjoyable again.
As Ericson and I started studying electrical engineering at Loyola Marymount University, we began to develop an interest in a relatively new topic to the engineering world: machine learning. Our professor, Dr. Lei Huang, encouraged us to pick a project that we were passionate about, and Ericson and I wanted to use technology to tackle problems in the real world—such as helping the communities around us with road development.
This summer, we looked at previous research projects on detecting road cracks, and pondered how we could improve the algorithm and apply it to Los Angeles communities. We decided to use TensorFlow, Google’s open-source machine learning platform, to train a model that could quickly identify potholes and dangerous road cracks from camera footage of L.A. roads.
Construction companies and cities could use this technology to identify which roads need fixing the most. With safer driving conditions and efficient road-work repairs, traffic in major cities could dramatically decrease, allowing for people to travel in a quick, safe and enjoyable manner.
And that way, driving through Los Angeles can be about enjoying the view, not grumbling at the traffic.
In my twelve years at Google, I've seen that big things happen when you don't shy away from big ideas—especially when you pair those ideas with emerging technology. We're trying to encourage more of that kind of thinking with the Google AI Impact Challenge, a call for organizations to use AI to help address social, humanitarian and environmental problems. Before you read on, remember this: there are only seven days left to apply to the Challenge!
Hundreds of nonprofits and research organizations have already applied, and there’s good reason for all the excitement. Recently, we collaborated with McKinsey on research to identify ways AI can drive social change. The resulting report shows that AI projects have the potential to improve all 17 of the United Nations Sustainable Development Goals: end poverty and hunger, promote good health and wellbeing for all, and several more.
According to the research, AI has the greatest potential for impact in four areas: health and hunger, education, justice, equality and inclusion. AI can have the largest and most immediate impact through the application of computer vision, giving machines the ability to understand images and videos, and natural language processing, teaching computers to parse and understand human languages.
Computer vision can be used to improve health through better disease detection, our environment through wildlife tracking, and our education through new forms of learning for people with different learning capabilities. You’ve seen natural language processing at work in chatbots, which make the job-seeking process more efficient, or allow for better interaction between people seeking medical help and health providers.
What’s the hold up?
While AI cannot solve every problem, its potential is profound. So why isn’t every nonprofit and social entrepreneur embracing it? Three of the greatest challenges are access to talent, access to relevant data, and the capacity to deploy and sustain an AI project once it’s created. Nonprofits and their funders, the private sector and governments will need to work together to address these challenges.
To solve for talent scarcity, we need to continue to push for more education globally—especially for professionals willing to pursue AI. Private and public sector organizations may be able to open access to subsets of their data that could serve the clear public interest. Tools like Dataset Search are making it easier to discover potentially relevant datasets. Also, Nonprofits should look for opportunities to collect and share data most relevant to the problems they are looking to address. Finally, funders should consider how they can best support the ongoing deployment of AI projects and ensure social sector professionals have access to basic AI training.
McKinsey’s findings also show that to be successful, AI tools and techniques must be applied responsibly: clear principles must be established so that the solutions consider potential negative impacts—like the perpetuation of bias—on disadvantaged populations.
So, back to what I told you to remember: applications for the AI Impact Challenge close in seven days, on January 22 (@ 11:59:59 PST, to be exact). I’ll be part of an international panel of expert reviewers that will review all finalists and ultimately decide which ones will receive funds from our $25 million pool as well as other resources. We're excited to see what you come up with.
Posted by Jeff Dean, Senior Fellow and Google AI Lead, on behalf of the entire Google Research Community
2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.
Ethical Principles and AI Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.
AI for Social Good The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property. A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.
Assistive Technology Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.
Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).
Quantum computing Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.
A Bristlecone chip being installed by Research Scientist Marissa Giustina at the Quantum AI Lab in Santa Barbara.
Natural Language Understanding Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.
BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.
In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listen combines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.
Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.
Computational Photography The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.
Motion photos on the Pixel 2 in Google Photos. For more examples, check out this Google Photos album.
Augmented chicken family with Motion Stills AR mode.
This year, one of our primary efforts in computational photography research was to create a new capability called Night Sight, which enables Pixel phone cameras to "see in the dark", earning praise by both press and users. Of course, Night Sight is just one of the new software-enabled camera features our teams have developed to help you take the perfect photo, including using ML to provide better portrait mode shots, seeing better and further with Super Res Zoom and capturing special moments with Top Shot and Google Clips.
Performance comparison of ADAM and AMSGRAD on a synthetic example of a simple one dimensional convex problem inspired by our examples of non-convergence. The first two plots (left and center) are for the online setting and the the last one (right) is for the stochastic setting.
Software Systems A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.
The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.
We also released JAX, an accelerator-backed variant of NumPy that supports automatic differentiation of Python functions to arbitrary order. While JAX is not part of TensorFlow, it leverages some of the same underlying software infrastructure (e.g. XLA), and some of its ideas and algorithms have been helpful to our TensorFlow projects. Finally, we continued our research on the security and privacy of machine learning, and our development of open source frameworks for safety and privacy in AI systems, such as CleverHans and TensorFlow Privacy.
Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.
The Hierarchical Planner's placement of a NMT (4-layer) model. White denotes CPU and the four colors each represent one of the GPUs. Note that every step of every layer is allocated across multiple GPUs. This placement is 53.7% faster than that generated by a human expert.
In 2018 we learned about Spectre and Meltdown, new classes of serious security vulnerabilities in modern computer processors, thanks to Google's Project Zero team in collaboration with others. These and related vulnerabilities will keep computer architecture researchers quite busy. In our continuing efforts to model CPU behavior, our Compiler Research team integrated their tool for measuring machine instruction latency and port pressure into LLVM, making possible better compilation decisions.
Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guarantees on the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.
TPUs Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.
An individual TPU v3 device (left) and a portion of a TPU v3 Pod (right). TPU v3 is the latest generation of Google's Tensor Processing Unit (TPU) hardware. Available to external customers as Cloud TPU v3, these systems are liquid-cooled for maximum performance (computer chips + liquid = exciting!), and a full TPU v3 Pod can apply more than 100 petaflops of computational power to the world's largest ML problems.
Open Source Software and Datasets Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.
Real-time evolution of the tSNE embedding for the complete MNIST dataset. The dataset contains images of 60,000 handwritten digits. You can find a live demo here.
Public datasets are often a great source of inspiration that lead to great progress across many fields, since they give the broader community both access to interesting data and problems as well as a healthy competitive drive to achieve better results on a variety of tasks. This year we were happy to release Google Dataset Search, a new tool for finding public datasets from all of the web. Over the years we have also curated and released many new, novel datasets, including everything from millions of general annotated images or videos, to a crowd-source Bengali dataset for speech recognition to robot arm grasping datasets and more. In 2018, we added even more datasets to that list.
Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene, original image.
From time-to-time, we also help establish new kinds of challenges for the research community, so that we can all work together on solving difficult research problems. Often these are done with the release of a new dataset, but not always. This year, we established new challenges around the Inclusive Images Challenge, to work towards making more robust models that are free from many kinds of biases, the iNaturalist 2018 Challenge which aims to enable computers' fine-grained discrimination of visual categories (such as species of plants in an image), a Kaggle "Quick, Draw!" Doodle Recognition Challenge to create a better classifier for the QuickDraw challenge game, and Conceptual Captions, a larger-scale image captioning dataset and challenge aimed at enabling better image captioning model research.
Applications of AI to Other Fields In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.
A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.
Health For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.
On the left is a retinal fundus image graded as having moderate DR ("Mo") by an adjudication panel of ophthalmologists (ground truth). On the top right is an illustration of the predicted scores ("N" = no DR, "Mi" = Mild DR, "Mo" = Moderate DR) from the model. On the bottom right is the set of scores given by physicians without assistance ("Unassisted") and those who saw the model's predictions ("Grades Only").
When applying ML to historically-collected data, it's important to understand the populations that have experienced human and structural biases in the past and how those biases have been codified in the data. Machine-learning offers an opportunity to detect and address bias and to proactively advance health equity, which we are designing our systems to do.
Research Outreach We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world. Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.
Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.
New Places, New Faces In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.
Looking Forward to 2019 This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!
Seeing music. Predicting earthquake aftershocks. Finding emojis in real life. These are just a few examples of how researchers, engineers and user-experience (UX) professionals made imaginative ideas real. They made it happen using tools and techniques developed by Google’s People + AI Research (PAIR) team in 2018.
Here’s what PAIR has accomplished over the past year—and here’s how engineers and UX teams can put our resources to use in 2019 and beyond.
Creating a design library—and learning how to design for AI
In January, we launched a library of user-experience articles and case studies on Google Design. These show how Google makes decisions to balance our users’ needs for familiarity and trust with new functionality and experiences enabled by AI. The case studies go behind the scenes to show how Google teams developed user experiences for applications, like the fun mobile game Emoji Scavenger Hunt.
In these articles, practicing user-experience designers offer clear how-tos. They address challenges in designing for AI, such as balancing how to design for habits like swiping or scrolling in certain directions, and building personalized experiences for individual users. We know we don’t have all the answers, so we also seek advice from outside experts, like Paola Antonelli, Senior Curator of Architecture and Design at New York’s Museum of Modern Art (MoMA), who answered our team’s questions on how to use AI as a design material itself.
Talking about AI across disciplines
A key part of our process is partnering with domain experts in other fields. For example, this year we worked with Harvard’s Brendan Meade and the University of Connecticut’s Phoebe de Vries on a model for predicting and visualizing earthquake aftershocks. This project led to a state-of-the-art model for aftershock prediction--and, intriguingly, our analysis of the AI suggested new, unexpected directions for human researchers to investigate.
In March, we hosted our first UX symposium in Zurich, featuring external researchers and industry professionals. And in May, we held a panel at I/O, “AI for Everyone,” featuring Google engineering leaders with a spectrum of expertise, from cloud computing to climate science, to discuss fair and inclusive AI in these fields.
We’re also dedicated to translating the complicated language behind AI for everyone who uses it, even if they’re not engineers. Since June, our first PAIR writer-in-residence, tech journalist David Weinberger, has been embedded in PAIR’s Cambridge, Mass. lab. He’s explaining key AI concepts, like classification and confidence levels, and timely topics like fairness in machine learning, for non-technical audiences.
New open-source tools for engineers, UXers and beyond
We believe in applying deep insights to invent, and open-source, new technologies that can be used by engineers, UX professionals, and other stakeholders who may not be experts in ML.
Our PAIR team also built the What-If Tool, released this fall, so professionals building ML systems don’t have to write a single line of code to answer “what if” questions such as: “What if I changed data points, how would this affect my model’s predictions? Does it perform differently for various groups–for example, historically marginalized people?" Our tool makes it possible to simply click a button to visualize and inspect alternative scenarios.
Also this year, our team developed and open-sourced a new technique for helping people more easily understand the inner workings of neural networks in terms of simple, human-understandable concepts – like showing how AI can recognize images of zebras by their stripes.
In 2019, we’re excited to expand PAIR’s work further with global audiences of engineers and user-experience designers–and everyday users. For more resources, updates and information on our research, head to PAIR’s website.
Six months ago we announced Google’s AI Principles, which guide the ethical development and use of AI in our research and products. As a complement to the Principles, we also posted our Responsible AI Practices, a set of quarterly-updated technical recommendations and results to share with the wider AI ecosystem. Since then we’ve put in place additional initiatives and processes to ensure we live up to the Principles in practice.
First, we want to encourage teams throughout Google to consider how and whether our AI Principles affect their projects. To that end, we’ve established several efforts:
Trainings based on the “Ethics in Technology Practice” project developed at the Markkula Center for Applied Ethics at Santa Clara University, with additional materials tailored to the AI Principles. The content is designed to help technical and non-technical Googlers address the multifaceted ethical issues that arise in their work. So far, more than 100 Googlers from different countries have tried out the course and in the future we plan to make it accessible for everyone across the company.
AI Ethics Speaker Series with external experts across different countries, regions, and professional disciplines. So far, we’ve had eight sessions with 11 speakers, covering topics from bias in natural language processing (NLP) to the use of AI in criminal justice.
We added a technical module on fairnessto our free Machine Learning Crash Course, which is available in 11 languages and has been used to train more than 21,000 Google employees. The fairness module, which is currently available in English with more languages coming soon, explores how bias can crop up in training data, and ways to identify and mitigate it.
Along with these efforts to engage Googlers, we’ve established a formal review structure to assess new projects, products and deals. Thoughtful decisions require a careful and nuanced consideration of how the AI Principles (which are intentionally high-level to allow flexibility as technology and circumstances evolve) should apply, how to make tradeoffs when principles come into conflict, and how to mitigate risks for a given circumstance. The review structure consists of three core groups:
A responsible innovation team that handles day-to-day operations and initial assessments. This group includes user researchers, social scientists, ethicists, human rights specialists, policy and privacy advisors, and legal experts on both a full- and part-time basis, which allows for diversity and inclusion of perspectives and disciplines.
A group of senior experts from a range of disciplines across Alphabet who provide technological, functional, and application expertise.
A council of senior executives to handle the most complex and difficult issues, including decisions that affect multiple products and technologies.
We’ve conducted more than 100 reviews so far, assessing the scale, severity, and likelihood of best- and worst-case scenarios for each product and deal. Most of these cases, like the integration of guidelines for creating inclusive machine learning in our Cloud AutoML products, have aligned with the Principles. We’ve modified some efforts, like research in visual speech recognition, to clearly outline assistive benefits as well as model limitations that minimize the potential for misuse. And in a small number of product use-cases—like a general-purpose facial recognition API—we’ve decided to hold off on offering functionality before working through important technology and policy questions.
The variety and scope of the cases considered so far are helping us build a framework for scaling this process across Google products and technologies. This framework will include the creation of an external advisory group, comprised of experts from a variety of disciplines, to complement the internal governance and processes outlined above.
We’re committed to promoting thoughtful consideration of these important issues and appreciate the work of the many teams contributing to the review process, as we continue to refine our approach.