Tag Archives: Year in Review

Google Research: Themes from 2021 and Beyond

Posted by Jeff Dean, Senior Fellow and SVP of Google Research, on behalf of the entire Google Research community

Over the last several decades, I've witnessed a lot of change in the fields of machine learning (ML) and computer science. Early approaches, which often fell short, eventually gave rise to modern approaches that have been very successful. Following that long-arc pattern of progress, I think we'll see a number of exciting advances over the next several years, advances that will ultimately benefit the lives of billions of people with greater impact than ever before. In this post, I’ll highlight five areas where ML is poised to have such impact. For each, I’ll discuss related research (mostly from 2021) and the directions and progress we’ll likely see in the next few years.

	· Trend 1: More Capable, General-Purpose ML Models
	· Trend 2: Continued Efficiency Improvements for ML
	· Trend 3: ML Is Becoming More Personally and Communally Beneficial
	· Trend 4: Growing Benefits of ML in Science, Health and Sustainability
	· Trend 5: Deeper and Broader Understanding of ML

Trend 1: More Capable, General-Purpose ML Models
Researchers are training larger, more capable machine learning models than ever before. For example, just in the last couple of years models in the language domain have grown from billions of parameters trained on tens of billions of tokens of data (e.g., the 11B parameter T5 model), to hundreds of billions or trillions of parameters trained on trillions of tokens of data (e.g., dense models such as OpenAI’s 175B parameter GPT-3 model and DeepMind’s 280B parameter Gopher model, and sparse models such as Google’s 600B parameter GShard model and 1.2T parameter GLaM model). These increases in dataset and model size have led to significant increases in accuracy for a wide variety of language tasks, as shown by across-the-board improvements on standard natural language processing (NLP) benchmark tasks (as predicted by work on neural scaling laws for language models and machine translation models).

Many of these advanced models are focused on the single but important modality of written language and have shown state-of-the-art results in language understanding benchmarks and open-ended conversational abilities, even across multiple tasks in a domain. They have also shown exciting capabilities to generalize to new language tasks with relatively little training data, in some cases, with few to no training examples for a new task. A couple of examples include improved long-form question answering, zero-label learning in NLP, and our LaMDA model, which demonstrates a sophisticated ability to carry on open-ended conversations that maintain significant context across multiple turns of dialog.

A dialog with LaMDA mimicking a Weddell seal with the preset grounding prompt, “Hi I’m a weddell seal. Do you have any questions for me?” The model largely holds down a dialog in character.
(Weddell Seal image cropped from Wikimedia CC licensed image.)

Transformer models are also having a major impact in image, video, and speech models, all of which also benefit significantly from scale, as predicted by work on scaling laws for visual transformer models. Transformers for image recognition and for video classification are achieving state-of-the-art results on many benchmarks, and we’ve also demonstrated that co-training models on both image data and video data can improve performance on video tasks compared with video data alone. We’ve developed sparse, axial attention mechanisms for image and video transformers that use computation more efficiently, found better ways of tokenizing images for visual transformer models, and improved our understanding of visual transformer methods by examining how they operate compared with convolutional neural networks. Combining transformer models with convolutional operations has shown significant benefits in visual as well as speech recognition tasks.

The outputs of generative models are also substantially improving. This is most apparent in generative models for images, which have made significant strides over the last few years. For example, recent models have demonstrated the ability to create realistic images given just a category (e.g., "irish setter" or "streetcar", if you desire), can "fill in" a low-resolution image to create a natural-looking high-resolution counterpart ("computer, enhance!"), and can even create natural-looking aerial nature scenes of arbitrary length. As another example, images can be converted to a sequence of discrete tokens that can then be synthesized at high fidelity with an autoregressive generative model.

Example of a cascade diffusion models that generate novel images from a given category and then use those as the seed to create high-resolution examples: the first model generates a low resolution image, and the rest perform upsampling to the final high resolution image.

The SR3 super-resolution diffusion model takes as input a low-resolution image, and builds a corresponding high resolution image from pure noise.

Because these are powerful capabilities that come with great responsibility, we carefully vet potential applications of these sorts of models against our AI Principles.

Beyond advanced single-modality models, we are also starting to see large-scale multi-modal models. These are some of the most advanced models to date because they can accept multiple different input modalities (e.g., language, images, speech, video) and, in some cases, produce different output modalities, for example, generating images from descriptive sentences or paragraphs, or describing the visual content of images in human languages. This is an exciting direction because like the real world, some things are easier to learn in data that is multimodal (e.g., reading about something and seeing a demonstration is more useful than just reading about it). As such, pairing images and text can help with multi-lingual retrieval tasks, and better understanding of how to pair text and image inputs can yield improved results for image captioning tasks. Similarly, jointly training on visual and textual data can also help improve accuracy and robustness on visual classification tasks, while co-training on image, video, and audio tasks improves generalization performance for all modalities. There are also tantalizing hints that natural language can be used as an input for image manipulation, telling robots how to interact with the world and controlling other software systems, portending potential changes to how user interfaces are developed. Modalities handled by these models will include speech, sounds, images, video, and languages, and may even extend to structured data, knowledge graphs, and time series data.

Example of a vision-based robotic manipulation system that is able to generalize to novel tasks. Left: The robot is performing a task described in natural language to the robot as “place grapes in ceramic bowl”, without the model being trained on that specific task. Right: As on the left, but with the novel task description of “place bottle in tray”.

Often these models are trained using self-supervised learning approaches, where the model learns from observations of “raw” data that has not been curated or labeled, e.g., language models used in GPT-3 and GLaM, the self-supervised speech model BigSSL, the visual contrastive learning model SimCLR, and the multimodal contrastive model VATT. Self-supervised learning allows a large speech recognition model to match the previous Voice Search automatic speech recognition (ASR) benchmark accuracy while using only 3% of the annotated training data. These trends are exciting because they can substantially reduce the effort required to enable ML for a particular task, and because they make it easier (though by no means trivial) to train models on more representative data that better reflects different subpopulations, regions, languages, or other important dimensions of representation.

All of these trends are pointing in the direction of training highly capable general-purpose models that can handle multiple modalities of data and solve thousands or millions of tasks. By building in sparsity, so that the only parts of a model that are activated for a given task are those that have been optimized for it, these multimodal models can be made highly efficient. Over the next few years, we are pursuing this vision in a next-generation architecture and umbrella effort called Pathways. We expect to see substantial progress in this area, as we combine together many ideas that to date have been pursued relatively independently.

Pathways: a depiction of a single model we are working towards that can generalize across millions of tasks.

Top

Trend 2: Continued Efficiency Improvements for ML
Improvements in efficiency — arising from advances in computer hardware design as well as ML algorithms and meta-learning research — are driving greater capabilities in ML models. Many aspects of the ML pipeline, from the hardware on which a model is trained and executed to individual components of the ML architecture, can be optimized for efficiency while maintaining or improving on state-of-the-art performance overall. Each of these different threads can improve efficiency by a significant multiplicative factor, and taken together, can reduce computational costs, including CO₂ equivalent emissions (CO2e), by orders of magnitude compared to just a few years ago. This greater efficiency has enabled a number of critical advances that will continue to dramatically improve the efficiency of machine learning, enabling larger, higher quality ML models to be developed cost effectively and further democratizing access. I’m very excited about these directions of research!

Continued Improvements in ML Accelerator Performance

Each generation of ML accelerator improves on previous generations, enabling faster performance per chip, and often increasing the scale of the overall systems. Last year, we announced our TPUv4 systems, the fourth generation of Google’s Tensor Processing Unit, which demonstrated a 2.7x improvement over comparable TPUv3 results in the MLPerf benchmarks. Each TPUv4 chip has ~2x the peak performance per chip versus the TPUv3 chip, and the scale of each TPUv4 pod is 4096 chips (4x that of TPUv3 pods), yielding a performance of approximately 1.1 exaflops per pod (versus ~100 petaflops per TPUv3 pod). Having pods with larger numbers of chips that are connected together with high speed networks improves efficiency for larger models.
ML capabilities on mobile devices are also increasing significantly. The Pixel 6 phone features a brand new Google Tensor processor that integrates a powerful ML accelerator to better support important on-device features.
Left: TPUv4 board; Center: Part of a TPUv4 pod; Right: Google Tensor chip found in Pixel 6 phones.
Our use of ML to accelerate the design of computer chips of all kinds (more on this below) is also paying dividends, particularly to produce better ML accelerators.

Continued Improvements in ML Compilation and Optimization of ML Workloads

Even when the hardware is unchanged, improvements in compilers and other optimizations in system software for machine learning accelerators can lead to significant improvements in efficiency. For example, “A Flexible Approach to Autotuning Multi-pass Machine Learning Compilers” shows how to use machine learning to perform auto-tuning of compilation settings to get across-the-board performance improvements of 5-15% (and sometimes as much as 2.4x improvement) for a suite of ML programs on the same underlying hardware. GSPMD describes an automatic parallelization system based on the XLA compiler that is capable of scaling most deep learning network architectures beyond the memory capacity of an accelerator and has been applied to many large models, such as GShard-M4, LaMDA, BigSSL, ViT, MetNet-2, and GLaM, leading to state-of-the-art results across several domains.
End-to-end model speedups from using ML-based compiler autotuning on 150 ML models. Included are models that achieve improvements of 5% or more. Bar colors represent relative improvement from optimizing different model components.

Human-Creativity–Driven Discovery of More Efficient Model Architectures

Continued improvements in model architectures give substantial reductions in the amount of computation needed to achieve a given level of accuracy for many problems. For example, the Transformer architecture, which we developed in 2017, was able to improve the state of the art on several NLP and translation benchmarks while simultaneously using 10x to 100x less computation to achieve these results than a variety of other prevalent methods, such as LSTMs and other recurrent architectures. Similarly, the Vision Transformer was able to show improved state-of-the-art results on a number of different image classification tasks despite using 4x to 10x less computation than convolutional neural networks.

Machine-Driven Discovery of More Efficient Model Architectures

Neural architecture search (NAS) can automatically discover new ML architectures that are more efficient for a given problem domain. A primary advantage of NAS is that it can greatly reduce the effort needed for algorithm development, because NAS requires only a one-time effort per search space and problem domain combination. In addition, while the initial effort to perform NAS can be computationally expensive, the resulting models can greatly reduce computation in downstream research and production settings, resulting in greatly reduced resource requirements overall. For example, the one-time search to discover the Evolved Transformer generated only 3.2 tons of CO2e (much less than the 284t CO₂e reported elsewhere; see Appendix C and D in this joint Google/UC Berkeley preprint), but yielded a model for use by anyone in the NLP community that is 15-20% more efficient than the plain Transformer model. A more recent use of NAS discovered an even more efficient architecture called Primer (that has also been open-sourced), which reduces training costs by 4x compared to a plain Transformer model. In this way, the discovery costs of NAS searches are often recouped from the use of the more-efficient model architectures that are discovered, even if they are applied to only a handful of downstream uses (and many NAS results are reused thousands of times).
The Primer architecture discovered by NAS is 4x as efficient compared with a plain Transformer model. This image shows (in red) the two main modifications that give Primer most of its gains: depthwise convolution added to attention multi-head projections and squared ReLU activations (blue indicates portions of the original Transformer).
NAS has also been used to discover more efficient models in the vision domain. The EfficientNetV2 model architecture is the result of a neural architecture search that jointly optimizes for model accuracy, model size, and training speed. On the ImageNet benchmark, EfficientNetV2 improves training speed by 5–11x while substantially reducing model size over previous state-of-the-art models. The CoAtNet model architecture was created with an architecture search that uses ideas from the Vision Transformer and convolutional networks to create a hybrid model architecture that trains 4x faster than the Vision Transformer and achieves a new ImageNet state of the art.
EfficientNetV2 achieves much better training efficiency than prior models for ImageNet classification.
The broad use of search to help improve ML model architectures and algorithms, including the use of reinforcement learning and evolutionary techniques, has inspired other researchers to apply this approach to different domains. To aid others in creating their own model searches, we have open-sourced Model Search, a platform that enables others to explore model search for their domains of interest. In addition to model architectures, automated search can also be used to find new, more efficient reinforcement learning algorithms, building on the earlier AutoML-Zero work that demonstrated this approach for automating supervised learning algorithm discovery.

Use of Sparsity

Sparsity, where a model has a very large capacity, but only some parts of the model are activated for a given task, example or token, is another important algorithmic advance that can greatly improve efficiency. In 2017, we introduced the sparsely-gated mixture-of-experts layer, which demonstrated better results on a variety of translation benchmarks while using 10x less computation than previous state-of-the-art dense LSTM models. More recently, Switch Transformers, which pair a mixture-of-experts–style architecture with the Transformer model architecture, demonstrated a 7x speedup in training time and efficiency over the dense T5-Base Transformer model. The GLaM model showed that transformers and mixture-of-expert–style layers can be combined to produce a model that exceeds the accuracy of the GPT-3 model on average across 29 benchmarks using 3x less energy for training and 2x less computation for inference. The notion of sparsity can also be applied to reduce the cost of the attention mechanism in the core Transformer architecture.
The BigBird sparse attention model consists of global tokens that attend to all parts of an input sequence, local tokens, and a set of random tokens. Theoretically, this can be interpreted as adding a few global tokens on a Watts-Strogatz graph.
The use of sparsity in models is clearly an approach with very high potential payoff in terms of computational efficiency, and we are only scratching the surface in terms of research ideas to be tried in this direction.
Each of these approaches for improved efficiency can be combined together so that equivalent-accuracy language models trained today in efficient data centers are ~100 times more energy efficient and produce ~650 times less CO₂e emissions, compared to a baseline Transformer model trained using P100 GPUs in an average U.S. datacenter using an average U.S. energy mix. And this doesn’t even account for Google’s carbon-neutral, 100% renewable energy offsets. We’ll have a more detailed blog post analyzing the carbon emissions trends of NLP models soon.

Top

Trend 3: ML Is Becoming More Personally and Communally Beneficial
A host of new experiences are made possible as innovation in ML and silicon hardware (like the Google Tensor processor on the Pixel 6) enable mobile devices to be more capable of continuously and efficiently sensing their surrounding context and environment. These advances have improved accessibility and ease of use, while also boosting computational power, which is critical for popular features like mobile photography, live translation and more. Remarkably, recent technological advances also provide users with a more customized experience while strengthening privacy safeguards.

More people than ever rely on their phone cameras to record their daily lives and for artistic expression. The clever application of ML to computational photography has continued to advance the capabilities of phone cameras, making them easier to use, improving performance, and resulting in higher-quality images. Advances, such as improved HDR+, the ability to take pictures in very low light, better handling of portraits, and efforts to make cameras more inclusive so they work for all skin tones, yield better photos that are more true to the photographer’s vision and to their subjects. Such photos can be further improved using the powerful ML-based tools now available in Google Photos, like cinematic photos, noise and blur reduction, and the Magic Eraser.

HDR+ starts from a burst of full-resolution raw images, each underexposed by the same amount (left). The merged image has reduced noise and increased dynamic range, leading to a higher quality final result (right).

In addition to using their phones for creative expression, many people rely on them to help communicate with others across languages and modalities in real-time using Live Translate in messaging apps and Live Caption for phone calls. Speech recognition accuracy has continued to make substantial improvements thanks to techniques like self-supervised learning and noisy student training, with marked improvements for accented speech, noisy conditions or environments with overlapping speech, and across many languages. Building on advances in text-to-speech synthesis, people can listen to web pages and articles using our Read Aloud technology on a growing number of platforms, making information more available across barriers of modality and languages. Live speech translations in the Google Translate app have become significantly better by stabilizing the translations that are generated on-the-fly, and high quality, robust and responsible direct speech-to-speech translation provides a much better user experience in communicating with people speaking a different language. New work on combining ML with traditional codec approaches in the Lyra speech codec and the more general SoundStream audio codec enables higher fidelity speech, music, and other sounds to be communicated reliably at much lower bitrate.

Everyday interactions are becoming much more natural with features like automatic call screening and ML agents that will wait on hold for you, thanks to advances in Duplex. Even short tasks that users may perform frequently have been improved with tools such as Smart Text Selection, which automatically selects entities like phone numbers or addresses for easy copy and pasting, and grammar correction as you type on Pixel 6 phones. In addition, Screen Attention prevents the phone screen from dimming when you are looking at it and improvements in gaze recognition are opening up new use cases for accessibility and for improved wellness and health. ML is also enabling new methods for ensuring the safety of people and communities. For example, Suspicious Message Alerts warn against possible phishing attacks and Safer Routing detects hard-braking events to suggest alternate routes.

Recent work demonstrates the ability of gaze recognition as an important biomarker of mental fatigue.

Given the potentially sensitive nature of the data that underlies these new capabilities, it is essential that they are designed to be private by default. Many of them run inside of Android's Private Compute Core — an open source, secure environment isolated from the rest of the operating system. Android ensures that data processed in the Private Compute Core is not shared to any apps without the user taking an action. Android also prevents any feature inside the Private Compute Core from having direct access to the network. Instead, features communicate over a small set of open-source APIs to Private Compute Services, which strips out identifying information and makes use of privacy technologies, including federated learning, federated analytics, and private information retrieval, enabling learning while simultaneously ensuring privacy.

Federated Reconstruction is a novel partially local federated learning technique in which models are partitioned into global and local parameters. For each round of Federated Reconstruction training: (1) The server sends the current global parameters g to each user i; (2) Each user i freezes g and reconstructs their local parameters l_i; (3) Each user i freezes l_i and updates g to produce g_i; (4) Users’ g_i are averaged to produce the global parameters for the next round.

These technologies are critical to evolving next-generation computation and interaction paradigms, whereby personal or communal devices can both learn from and contribute to training a collective model of the world without compromising privacy. A federated unsupervised approach to privately learn the kinds of aforementioned general-purpose models with fine-tuning for a given task or context could unlock increasingly intelligent systems that are far more intuitive to interact with — more like a social entity than a machine. Broad and equitable access to these intelligent interfaces will only be possible with deep changes to our technology stacks, from the edge to the datacenter, so that they properly support neural computing.

Top

Trend 4: Growing Impact of ML in Science, Health and Sustainability
In recent years, we have seen an increasing impact of ML in the basic sciences, from physics to biology, with a number of exciting practical applications in related realms, such as renewable energy and medicine. Computer vision models have been deployed to address problems at both personal and global scales. They can assist physicians in their regular work, expand our understanding of neural physiology, and also provide better weather forecasts and streamline disaster relief efforts. Other types of ML models are proving critical in addressing climate change by discovering ways to reduce emissions and improving the output of alternative energy sources. Such models can even be leveraged as creative tools for artists! As ML becomes more robust, well-developed, and widely accessible, its potential for high-impact applications in a broad array of real-world domains continues to expand, helping to solve some of our most challenging problems.

Large-Scale Application of Computer Vision for New Insights

The advances in computer vision over the past decade have enabled computers to be used for a wide variety of tasks across different scientific domains. In neuroscience, automated reconstruction techniques can recover the neural connective structure of brain tissues from high resolution electron microscopy images of thin slices of brain tissue. In previous years, we have collaborated to create such resources for fruit fly, mouse, and songbird brains, but last year, we collaborated with the Lichtman Lab at Harvard University to analyze the largest sample of brain tissue imaged and reconstructed in this level of detail, in any species, and produced the first large-scale study of synaptic connectivity in the human cortex that spans multiple cell types across all layers of the cortex. The goal of this work is to produce a novel resource to assist neuroscientists in studying the stunning complexity of the human brain. The image below, for example, shows six neurons out of about 86 billion neurons in an adult human brain.
A single human chandelier neuron from our human cortex reconstruction, along with some of the pyramidal neurons that make a connection with that cell. Here’s an interactive version and a gallery of other interactive examples.
Computer vision technology also provides powerful tools to address challenges at much larger, even global, scales. A deep-learning–based approach to weather forecasting that uses satellite and radar imagery as inputs, combined with other atmospheric data, produces weather and precipitation forecasts that are more accurate than traditional physics-based models at forecasting times up to 12 hours. They can also produce updated forecasts much more quickly than traditional methods, which can be critical in times of extreme weather.
Comparison of 0.2 mm/hr precipitation on March 30, 2020 over Denver, Colorado. Left: Ground truth, source MRMS. Center: Probability map as predicted by MetNet-2. Right: Probability map as predicted by the physics-based HREF model. MetNet-2 is able to predict the onset of the storm earlier in the forecast than HREF as well as the storm’s starting location, whereas HREF misses the initiation location, but captures its growth phase well.
Having an accurate record of building footprints is essential for a range of applications, from population estimation and urban planning to humanitarian response and environmental science. In many parts of the world, including much of Africa, this information wasn’t previously available, but new work shows that using computer vision techniques applied to satellite imagery can help identify building boundaries at continental scales. The results of this approach have been released in the Open Buildings dataset, a new open-access data resource that contains the locations and footprints of 516 million buildings with coverage across most of the African continent. We’ve also been able to use this unique dataset in our collaboration with the World Food Programme to provide fast damage assessment after natural disasters through application of ML.
Example of segmenting buildings in satellite imagery. Left: Source image; Center: Semantic segmentation, with each pixel assigned a confidence score that it is a building vs. non-building; Right: Instance segmentation, obtained by thresholding and grouping together connected components.
A common theme across each of these cases is that ML models are able to perform specialized tasks efficiently and accurately based on analysis of available visual data, supporting high impact downstream tasks.

Automated Design Space Exploration

Another approach that has yielded excellent results across many fields is to allow an ML algorithm to explore and evaluate a problem’s design space for possible solutions in an automated way. In one application, a Transformer-based variational autoencoder learns to create aesthetically-pleasing and useful document layouts, and the same approach can be extended to explore possible furniture layouts. Another ML-driven approach automates the exploration of the huge design space of tweaks for computer game rules to improve playability and other attributes of a game, enabling human game designers to create enjoyable games more quickly.
A visualization of the Variational Transformer Network (VTN) model, which is able to extract meaningful relationships between the layout elements (paragraphs, tables, images, etc.) in order to generate realistic synthetic documents (e.g., with better alignment and margins).
Other ML algorithms have been used to evaluate the design space of computer architectural decisions for ML accelerator chips themselves. We’ve also shown that ML can be used to quickly create chip placements for ASIC designs that are better than layouts generated by human experts and can be generated in a matter of hours instead of weeks. This reduces the fixed engineering costs of chips and lowers the barrier to quickly creating specialized hardware for different applications. We’ve successfully used this automated placement approach in the design of our upcoming TPU-v5 chip.
Such exploratory ML approaches have also been applied to materials discovery. In a collaboration between Google Research and Caltech, several ML models, combined with a modified inkjet printer and a custom-built microscope, were able to rapidly search over hundreds of thousands of possible materials to hone in on 51 previously uncharacterized three-metal oxide materials with promising properties for applications in areas like battery technology and electrolysis of water.
These automated design space exploration approaches can help accelerate many scientific fields, especially when the entire experimental loop of generating the experiment and evaluating the result can all be done in an automated or mostly-automated manner. I expect to see this approach applied to good effect in many more areas in the coming years.

Application to Health

In addition to advancing basic science, ML can also drive advances in medicine and human health more broadly. The idea of leveraging advances in computer science in health is nothing new — in fact some of my own early experiences were in developing software to help analyze epidemiological data. But ML opens new doors, raises new opportunities, and yes, poses new challenges.
Take for example the field of genomics. Computing has been important to genomics since its inception, but ML adds new capabilities and disrupts old paradigms. When Google researchers began working in this area, the idea of using deep learning to help infer genetic variants from sequencer output was considered far-fetched by many experts. Today, this ML approach is considered state-of-the-art. But the future holds an even more important role for ML — genomics companies are developing new sequencing instruments that are more accurate and faster, but also present new inference challenges. Our release of open-source software DeepConsensus and, in collaboration with UCSC, PEPPER-DeepVariant, supports these new instruments with cutting-edge informatics. We hope that more rapid sequencing can lead to near term applicability with impact for real patients.
A schematic of the Transformer architecture for DeepConsensus, which corrects sequencing errors to improve yield and correctness.
There are other opportunities to use ML to accelerate our use of genomic information for personalized health outside of processing the sequencer data. Large biobanks of extensively phenotyped and sequenced individuals can revolutionize how we understand and manage genetic predisposition to disease. Our ML-based phenotyping method improves the scalability of converting large imaging and text datasets into phenotypes usable for genetic association studies, and our DeepNull method better leverages large phenotypic data for genetic discovery. We are happy to release both as open-source methods for the scientific community.
The process for generating large-scale quantification of anatomical and disease traits for combination with genomic data in Biobanks.
Just as ML helps us see hidden characteristics of genomics data, it can help us discover new information and glean new insights from other health data types as well. Diagnosis of disease is often about identifying a pattern, quantifying a correlation, or recognizing a new instance of a larger class — all tasks at which ML excels. Google researchers have used ML to tackle a wide range of such problems, but perhaps none of these has progressed farther than the applications of ML to medical imaging.
In fact, Google’s 2016 paper describing the application of deep learning to the screening for diabetic retinopathy, was selected by the editors of the Journal of the American Medical Association (JAMA) as one of the top 10 most influential papers of the decade — not just the most influential papers on ML and health, the most influential JAMA papers of the decade overall. But the strength of our research doesn’t end at contributions to the literature, but extends to our ability to build systems operating in the real world. Through our global network of deployment partners, this same program has helped screen tens of thousands of patients in India, Thailand, Germany and France who might otherwise have been untested for this vision-threatening disease.
We expect to see this same pattern of assistive ML systems deployed to improve breast cancer screening, detect lung cancer, accelerate radiotherapy treatments for cancer, flag abnormal X-rays, and stage prostate cancer biopsies. Each domain presents new opportunities to be helpful. ML-assisted colonoscopy procedures are a particularly interesting example of going beyond the basics. Colonoscopies are not just used to diagnose colon cancer — the removal of polyps during the procedure are the front line of halting disease progression and preventing serious illness. In this domain we’ve demonstrated that ML can help ensure doctors don’t miss polyps, can help detect elusive polyps, and can add new dimensions of quality assurance, like coverage mapping through the application of simultaneous localization and mapping techniques. In collaboration with Shaare Zedek Medical Center in Jerusalem, we’ve shown these systems can work in real time, detecting an average of one polyp per procedure that would have otherwise been missed, with fewer than four false alarms per procedure.
Sample chest X-rays (CXR) of true and false positives, and true and false negatives for (A) general abnormalities, (B) tuberculosis, and (C) COVID-19. On each CXR, red outlines indicate areas on which the model focused to identify abnormalities (i.e., the class activation map), and yellow outlines refer to regions of interest identified by a radiologist.
Another ambitious healthcare initiative, Care Studio, uses state-of-the-art ML and advanced NLP techniques to analyze structured data and medical notes, presenting clinicians with the most relevant information at the right time — ultimately helping them deliver more proactive and accurate care.
As important as ML may be to expanding access and improving accuracy in the clinical setting, we see a new equally important trend emerging: ML applied to help people in their daily health and well-being. Our everyday devices have powerful sensors that can help democratize health metrics and information so people can make more informed decisions about their health. We’ve already seen launches that enable a smartphone camera to assess heart rate and respiratory rate to help users without additional hardware, and Nest Hub devices that support contactless sleep sensing and allow users to better understand their nighttime wellness. We’ve seen that we can, on the one hand, significantly improve speech recognition quality for disordered speech in our own ASR systems, and on the other, use ML to help recreate the voice of those with speech impairments, empowering them to communicate in their own voice. ML enabled smartphones that help people better research emerging skin conditions or help those with limited vision go for a jog, seem to be just around the corner. These opportunities offer a future too bright to ignore.
The custom ML model for contactless sleep sensing efficiently processes a continuous stream of 3D radar tensors (summarizing activity over a range of distances, frequencies, and time) to automatically compute probabilities for the likelihood of user presence and wakefulness (awake or asleep).

ML Applications for the Climate Crisis

Another realm of paramount importance is climate change, which is an incredibly urgent threat for humanity. We need to all work together to bend the curve of harmful emissions to ensure a safe and prosperous future. Better information about the climate impact of different choices can help us tackle this challenge in a number of different ways.
To this end, we recently rolled out eco-friendly routing in Google Maps, which we estimate will save about 1 million tons of CO₂ emissions per year (the equivalent of removing more than 200,000 cars from the road). A recent case study shows that using Google Maps directions in Salt Lake City results in both faster and more emissions-friendly routing, which saves 1.7% of CO₂ emissions and 6.5% travel time. In addition, making our Maps products smarter about electric vehicles can help alleviate range anxiety, encouraging people to switch to emissions-free vehicles. We are also working with multiple municipalities around the world to use aggregated historical traffic data to help suggest improved traffic light timing settings, with an early pilot study in Israel and Brazil showing a 10-20% reduction in fuel consumption and delay time at the examined intersections.
With eco-friendly routing, Google Maps will show you the fastest route and the one that’s most fuel-efficient — so you can choose whichever one works best for you.
On a longer time scale, fusion holds promise as a game-changing renewable energy source. In a long-standing collaboration with TAE Technologies, we have used ML to help maintain stable plasmas in their fusion reactor by suggesting settings of the more than 1000 relevant control parameters. With our collaboration, TAE achieved their major goals for their Norman reactor, which brings us a step closer to the goal of breakeven fusion. The machine maintains a stable plasma at 30 million Kelvin (don’t touch!) for 30 milliseconds, which is the extent of available power to its systems. They have completed a design for an even more powerful machine, which they hope will demonstrate the conditions necessary for breakeven fusion before the end of the decade.
We’re also expanding our efforts to address wildfires and floods, which are becoming more common (like millions of Californians, I’m having to adapt to having a regular “fire season”). Last year, we launched a wildfire boundary map powered by satellite data to help people in the U.S. easily understand the approximate size and location of a fire — right from their device. Building on this, we’re now bringing all of Google’s wildfire information together and launching it globally with a new layer on Google Maps. We have been applying graph optimization algorithms to help optimize fire evacuation routes to help keep people safe in the presence of rapidly advancing fires. In 2021, our Flood Forecasting Initiative expanded its operational warning systems to cover 360 million people, and sent more than 115 million notifications directly to the mobile devices of people at risk from flooding, more than triple our outreach in the previous year. We also deployed our LSTM-based forecast models and the new Manifold inundation model in real-world systems for the first time, and shared a detailed description of all components of our systems.
The wildfire layer in Google Maps provides people with critical, up-to-date information in an emergency.
We’re also working hard on our own set of sustainability initiatives. Google was the first major company to become carbon neutral in 2007. We were also the first major company to match our energy use with 100 percent renewable energy in 2017. We operate the cleanest global cloud in the industry, and we’re the world’s largest corporate purchaser of renewable energy. Further, in 2020 we became the first major company to make a commitment to operate on 24/7 carbon-free energy in all our data centers and campuses worldwide. This is far more challenging than the traditional approach of matching energy usage with renewable energy, but we’re working to get this done by 2030. Carbon emission from ML model training is a concern for the ML community, and we have shown that making good choices about model architecture, datacenter, and ML accelerator type can reduce the carbon footprint of training by ~100-1000x.

Top

Trend 5: Deeper and Broader Understanding of ML
As ML is used more broadly across technology products and society more generally, it is imperative that we continue to develop new techniques to ensure that it is applied fairly and equitably, and that it benefits all people and not just select subsets. This is a major focus for our Responsible AI and Human-Centered Technology research group and an area in which we conduct research on a variety of responsibility-related topics.

One area of focus is recommendation systems that are based on user activity in online products. Because these recommendation systems are often composed of multiple distinct components, understanding their fairness properties often requires insight into individual components as well as how the individual components behave when combined together. Recent work has helped to better understand these relationships, revealing ways to improve the fairness of both individual components and the overall recommendation system. In addition, when learning from implicit user activity, it is also important for recommendation systems to learn in an unbiased manner, since the straightforward approach of learning from items that were shown to previous users exhibits well-known forms of bias. Without correcting for such biases, for example, items that were shown in more prominent positions to users tend to get recommended to future users more often.

As in recommendation systems, surrounding context is important in machine translation. Because most machine translation systems translate individual sentences in isolation, without additional surrounding context, they can often reinforce biases related to gender, age or other areas. In an effort to address some of these issues, we have a long-standing line of research on reducing gender bias in our translation systems, and to help the entire translation community, last year we released a dataset to study gender bias in translation based on translations of Wikipedia biographies.

Another common problem in deploying machine learning models is distributional shift: if the statistical distribution of data on which the model was trained is not the same as that of the data the model is given as input, the model’s behavior can sometimes be unpredictable. In recent work, we employ the Deep Bootstrap framework to compare the real world, where there is finite training data, to an "ideal world", where there is infinite data. Better understanding of how a model behaves in these two regimes (real vs. ideal) can help us develop models that generalize better to new settings and exhibit less bias towards fixed training datasets.

Although work on ML algorithms and model development gets significant attention, data collection and dataset curation often gets less. But this is an important area, because the data on which an ML model is trained can be a potential source of bias and fairness issues in downstream applications. Analyzing such data cascades in ML can help identify the many places in the lifecycle of an ML project that can have substantial influence on the outcomes. This research on data cascades has led to evidence-backed guidelines for data collection and evaluation in the revised PAIR Guidebook, aimed at ML developers and designers.

Arrows of different color indicate various types of data cascades, each of which typically originate upstream, compound over the ML development process, and manifest downstream.

The general goal of better understanding data is an important part of ML research. One thing that can help is finding and investigating anomalous data. We have developed methods to better understand the influence that particular training examples can have on an ML model, since mislabeled data or other similar issues can have outsized impact on the overall model behavior. We have also built the Know Your Data tool to help ML researchers and practitioners better understand properties of their datasets, and last year we created a case study of how to use the Know Your Data tool to explore issues like gender bias and age bias in a dataset.

A screenshot from Know Your Data showing the relationship between words that describe attractiveness and gendered words. For example, “attractive” and “male/man/boy” co-occur 12 times, but we expect ~60 times by chance (the ratio is 0.2x). On the other hand, “attractive” and “female/woman/girl” co-occur 2.62 times more than chance.

Understanding dynamics of benchmark dataset usage is also important, given the central role they play in the organization of ML as a field. Although studies of individual datasets have become increasingly common, the dynamics of dataset usage across the field have remained underexplored. In recent work, we published the first large scale empirical analysis of dynamics of dataset creation, adoption, and reuse. This work offers insights into pathways to enable more rigorous evaluations, as well as more equitable and socially informed research.

Creating public datasets that are more inclusive and less biased is an important way to help improve the field of ML for everyone. In 2016, we released the Open Images dataset, a collection of ~9 million images annotated with image labels spanning thousands of object categories and bounding box annotations for 600 classes. Last year, we introduced the More Inclusive Annotations for People (MIAP) dataset in the Open Images Extended collection. The collection contains more complete bounding box annotations for the person class hierarchy, and each annotation is labeled with fairness-related attributes, including perceived gender presentation and perceived age range. With the increasing focus on reducing unfair bias as part of responsible AI research, we hope these annotations will encourage researchers already leveraging the Open Images dataset to incorporate fairness analysis in their research.

Because we also know that our teams are not the only ones creating datasets that can improve machine learning, we have built Dataset Search to help users discover new and useful datasets, wherever they might be on the Web.

Tackling various forms of abusive behavior online, such as toxic language, hate speech, and misinformation, is a core priority for Google. Being able to detect such forms of abuse reliably, efficiently, and at scale is of critical importance both to ensure that our platforms are safe and also to avoid the risk of reproducing such negative traits through language technologies that learn from online discourse in an unsupervised fashion. Google has pioneered work in this space through the Perspective API tool, but the nuances involved in detecting toxicity at scale remains a complex problem. In recent work, in collaboration with various academic partners, we introduced a comprehensive taxonomy to reason about the changing landscape of online hate and harassment. We also investigated how to detect covert forms of toxicity, such as microaggressions, that are often ignored in online abuse interventions, studied how conventional approaches to deal with disagreements in data annotations of such subjective concepts might marginalize minority perspectives, and proposed a new disaggregated modeling approach that uses a multi-task framework to tackle this issue. Furthermore, through qualitative research and network-level content analysis, Google’s Jigsaw team, in collaboration with researchers at George Washington University, studied how hate clusters spread disinformation across social media platforms.

Another potential concern is that ML language understanding and generation models can sometimes also produce results that are not properly supported by evidence. To confront this problem in question answering, summarization, and dialog, we developed a new framework for measuring whether results can be attributed to specific sources. We released annotation guidelines and demonstrated that they can be reliably used in evaluating candidate models.

Interactive analysis and debugging of models remains key to responsible use of ML. We have updated our Language Interpretability Tool with new capabilities and techniques to advance this line of work, including support for image and tabular data, a variety of features carried over from our previous work on the What-If Tool, and built-in support for fairness analysis through the technique of Testing with Concept Activation Vectors. Interpretability and explainability of ML systems more generally is also a key part of our Responsible AI vision; in collaboration with DeepMind, we made headway in understanding the acquisition of human chess concepts in the self-trained AlphaZero chess system.

Explore what AlphaZero might have learned about playing chess using this online tool.

We are also working hard to broaden the perspective of Responsible AI beyond western contexts. Our recent research examines how various assumptions of conventional algorithmic fairness frameworks based on Western institutions and infrastructures may fail in non-Western contexts and offers a pathway for recontextualizing fairness research in India along several directions. We are actively conducting survey research across several continents to better understand perceptions of and preferences regarding AI. Western framing of algorithmic fairness research tends to focus on only a handful of attributes, thus biases concerning non-Western contexts are largely ignored and empirically under-studied. To address this gap, in collaboration with the University of Michigan, we developed a weakly supervised method to robustly detect lexical biases in broader geo-cultural contexts in NLP models that reflect human judgments of offensive and inoffensive language in those geographic contexts.

Furthermore, we have explored applications of ML to contexts valued in the Global South, including developing a proposal for farmer-centered ML research. Through this work, we hope to encourage the field to be thoughtful about how to bring ML-enabled solutions to smallholder farmers in ways that will improve their lives and their communities.

Involving community stakeholders at all stages of the ML pipeline is key to our efforts to develop and deploy ML responsibly and keep us focused on tackling the problems that matter most. In this vein, we held a Health Equity Research Summit among external faculty, non-profit organization leads, government and NGO representatives, and other subject matter experts to discuss how to bring more equity into the entire ML ecosystem, from the way we approach problem-solving to how we assess the impact of our efforts.

Community-based research methods have also informed our approach to designing for digital wellbeing and addressing racial equity issues in ML systems, including improving our understanding of the experience of Black Americans using ASR systems. We are also listening to the public more broadly to learn how sociotechnical ML systems could help during major life events, such as by supporting family caregiving.

As ML models become more capable and have impact in many domains, the protection of the private information used in ML continues to be an important focus for research. Along these lines, some of our recent work addresses privacy in large models, both highlighting that training data can sometimes be extracted from large models and pointing to how privacy can be achieved in large models, e.g., as in differentially private BERT. In addition to the work on federated learning and analytics, mentioned above, we have also been enhancing our toolbox with other principled and practical ML techniques for ensuring differential privacy, for example private clustering, private personalization, private matrix completion, private weighted sampling, private quantiles, private robust learning of halfspaces, and in general, sample-efficient private PAC learning. Moreover, we have been expanding the set of privacy notions that can be tailored to different applications and threat models, including label privacy and user versus item level privacy.

A visual illustration of the differentially private clustering algorithm.

Top

Datasets
Recognizing the value of open datasets to the general advancement of ML and related fields of research, we continue to grow our collection of open source datasets and resources and expand our global index of open datasets in Google Dataset Search. This year, we have released a number of datasets and tools across a range of research areas:

Datasets & Tools	Description
AIST++	3D keypoints with corresponding images for dance motions covering 10 dance genres
AutoFlow	40k image pairs with ground truth optical flow
C4_200M	A 200 million sentence synthetic dataset for grammatical error correction
CIFAR-5M	Dataset of ~6 million synthetic CIFAR-10–like images (RGB 32 x 32 pix)
Crisscrossed Captions	Set of semantic similarity ratings for the MS-COCO dataset
Disfl-QA	Dataset of contextual disfluencies for information seeking
Distilled Datasets	Distilled datasets from CIFAR-10, CIFAR-100, MNIST, Fashion-MNIST, and SVHN
EvolvingRL	1000 top performing RL algorithms discovered through algorithm evolution
GoEmotions	A human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories
H01 Dataset	1.4 petabyte browsable reconstruction of the human cortex
Know Your Data	Tool for understanding biases in a dataset
Lens Flare	5000 high-quality RGB images of typical lens flare
More Inclusive Annotations for People (MIAP)	Improved bounding box annotations for a subset of the person class in the Open Images dataset
Mostly Basic Python Problems	1000 Python programming problems, incl. task description, code solution & test cases
NIH ChestX-ray14 dataset labels	Expert labels for a subset of the NIH ChestX-ray14 dataset
Open Buildings	Locations and footprints of 516 million buildings with coverage across most of Africa
Optical Polarization from Curie	5GB of optical polarization data from the Curie submarine cable
Readability Scroll	Scroll interactions of ~600 participants reading texts from the OneStopEnglish corpus
RLDS	Tools to store, retrieve & manipulate episodic data for reinforcement learning
Room-Across-Room (RxR)	Multilingual dataset for vision-and-language navigation in English, Hindi and Telugu
Soft Attributes	~6k sets of movie titles annotated with single English soft attributes
TimeDial	Dataset of multiple choice span-filling tasks for temporal commonsense reasoning in dialog
ToTTo	English table-to-text generation dataset with a controlled text generation task
Translated Wikipedia Biographies	Dataset for analysis of common gender errors in NMT for English, Spanish and German
UI Understanding Data for UIBert	Datasets for two UI understanding tasks, AppSim & RefExp
WikiFact	Wikipedia & WikiData–based dataset to train relationship classifiers and fact extraction models
WIT	Wikipedia-based Image Text dataset for multimodal multilingual ML

Research Community Interaction
To realize our goal for a more robust and comprehensive understanding of ML and related technologies, we actively engage with the broader research community. In 2021, we published over 750 papers, nearly 600 of which were presented at leading research conferences. Google Research sponsored over 150 conferences, and Google researchers contributed directly by serving on program committees and organizing workshops, tutorials and numerous other activities aimed at collectively advancing the field. To learn more about our contributions to some of the larger research conferences this year, please see our recent conference blog posts. In addition, we hosted 19 virtual workshops (like the 2021 Quantum Summer Symposium), which allowed us to further engage with the academic community by generating new ideas and directions for the research field and advancing research initiatives.

In 2021, Google Research also directly supported external research with $59M in funding, including $23M through Research programs to faculty and students, and $20M in university partnerships and outreach. This past year, we introduced new funding and collaboration programs that support academics all over the world who are doing high impact research. We funded 86 early career faculty through our Research Scholar Program to support general advancements in science, and funded 34 faculty through our Award for Inclusion Research Program who are doing research in areas like accessibility, algorithmic fairness, higher education and collaboration, and participatory ML. In addition to the research we are funding, we welcomed 85 faculty and post-docs, globally, through our Visiting Researcher program, to come to Google and partner with us on exciting ideas and shared research challenges. We also selected a group of 74 incredibly talented PhD student researchers to receive Google PhD Fellowships and mentorship as they conduct their research.

As part of our ongoing racial equity commitments, making computer science (CS) research more inclusive continues to be a top priority for us. In 2021, we continued expanding our efforts to increase the diversity of Ph.D. graduates in computing. For example, the CS Research Mentorship Program (CSRMP), an initiative by Google Research to support students from historically marginalized groups (HMGs) in computing research pathways, graduated 590 mentees, 83% of whom self-identified as part of an HMG, who were supported by 194 Google mentors — our largest group to date! In October, we welcomed 35 institutions globally leading the way to engage 3,400+ students in computing research as part of the 2021 exploreCSR cohort. Since 2018, this program has provided faculty with funding, community, evaluation and connections to Google researchers in order to introduce students from HMGs to the world of CS research. We are excited to expand this program to more international locations in 2022.

We also continued our efforts to fund and partner with organizations to develop and support new pathways and approaches to broadening participation in computing research at scale. From working with alliances like the Computing Alliance of Hispanic-Serving Institutions (CAHSI) and CMD-IT Diversifying LEAdership in the Professoriate (LEAP) Alliance to partnering with university initiatives like UMBC’s Meyerhoff Scholars, Cornell University’s CSMore, Northeastern University’s Center for Inclusive Computing, and MIT’s MEnTorEd Opportunities in Research (METEOR), we are taking a community-based approach to materially increase the representation of marginalized groups in computing research.

Other Work
In writing these retrospectives, I try to focus on new research work that has happened (mostly) in the past year while also looking ahead. In past years’ retrospectives, I’ve tried to be more comprehensive, but this time I thought it could be more interesting to focus on just a few themes. We’ve also done great work in many other research areas that don’t fit neatly into these themes. If you’re interested, I encourage you to check out our research publications by area below or by year (and if you’re interested in quantum computing, our Quantum team recently wrote a retrospective of their work in 2021):

Algorithms and Theory		Machine Perception
Data Management		Machine Translation
Data Mining		Mobile Systems
Distributed Systems & Parallel Computing		Natural Language Processing
Economics & Electronic Commerce		Networking
Education Innovation		Quantum Computing
General Science		Responsible AI
Health and Bioscience		Robotics
Hardware and Architecture		Security, Privacy and Abuse Prevention
Human-Computer Interaction and Visualization		Software Engineering
Information Retrieval and the Web		Software Systems
Machine Intelligence		Speech Processing

Conclusion
Research is often a multi-year journey to real-world impact. Early stage research work that happened a few years ago is now having a dramatic impact on Google’s products and across the world. Investments in ML hardware accelerators like TPUs and in software frameworks like TensorFlow and JAX have borne fruit. ML models are increasingly prevalent in many different products and features at Google because their power and ease of expression streamline experimentation and productionization of ML models in performance-critical environments. Research into model architectures to create Seq2Seq, Inception, EfficientNet, and Transformer or algorithmic research like batch normalization and distillation is driving progress in the fields of language understanding, vision, speech, and others. Basic capabilities like better language and visual understanding and speech recognition can be transformational, and as a result, these sorts of models are widely deployed for a wide variety of problems in many of our products including Search, Assistant, Ads, Cloud, Gmail, Maps, YouTube, Workspace, Android, Pixel, Nest, and Translate.

These are truly exciting times in machine learning and computer science. Continued improvement in computers’ ability to understand and interact with the world around them through language, vision, and sound opens up entire new frontiers of how computers can help people accomplish things in the world. The many examples of progress along the five themes outlined in this post are waypoints in a long-term journey!

Acknowledgements
Thanks to Alison Carroll, Alison Lentz, Andrew Carroll, Andrew Tomkins, Avinatan Hassidim, Azalia Mirhoseini, Barak Turovsky, Been Kim, Blaise Aguera y Arcas, Brennan Saeta, Brian Rakowski, Charina Chou, Christian Howard, Claire Cui, Corinna Cortes, Courtney Heldreth, David Patterson, Dipanjan Das, Ed Chi, Eli Collins, Emily Denton, Fernando Pereira, Genevieve Park, Greg Corrado, Ian Tenney, Iz Conroy, James Wexler, Jason Freidenfelds, John Platt, Katherine Chou, Kathy Meier-Hellstern, Kyle Vandenberg, Lauren Wilcox, Lizzie Dorfman, Marian Croak, Martin Abadi, Matthew Flegal, Meredith Morris, Natasha Noy, Negar Saei, Neha Arora, Paul Muret, Paul Natsev, Quoc Le, Ravi Kumar, Rina Panigrahy, Sanjiv Kumar, Sella Nevo, Slav Petrov, Sreenivas Gollapudi, Tom Duerig, Tom Small, Vidhya Navalpakkam, Vincent Vanhoucke, Vinodkumar Prabhakaran, Viren Jain, Yonghui Wu, Yossi Matias, and Zoubin Ghahramani for helpful feedback and contributions to this post, and to the entire Research and Health communities at Google for everyone’s contributions towards this work.

Source: Google AI Blog

Google Research: Looking Back at 2020, and Forward to 2021

Posted by Jeff Dean, Senior Fellow and SVP of Google Research and Health, on behalf of the entire Google Research community

When I joined Google over 20 years ago, we were just figuring out how to really start on the journey of making a high quality and comprehensive search service for information on the web, using lots of curiously wired computers. Fast forward to today, and while we’re taking on a much broader array of technical challenges, it’s still with the same overarching goal of organizing the world's information and making it universally accessible and useful. In 2020, as the world has been reshaped by COVID-19, we saw the ways research-developed technologies could help billions of people better communicate, understand the world, and get things done. I’m proud of what we’ve accomplished, and excited about new possibilities on the horizon.

The goal of Google Research is to work on long-term, ambitious problems across a wide range of important topics — from predicting the spread of COVID-19, to designing algorithms, to learning to translate more and more languages automatically, to mitigating bias in ML models. In the spirit of our annual reviews for 2019, 2018, and more narrowly focused reviews of some work in 2017 and 2016, this post covers key Google Research highlights from this unusual year. This is a long post, but grouped into many different sections. Hopefully, there’s something interesting in here for everyone! For a more comprehensive look, please see our >750 research publications in 2020.

COVID-19 and Health
As the impact of COVID-19 took a tremendous toll on people’s lives, researchers and developers around the world rallied together to develop tools and technologies to help public health officials and policymakers understand and respond to the pandemic. Apple and Google partnered in 2020 to develop the Exposure Notifications System (ENS), a Bluetooth-enabled privacy-preserving technology that allows people to be notified if they have been exposed to others who have tested positive for COVID-19. ENS supplements traditional contact tracing efforts and has been deployed by public health authorities in more than 50 countries, states and regions to help curb the spread of infection.

In the early days of the pandemic, public health officials signalled their need for more comprehensive data to combat the virus’ rapid spread. Our Community Mobility Reports, which provide anonymized insights into movement trends, are helping researchers not only understand the impact of policies like stay-at-home directives and social distancing, and also conduct economic forecasting.

Community Mobility Reports: Navigate and download a report for regions of interest.

Our own researchers have also explored using this anonymized data to forecast COVID-19 spread using graph neural networks instead of traditional time series-based models.

Although the research community knew little about this disease and secondary effects initially, we’re learning more every day. Our COVID-19 Search Trends symptoms allows researchers to explore temporal or symptomatic associations, such as anosmia — the loss of smell that is sometimes a symptom of the virus. To further support the broader research community, we launched Google Health Studies app to provide the public ways to participate in research studies.

Our COVID-19 Search Trends are helping researchers study the link between the disease’s spread and symptom-related searches.

Teams across Google are contributing tools and resources to the broader scientific community, which is working to address the health and economic impacts of the virus.

A spatio-temporal graph for modelling COVID-19 Spread.

Accurate information is critical in dealing with public health threats. We collaborated with many product teams at Google in order to improve information quality about COVID-19 in Google News and Search through supporting fact checking efforts, as well as similar efforts in YouTube.

We helped multilingual communities get equal access to critical COVID-19 information by sponsoring localization of Nextstrain.org’s weekly Situation Reports and developing a COVID-19 open source parallel dataset in collaboration with Translators Without Borders.

Modelling a complex global event is particularly challenging and requires more comprehensive epidemiological datasets, the development of novel interpretable models and agent-based simulators to inform the public health response. Machine learning techniques have also helped in other ways from deploying natural language understanding to helping researchers quickly navigate the mountains of COVID-19 scientific literature, applying anonymization technology to protect privacy while making useful datasets available, and exploring whether public health can conduct faster screening with fewer tests via Bayesian group testing.

These are only a sample of the many pieces of work that happened across Google to help users and public health authorities respond to COVID-19. For more, see using technology to help take on COVID-19.

Research in Machine Learning for Medical Diagnostics
We continue to make headway helping clinicians harness the power of ML to deliver better care for more patients. This year we have described notable advances in applying computer vision to aid doctors in the diagnosis and management of cancer, including helping to make sure that doctors don’t miss potentially cancerous polyps during colonoscopies, and showing that an ML system can achieve substantially higher accuracy than pathologists in Gleason grading of prostate tissue, enabling radiologists to achieve significant reductions in both false negative and false positive results when examining X-rays for signs of breast cancer.

To determine the aggressiveness of prostate cancers, pathologists examine a biopsy and assign it a Gleason grade. In published research, our system was able to grade with higher accuracy than a cohort of pathologists who have not had specialist training in prostate cancer. The first stage of the deep learning system assigns a Gleason grade to every region in a biopsy. In this biopsy, green indicates Gleason pattern 3, while yellow indicates Gleason pattern 4.

We’ve also been working on systems to help identify skin disease, help detect age-related macular degeneration (the leading cause of blindness in the U.S. and U.K., and the third-largest cause of blindness worldwide), and on potential novel non-invasive diagnostics (e.g., being able to detect signs of anemia from retinal images).

Our study examines how a deep learning model can quantify hemoglobin levels — a measure doctors use to detect anemia — from retinal images.

This year has also brought exciting demonstrations of how these same technologies can peer into the human genome. Google’s open-source tool, DeepVariant, identifies genomic variants in sequencing data using a convolutional neural network, and this year won the FDA Challenge for best accuracy in 3 out of 4 categories. Using this same tool, a study led by the Dana-Farber Cancer Institute improved diagnostic yield by 14% for genetic variants that lead to prostate cancer and melanoma in a cohort of 2,367 cancer patients.

Research doesn’t end at measurement of experimental accuracy. Ultimately, truly helping patients receive better care requires understanding how ML tools will affect people in the real world. This year we began work with Mayo Clinic to develop a machine learning system to assist in radiotherapy planning and to better understand how this technology could be deployed into clinical practice. With our partners in Thailand, we’ve used diabetic eye disease screening as a test case in how we can build systems with people at the center, and recognize the fundamental role of diversity, equity, and inclusion in building tools for a healthier world.

Weather, Environment and Climate Change
Machine learning can help us better understand the environment and make useful predictions to help people in both their everyday life as well as in disaster situations. For weather and precipitation forecasting, computationally intensive physics-based models like NOAA’s HRRR have long reigned supreme. We have been able to show, though, that ML-based forecasting systems can predict current precipitation with much better spatial resolution (“Is it raining in my local park in Seattle?” and not just “Is it raining in Seattle?”) and can produce short-term forecasts of up to eight hours that are considerably more accurate than HRRR, and can compute the forecast more quickly, yet with higher temporal and spatial resolution.

A visualization of predictions made over the course of roughly one day. Left: The 1-hour HRRR prediction made at the top of each hour, the limit to how often HRRR provides predictions. Center: The ground truth, i.e., what we are trying to predict. Right: The predictions made by our model. Our predictions are every 2 minutes (displayed here every 15 minutes) at roughly 10 times the spatial resolution made by HRRR. Notice that we capture the general motion and general shape of the storm.

We’ve also developed an improved technique called HydroNets, which uses a network of neural networks to model the actual river systems in the world to more accurately understand the interactions of upstream water levels to downstream inundation, resulting in more accurate water-level predictions and flood forecasting. Using these techniques, we've expanded our coverage of flood alerts by 20x in India and Bangladesh, helping to better protect more than 200 million people in 250,000 square kilometers.

An illustration of the HydroNets architecture.

Better analysis of satellite imagery data can also give Google users a better understanding of the impact and extent of wildfires (which caused devastating effects in California and Australia this year). We showed that automated analysis of satellite imagery can help with rapid assessment of damage after natural disasters even with limited prior satellite imagery. It can also aid urban tree-planting efforts by helping cities assess their current tree canopy coverage and where they should focus on planting new trees. We’ve also shown how machine learning techniques that leverage temporal context can help improve ecological and wildlife monitoring.

Based on this work, we’re excited to partner with NOAA on using AI and ML to amplify NOAA’s environmental monitoring, weather forecasting and climate research using Google Cloud’s infrastructure.

Accessibility
Machine learning continues to provide amazing opportunities for improving accessibility, because it can learn to transfer one kind of sensory input into others. As one example, we released Lookout, an Android application that can help visually impaired users by identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. The machine learning system behind Lookout demonstrates that a powerful-but-compact machine learning model can accomplish this in real-time on a phone for nearly 2 million products.

Similarly, people who communicate with sign language find it difficult to use video conferencing systems because even if they are signing, they are not detected as actively speaking by audio-based speaker detection systems. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing presents a real-time sign language detection model and demonstrates how it can be used to provide video conferencing systems with a mechanism to identify the person signing as the active speaker.

We also enabled useful Android accessibility capabilities such as Voice Access and Sound Notifications for important household sounds.

Live Caption was expanded to support calls on the Pixel phone with the ability to caption phone calls and video calls. This came out of the Live Relay research project, which enables deaf and hard of hearing people to make calls without assistance.

Applications of ML to Other Fields
Machine learning continues to prove vital in helping us make progress across many fields of science. In 2020, in collaboration with the FlyEM team at HHMI Janelia Research Campus, we released the drosophila hemibrain connectome, the large synapse-resolution map of brain connectivity, reconstructed using large-scale machine learning models applied to high-resolution electron microscope imaging of brain tissue. This connectome information will aid neuroscientists in a wide variety of inquiries, helping us all better understand how brains function. Be sure to check out the very fly interactive 3-D UI!

The application of ML to problems in systems biology is also on the rise. Our Google Accelerated Science team, in collaboration with our colleagues at Calico, have been applying machine learning to yeast, to get a better understanding of how genes work together as a whole system. We’ve also been exploring how to use model-based reinforcement learning in order to design biological sequences like DNA or proteins that have desirable properties for medical or industrial uses. Model-based RL is used to improve sample efficiency. At each round of experimentation the policy is trained offline using a simulator fit on functional measurements from prior rounds. On various tasks like designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structures, we find that model-based RL is an attractive alternative to existing methods.

In partnership with X-Chem Pharmaceuticals and ZebiAI, we have also been developing ML techniques to do “virtual screening” of promising molecular compounds computationally. Previous work in this area has tended to focus on relatively small sets of related compounds, but in this work, we are trying to use DNA-encoded small molecule libraries in order to be able to generalize to find “hits” across a wide swath of chemical space, reducing the need for slower, physical-based lab work in order to progress from idea to working pharmaceutical.

We’ve also seen success applying machine learning to core computer science and computer systems problems, a growing trend that is spawning entire new conferences like MLSys. In Learning-based Memory Allocation for C++ Server Workloads, a neural network-based language model predicts context-sensitive per-allocation site object lifetime information, and then uses this to organize the heap so as to reduce fragmentation. It is able to reduce fragmentation by up to 78% while only using huge pages (which are better for TLB behavior). End-to-End, Transferable Deep RL for Graph Optimization described an end-to-end transferable deep reinforcement learning method for computational graph optimization that shows 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization, with 15x faster convergence over prior computation graph optimization methods.

Overview of GO: An end-to-end graph policy network that combines graph embedding and sequential attention.

As described in Chip Design with Deep Reinforcement Learning, we have also been applying reinforcement learning to the problem of place-and-route in computer chip design. This is normally a very time-consuming, labor-intensive process, and is a major reason that going from an idea for a chip to actually having a fully designed and fabricated chip takes so long. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. The system is able to generate placements that usually outperform those of human chip design experts, and we have been using this system (running on TPUs) to do placement and layout for major portions of future generations of TPUs. Menger is a recent infrastructure we’ve built for large-scale distributed reinforcement learning that is yielding promising performance for difficult RL tasks such as chip design.

Macro placements of Ariane, an open-source RISC-V processor, as training progresses. On the left, the policy is being trained from scratch, and on the right, a pre-trained policy is being fine-tuned for this chip. Each rectangle represents an individual macro placement. Notice how the cavity that is occupied by non-macro logic cells that is discovered by the from-scratch policy is already present from the outset in the pre-trained policy’s placement.

Responsible AI
The Google AI Principles guide our development of advanced technologies. We continue to invest in responsible AI research and tools, update our recommended technical practices in this area, and share regular updates — including a 2020 blog post and report — on our progress in implementation.

To help better understand the behavior of language models, we developed the Language Interpretability Tool (LIT), a toolkit for better interpretability of language models, enabling interactive exploration and analysis of their decisions. We developed techniques for measuring gendered correlations in pre-trained language models and scalable techniques for reducing gender bias in Google Translate. We used the kernel trick to propose a simple method to estimate the influence of a training data example on an individual prediction. To help non-specialists interpret machine learning results, we extended the TCAV technique introduced in 2019 to now provide a complete and sufficient set of concepts. With the original TCAV work, we were able to say that ‘fur’ and ‘long ears’ are important concepts for ‘rabbit’ prediction. With this work, we can also say that these two concepts are enough to fully explain the prediction; you don’t need any other concepts. Concept bottleneck models are a technique to make models more interpretable by training them so that one of the layers is aligned with pre-defined expert concepts (e.g., “bone spurs present”, or “wing color”, as shown below) before making a final prediction for a task, so that we can not only interpret but also turn on/off these concepts on the fly.

Aligning predictions to pre-identified concepts can make models more interpretable, as described in Concept Bottleneck Models.

In collaboration with many other institutions, we also looked into memorization effects of language models, showing that training data extraction attacks are realistic threats on state-of-the-art large language models. This finding along with a result that embedding models can leak information can have significant privacy implications (especially for models trained on private data). In Thieves of Sesame Street: Model Extraction on BERT-based APIs, we demonstrated that attackers with only API access to a language model could create models whose outputs had very high correlation with the original model, even with relatively few API queries to the original model. Subsequent work demonstrated that attackers can extract smaller models with arbitrary accuracy. On the AI Principle of safety we demonstrated that thirteen published defenses to adversarial examples can be circumvented despite attempting to perform evaluations using adaptive attacks. Our work focuses on laying out the methodology and the approach necessary to perform an adaptive attack, and thus will allow the community to make further progress in building more robust models.

Examining the way in which machine learning systems themselves are examined is also an important area of exploration. In collaboration with the Partnership on AI, we defined a framework for how to audit the use of machine learning in software product settings, drawing on lessons from the aerospace, medical devices, and finance industries and their best practices. In joint work with University of Toronto and MIT, we identified several ethical concerns that can arise when auditing the performance of facial recognition systems. In joint work with the University of Washington, we identified some important considerations related to diversity and inclusion when choosing subsets for evaluating algorithmic fairness. As an initial step in making responsible AI work for the next billion users and to help understand if notions of fairness were consistent in different parts of the world, we analyzed and created a framework for algorithmic fairness in India, accounting for datasets, fairness optimizations, infrastructures, and ecosystems

The Model Cards work that was introduced in collaboration with the University of Toronto in 2019 has been growing in influence. Indeed, many well-known models like OpenAI’s GPT-2 and GPT-3, many of Google’s MediaPipe models and various Google Cloud APIs have all adopted Model Cards as a way of giving users of a machine learning model more information about the model’s development and the observed behavior of the model under different conditions. To make this easier for others to adopt for their own machine learning models, we also introduced the Model Card Toolkit for easier model transparency reporting. In order to increase transparency in ML development practices, we demonstrate the applicability of a range of best practices throughout the dataset development lifecycle, including data requirements specification and data acceptance testing.

In collaboration with the U.S. National Science Foundation (NSF), we announced and helped to fund a National AI Research Institute for Human-AI Interaction and Collaboration. We also released the MinDiff framework, a new regularization technique available in the TF Model Remediation library for effectively and efficiently mitigating unfair biases when training ML models, along with ML-fairness gym for building simple simulations that explore potential long-run impacts of deploying machine learning-based decision systems in social environments.

In addition to developing frameworks for fairness, we developed approaches for identifying and improving the health and quality of experiences with Recommender Systems, including using reinforcement learning to introduce safer trajectories. We also continue to work on improving the reliability of our machine learning systems, where we’ve seen that approaches such as generating adversarial examples can improve robustness and that robustness approaches can improve fairness.

Differential privacy is a way to formally quantify privacy protections and requires a rethinking of the most basic algorithms to operate in a way that they do not leak information about any particular individual. In particular, differential privacy can help in addressing memorization effects and information leakage of the kinds mentioned above. In 2020 there were a number of exciting developments, from more efficient ways of computing private empirical risk minimizers to private clustering methods with tight approximation guarantees and private sketching algorithms. We also open sourced the differential privacy libraries that lie at the core of our internal tools, taking extra care to protect against leakage caused by the floating point representation of real numbers. These are the exact same tools that we use to produce differentially private COVID-19 mobility reports that have been a valuable source of anonymous data for researchers and policymakers.

To help developers assess the privacy properties of their classification models we released an ML privacy testing library in Tensorflow. We hope this library will be the starting point of a robust privacy testing suite that can be used by any machine learning developer around the world.

Membership inference attack on models for CIFAR10. The x-axis is the test accuracy of the model, and y-axis is vulnerability score (lower means more private). Vulnerability grows while test accuracy remains the same — better generalization could prevent privacy leakage.

In addition to pushing the state of the art in developing private algorithms, I am excited about the advances we made in weaving privacy into the fabric of our products. One of the best examples is Chrome’s Privacy Sandbox, which changes the underpinnings of the advertising ecosystem and helps systematically protect individuals’ privacy. As part of the project, we proposed and evaluated a number of different APIs, including federated learning of cohorts (FLoC) for interest based targeting, and aggregate APIs for differentially private measurement.

Launched in 2017, federated learning is now a complete research field unto itself, with over 3000 publications on federated learning appearing in 2020 alone. Our cross-institutional Advances and Open Problems in Federated Learning survey paper published in 2019 has been cited 367 times in the past year, and an updated version will soon be published in the Foundations & Trends in Machine Learning series. In July, we hosted a Workshop on Federated Learning and Analytics, and made all research talks and a TensorFlow Federated tutorial publicly available.

The lifecycle of an FL-trained model and the various actors in a federated learning system.

We continue to push the state of the art in federated learning, including the development of new federated optimization algorithms including adaptive learning algorithms, posterior averaging algorithms, and techniques for mimicking centralized algorithms in federated settings, substantial improvements in complimentary cryptographic protocols, and more. We announced and deployed federated analytics, enabling data science over raw data that is stored locally on users’ devices. New uses of federated learning in Google products include contextual emoji suggestions in Gboard, and pioneering privacy-preserving medical research with Google Health Studies. Furthermore, in Privacy Amplification via Random Check-Ins we presented the first privacy accounting mechanism for Federated Learning.

Security for our users is also an area of considerable interest for us. In 2020, we continued to improve protections for Gmail users, by deploying a new ML-based document scanner that provides protection against malicious documents, which increased malicious office document detection by 10% on a daily basis. Thanks to its ability to generalize, this tool has been very effective at blocking some adversarial malware campaigns that elude other detection mechanisms and increased our detection rate by 150% in some cases.

On the account protection side, we released a fully open-source security key firmware to help advance state of art in the two factor authentication space, staying focused on security keys as the best way to protect accounts against phishing.

Natural Language Understanding
Better understanding of language is an area where we saw considerable progress this year. Much of the work in this space from Google and elsewhere now relies on Transformers, a particular style of neural network model originally developed for language problems (but with a growing body of evidence that they are also useful for images, videos, speech, protein folding, and a wide variety of other domains).

One area of excitement is in dialog systems that can chat with a user about something of interest, often encompassing multiple turns of interaction. While successful work in this area to date has involved creating systems that are specialized around particular topics (e.g., Duplex) these systems cannot carry on general conversations. In pursuit of the general research goal of creating systems capable of much more open-ended dialog, in 2020 we described Meena, a learned conversational agent that aspirationally can chat about anything. Meena achieves high scores on a dialog system metric called SSA, which measures both sensibility and specificity of responses. We’ve seen that as we scale up the model size of Meena, it is able to achieve lower perplexity and, as shown in the paper, lower perplexity correlates extremely closely with improved SSA.

A chat between Meena (left) and a person (right).

One well-known issue with generative language models and dialog systems is that when discussing factual data, the model’s capacity may not be large enough to remember every specific detail about a topic, so they generate language that is plausible but incorrect. (This is not unique to machines — people can commit these errors too.) To address this in dialog systems, we are exploring ways to augment a conversational agent by giving it access to external information sources (e.g., a large corpus of documents or a search engine API), and developing learning techniques to use this as an additional resource in order to generate language that is consistent with the retrieved text. Work in this area includes integrating retrieval into language representation models (and a key underlying technology for this to work well is something like ScaNN, an efficient vector similarity search, to efficiently match the desired information to information in the corpus of text). Once appropriate content is found, it can be better understood with approaches like using neural networks to find answers in tables and extracting structured data from templatic documents. Our work on PEGASUS, a state-of-the-art model for abstractive text summarization can also help to create automatic summaries from any piece of text, a general technique useful in conversations, retrieval systems, and many other places.

Efficiency of NLP models has also been a significant focus for our work in 2020. Techniques like transfer learning and multi-task learning can dramatically help with making general NLP models usable for new tasks with modest amounts of computation. Work in this vein includes transfer learning explorations in T5, sparse activation of models (as in our GShard work mentioned below), and more efficient model pre-training with ELECTRA. Several threads of work also look to improve on the basic Transformer architecture, including Reformer, which uses locality-sensitive hashing and reversible computation to more efficiently support much larger attention windows, Performers, which use an approach for attention that scales linearly rather than quadratically (and discusses its use in the context of protein modeling), and ETC and BigBird, which utilize global and sparse random connections, to enable linear scaling for larger and structured sequences. We also explored techniques for creating very lightweight NLP models that are 100x smaller than a larger BERT model, but perform nearly as well for some tasks, making them very suitable for on-device NLP. In Encode, Tag and Realize, we also explored new approaches for generative text models that use edit operations rather than fully general text generation, which can have advantages in computation requirements for generation, more control over the generated text, and require less training data.

Language Translation
Effective language translation helps bring the world closer together by enabling us to all communicate, despite speaking different languages. To date, over a billion people around the world use Google Translate, and last year we added support for five new languages (Kinyarwanda, Odia, Tatar, Turkmen and Uyghur, collectively spoken by 75 million people). Translation quality continues to improve, showing an average +5 BLEU point gain across more than 100 languages from May 2019 to May 2020, through a wide variety of techniques like improved model architectures and training, better handling of noise in datasets, multilingual transfer and multi-task learning, and better use of monolingual data to improve low-resource languages (those without much written public content on the web), directly in line with our goals of improving ML fairness of machine learning systems to provide benefits to the greatest number of people possible.

We strongly believe that continued scaling of multilingual translation models will bring further quality improvements, especially to the billions of speakers of low-resource languages around the world. In GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Google researchers showed that training sparsely-activated multilingual translation models of up to 600 billion parameters leads to major improvements in translation quality for 100 languages as measured by BLEU score improvement over a baseline of a separate 400M parameter monolingual baseline model for each language. Three trends stood out in this work, illustrated by Figure 6 in the paper, reproduced below (see the paper for complete discussion):

The BLEU score improvements from multilingual training are high for all languages but are even higher for low-resource languages (right hand side of graph is higher than the left) whose speakers represent billions of people in some of the world’s most marginalized communities. Each rectangle on the figure represents languages with 1B speakers.
The larger and deeper the model, the larger the BLEU score improvements were across all languages (the lines hardly ever cross).
Large, sparse models also show a ~10x to 100x improvement in computational efficiency for model training over training a large, dense model, while simultaneously matching or significantly exceeding the BLEU scores of the large, dense model (computational efficiency discussed in paper).

An illustration of the significant gains in translation quality across 100 languages for large, sparsely-activated language models described in GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.

We’re actively working on bringing the benefits demonstrated in this GShard research work to Google Translate, as well as training single models that cover 1000 languages, including languages like Dhivehi and Sudanese Arabic (while sharing some challenges that needed solving along the way).

We also developed techniques to create language-agnostic representations of sentences for BERT models, which can help with developing better translation models. To more effectively evaluate translation quality, we introduced BLEURT, a new metric for evaluating language generation for tasks like translation that considers the semantics of the generated text, rather than just the amount of word overlap with ground-truth data, illustrated in the table below.

Machine Learning Algorithms
We continue to develop new machine learning algorithms and approaches for training that enable systems to learn more quickly and from less supervised data. By replaying intermediate results during training of neural networks, we find that we can fill idle time on ML accelerators and therefore can train neural networks faster. By changing the connectivity of neurons dynamically during training, we can find better solutions compared with statically-connected neural networks. We also developed SimCLR, a new self-supervised and semi-supervised learning technique that simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images. This approach significantly improves on the best self-supervised learning techniques.

ImageNet top-1 accuracy of linear classifiers trained on representations learned with different self-supervised methods (pretrained on ImageNet). Gray cross indicates supervised ResNet-50.

We also extended the idea of contrastive learning to the supervised regime, resulting in a loss function that significantly improves over cross-entropy for supervised classification problems.

Reinforcement Learning
Reinforcement learning (RL), which learns to make good long-term decisions from limited experience, has been an important focus area for us. An important challenge in RL is to learn to make decisions from few data points, and we’ve improved RL algorithm efficiency through learning from fixed datasets, learning from other agents, and improving exploration.

A major focus area this year has been around offline RL, which relies solely on fixed, previously collected datasets (for example, from previous experiments or human demonstrations), extending RL to the applications that can’t collect training data on-the-fly. We’ve introduced a duality approach to RL, developed improved algorithms for off-policy evaluation, estimating confidence intervals, and offline policy optimization. In addition, we’re collaborating with the broader community to tackle these problems by releasing open-source benchmark datasets, and DQN dataset for Atari.

Offline RL on Atari games using the DQN Replay Dataset.

Another line of research improved sample efficiency by learning from other agents through apprenticeship learning. We developed methods to learn from informed agents, matching other agent’s distribution, or learning from adversarial examples. To improve the exploration in RL, we explored bonus-based exploration methods including imitation techniques able to mimic structured exploration arising in agents having prior knowledge about their environment.

We’ve also made significant advances in the mathematical theory of reinforcement learning. One of our main areas of research was studying reinforcement learning as an optimization process. We found connections to the Frank-Wolfe algorithm, momentum methods, KL divergence regularization, operator theory, and convergence analysis; some of these insights led to an algorithm that achieves state-of-the-art performance in challenging RL benchmarks and discovery that polynomial transfer functions avoid convergence problems associated with softmax, both in RL and supervised learning. We’ve made some exciting progress on the topic of safe reinforcement learning, where one seeks to discover optimal control rules while respecting important experimental constraints. This includes a framework for safe policy optimization. We studied efficient RL-based algorithms for solving a class of problems known as mean field games, which model systems with a large number of decision-makers, from mobile networks to electric grids.

We’ve made breakthroughs toward generalization to new tasks and environments, an important challenge for scaling up RL to complex real-world problems. A 2020 focus area was population-based learning-to-learn methods, where another RL or evolutionary agent trained a population of RL agents to create a curriculum of emergent complexity, and discover new state-of-the-art RL algorithms. Learning to estimate the importance of data points in the training set and parts of visual input with selective attention resulted in significantly more skillful RL agents.

Overview of our method and illustration of data processing flow in AttentionAgent. Top: Input transformation — A sliding window segments an input image into smaller patches, and then “flattens” them for future processing. Middle: Patch election — The modified self-attention module holds votes between patches to generate a patch importance vector. Bottom: Action generation — AttentionAgent picks the patches of the highest importance, extracts corresponding features and makes decisions based on them.

Further, we made progress in model-based RL by showing that learning predictive behavior models accelerates RL learning, and enables decentralized cooperative multi-agent tasks in diverse teams, and learning long-term behavior models. Observing that skills bring predictable changes in the environment, we discover skills without supervision. Better representations stabilize RL learning, while hierarchical latent spaces and value-improvement paths yield better performance.

We shared open source tools for scaling up and productionizing RL. To expand the scope and problems tackled by users, we’ve introduced SEED, a massively parallel RL agent, released a library for measuring the RL algorithm reliability, and a new version of TF-Agents that includes distributed RL, TPU support, and a full set of bandit algorithms. In addition, we performed a large empirical study of RL algorithms to improve hyperparameter selection and algorithm design.

Finally, in collaboration with Loon, we trained and deployed RL to more efficiently control stratospheric balloons, improving both power usage and their ability to navigate.

AutoML
Using learning algorithms to develop new machine learning techniques and solutions, or meta-learning, is a very active and exciting area of research. In much of our previous work in this area, we’ve created search spaces that look at how to find ways to combine sophisticated hand-designed components together in interesting ways. In AutoML-Zero: Evolving Code that Learns, we took a different approach, by giving an evolutionary algorithm a search space consisting of very primitive operations (like addition, subtraction, variable assignment, and matrix multiplication) in order to see if it was possible to evolve modern ML algorithms from scratch. The presence of useful learning algorithms in this space is incredibly sparse, so it is remarkable that the system was able to progressively evolve more and more sophisticated ML algorithms. As shown in the figure below, the system reinvents many of the most important ML discoveries over the past 30 years, such as linear models, gradient descent, rectified linear units, effective learning rate settings and weight initializations, and gradient normalization.

We also used meta-learning to discover a variety of new efficient architectures for object detection in both still images and videos. Last year’s work on EfficientNet for efficient image classification architectures showed significant accuracy improvements and computational cost reductions for image classification. In follow-on work this year, EfficientDet: Towards Scalable and Efficient Object Detection builds on top of the EfficientNet work to derive new efficient architectures for object detection and localization, showing remarkable improvements in both highest absolute accuracy, as well as computational cost reductions of 13-42x over previous approaches to achieve a given level of accuracy.

EfficientDet achieves state-of-the-art 52.2 mAP, up 1.5 points from the prior state of the art (not shown since it is at 3045B FLOPs) on COCO test-dev under the same setting. Under the same accuracy constraint, EfficientDet models are 4x-9x smaller and use 13x-42x less computation than previous detectors.

Our work on SpineNet describes a meta-learned architecture that can retain spatial information more effectively, allowing detection to be done at finer resolution. We also focused on learning effective architectures for a variety of video classification problems. AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures, AssembleNet++: Assembling Modality Representations via Attention Connections, and AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification demonstrate how to use evolutionary algorithms to create novel state-of-the-art video processing machine learning architectures.

This approach can also be used to develop effective model architectures for time series forecasting. Using AutoML for Time Series Forecasting describes the system that discovers new forecasting models via an automated search over a search space involving many interesting kinds of low-level building blocks, and its effectiveness was demonstrated in the Kaggle M5 Forecasting Competition, by generating an algorithm and system that placed 138th out of 5558 participants (top 2.5%). While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.

Better Understanding of ML Algorithms and Models
Deeper understanding of machine learning algorithms and models is crucial for designing and training more effective models, as well as understanding when models may fail. Last year, we focused on fundamental questions around representation power, optimization, model generalization, and label noise, among others. As mentioned earlier in this post, Transformer networks have had a huge impact on modeling language, speech and vision problems, but what is the class of functions represented by these models? Recently we showed that transformers are universal approximators for sequence-to-sequence functions. Furthermore, sparse transformers also remain universal approximators even when they use just a linear number of interactions among the tokens. We have been developing new optimization techniques based on layerwise adaptive learning rates to improve the convergence speed of transformers, e.g., Large batch optimization for deep learning (LAMB): Training BERT in 76 minutes.

As neural networks are made wider and deeper, they often train faster and generalize better. This is a core mystery in deep learning since classical learning theory suggests that large networks should overfit more. We are working to understand neural networks in this overparameterized regime. In the limit of infinite width, neural networks take on a surprisingly simple form, and are described by a Neural Network Gaussian Process (NNGP) or Neural Tangent Kernel (NTK). We studied this phenomenon theoretically and experimentally, and released Neural Tangents, an open-source software library written in JAX that allows researchers to build and train infinite-width neural networks.

Left: A schematic showing how deep neural networks induce simple input / output maps as they become infinitely wide. Right: As the width of a neural network increases, we see that the distribution of outputs over different random instantiations of the network becomes Gaussian.

As finite width networks are made larger, they also demonstrate peculiar double descent phenomena — where they generalize better, then worse, then better again with increasing width. We have shown that this phenomenon can be explained by a novel bias-variance decomposition, and further that it can sometimes manifest as triple descent.

Lastly, in real-world problems, one often needs to deal with significant label noise. For instance, in large scale learning scenarios, weakly labeled data is available in abundance with large label noise. We have developed new techniques for distilling effective supervision from severe label noise leading to state-of-the-art results. We have further analyzed the effects of training neural networks with random labels, and shown that it leads to alignment between network parameters and input data, enabling faster downstream training than initializing from scratch. We have also explored questions such as whether label smoothing or gradient clipping can mitigate label noise, leading to new insights for developing robust training techniques with noisy labels.

Algorithmic Foundations and Theory
2020 was a productive year for our work in algorithmic foundations and theory, with several impactful research publications and notable results. On the optimization front, our paper on edge-weighted online bipartite matching develops a new technique for online competitive algorithms and solves a thirty-year old open problem for the edge-weighted variant with applications in efficient online ad allocation. Along with this work in online allocation, we developed dual mirror descent techniques that generalize to a variety of models with additional diversity and fairness constraints, and published a sequence of papers on the topic of online optimization with ML advice in online scheduling, online learning and online linear optimization. Another research result gave the first improvement in 50 years on the classic bipartite matching in dense graphs. Finally, another paper solves a long-standing open problem about chasing convex bodies online — using an algorithm from The Book, no less.

We also continued our work in scalable graph mining and graph-based learning and hosted the Graph Mining & Learning at Scale Workshop at NeurIPS’20, which covered work on scalable graph algorithms including graph clustering, graph embedding, causal inference, and graph neural networks. As part of the workshop, we showed how to solve several fundamental graph problems faster, both in theory and practice, by augmenting standard synchronous computation frameworks like MapReduce with a distributed hash-table similar to a BigTable. Our extensive empirical study validates the practical relevance of the AMPC model inspired by our use of distributed hash tables in massive parallel algorithms for hierarchical clustering and connected components, and our theoretical results show how to solve many of these problems in constant distributed rounds, greatly improving upon our previous results. We also achieved exponential speedup for computing PageRank and random walks. On the graph-based learning side, we presented Grale, our framework for designing graphs for use in machine learning. Furthermore, we presented our work on more scalable graph neural network models, where we show that PageRank can be used to greatly accelerate inference in GNNs.

In market algorithms, an area at the intersection of computer science and economics, we continued our research in designing improved online marketplaces, such as measuring incentive properties of ad auctions, two-sided markets, and optimizing order statistics in ad selection. In the area of repeated auctions, we developed frameworks to make dynamic mechanisms robust against lack of forecasting or estimation errors of the current market and/or the future market, leading to provably tight low-regret dynamic mechanisms. Later, we characterized when it is possible to achieve the asymptotically optimal objective through geometry-based criteria. We also compared the equilibrium outcome of a range of budget management strategies used in practice, showed their impact on the tradeoff between revenue and buyers' utility and shed light on their incentive properties. Additionally, we continued our research in learning optimal auction parameters, and settled the complexity of batch-learning with revenue loss. We designed the optimal regret and studied combinatorial optimization for contextual auction pricing, and developed a new active learning framework for auctions and improved the approximation for posted-price auctions. Finally, motivated by the importance of incentives in ad auctions, and in the hope to help advertisers study the impact of incentives in auctions, we introduce a data-driven metric to quantify how much a mechanism deviates from incentive compatibility.

Machine Perception
Perceiving the world around us — understanding, modeling and acting on visual, auditory and multimodal input — continues to be a research area with tremendous potential to be beneficial in our everyday lives.

In 2020, deep learning powered new approaches that bring 3D computer vision and computer graphics closer together. CvxNet, deep implicit functions for 3D shapes, neural voxel rendering and CoReNet are a few examples of this direction. Furthermore, our research on representing scenes as neural radiance fields (aka NeRF, see also this blog post) is a good example of how Google Research's academic collaborations stimulate rapid progress in the area of neural volume rendering.

In Learning to Factorize and Relight a City, a collaboration with UC Berkeley, we proposed a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. This gives the ability to change lighting effects and scene geometry for any Street View panorama, or even turn it into a full-day timelapse video.

Our work on generative human shape and articulated pose models introduces a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. Such models enable 3D human pose and shape reconstruction of people from a single photo to better understand the scene.

Overview of end-to-end statistical 3D articulated human shape model construction in GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models.

The growing area of media compression using neural networks continued to make strong progress in 2020, not only on learned image compression, but also in deep approaches to video compression, volume compression and nice results in deep distortion-agnostic image watermarking.

Samples of encoded and cover images for Distortion Agnostic Deep Watermarking. First row: Cover image with no embedded message. Second row: Encoded image from HiDDeN combined distortion model. Third row: Encoded images from our model. Fourth row: Normalized difference of the encoded image and cover image for the HiDDeN combined model. Fifth row: Normalized difference for our model

Additional important themes in perceptual research included:

Making better use of data (e.g., self-training with noisy students, learning from simulated data, learning with noisy labels, contrastive learning)
Reasoning across modalities (e.g., exploiting cross-modal supervision, audiovisual speech enhancement, language grounding, Open Images (V6) update featuring localized narratives — multimodal annotations connecting vision and language)
Developing approaches for efficient perception, particularly on edge devices (e.g., fast sparse convnets, structured multi-hashing for model compression)
Improving the ability to represent and reason about objects and scenes (e.g., detecting 3D objects and predicting 3D shapes, 3D scene reconstruction from a single RGB image, leveraging temporal context for object detection, learning to see transparent objects and estimating their pose from stereo)
AI enabling human creativity (e.g., automatic creation of videos from web pages, intelligent video reframing, using GANs to create fantastical creatures, relighting portraits)

Engaging with the broader research community through open sourcing of solutions and datasets is another important aspect of furthering perceptual research. In 2020, we open sourced multiple new perceptual inference capabilities and solutions in MediaPipe, such as on-device face, hand and pose prediction, real-time body pose tracking, real-time iris tracking and depth estimation, and real-time 3D object detection.

We continued to make strides to improve experiences and promote helpfulness on mobile devices through ML-based solutions. Our ability to run sophisticated natural language processing on-device, enabling more natural conversational features, continues to improve. In 2020, we expanded Call Screen and launched Hold for Me to allow users to save time when performing mundane tasks, and we also launched language-based actions and language navigability of our Recorder app to aid productivity.

We have used Google's Duplex technology to make calls to businesses and confirm things like temporary closures. This has enabled us to make 3 million updates to business information globally, that have been seen over 20 billion times on Maps and Search. We also used text to speech technology for easier access to web pages, by enabling Google Assistant to read it aloud, supporting 42 languages.

We also continued to make meaningful improvements to imaging applications. We made it easier to capture precious moments on Pixel with innovative controls and new ways to relight, edit, enhance and relive them again in Google Photos. For the Pixel camera, beginning with Pixel 4 and 4a, we added Live HDR+, which uses machine learning to approximate the vibrance and balanced exposure and appearance of HDR+ burst photography in real time in the viewfinder. We also created dual exposure controls, which allow the brightness of shadows and highlights in a scene to be adjusted independently — live in the viewfinder.

More recently, we introduced Portrait Light, a new post-capture feature for the Pixel Camera and Google Photos apps that adds a simulated directional light source to portraits. This feature is again one that is powered by machine learning, having been trained on 70 different people, photographed one light at a time, in our pretty cool 331-LED Light Stage computational illumination system.

In the past year, Google researchers were excited to contribute to many new (and timely) ways of using Google products. Here are a few examples

Enhancing learning by making it easier to get help with homework or explore concepts in 3D via augmented reality
Improving virtual meetings via in-browser background blur & replace in Google Meet.
Powering new ways to virtually try products on at home.
Helping you get to the most relevant content faster via key moments in video.
Helping you find that tune that was stuck in your head by humming it.
Helped YouTube identify potentially harmful content for human review.
Helping YouTube creators make better videos by automatically enhancing their voices and reducing background noise.

Robotics
In the area of robotics research, we’ve made tremendous progress in our ability to learn more and more complex, safe and robust robot behaviors with less and less data, using many of the RL techniques described earlier in the post.

Transporter Networks are a novel approach to learning how to represent robotic tasks as spatial displacements. Representing relations between objects and the robot end-effectors, as opposed to absolute positions in the environment, makes learning robust transformations of the workspace very efficient.

In Grounding Language in Play, we demonstrated how a robot can be taught to follow natural language instructions (in many languages!). This required a scalable approach to collecting paired data of natural language instructions and robot behaviors. One key insight is that this can be accomplished by asking robot operators to simply play with the robot, and label after-the-fact what instructions would have led to the robot accomplishing the same task.

We also explored doing away with robots altogether (by having humans use a camera-equipped grasping stick) for even more scalable data collection, and how to efficiently transfer visual representations across robotic tasks.

We investigated how to learn very agile strategies for robot locomotion, by taking inspiration from nature, using evolutionary meta-learning strategies, human demonstrations, and various approaches to training data-efficient controllers using deep reinforcement learning.

One increased emphasis this year has been on safety: how do we deploy safe delivery drones in the real world? How do we explore the world in a way that always allows the robot to recover from its mistakes? How do we certify the stability of learned behaviors? This is a critical area of research on which we expect to see increased focus in the future.

Quantum Computing
Our Quantum AI team continued its work to establish practical uses of quantum computing. We ran experimental algorithms on our Sycamore processors to simulate systems relevant to chemistry and physics. These simulations are approaching a scale at which they can not be performed on classical computers anymore, making good on Feynman’s original idea of using quantum computers as an efficient means to simulate systems in which quantum effects are important. We published new quantum algorithms, for instance to perform precise processor calibration, to show an advantage for quantum machine learning or to test quantum enhanced optimization. We also worked on programming models to make it easier to express quantum algorithms. We released qsim, an efficient simulation tool to develop and test quantum algorithms with up to 40 qubits on Google Cloud.

We continued to follow our roadmap towards building a universal error-corrected quantum computer. Our next milestone is the demonstration that quantum error correction can work in practice. To achieve this, we will show that a larger grid of qubits can hold logical information exponentially longer than a smaller grid, even though individual components such as qubits, couplers or I/O devices have imperfections. We are also particularly excited that we now have our own cleanroom which should significantly increase the speed and quality of our processor fabrication.

Supporting the Broader Developer and Researcher Community
This year marked TensorFlow’s 5th birthday, passing 160M downloads. The TensorFlow community continued its impressive growth with new special interest groups, TensorFlow User Groups, TensorFlow Certificates, AI Service partners, and inspiring demos #TFCommunitySpotlight. We significantly improved TF 2.x with seamless TPU support, out of the box performance (and best-in-class performance on MLPerf 0.7), data preprocessing, distribution strategy and a new NumPy API.

We also added many more capabilities to the TensorFlow Ecosystem to help developers and researchers in their workflows: Sounds of India demonstrated going from research to production in under 90 days, using TFX for training and TF.js for deployment in the browser. With Mesh TensorFlow, we pushed the boundaries of model parallelism to provide ultra-high image resolution image analysis. We open-sourced the new TF runtime, TF Profiler for model performance debugging, and tools for Responsible AI, such as the Model Card Toolkit for model transparency and a privacy testing library. With TensorBoard.dev we made it possible to easily host, track, and share your ML experiments for free.

In addition, we redoubled our investment in JAX, an open-source, research-focused ML system that has been actively developed over the past two years. Researchers at Google and beyond are now using JAX in a wide range of fields, including differential privacy, neural rendering, physics-informed networks, fast attention, molecular dynamics, tensor networks, neural tangent kernels, and neural ODEs. JAX accelerates research at DeepMind, powering a growing ecosystem of libraries and work on GANs, meta-gradients, reinforcement learning, and more. We also used JAX and the Flax neural network library to build record-setting MLPerf benchmark submissions, which we demonstrated live at NeurIPS on a large TPU Pod slice with a next-generation Cloud TPU user experience (slides, video, sign-up form). Finally, we’re ensuring that JAX works seamlessly with TF ecosystem tooling, from TF.data for data preprocessing and TensorBoard for experiment visualization to the TF Profiler for performance debugging, with more to come in 2021.

Many recent research breakthroughs have been enabled by increased computing power, and we make more than 500 petaflops of Cloud TPU computing power available for free to researchers around the world via the TFRC program to help broaden access to the machine learning research frontier. More than 120 TFRC-supported papers have been published to date, many of which would not have been possible without the computing resources that the program provides. For example, TFRC researchers have recently developed simulations of wildfire spread, helped analyze COVID-19 content and vaccine sentiment changes on social media networks, and advanced our collective understanding of the lottery ticket hypothesis and neural network pruning. Members of the TFRC community have also published experiments with Persian poetry, won a Kaggle contest on fine-grained fashion image segmentation, and shared tutorials and open-source tools as starting points for others. In 2021, we will change the name of the TFRC program to the TPU Research Cloud program to be more inclusive now that Cloud TPUs support JAX and PyTorch in addition to TensorFlow.

Finally, this was a huge year for Colab. Usage doubled, and we launched productivity features to help people do their work more efficiently, including improved Drive integration and access to the Colab VM via the terminal. And we launched Colab Pro to enable users to access faster GPUs, longer runtimes and more memory.

Open Datasets and Dataset Search
Open datasets with clear and measurable goals are often very helpful in driving forward the field of machine learning. To help the research community find interesting datasets, we continue to index a wide variety of open datasets sourced from many different organizations with Google Dataset Search. We also think it's important to create new datasets for the community to explore and to develop new techniques, while ensuring that we share open data responsibly. This year, in addition to open datasets to help address the COVID crisis, we released a number of open datasets across many different areas:

An Analysis of Online Datasets Using Dataset Search (Published, in Part, as a Dataset): a meta dataset about datasets!
Google Compute Cluster Trace Data: in 2011, Google published a trace of 29 days of compute activity on one of our compute clusters, which has proven very useful for the computer systems community to explore job scheduling policies, better understand utilization in these clusters, etc. This year we published a larger and more extensive version of this data, covering eight of our compute clusters with much finer-grained information.
Announcing the Objectron Dataset: a collection of 15,000 short, object-centric video clips annotated with 3-D bounding boxes, capturing a larger set of common objects from different angles, as well as 4M annotated images collected from a geo-diverse sample (covering 10 countries across five continents).
Open Images V6 — Now Featuring Localized Narratives: in addition to the 9M images annotated with 36M image-level labels, 15.8M bounding boxes, 2.8M instance segmentations, and 391k visual relationships found in version 5, this new release adds localized narratives, a completely new form of multimodal annotations that consist of synchronized voice, text, and mouse traces over the objects being described. In Open Images V6, these localized narratives are available for 500k of its images. Additionally, in order to facilitate comparison to previous works, we also release localized narratives annotations for the full 123k images of the COCO dataset.
A Challenge and Workshop in Efficient Open-Domain Question Answering, created in collaboration with researchers from University of Washington and Princeton, aims to challenge researchers to create systems capable of answering arbitrary questions. A technical report describes the competition and workshop that happened at NeurIPS 2020.
TyDi QA: A Multilingual Question Answering Benchmark discusses a new benchmark for measuring effectiveness of multilingual question answering (since many benchmarks in this area are English-only or otherwise monolingual, and we feel answering questions in any language is important).
Wiki-40B: Multilingual Language Model Dataset is a new multilingual language model benchmark that is composed of 40+ languages spanning several scripts and linguistic families. With around 40 billion characters, we hope this new resource will accelerate the research of multilingual modeling. We also released high quality trained language models trained on this dataset, enabling easy comparison of different techniques with these baselines.
XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization can help assess progress on cross-lingual generalizations in a multi-task setting.
How to Ask Better Questions? A Large-Scale Multi-Domain Dataset for Rewriting Ill-Formed Questions presents a dataset of 427,719 question pairs across 303 domains that can be used to train models to rewrite poorly-formed questions into better formulations.
Open-Sourcing Big Transfer (BiT): Exploring Large-Scale Pre-training for Computer Vision describes an open-sourced pre-trained model that can be used as a starting point for a wide variety of image-related tasks.
The 2020 Image Matching Benchmark and Challenge is a dataset and benchmark challenge for the problem of capturing 3D structure from motion (either from videos, or from many images of a scene from many different angles), created in collaboration with University of Victoria, Czech Technical University, and EPFL.
Meta-Dataset: A Dataset of Datasets for Few-Shot Learning is a dataset of datasets. One of the long-term goals in ML is to build systems that can generalize not from one example to another within the same task, but can generalize even across tasks to solve new problems with little or no training. This meta-dataset can allow us to measure progress towards this ultimate goal.
Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval introduces the Google Landmarks Dataset v2 (GLDv2), a new benchmark for large-scale, fine-grained instance recognition and image retrieval in the domain of human-made and natural landmarks. GLDv2 is the largest such dataset to date by a large margin, including over 5M images and 200k distinct instance labels. Its test set consists of 118k images with ground truth annotations for both the retrieval and recognition tasks.
Enhancing the Research Community’s Access to Street View Panoramas for Language Grounding Tasks describes a new open dataset to allow researchers to compare techniques for language-grounded navigation tasks or other tasks that rely on Street View panoramas, using a well-known set of data so that comparisons between different techniques can be done more easily.

Research Community Interaction
We are proud to enthusiastically support and participate in the broader research community. In 2020, Google researchers presented over 500 papers at leading research conferences, additionally serving on program committees, organizing workshops, tutorials and numerous other activities aimed at collectively progressing the state of the art in the field. To learn more about our contributions to some of the larger research conferences this year, please see our blog posts for ICLR 2020, CVPR 2020, ACL 2020, ICML 2020, ECCV 2020 and NeurIPS 2020.

In 2020 we supported external research with $37M in funding, including $8.5M in COVID research, $8M in research inclusion and equity, and $2M in responsible AI research. In February, we announced the 2019 Google Faculty Research Award Recipients, funding research proposals from 150 faculty members throughout the world. Among this group, 27% self-identified as members of historically underrepresented groups within technology. We also announced a new Research Scholar Program to support early-career professors who are pursuing research in fields relevant to Google via unrestricted gifts. As we have for more than a decade, we selected a group of incredibly talented PhD student researchers to receive Google PhD Fellowships, which provides funding for graduate studies, as well as mentorship as they pursue their research, and opportunities to interact with other Google PhD Fellows.

We are also expanding the ways that we support inclusion and bring new voices into the field of computer science. In 2020, we created a new Award for Inclusion Research program that supports academic research in computing and technology addressing the needs of underrepresented populations. In the inaugural set of awards, we selected 16 proposals for funding with 25 principal investigators, focused on topics around diversity and inclusion, algorithmic bias, education innovation, health tools, accessibility, gender bias, AI for social good, security, and social justice. We additionally partnered with the Computing Alliance of Hispanic-Serving Institutions (CAHSI) and the CMD-IT Diversifying Future Leadership in the Professoriate Alliance (FLIP) to create an award program for doctoral students from traditionally underrepresented backgrounds to support the last year of the completion of the dissertation requirements.

In 2019, Google’s CS Research Mentorship Program (CSRMP) helped provide mentoring to 37 undergraduate students to introduce them to conducting computer science research. Based on the success of the program in 2019/2020, we’re excited to greatly expand this program in 2020/2021 and will have hundreds of Google researchers mentoring hundreds of undergraduate students in order to encourage more people from underrepresented backgrounds to pursue computer science research careers. Finally, in October we provided exploreCSR awards to 50 institutions around the world for the 2020 academic year. These awards fund faculty to host workshops for undergraduates from underrepresented groups in order to encourage them to pursue CS research.

Looking Forward to 2021 and Beyond
I’m excited about what’s to come, from our technical work on next-generation AI models, to the very human work of growing our community of researchers.

We’ll keep ensuring our research is done responsibly and has a positive impact, using our AI Principles as a guiding framework and applying particular scrutiny to topics that can have broad societal impact. This post covers just a few of the many papers on responsible AI that Google published in the past year. While pursuing our research, we’ll focus on:

Promoting research integrity: We’ll make sure Google keeps conducting a wide range of research in an appropriate manner, and provides comprehensive, scientific views on a variety of challenging, interesting topics.
Responsible AI development: Tackling tough topics will remain core to our work, and Google will continue creating new ML algorithms to make machine learning more efficient and accessible, developing approaches to combat unfair bias in language models, devising new techniques for ensuring privacy in learning systems, and much more. And importantly, beyond looking at AI development with a suitably critical eye, we’re eager to see what techniques we and others in the community can develop to mitigate risks and make sure new technologies have equitable, positive impacts on society.
Advancing diversity, equity, and inclusion: We care deeply that the people who are building influential products and computing systems better reflect the people using these products all around the world. Our efforts here are both within Google Research, as well as within the wider research and academic communities — we’ll be calling upon the academic and industry partners we work with to advance these efforts together. On a personal level, I am deeply committed to improving representation in computer science, having spent hundreds of hours working towards these goals over the last few years, as well as supporting universities like Berkeley, CMU, Cornell, Georgia Tech, Howard, UW, and numerous other organizations that work to advance inclusiveness. This is important to me, to Google, and to the broader computer science community.

Finally, looking ahead to the year, I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples. Advances in this area will empower people with dramatically more capable products, bringing better translation, speech recognition, language understanding and creative tools to billions of people all around the world. This kind of exploration and impact is what keeps us excited about our work!

Acknowledgements
Thanks to Martin Abadi, Marc Bellemare, Elie Bursztein, Zhifeng Chen, Ed Chi, Charina Chou, Katherine Chou, Eli Collins, Greg Corrado, Corinna Cortes, Tiffany Deng, Tulsee Doshi, Robin Dua, Kemal El Moujahid, Aleksandra Faust, Orhan Firat, Jen Gennai, Till Hennig, Ben Hutchinson, Alex Ingerman, Tomáš Ižo, Matthew Johnson, Been Kim, Sanjiv Kumar, Yul Kwon, Steve Langdon, James Laudon, Quoc Le, Yossi Matias, Brendan McMahan, Aranyak Mehta, Vahab Mirrokni, Meg Mitchell, Hartmut Neven, Mohammad Norouzi, Timothy Novikoff, Michael Piatek, Florence Poirel, David Salesin, Nithya Sambasivan, Navin Sarma, Tom Small, Jascha Sohl-Dickstein, Zak Stone, Rahul Sukthankar, Mukund Sundararajan, Andreas Terzis, Sergei Vassilvitskii, Vincent Vanhoucke, and Leslie Yeh and others for helpful feedback and for drafting portions of this post, and to the entire Research and Health communities at Google for everyone’s contributions towards this work.

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Posted by Jeff Dean, Senior Fellow and SVP of Google Research and Health, on behalf of the entire Google Research community

The goal of Google Research is to work on long-term, ambitious problems, with an emphasis on solving ones that will dramatically help people throughout their daily lives. In pursuit of that goal in 2019, we made advances in a broad set of fundamental research areas, applied our research to new and emerging areas such as healthcare and robotics, open sourced a wide variety of code and continued collaborations with Google product teams to build tools and services that are dramatically more helpful for our users.

As we start 2020, it’s useful to take a step back and assess the research work we’ve done over the past year, and also to look forward to what sorts of problems we want to tackle in the upcoming years. In that spirit, this blog post is a survey of some of the research-focused work done by Google researchers and engineers during 2019 (in the spirit of similar reviews for 2018, and more narrowly focused reviews of some work in 2017 and 2016). For a more comprehensive look, please see our research publications in 2019.

Ethical Use of AI
In 2018, we published a set of AI Principles that provide a framework by which we evaluate our own research and applications of technologies such as machine learning in our products. In June 2019, we published a one-year update about how these principles are being put into practice in many different aspects of our research and product development life cycles. Since many of the areas touched on by the principles are active areas of research in the broader AI and machine learning research community (such as bias, safety, fairness, accountability, transparency and privacy in machine learning systems), our goals are to apply the best currently-known techniques in these areas to our work, and also to do research to continue to advance the state of the art in these important areas.

For example, this year we:

Published a research paper about a new transparency tool, which enabled the launch of Model Cards for several of our Cloud AI products. You can see an example model card for the Cloud AI Vision API Object Detection feature.
Showed how Activation Atlases can help explore neural network behavior and can aid with interpretability of machine learning models.
Introduced TensorFlow Privacy, an open-source library to enable training machine learning models with differential privacy guarantees.
Released a beta version of Fairness Indicators, to help ML practitioners identify unjust or unintended impacts of machine learning models.
Clicking on a slice in Fairness Indicators will load all the data points in that slice inside the What-If Tool widget. In this case, all data points with the “female” label are shown.
Published a KDD'19 paper on how pairwise comparisons and regularization is incorporated into a large-scale production recommender system to improve ML Fairness.
Published an AIES'19 paper about a case study on the application of fairness in machine learning research to a production classification system, and described our fairness metric, conditional equality, that takes into account distributional differences in implementing equality of opportunity.
Published an AIES'19 paper about counterfactual fairness in text classification problems that asks the question: "How would the prediction change if the sensitive attribute referenced in the example were different?" and used this approach to improve our production systems that assess the toxicity of online content.

Released a new dataset to help with research to identify deepfakes.

A sample of videos from Google’s contribution to the FaceForensics benchmark. To generate these, pairs of actors were selected randomly, and deep neural networks swapped the face of one actor onto the head of another.

AI for Social Good
There is enormous potential for machine learning to help with many important societal issues. We have been doing work in several such areas, as well as working to enable others to apply their creativity and skills to solving such problems. Floods are the most common and the most deadly natural disaster on the planet, affecting approximately 250 million people each year. We have been using machine learning, computation and better sources of data to make significantly more accurate flood forecasts, and then to deliver actionable alerts to the phones of millions of people in the affected regions. We also hosted a workshop that brought together researchers with expertise in flood forecasting, hydrology and machine learning from Google and the broader research community to discuss ways to collaborate further on this important problem.

In addition to our flood forecasting efforts, we’ve been developing techniques to better understand the world’s wildlife, collaborating with seven wildlife conservation organizations to use machine learning to help analyze wildlife camera data and collaborating with the U.S. NOAA to identify whale species and locations from sounds in underwater recordings. We’ve also created and released a set of tools for enabling new kinds of machine-learning-oriented biodiversity research. As part of helping to organize the 6th Fine-Grained Visual Categorization Workshop, Google researchers in our Accra, Ghana office collaborated with researchers at Makerere University AI & Data Science research group to create and run a Kaggle competition on the classification of cassava plant diseases. As cassava is the second largest source of carbohydrates in Africa, plant health is an important food security issue, and it was great to see more than 100 participants across 87 teams participate in the contest.

In 2019 we updated Google Earth Timelapse, enabling people to effectively and intuitively visualize how the planet has changed over the past 35 years. Further, we’ve been collaborating with academic researchers on new privacy-preserving ways to aggregate data on human mobility, to give urban planners better information about how to design efficient environments with lower levels of carbon emissions.

We’ve also applied machine learning to support childhood learning. According to the United Nations, 617 million children do not have basic literacy, a critical determinant of their quality of life. To help more children learn to read, our Bolo app uses speech-recognition technology that tutors students in real-time. And to increase access, the app works completely offline on low-cost phones. In India, Bolo has already helped 800,000 children read stories and speak half a billion words. Early results are encouraging; a three-month pilot among 200 villages in India showed an improvement in reading proficiency among 64% of pilot participants.

For older students, the Socratic app can help high schoolers with complex problems in math, physics and over 1,000 higher education topics. Based on a photo or verbal question, the app automatically identifies the question’s underlying concepts and links to the most helpful online resources. Like the Socratic method, the app doesn’t directly answer questions, but instead leads students to discover the answer themselves. We’re excited about the broad possibilities of improving educational outcomes around the world through things like Bolo and Socratic.

To expand the reach of our AI for Social Good efforts, in May we announced the grantees of our AI Impact Challenge with $25 million in grants from Google.org. The response was huge: we received over 2,600 thoughtful proposals from 119 countries. Twenty impressive organizations stood out for their potential to solve big social and environmental problems and were our initial set of grantees. A few examples of the work of these organizations:

The Fondation Médecins Sans Frontières (MSF) is creating a free smartphone application that uses image recognition tools to help clinical staff in low-resource settings (currently being piloted in Jordan) to analyze anti-microbial images and advise on the appropriate antibiotics to use for a particular patient’s infection.
Over a billion people live in smallholder farm households. A single pest attack can devastate their crop yields and livelihoods. Wadhwani AI uses image classification models that can identify pests and provide timely advice on what pesticides to spray and when—ultimately improving crop yield.
And deep in tropical rainforests, where illegal deforestation is a major driver of climate change, Rainforest Connection uses deep learning for bioacoustic monitoring and old cell phones to track rainforest health and detect threats.
Our 20 AI Impact Challenge winners. You can learn more about the work of all the grantees here.

Applications of AI to Other Fields
The application of computer science and machine learning to other scientific fields is an area that we are especially excited about and have published a number of papers in, often in multi-organization collaborations. Some highlights from this year include:

In An Interactive, Automated 3D Reconstruction of a Fly Brain, we reported on a collaborative effort that achieved a milestone of mapping the structure of an entire fly brain, using machine learning models that were able to painstakingly trace each individual neuron.

In Learning Better Simulation Methods for Partial Differential Equations (PDEs), we showed how machine learning can be used to accelerate PDE computations, which are at the heart of many fundamental computational problems in climate science, fluid dynamics, electromagnetism, heat conduction and general relativity.

Simulations of Burgers’ equation, a model for shock waves in fluids, solved with either a standard finite volume method (left) or our neural network based method (right). The orange squares represent simulations with each method on low resolution grids. These points are fed back into the model at each time step, which then predicts how they should change. Blue lines show the exact simulations used for training. The neural network solution is much better, even on a 4x coarser grid, as indicated by the orange squares smoothly tracing the blue line.

We gave machine learning models better scents of the world with Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules. We showed how to leverage graph neural networks (GNNs) to directly predict the odor descriptors for individual molecules, without using any handcrafted rules.

2D snapshot of our embedding space with some example odors highlighted. Left: Each odor is clustered in its own space. Right: The hierarchical nature of the odor descriptor. Shaded and contoured areas are computed with a kernel-density estimate of the embeddings.

In work that combines chemistry and reinforcement learning techniques, we presented a framework for molecule optimization.
Machine learning can also help us in our artistic and creative endeavors. Artists have found ways to collaborate with AI and AR and create interesting new forms, from dancing with a machine to reimagine choreography, to creating new melodies with machine learning tools. ML can be used by novices, too. To honor the birthday of J.S. Bach, we featured a ML-powered Doodle: just create your melody, and the ML tool can create accompanying harmonizations in Bach’s style.

Assistive Technology
On a more personal scale, ML can help us in our daily lives. It’s easy to take for granted our ability to see a beautiful image, to hear a favorite song, or to speak with a loved one. Yet over one billion people aren’t able to access the world in these ways. ML technology can help by turning these signals—vision, hearing, speech—into other signals that can be well-managed by people with accessibility needs, enabling better access to the world around them. A few examples of our assistive technology:

Lookout helps people who are blind or have low vision identify information about their surroundings. It draws upon similar underlying technology as Google Lens, which lets you search and take action on the objects around you, simply by pointing your phone.
Live Transcribe has the potential to give people who are deaf or hard of hearing greater independence in their everyday interactions. You can get real-time transcriptions of conversations that the user is engaged in, even if the speech is in another language.
Project Euphonia performs personalized speech-to-text transcription. For people with ALS and other conditions that produce slurred or non-standard speech, this research improves automatic speech recognition (ASR) over other state-of-the-art ASR models.
Like Project Euphonia, Parrotron uses end-to-end neural networks to help improve communication, but the research focuses on automatic speech-to-speech conversion rather than transcription, presenting a speech interface that may be easier for some to access.
Millions of images online don’t have any text description. Get Image Descriptions from Google helps blind or low vision users understand unlabelled images. When a screen reader encounters an image or graphic without a description, Chrome can now create one automatically.
We developed tools that can read visual text in audio form in Lens for Google Go, greatly helping users who are not fully literate navigate the word-rich world around them.

Making Your Phone More Intelligent
Much of our work serves to enable intelligent, personal devices by giving mobile phones new capabilities through the use of on-device machine learning. By making powerful models that can run on-device, we can ensure that these phone features are highly responsive and always available even in airplane mode or otherwise off the network. We’ve made progress in getting highly accurate speech recognition models, vision models and handwriting recognition models all running on-device, paving the way for powerful new features. Some of this year’s highlights include:

The launch of on-device captioning with Live Caption, giving always-available transcription of any video playing on your device.
The creation of a powerful new transcribing Recorder app, which can help index audio information and make it easily retrievable.
Improvements to Google Translate’s camera translation, so that you can point at text in an unfamiliar language and get it instantly translated in context.
Release of the Augmented Faces API in ARCore, enabling new real-time AR self-expression tools.
A demonstration of on-device, real-time hand tracking, enabling new ways for users to interact with and control their devices with their hands.
Improved, RNN-based on-device handwriting recognition for on-screen mobile keyboards.
The release of a new global localization approach using your smart phone’s camera to help more accurately orient you and help you find your way in the world.

Federated learning (check out the online comic description!) is a powerful machine learning approach invented by Google researchers in 2015, whereby many clients (such as mobile devices or whole organizations) collaboratively train a model, while keeping the training data decentralized. This enables approaches that have superior privacy properties in large-scale learning systems. We are using federated learning in more and more of our products and features, while also working to advance the state of the art in many research problems in this space. In 2019, Google researchers collaborated with authors from 24 (!) academic institutions to produce a survey article on Federated Learning, highlighting advances over the past few years as well describing a number of open research problems in the field.

The field of computational photography has led to great advances in the image quality of phone cameras over the past few years, and this year was no exception. This year, we made it easier to take great selfies, to take professional-looking shallow depth of field images and portraits and to use the Night Sight feature on Pixel Phones to take some stunning astrophotography pictures. More technical details about this work can be found in papers on multi-frame super resolution and mobile photography in very low-light conditions. All of this work helps enable you to take great pictures to remember life’s magical moments as they happen.

Health
In late 2018, we combined the Google Research health team, Deepmind Health and a team from Google’s Hardware division focused on health-related applications to form Google Health. In 2019 we continued the research we’ve been pursuing in this space, publishing research papers and building tools in collaboration with a variety of healthcare partners. Here are a few of the highlights from 2019:

We showed that a deep learning model for mammography can assist physicians in spotting breast cancer, a condition that affects 1 in 8 women in the US during their lifetimes, with greater accuracy than experts, reducing both false positives and false negatives. The model trained on de-identified data from a UK hospital had similar gains in accuracy when used to evaluate patients in a completely different healthcare system in the U.S.
Example of a difficult-to-detect cancer case correctly identified by machine learning.
We showed that a deep learning model for differential diagnoses of skin diseases can give results that are significantly more accurate than primary care physicians and on par with or perhaps slightly better than dermatologists.
Working alongside experts from the US Department of Veterans Affairs (VA), DeepMind Health colleagues who are now part of Google Health showed that a machine learning model can predict the onset of acute kidney injury (AKI), one of the leading causes of avoidable patient harm, up to two days before it happens. In the future, this could give doctors a 48-hour head start in treating this serious condition.
We expanded the application of deep learning to electronic health records with several partner organizations. You can read more about this work in our 2018 blog post.
We showed a promising step forward for predicting lung cancer, where a deep learning model for examining the results of a single CT scan study performed on par or better than trained radiologists at early detection of lung cancer. Early detection of lung cancer dramatically improves survival rates.
We continued to expand and evaluate our deployment of machine learning tools for detection and prevention of eye disease, in collaboration with Verily and our healthcare partners in India and Thailand.
We published a research paper on an augmented reality microscope for cancer diagnosis, whereby a pathologist can get real-time feedback about what parts of a slide are most interesting while examining tissue through a microscope. You can also read more about it in our 2018 blog post here.
We built a human-centric, similar-image search tool for pathologists to help them make more effective diagnoses, by allowing examination of similar cases.

Quantum Computing
In 2019, our quantum computing team demonstrated for the first time a computational task that can be executed exponentially faster on a quantum processor than on the world’s fastest classical computer — just 200 seconds compared to 10,000 years.

Left: Artist's rendition of the Sycamore processor mounted in the cryostat. (Full Res Version; Forest Stearns, Google AI Quantum Artist in Residence) Right: Photograph of the Sycamore processor. (Full Res Version; Erik Lucero, Research Scientist and Lead Production Quantum Hardware)

Using quantum computers may make important problems in domains like materials science, quantum chemistry (early example) and large-scale optimization tractable, but in order to make this a reality, we’ll have to continue to push the field forward. We are now focusing on implementing quantum error correction so that we will be able to run computations for longer. We are also working on making quantum algorithms easier to express, the hardware easier to control and we have found ways to use classical machine learning techniques like deep reinforcement learning to build more reliable quantum processors. The achievements this year are encouraging and are early steps along the way to making practical quantum computing a reality for a wider variety of problems.

You can also read Sundar’s thoughts on what our quantum computing milestone means.

General Algorithms and Theory
In the general areas of algorithms and theory, we continued our research from algorithmic foundations to applications, and also did work in graph mining and market algorithms. A blog post summarizing some of our work in graph learning algorithms gives more details about that work.

We published a paper at VLDB’19 titled "Cache-aware load balancing of data center applications," although an alternative title could be "Increase the serving capacity of your data center by 40% with this one cool trick!". The paper describes how we used balanced partitioning of graphs to specialize the caches in our web search backend serving system, thereby increasing the query throughput of our flash drives by 48%, and helping to enable a 40% increase in the throughput of the entire search backend.

Heatmap of flash IO requests (resulting from cache misses) across web search serving leaves. The three humps represent random leaf selection, load balancing, and cache-aware load balancing (left to right). Lines indicate the 50th, 90th, 95th and 99.9th percentiles. From VLDB’19 paper, "Cache-aware load balancing of data center applications."

In an ICLR’2019 paper titled "A new dog learns old tricks: RL finds classic optimization algorithms," we discovered a new connection between algorithms and machine learning, showing how Reinforcement Learning can effectively find optimal (worst-case, uniform) algorithms for several classic online optimization combinatorial problems such as online matching and allocation.

Our work in scalable algorithms spans both parallel, online and distributed algorithms for big data sets. In a recent FOCS’19 paper, we provided a near-optimal massively parallel computation algorithm for connected components. Another set of our papers improved parallel algorithms for matching (in theory and practice) and for density clustering. And a third line of work concerned adaptively optimizing submodular functions in the black-box model, which has several applications in feature selection and vocabulary compression. In a SODA’19 paper, we presented a submodular maximization algorithm that is nearly optimal in three aspects: approximation factor, round complexity, and query complexity. Also, in another FOCS 2019 paper, we provide the first online multiplicative approximation algorithm for PCA and Column Subset selection.

In other work, we introduce the semi-online model of computation that postulates that the unknown future has a predictable part and an adversarial part. For classical combinatorial problems such as bipartite matching (ITCS’19) and caching (SODA’20), we obtained semi-online algorithms to provide guarantees that smoothly interpolate between the best possible online and offline algorithms.

Our recent research in the area of market algorithms includes new understanding of the interaction between learning and markets, and innovations in experimental design. For example, this NeurIPS’19 oral paper reveals the surprising competitive advantage that a strategic agent has when competing with a learning agent in a general repeated 2-player game. Recent focus on advertising automation has produced increased interest in automated bidding and understanding response behavior of advertisers. In a pair of WINE 2019 papers, we study optimal strategy to maximize conversions on behalf of advertisers and further learn advertiser response behavior for any changes in the auction. Finally, we studied experimental design in the presence of interference where the treatment of one group may affect the outcomes of others. In a KDD'19 paper and a NeurIPS'19 paper, we show how to define units or clusters of units to limit interference while maintaining experimental power.

The clustering algorithm from the KDD’19 paper “Randomized Experimental Design via Geographic Clustering“ applied to user queries from the United States. The algorithm automatically identifies metropolitan areas, correctly predicting, for example, that the Bay Area includes San Francisco, Berkeley, and Palo Alto, but not Sacramento.

Machine Learning Algorithms
In 2019, we conducted research in many different areas of machine learning algorithms and approaches. One major focus was in understanding the properties of training dynamics in neural networks. In the blog post Measuring the Limits of Data Parallel Training for Neural Networks highlighting this paper, Google researchers presented a careful set of experimental results showing when scaling the amount of data parallelism (by making larger batches) is effective for allowing the model to converge faster (using data parallelism).

For all workloads we tested, we observed a universal relationship between batch size and training speed with three distinct regimes: perfect scaling with small batch sizes (following the dashed line), eventually seeing diminishing returns as batch sizes grow (diverging from the dashed line), and maximal data parallelism at the largest batch sizes (where the trend plateaus). The transition points between the regimes vary dramatically between different workloads.

Model parallelism, in contrast to data parallelism, where a model is spread out across multiple computational devices, can be an effective way of scaling models. GPipe is a library that enables model parallelism to be more effective, in an approach similar to that used by pipelined CPU processors: when one part of the whole model is working on some of the data, other parts can be working on their part of the computation on different data. The results of this pipeline approach can be combined together to simulate a larger effective batch size.

Machine learning models are effective when they’re able to take raw input data and learn “disentangled” higher-level representations that separate different kinds of examples by properties that we want the model to be able to distinguish (cat vs. truck vs. wildebeest, cancerous tissue vs. normal tissue, etc.). Much of the focus on advancing machine learning algorithms is to encourage the learning of better representations that generalize better to new examples, problems or domains. This year, we looked at this problem in a number of different contexts:

In Evaluating the Unsupervised Learning of Disentangled Representations, we examined what properties affect the representations that are learned from unsupervised data, in order to better understand what makes for good representations and effective learning.
In Predicting the Generalization Gap in Deep Neural Networks, we showed that it is possible to predict the generalization gap (the gap between a model’s performance on data from the training distribution versus data drawn from a different distribution) using statistics of the margin distribution, helping us better understand which models generalize most effectively. We also did some research on Improving Out-of-Distribution Detection in Machine Learning Models, to better understand when a model is starting to encounter kinds of data it has never seen before. We also looked at Off-Policy Classification in the context of reinforcement learning, to better understand which models are likely to generalize the best.

In Learning to Generalize from Sparse and Underspecified Rewards, we also examined ways of specifying reward functions for reinforcement learning that enable learning systems to more directly learn from true objectives and be less distracted with longer, less-desirable sequences of actions that happen to achieve desired goals by accident.

In this instruction-following task, the action trajectories a₁, a₂ and a₃ reach the goal, but the sequences a₂ and a₃ do not follow the instructions. This illustrates the issue of underspecified rewards.

AutoML
We continued our work on AutoML this year, an approach whereby algorithms that learn how to learn can automate many aspects of machine learning and often can achieve substantially better results than the best human machine learning experts for certain kinds of machine learning meta-decisions. In particular:

In EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling, we showed how to use neural architecture search techniques to achieve substantially better results on computer vision problems, including a new state-of-the-art result of 84.4% top-1 accuracy on ImageNet while having 8X fewer parameters than the previous best model.

Model Size vs. Accuracy Comparison. EfficientNet-B0 is the baseline network developed by AutoML MNAS, while Efficient-B1 to B7 are obtained by scaling up the baseline network. In particular, our EfficientNet-B7 achieves new state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy, while being 8.4x smaller than the best existing CNN.

In EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML, we showed how a neural architecture search approach can find efficient models that are tailored to particular hardware accelerators, resulting in high accuracy, low-computational models for running on mobile devices.

In Video Architecture Search, we describe how we extended our AutoML work to the domain of video models, finding architectures that achieve state-of-the-art results, and also lightweight architectures that match the performance of hand-crafted models while using 50x less computation.

TinyVideoNet (TVN) architectures evolved to maximize the recognition performance while keeping its computation time within the desired limit. For instance, TVN-1 (top) runs at 37 ms on a CPU and 10ms on a GPU. TVN-2 (bottom) runs at 65ms on a CPU and 13ms on a GPU.

We developed AutoML techniques for tabular data, unlocking an important domain where many companies and organizations have interesting data in relational databases, and often want to develop machine learning models on this data. We collaborated to release this technology as a new Google Cloud AutoML Tables product, and also discussed how well this system did in a new Kaggle competition in An End-to-End AutoML Solution for Tabular Data at KaggleDays (spoiler: AutoML Tables finished second out of 74 teams of expert data scientists).
In Exploring Weight Agnostic Neural Networks, we showed how it is possible to find interesting neural network architectures without any training steps to update the weights of the evaluated models. This can make architecture search much more computationally efficient.

A weight-agnostic neural network performing a Cartpole Swing-up task at various different weight parameters, and also using fine-tuned weight parameters.

Applying AutoML to Transformer Architectures explored finding architectures for natural language processing tasks that significantly outperform vanilla Transformer models at substantially reduced computational costs.

Comparison between the Evolved Transformer and the original Transformer on WMT’14 En-De at varying sizes. The biggest gains in performance occur at smaller sizes, while ET also shows strength at larger sizes, outperforming the largest Transformer with 37.6% less parameters (models to compare are circled in green). See Table 3 in our paper for the exact numbers.

In SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition, we showed that the approach of automatically learning data augmentation methods can be extended to speech recognition models, with the learned augmentation approaches achieving significantly higher accuracy with less data than existing human ML-expert driven data augmentation approaches.
We launched our first speech application for keyword spotting and spoken language identification using AutoML. In our experiments we found better models (both more efficient and better performance) than the human designed models that have been in this setting for some time.

Natural Language Understanding
The past few years have seen remarkable advances in models for natural language understanding, translation, natural dialog, speech recognition and related tasks. This year, one theme in our work was advancing the state of the art by combining modalities or tasks, to train more powerful and capable models. A few examples:

In Exploring Massively Multilingual, Massive Neural Machine Translation, we showed significant gains in translation quality by training a single model to translate between 100 languages, rather than having 100 separate models.

Left: Language pairs with larger amounts of training data generally have higher translation quality. Right: Multilingual training, where we train a single model for all language pairs rather than separate models for each language pair, results in substantial improvements in BLEU score (a measure of translation quality) for language pairs without much data.

In Large-Scale Multilingual Speech Recognition with a Streaming End-to-End Model, we showed how combining speech recognition and language models together and training the system on many languages, can significantly improve speech recognition accuracy.

Left: A traditional monolingual speech recognizer comprised of Acoustic, Pronunciation and Language Models for each language. Middle: A traditional multilingual speech recognizer where the Acoustic and Pronunciation model is multilingual, while the Language model is language-specific. Right: An E2E multilingual speech recognizer where the Acoustic, Pronunciation and Language Model is combined into a single multilingual model.

In Translatotron: An End-to-End Speech-to-Speech Translation Model, we showed that it is possible to train a joint model to accomplish the (normally separate) tasks of speech recognition, translation and text-to-speech generation with nice benefits, like preserving the sound of the speaker’s voice in the generated translated audio, as well as a simpler overall learning system.
In Multilingual Universal Sentence Encoder for Semantic Retrieval, we showed how to combine many different objectives to yield models that are significantly better at semantic retrieval (versus simpler word matching techniques). For example, in Google Talk to Books, the query “What fragrance brings back memories?” yields the result, “And for me, the smell of jasmine along with the pan bagnat, it brings back my entire carefree childhood.”

In Robust Neural Machine Translation, we showed how to use an adversarial training procedure to significantly improve the quality and robustness of language translations.

Left: The Transformer model is applied to an input sentence (lower left) and, in conjunction with the target output sentence (above right) and target input sentence (middle right; beginning with the placeholder “<sos>”), the translation loss is calculated. The AdvGen function then takes the source sentence, word selection distribution, word candidates and the translation loss as inputs to construct an adversarial source example. Right: In the defense stage, the adversarial source example serves as input to the Transformer model and the translation loss is calculated. AdvGen then uses the same method as above to generate an adversarial target example from the target input.

As our language understanding capabilities have improved, based on fundamental research advances like seq2seq, Transformer, BERT, Transformer-XL and ALBERT models, we have seen increased use of these sorts of models in many of our core products and features like Google Translate, Gmail’s Smart Compose, and Google Search. This year, the launch of BERT in our core search and ranking algorithms led to the biggest improvement in search quality in the last five years (and one of the biggest ever), through better understanding of the subtle meanings of query and document words and phrases.

Machine Perception
Models for better understanding of still images have made remarkable progress in the last decade. Among the next major frontiers are models and approaches for understanding the dynamic world in fine-grained detail. This includes deeper and more nuanced understanding of images and video, as well as live and situated perception: understanding the audiovisual world at interactive rates and with a shared spatial grounding with the user. This year, we explored many aspects of advances in this area, including:

Finer-grained visual understanding in Lens, enabling even more powerful visual search.
Helpful smart camera features such as Quick Gestures, Face Match and smart video call framing on the Nest Hub Max.
Technology for live and spatially-aware perception for helpfully augmenting the world around us through Lens.
Better models for depth prediction from videos.

Better representations for fine-grained temporal understanding of videos using temporal cycle-consistency learning.

Right: Input videos of people performing a squat exercise. The video on the top left is the reference. The other videos show nearest neighbor frames (in the TCC embedding space) from other videos of people doing squats. Left: The corresponding frame embeddings move as the action is performed.

Learning representations across text, speech and video that are temporally consistent from unlabeled videos.

Qualitative results from VideoBERT, pretrained on cooking videos. Top: Given some recipe text, we generate a sequence of visual tokens. Bottom: Given a visual token, we show the top three future tokens forecast by VideoBERT at different time scales. In this case, the model predicts that a bowl of flour and cocoa powder may be baked in an oven and may become a brownie or cupcake. We visualize the visual tokens using the images from the training set closest to the tokens in feature space.

Being able to predict future visual inputs from observations of the past.
Models that can better understand action sequences in videos, enabling you to better recall special video moments like “blowing out candles” or “sliding down a slide” in Google Photos.
Architecture for temporal action localization.

We’re quite excited about the prospects of continued improvements in the understanding of the sensory world around us.

Robotics
The application of machine learning to robotic control is a significant research area for us. We believe this is a vital tool for enabling robots to operate effectively in complex, real-world environments like everyday homes and businesses. Some of the work we did this year includes:

In Long-Range Robotic Navigation via Automated Reinforcement Learning, we showed how to combine reinforcement learning with long-range planning to enable robots to more effectively navigate complex environments (like our Google office buildings).
In PlaNet: A Deep Planning Network for Reinforcement Learning, we showed how to effectively learn a world model purely from the pixels of images, and how to leverage this model of how the world behaves in order to accomplish tasks with many fewer learning episodes.
In Unifying Physics and Deep Learning with TossingBot, we showed how robots can learn “intuitive” physics from experimentation in an environment, rather than being pre-programmed with physics models about the environment in which they are operating.
In Soft Actor-Critic: Deep Reinforcement Learning for Robotics, we showed that training a reinforcement learning algorithm to both maximize the expected reward (which is the standard RL objective) and to maximize the policy's entropy (so that learning favors policies that are more random), can help robots learn faster and be more robust to changes in their environment.
In Learning to Assemble and to Generalize from Self-Supervised Disassembly, we showed how robots can learn to assemble by first learning to disassemble things in a self-supervised way. Kids learn from taking things apart, and it appears that robots can as well!
We introduced ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, an open-source platform of cost-effective robots and curated benchmarks designed to facilitate research and development on physical robotics hardware in the real world.

Helping Advance the Broader Developer and Researcher Community
Open source is about more than code: it's about the community of contributors. It’s been an exciting year to be part of the open source community. We launched TensorFlow 2.0—the biggest TensorFlow release to date—which makes building ML systems and applications easier than ever. We added support for fast mobile GPU inference to TensorFlow Lite. We also launched Teachable Machine 2.0, a fast, easy web-based tool which can train a machine learning model with the click of a button, no coding required. We announced MLIR, open source machine learning compiler infrastructure that addresses the complexity of growing software and hardware fragmentation and makes it easier to build AI applications.

We saw the first year of JAX, a new system for high-performance machine learning research. At NeurIPS 2019, Googlers and the broader open-source community presented work using JAX ranging from neural tangent kernels to Bayesian inference to molecular dynamics, and we launched a preview of JAX on Cloud TPUs.

We open-sourced MediaPipe, a framework for building perceptual and multimodal applied ML pipelines, and XNNPACK, a library of efficient floating-point neural network inference operators. As of the end of 2019, we had enabled more than 1,500 researchers around the world to access Cloud TPUs for free via the TensorFlow Research Cloud. Our Intro To TensorFlow at Coursera crossed 100,000 students. And we engaged with thousands of users while taking TensorFlow on the road to 11 different countries, hosted our first ever TensorFlow World and more.

With the help of TensorFlow, one college student discovered two new planets and built a method to help others find more. A data scientist originally from Nigeria trained a GAN to generate images reminiscent of African masks. A developer in Uganda used TensorFlow to create the Farmers Companion, an app that local farmers can use to fight a crop-destroying caterpillar. In snowy Iowa, researchers and state officials used TensorFlow to determine safe road conditions based on traffic behavior, visuals and other data. In sunny California, college students used TensorFlow to identify pot holes and dangerous road cracks in Los Angeles. And in France, a coder used TensorFlow to build a simple algorithm that learns how to add color to black-and-white photos.

Open Datasets
Open datasets with clear and measurable goals are often very helpful in driving forward the field of machine learning. To help the research community find interesting datasets, we continue to index a wide variety of open datasets sourced from many different organizations with Google Dataset Search. We also think it's important to create new datasets for the community to explore and to develop new techniques, and to ensure we share open data responsibly. This year, we additionally released a number of open datasets across many different areas:

Open Images V5: An update to the popular Open Images dataset that includes segmentation masks for 2.8 million objects in 350 categories (so that it now has ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships).
Natural questions: the first dataset to use naturally occurring queries and find answers by reading an entire page, rather than extracting answers from a short paragraph.
Data for deepfake detection: we contributed a large dataset of visual deepfakes to the FaceForensics benchmark (mentioned above).
Google Research Football: a novel reinforcement learning environment where agents aim to master the world’s most popular sport—football (or, if you’re American, soccer). It’s important for reinforcement learning agents to have GOOOAAALLLSS!
Google-Landmarks-v2: over 5 million images (2x that of the first release) of more than 200 thousand different landmarks.
YouTube-8M Segments: A large-scale classification and temporal localization dataset that includes human-verified labels at the 5-second segment level of YouTube-8M videos.
Atomic Visual Actions (AVA) Spoken Activity: A multimodal audio+visual video dataset for perception of conversations. In addition, academic challenges were run for AVA action recognition and AVA: Spoken Activity
PAWS and PAWS-X: To help with paraphrase identification, both datasets contain well-formed sentence pairs with high lexical overlap, in which around half of pairs are paraphrase and half are not.
Natural language dialog datasets: CCPE and Taskmaster-1 both use a Wizard-of-Oz platform that pairs two people who engage in spoken conversations, to mimic a human-level conversation with a digital assistant.
The Visual Task Adaptation Benchmark: VTAB follows similar guidelines to ImageNet and GLUE but is based on one principle—a better representation is one that yields better performance on unseen tasks, with limited in-domain data.
Schema-Guided Dialogue Dataset: the largest publicly available corpus of task-oriented dialogues, with over 18,000 dialogues spanning 17 domains.

Research Community Interaction
Finally, we’ve been busy within the broader academic and research community. In 2019 Google researchers presented hundreds of papers, participated in numerous conferences and received many awards and other accolades. We had a strong presence at:

CVPR: ~250 Googlers presented 40+ papers, talks, posters, workshops and more.
ICML: ~200 Googlers presented 100+ papers, talks, posters, workshops and more.
ICLR: ~200 Googlers presented 60+ papers, talks, posters, workshops and more.
ACL: ~100 Googlers presented 40+ papers, workshops and tutorials.
Interspeech: Over 100 Googlers presented 30+ papers.
ICCV: ~200 Googlers presented 40+ papers, and several Googlers also won three prestigious ICCV awards.
NeurIPS: ~500 Googlers co-authored more than 120 accepted papers and engaged in various workshops and more.

We also brought together hundreds of Google researchers and faculty from across the globe to 15 separate research workshops hosted at Google locations. These workshops were on topics ranging from improving flood forecasting globally, to how to use machine learning to build systems that can better serve people with disabilities, to accelerating the development of algorithms, applications and tools for noisy-intermediate scale quantum (NISQ) processors.

Supporting academia and research communities outside of Google, we supported over 50 PhD students globally through our annual PhD Fellowship Program, we funded 158 projects as part of our Google Faculty Research Awards 2018, and we held our third cohort of the Google AI Residency Program. We also mentored AI-focused startups.

New Places, New Faces
We’ve made lots of headway in 2019, but there’s so much more we can do. To continue growing our impact around the world, we opened a Research office in Bangalore, and we’re expanding in other offices. If you’re excited about working on these sorts of problems, we’re hiring!

Looking Forward to 2020 and Beyond
The past decade has seen remarkable advances in the fields of machine learning and computer science, where we now have given computers the ability to see, hear and understand language better than ever before (see a nice overview of important advances of the last decade). In our pockets, we now have sophisticated computing devices that can use these capabilities to better help us accomplish a multitude of tasks in our daily lives. We have substantially redesigned our computing platforms around these machine learning approaches by developing specialized hardware, giving us the ability to tackle ever larger problems. This has changed how we think about computing devices both in data centers (such as the inference-focused TPUv1 and the training-and-inference focused TPUv2 and TPUv3), as well as in low-power mobile environments (such as Edge TPUs). The deep learning revolution will continue to reshape how we think about computing and computers.

At the same time, there are a huge number of unanswered questions and unsolved problems. Some directions and questions that we are excited about tackling in 2020 and beyond are:

How can we build machine learning systems that can handle millions of tasks, and that can learn to successfully accomplish new tasks automatically? Currently, we’re mostly training separate machine models for each new task, starting from scratch, or at best, from a model trained on one or a few highly related tasks. As such, the models we train are really good at one or a few things, but not good at anything else. However, what we truly want are models that are good at leveraging their expertise at doing many things, so that they are able to learn to do a new thing with relatively little training data and computation. This is a true grand challenge which will require expertise and advances in many areas spanning solid-state circuit design, computer architecture, ML-focused compilers, distributed systems, machine learning algorithms and domain experts across many other fields in order to build systems that can generalize to solve new tasks independently across a full range of application areas.
How can we advance the state-of-the-art in important areas of artificial intelligence research like avoiding bias, increasing interpretability & understandability, improving privacy and ensuring safety? Advances in these areas are going to be critical as we use machine learning in more and more ways in society.
How can we apply computation and machine learning to make advances in important new areas of science? There are important advances to be had by collaborating with experts in other fields in areas like climate science, healthcare, bioinformatics and many other areas.
How can we ensure that the ideas and directions pursued by the machine learning and computer science research communities are put forth and explored by a diverse group of researchers? The work that the computer science and machine learning research communities are pursuing has broad implications for billions of people, and we want the set of researchers doing this work to represent the experiences, perspectives, concerns and creative enthusiasm of all the people of the world. How can we best support new researchers from diverse backgrounds entering the field?

Overall, 2019 was a very exciting year for research at Google and in the broader research community. We’re excited about tackling the research challenges ahead of us in 2020 and beyond, and we look forward to sharing our progress with you!

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Published a research paper about a new transparency tool, which enabled the launch of Model Cards for several of our Cloud AI products. You can see an example model card for the Cloud AI Vision API Object Detection feature.
Showed how Activation Atlases can help explore neural network behavior and can aid with interpretability of machine learning models.
Introduced TensorFlow Privacy, an open-source library to enable training machine learning models with differential privacy guarantees.
Released a beta version of Fairness Indicators, to help ML practitioners identify unjust or unintended impacts of machine learning models.
Clicking on a slice in Fairness Indicators will load all the data points in that slice inside the What-If Tool widget. In this case, all data points with the “female” label are shown.
Published a KDD'19 paper on how pairwise comparisons and regularization is incorporated into a large-scale production recommender system to improve ML Fairness.
Published an AIES'19 paper about a case study on the application of fairness in machine learning research to a production classification system, and described our fairness metric, conditional equality, that takes into account distributional differences in implementing equality of opportunity.
Published an AIES'19 paper about counterfactual fairness in text classification problems that asks the question: "How would the prediction change if the sensitive attribute referenced in the example were different?" and used this approach to improve our production systems that assess the toxicity of online content.

Released a new dataset to help with research to identify deepfakes.

The Fondation Médecins Sans Frontières (MSF) is creating a free smartphone application that uses image recognition tools to help clinical staff in low-resource settings (currently being piloted in Jordan) to analyze anti-microbial images and advise on the appropriate antibiotics to use for a particular patient’s infection.
Over a billion people live in smallholder farm households. A single pest attack can devastate their crop yields and livelihoods. Wadhwani AI uses image classification models that can identify pests and provide timely advice on what pesticides to spray and when—ultimately improving crop yield.
And deep in tropical rainforests, where illegal deforestation is a major driver of climate change, Rainforest Connection uses deep learning for bioacoustic monitoring and old cell phones to track rainforest health and detect threats.
Our 20 AI Impact Challenge winners. You can learn more about the work of all the grantees here.

In An Interactive, Automated 3D Reconstruction of a Fly Brain, we reported on a collaborative effort that achieved a milestone of mapping the structure of an entire fly brain, using machine learning models that were able to painstakingly trace each individual neuron.

We gave machine learning models better scents of the world with Learning to Smell: Using Deep Learning to Predict the Olfactory Properties of Molecules. We showed how to leverage graph neural networks (GNNs) to directly predict the odor descriptors for individual molecules, without using any handcrafted rules.

In work that combines chemistry and reinforcement learning techniques, we presented a framework for molecule optimization.
Machine learning can also help us in our artistic and creative endeavors. Artists have found ways to collaborate with AI and AR and create interesting new forms, from dancing with a machine to reimagine choreography, to creating new melodies with machine learning tools. ML can be used by novices, too. To honor the birthday of J.S. Bach, we featured a ML-powered Doodle: just create your melody, and the ML tool can create accompanying harmonizations in Bach’s style.

Lookout helps people who are blind or have low vision identify information about their surroundings. It draws upon similar underlying technology as Google Lens, which lets you search and take action on the objects around you, simply by pointing your phone.
Live Transcribe has the potential to give people who are deaf or hard of hearing greater independence in their everyday interactions. You can get real-time transcriptions of conversations that the user is engaged in, even if the speech is in another language.
Project Euphonia performs personalized speech-to-text transcription. For people with ALS and other conditions that produce slurred or non-standard speech, this research improves automatic speech recognition (ASR) over other state-of-the-art ASR models.
Like Project Euphonia, Parrotron uses end-to-end neural networks to help improve communication, but the research focuses on automatic speech-to-speech conversion rather than transcription, presenting a speech interface that may be easier for some to access.
Millions of images online don’t have any text description. Get Image Descriptions from Google helps blind or low vision users understand unlabelled images. When a screen reader encounters an image or graphic without a description, Chrome can now create one automatically.
We developed tools that can read visual text in audio form in Lens for Google Go, greatly helping users who are not fully literate navigate the word-rich world around them.

The launch of on-device captioning with Live Caption, giving always-available transcription of any video playing on your device.
The creation of a powerful new transcribing Recorder app, which can help index audio information and make it easily retrievable.
Improvements to Google Translate’s camera translation, so that you can point at text in an unfamiliar language and get it instantly translated in context.
Release of the Augmented Faces API in ARCore, enabling new real-time AR self-expression tools.
A demonstration of on-device, real-time hand tracking, enabling new ways for users to interact with and control their devices with their hands.
Improved, RNN-based on-device handwriting recognition for on-screen mobile keyboards.
The release of a new global localization approach using your smart phone’s camera to help more accurately orient you and help you find your way in the world.

We showed that a deep learning model for mammography can assist physicians in spotting breast cancer, a condition that affects 1 in 8 women in the US during their lifetimes, with greater accuracy than experts, reducing both false positives and false negatives. The model trained on de-identified data from a UK hospital had similar gains in accuracy when used to evaluate patients in a completely different healthcare system in the U.S.
Example of a difficult-to-detect cancer case correctly identified by machine learning.
We showed that a deep learning model for differential diagnoses of skin diseases can give results that are significantly more accurate than primary care physicians and on par with or perhaps slightly better than dermatologists.
Working alongside experts from the US Department of Veterans Affairs (VA), DeepMind Health colleagues who are now part of Google Health showed that a machine learning model can predict the onset of acute kidney injury (AKI), one of the leading causes of avoidable patient harm, up to two days before it happens. In the future, this could give doctors a 48-hour head start in treating this serious condition.
We expanded the application of deep learning to electronic health records with several partner organizations. You can read more about this work in our 2018 blog post.
We showed a promising step forward for predicting lung cancer, where a deep learning model for examining the results of a single CT scan study performed on par or better than trained radiologists at early detection of lung cancer. Early detection of lung cancer dramatically improves survival rates.
We continued to expand and evaluate our deployment of machine learning tools for detection and prevention of eye disease, in collaboration with Verily and our healthcare partners in India and Thailand.
We published a research paper on an augmented reality microscope for cancer diagnosis, whereby a pathologist can get real-time feedback about what parts of a slide are most interesting while examining tissue through a microscope. You can also read more about it in our 2018 blog post here.
We built a human-centric, similar-image search tool for pathologists to help them make more effective diagnoses, by allowing examination of similar cases.

In Evaluating the Unsupervised Learning of Disentangled Representations, we examined what properties affect the representations that are learned from unsupervised data, in order to better understand what makes for good representations and effective learning.
In Predicting the Generalization Gap in Deep Neural Networks, we showed that it is possible to predict the generalization gap (the gap between a model’s performance on data from the training distribution versus data drawn from a different distribution) using statistics of the margin distribution, helping us better understand which models generalize most effectively. We also did some research on Improving Out-of-Distribution Detection in Machine Learning Models, to better understand when a model is starting to encounter kinds of data it has never seen before. We also looked at Off-Policy Classification in the context of reinforcement learning, to better understand which models are likely to generalize the best.

In EfficientNet-EdgeTPU: Creating Accelerator-Optimized Neural Networks with AutoML, we showed how a neural architecture search approach can find efficient models that are tailored to particular hardware accelerators, resulting in high accuracy, low-computational models for running on mobile devices.

We developed AutoML techniques for tabular data, unlocking an important domain where many companies and organizations have interesting data in relational databases, and often want to develop machine learning models on this data. We collaborated to release this technology as a new Google Cloud AutoML Tables product, and also discussed how well this system did in a new Kaggle competition in An End-to-End AutoML Solution for Tabular Data at KaggleDays (spoiler: AutoML Tables finished second out of 74 teams of expert data scientists).
In Exploring Weight Agnostic Neural Networks, we showed how it is possible to find interesting neural network architectures without any training steps to update the weights of the evaluated models. This can make architecture search much more computationally efficient.

A weight-agnostic neural network performing a Cartpole Swing-up task at various different weight parameters, and also using fine-tuned weight parameters.

In SpecAugment: A New Data Augmentation Method for Automatic Speech Recognition, we showed that the approach of automatically learning data augmentation methods can be extended to speech recognition models, with the learned augmentation approaches achieving significantly higher accuracy with less data than existing human ML-expert driven data augmentation approaches.
We launched our first speech application for keyword spotting and spoken language identification using AutoML. In our experiments we found better models (both more efficient and better performance) than the human designed models that have been in this setting for some time.

In Translatotron: An End-to-End Speech-to-Speech Translation Model, we showed that it is possible to train a joint model to accomplish the (normally separate) tasks of speech recognition, translation and text-to-speech generation with nice benefits, like preserving the sound of the speaker’s voice in the generated translated audio, as well as a simpler overall learning system.
In Multilingual Universal Sentence Encoder for Semantic Retrieval, we showed how to combine many different objectives to yield models that are significantly better at semantic retrieval (versus simpler word matching techniques). For example, in Google Talk to Books, the query “What fragrance brings back memories?” yields the result, “And for me, the smell of jasmine along with the pan bagnat, it brings back my entire carefree childhood.”

In Robust Neural Machine Translation, we showed how to use an adversarial training procedure to significantly improve the quality and robustness of language translations.

Finer-grained visual understanding in Lens, enabling even more powerful visual search.
Helpful smart camera features such as Quick Gestures, Face Match and smart video call framing on the Nest Hub Max.
Technology for live and spatially-aware perception for helpfully augmenting the world around us through Lens.
Better models for depth prediction from videos.

Better representations for fine-grained temporal understanding of videos using temporal cycle-consistency learning.

Learning representations across text, speech and video that are temporally consistent from unlabeled videos.

Being able to predict future visual inputs from observations of the past.
Models that can better understand action sequences in videos, enabling you to better recall special video moments like “blowing out candles” or “sliding down a slide” in Google Photos.
Architecture for temporal action localization.

In Long-Range Robotic Navigation via Automated Reinforcement Learning, we showed how to combine reinforcement learning with long-range planning to enable robots to more effectively navigate complex environments (like our Google office buildings).
In PlaNet: A Deep Planning Network for Reinforcement Learning, we showed how to effectively learn a world model purely from the pixels of images, and how to leverage this model of how the world behaves in order to accomplish tasks with many fewer learning episodes.
In Unifying Physics and Deep Learning with TossingBot, we showed how robots can learn “intuitive” physics from experimentation in an environment, rather than being pre-programmed with physics models about the environment in which they are operating.
In Soft Actor-Critic: Deep Reinforcement Learning for Robotics, we showed that training a reinforcement learning algorithm to both maximize the expected reward (which is the standard RL objective) and to maximize the policy's entropy (so that learning favors policies that are more random), can help robots learn faster and be more robust to changes in their environment.
In Learning to Assemble and to Generalize from Self-Supervised Disassembly, we showed how robots can learn to assemble by first learning to disassemble things in a self-supervised way. Kids learn from taking things apart, and it appears that robots can as well!
We introduced ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots, an open-source platform of cost-effective robots and curated benchmarks designed to facilitate research and development on physical robotics hardware in the real world.

Open Images V5: An update to the popular Open Images dataset that includes segmentation masks for 2.8 million objects in 350 categories (so that it now has ~9M images annotated with image-level labels, object bounding boxes, object segmentation masks, and visual relationships).
Natural questions: the first dataset to use naturally occurring queries and find answers by reading an entire page, rather than extracting answers from a short paragraph.
Data for deepfake detection: we contributed a large dataset of visual deepfakes to the FaceForensics benchmark (mentioned above).
Google Research Football: a novel reinforcement learning environment where agents aim to master the world’s most popular sport—football (or, if you’re American, soccer). It’s important for reinforcement learning agents to have GOOOAAALLLSS!
Google-Landmarks-v2: over 5 million images (2x that of the first release) of more than 200 thousand different landmarks.
YouTube-8M Segments: A large-scale classification and temporal localization dataset that includes human-verified labels at the 5-second segment level of YouTube-8M videos.
Atomic Visual Actions (AVA) Spoken Activity: A multimodal audio+visual video dataset for perception of conversations. In addition, academic challenges were run for AVA action recognition and AVA: Spoken Activity
PAWS and PAWS-X: To help with paraphrase identification, both datasets contain well-formed sentence pairs with high lexical overlap, in which around half of pairs are paraphrase and half are not.
Natural language dialog datasets: CCPE and Taskmaster-1 both use a Wizard-of-Oz platform that pairs two people who engage in spoken conversations, to mimic a human-level conversation with a digital assistant.
The Visual Task Adaptation Benchmark: VTAB follows similar guidelines to ImageNet and GLUE but is based on one principle—a better representation is one that yields better performance on unseen tasks, with limited in-domain data.
Schema-Guided Dialogue Dataset: the largest publicly available corpus of task-oriented dialogues, with over 18,000 dialogues spanning 17 domains.

CVPR: ~250 Googlers presented 40+ papers, talks, posters, workshops and more.
ICML: ~200 Googlers presented 100+ papers, talks, posters, workshops and more.
ICLR: ~200 Googlers presented 60+ papers, talks, posters, workshops and more.
ACL: ~100 Googlers presented 40+ papers, workshops and tutorials.
Interspeech: Over 100 Googlers presented 30+ papers.
ICCV: ~200 Googlers presented 40+ papers, and several Googlers also won three prestigious ICCV awards.
NeurIPS: ~500 Googlers co-authored more than 120 accepted papers and engaged in various workshops and more.

How can we build machine learning systems that can handle millions of tasks, and that can learn to successfully accomplish new tasks automatically? Currently, we’re mostly training separate machine models for each new task, starting from scratch, or at best, from a model trained on one or a few highly related tasks. As such, the models we train are really good at one or a few things, but not good at anything else. However, what we truly want are models that are good at leveraging their expertise at doing many things, so that they are able to learn to do a new thing with relatively little training data and computation. This is a true grand challenge which will require expertise and advances in many areas spanning solid-state circuit design, computer architecture, ML-focused compilers, distributed systems, machine learning algorithms and domain experts across many other fields in order to build systems that can generalize to solve new tasks independently across a full range of application areas.
How can we advance the state-of-the-art in important areas of artificial intelligence research like avoiding bias, increasing interpretability & understandability, improving privacy and ensuring safety? Advances in these areas are going to be critical as we use machine learning in more and more ways in society.
How can we apply computation and machine learning to make advances in important new areas of science? There are important advances to be had by collaborating with experts in other fields in areas like climate science, healthcare, bioinformatics and many other areas.
How can we ensure that the ideas and directions pursued by the machine learning and computer science research communities are put forth and explored by a diverse group of researchers? The work that the computer science and machine learning research communities are pursuing has broad implications for billions of people, and we want the set of researchers doing this work to represent the experiences, perspectives, concerns and creative enthusiasm of all the people of the world. How can we best support new researchers from diverse backgrounds entering the field?

Source: Google AI Blog

Looking Back at Google’s Research Efforts in 2018

Posted by Jeff Dean, Senior Fellow and Google AI Lead, on behalf of the entire Google Research Community

2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.

Ethical Principles and AI
Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.

AI for Social Good
The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property.

A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.

We have also seen a huge number of external parties, sometimes in collaboration with Google researchers and engineers, using open source software like TensorFlow to tackle a wide range of scientific and social problems, such as using convolutional neural networks to identify humpback whale calls, detecting new exoplanets, identifying diseased cassava plants and more.

To spur creative activity in this area, we announced the Google AI for Social Impact Challenge in collaboration with Google.org, whereby individuals and organizations can receive grants from a total of $25M of funding, along with mentorship and advice from Google research scientists, engineers and other experts as they work to take a project with large potential social impact from idea to reality.

Assistive Technology
Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.

Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).

An important focus in our research is helping to make products like the Google Assistant support more languages and allow better understanding of semantic similarity, even when very different ways of expressing the same concept or idea are used. Underlying new product capabilities like these is research we performed on improving the quality of both speech synthesis and text-to-speech for languages without much training data available.

Quantum computing
Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.

A Bristlecone chip being installed by Research Scientist Marissa Giustina at the Quantum AI Lab in Santa Barbara.

We also released Cirq, an open source programming framework for quantum computers, and explored how quantum computers could be used for neural networks. Finally, we shared our experience and techniques for understanding performance fluctuations in quantum processors, and shared some thoughts on how quantum computers might be useful as a computational substrate for neural networks. We're looking forward to exciting results in the quantum computing space in 2019!

Natural Language Understanding
Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.

BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.

In addition to collaborating with various research teams to enable Smart Compose and Duplex (discussed previously), we worked to make the Google Assistant handle multilingual use cases better, with the goal of making the Assistant naturally conversational for all users.

Perception
Our perception research tackles the hard problems of allowing computers to understand images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality. In 2018, our technology improved Google Photos' ability to organize the content that users most care about, such as people and pets. Google Lens and the Assistant enabled users to learn about the natural world, answer questions in real-time, and do more with Lens in Google Images. A key aspect of the Google AI mission is to empower others to benefit from our technology, and we've made a lot of progress this year in improving capabilities and building blocks that are parts of Google APIs. Examples include improved and new capabilities in vision and video in Cloud ML APIs and face-related on-device building blocks through ML Kit.

Google Lens can help you learn more about the world around you. Here, Lens identifies the breed of this dog. Learn more in this blog post.

In 2018, our contributions to academic research included advances in deep learning for 3D scene understanding, such as stereo magnification, which enables synthesizing novel photorealistic views of a scene. Our ongoing research on better understanding images and video enables users to find, organize, enhance and improve images and video in Google products such as Photos, YouTube, Search and more. In 2018, notable advances included a fast bottom-up model for joint pose estimation and person instance segmentation, a system for visualizing complex motion, a system which models spatio-temporal relations between people and objects and improvements in video action recognition based on distillation and 3D convolutions.

In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listen combines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.

Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.

Computational Photography
The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.

Motion photos on the Pixel 2 in Google Photos. For more examples, check out this Google Photos album.

Augmented chicken family with Motion Stills AR mode.

This year, one of our primary efforts in computational photography research was to create a new capability called Night Sight, which enables Pixel phone cameras to "see in the dark", earning praise by both press and users. Of course, Night Sight is just one of the new software-enabled camera features our teams have developed to help you take the perfect photo, including using ML to provide better portrait mode shots, seeing better and further with Super Res Zoom and capturing special moments with Top Shot and Google Clips.

Left: iPhone XS (full resolution image here). Right: Pixel 3 Night Sight (full resolution image here).

Algorithms and Theory
Algorithms are the backbone of Google systems and touch all our products, from routing algorithms behind Google trips to consistent hashing for Google cloud. Over the past year, we continued our research in algorithms and theory covering a wide range of areas from theoretical foundations to applied algorithms, and from graph mining to privacy-preserving computation. Our work in optimization spans areas from studying continuous optimization for machine learning to distributed combinatorial optimization. In the former area, our work on studying convergence of stochastic optimization algorithms for training neural networks (which won an ICLR 2018 Best Paper Award) exhibited issues with popular gradient-based optimization methods (such as some variants of ADAM), but provided a solid foundation for new gradient-based optimization methods.

Performance comparison of ADAM and AMSGRAD on a synthetic example of a simple one dimensional convex problem inspired by our examples of non-convergence. The first two plots (left and center) are for the online setting and the the last one (right) is for the stochastic setting.

In distributed optimization, we worked to improve the round and communication complexity of well-studied combinatorial optimization problems such as matchings in graphs via round compression and via core-sets, as well as submodular maximization, and k-core decomposition. On the more applied side, we developed algorithmic techniques for solving set cover at scale via sketching and for solving balanced partitioning and hierarchical clustering for graphs with trillions of edges. Our work on online delivery services was nominated for the best paper award at WWW'18. Finally, our open source optimization OR-tools platform won 4 gold medals at the 2018 Minizinc constraint programming competition.

In algorithmic choice theory, we have proposed new models and investigated the problems of reconstruction and learning a mixture of multinomial logits. We also studied the classes of functions learnable by neural networks and how to use machine-learned oracles to improve classic online algorithms.

Understanding learning techniques with strong privacy guarantees is of great importance for us at Google. In this context, we developed two new means of analyzing how differential privacy can be amplified by iteration and by shuffling. We also applied differential privacy techniques to design incentive-aware learning methods that are robust against gaming. Such learning techniques have applications in efficient online market design. Our new research in the area of market algorithms include also techniques to help advertisers test incentive compatibility of ad auctions, and optimizing ad refresh for in-app advertising. We also pushed the boundaries of state-of-the-art dynamic mechanisms for repeated auctions, and presented dynamic auctions that are robust against lack of prediction of future, against noisy forecasts, or against heterogenous buyer behaviour, and extend our results to dynamic double auctions. Finally, in the context of robustness in online optimization and online learning, we developed new online allocation algorithms for stochastic input with traffic spikes and new bandit algorithms robust to corrupted data.

Software Systems
A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.

The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.

We also released JAX, an accelerator-backed variant of NumPy that supports automatic differentiation of Python functions to arbitrary order. While JAX is not part of TensorFlow, it leverages some of the same underlying software infrastructure (e.g. XLA), and some of its ideas and algorithms have been helpful to our TensorFlow projects. Finally, we continued our research on the security and privacy of machine learning, and our development of open source frameworks for safety and privacy in AI systems, such as CleverHans and TensorFlow Privacy.

Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.

The Hierarchical Planner's placement of a NMT (4-layer) model. White denotes CPU and the four colors each represent one of the GPUs. Note that every step of every layer is allocated across multiple GPUs. This placement is 53.7% faster than that generated by a human expert.

In 2018 we learned about Spectre and Meltdown, new classes of serious security vulnerabilities in modern computer processors, thanks to Google's Project Zero team in collaboration with others. These and related vulnerabilities will keep computer architecture researchers quite busy. In our continuing efforts to model CPU behavior, our Compiler Research team integrated their tool for measuring machine instruction latency and port pressure into LLVM, making possible better compilation decisions.

Google products, our Cloud offerings and inference for machine learning models depend critically on the ability to provide large-scale, reliable, efficient technical infrastructure for computing, storage and networking. A few research highlights from the past year include the evolution of Google's Software Defined Networking WAN, a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats, in many storage systems (BigTable, Spanner, Google Spreadsheets, etc.) and a report on our extensive use of code review, investigating the motivations behind code review at Google, current practices, and developers' satisfaction and challenges.

Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guarantees on the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.

AutoML
AutoML, also known as meta-learning, is the use of machine learning to automate some aspects of machine learning. We have been performing research in this space for many years, and the long-term goal is to develop learning systems that can learn to take a new problem and solve it automatically, using insights and capabilities derived from other problems that have been previously solved. Our earlier work in this space has mostly used reinforcement learning, but we are also interested in the use of evolutionary algorithms. Last year we showed how evolutionary algorithms can be used to automatically discover state-of-the-art neural network architectures for a variety of visual tasks. We also explored how reinforcement learning can be applied to other problems than just neural network architecture search, showing that it can be used to 1) automatically generate image transformation sequences that improve the accuracy of a wide variety of image models, and 2) find new symbolic optimization expressions that are more effective than the commonly used optimization update rules. Our work on AdaNet showed how to have a fast and flexible AutoML algorithm with learning guarantees.

AdaNet adaptively growing an ensemble of neural networks. At each iteration, it measures the ensemble loss for each candidate, and selects the best one to move onto the next iteration.

Another focus for us was on automatically discovering neural network architectures that are computationally efficient, so that they can run in environments such as mobile phones or autonomous vehicles that have tight constraints on either computational resources or on inference time. For this, we showed that combining the accuracy of a model with its inference computation time in the reward function for a reinforcement learning architecture search can find models that are highly accurate while meeting particular performance constraints. We also explored using ML to learn to automatically compress ML models to have fewer parameters and use less computational resources.

TPUs
Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.

An individual TPU v3 device (left) and a portion of a TPU v3 Pod (right). TPU v3 is the latest generation of Google's Tensor Processing Unit (TPU) hardware. Available to external customers as Cloud TPU v3, these systems are liquid-cooled for maximum performance (computer chips + liquid = exciting!), and a full TPU v3 Pod can apply more than 100 petaflops of computational power to the world's largest ML problems.

Open Source Software and Datasets
Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.

We are happy that TensorFlow has the strongest Github user retention of the top machine learning and deep learning frameworks. The TensorFlow team is also working to address Github issues faster and provide a smooth path for external contributors. In research, we continue to power much of the world's machine learning and deep learning research on a published paper basis according to Google Scholar data. TensorFlow Lite is now on more than 1.5B devices globally after being available for just one year. Additionally, TensorFlow.js is the number one ML framework for JavaScript; in the nine months since launch, it had over 2M Content Delivery Network (CDN) hits, 250K downloads and more than 10,000 stars on Github.

In addition to continued work on existing open source ecosystems, in 2018 we introduced a new framework for flexible and reproducible reinforcement learning, new visualization tools to rapidly understand the characteristics of a dataset (without needing to write any code), added a high-level library for expressing machine learning problems that involve learning-to-rank (the process of ordering a list of items in a way that maximizes the utility of the entire list, applicable across domains that include search engines, recommender systems, machine translation, dialogue systems and even computational biology), released a framework for fast and flexible AutoML solutions with learning guarantees, a library for doing in-browser realtime t-SNE visualizations using TensorFlow.js and added FHIR tools and software for working with electronic healthcare data (discussed in the healthcare section of this post).

Real-time evolution of the tSNE embedding for the complete MNIST dataset. The dataset contains images of 60,000 handwritten digits. You can find a live demo here.

Public datasets are often a great source of inspiration that lead to great progress across many fields, since they give the broader community both access to interesting data and problems as well as a healthy competitive drive to achieve better results on a variety of tasks. This year we were happy to release Google Dataset Search, a new tool for finding public datasets from all of the web. Over the years we have also curated and released many new, novel datasets, including everything from millions of general annotated images or videos, to a crowd-source Bengali dataset for speech recognition to robot arm grasping datasets and more. In 2018, we added even more datasets to that list.

Pictures from India & Singapore added to Open Images Extended using the Crowdsource app.

We released Open Images V4, a dataset containing 15.4M bounding-boxes for 600 categories on 1.9M images, as well as 30.1M human-verified image-level labels from 19,794 categories. We also extended this dataset to add more diversity of people and scenes from all over the world, by adding 5.5M generated annotations provided by tens of thousands of users from all over the world using crowdsource.google.com. We released the Atomic Visual Actions (AVA) dataset that provides audiovisual annotations of video for improving the state of the art in understanding human actions and speech in video. We also announced an updated YouTube-8M, and the 2nd YouTube-8M Large-Scale Video Understanding Challenge and Workshop. The HDR+ Burst Photography Dataset aims to enable a wide variety of research in the field of computational photography, and Google-Landmarks was a new dataset and challenge for landmark recognition. And while not a dataset release, we explored techniques that can enable faster creation of visual datasets using Fluid Annotation, an exploratory ML-powered interface for faster image annotation.

Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene, original image.

From time-to-time, we also help establish new kinds of challenges for the research community, so that we can all work together on solving difficult research problems. Often these are done with the release of a new dataset, but not always. This year, we established new challenges around the Inclusive Images Challenge, to work towards making more robust models that are free from many kinds of biases, the iNaturalist 2018 Challenge which aims to enable computers' fine-grained discrimination of visual categories (such as species of plants in an image), a Kaggle "Quick, Draw!" Doodle Recognition Challenge to create a better classifier for the QuickDraw challenge game, and Conceptual Captions, a larger-scale image captioning dataset and challenge aimed at enabling better image captioning model research.

Robotics
In 2018, we made significant progress towards our goal of understanding how ML can teach robots how to act in the world, achieving a new milestone in the ability to teach robots to grasp novel objects (best systems paper at CoRL'18), and using it to learn about objects without human supervision. We've also made progress in learning robot motion by combining ML and sampling-based methods (best paper in service robotics at ICRA'18) and learning robot geometry for faster planning. We've made great strides in our ability to better perceive the structure of the world from autonomous observation. For the first time, we've been able to successfully train deep reinforcement learning models online on real robots, and are finding new, theoretically grounded ways, to learn stable approaches to robot control.

Applications of AI to Other Fields
In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.

Our Nature Methods paper on high-precision automated reconstruction of neurons proposed a new model that improves the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques.

Our algorithm in action as it traces a single neurite in 3d in a songbird brain.

Some other examples of applying ML to science include:

A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.

Health
For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.

At the end of 2016, we published work showing that a model trained to assess retinal fundus images for signs of diabetic retinopathy was able to perform on-par to slightly-better than U.S. medical-board-certified ophthalmologists at this task in a retrospective study. In 2018, we were able to show that by having the training images labeled by retinal specialists and by using an adjudicated protocol (where multiple retinal specialists convene and have to arrive at a single collective assessment for each fundus image), we could arrive at a model that is on-par with retinal specialists. Later, we published an evaluation that showed how pairing ophthalmologists and this ML model allow them to make more accurate decisions than either alone. We have deployed this diabetic retinopathy detection system in partnership with our Alphabet colleagues at Verily at over 10 sites including Aravind Eye Hospitals in India and at Rajavithi Hospital affiliated with the Ministry of Health in Thailand.

On the left is a retinal fundus image graded as having moderate DR ("Mo") by an adjudication panel of ophthalmologists (ground truth). On the top right is an illustration of the predicted scores ("N" = no DR, "Mi" = Mild DR, "Mo" = Moderate DR) from the model. On the bottom right is the set of scores given by physicians without assistance ("Unassisted") and those who saw the model's predictions ("Grades Only").

In work that medical and eye specialists found quite remarkable, we also published research on a machine learning model that can assess cardiovascular risk from retinal images. This shows early promising signs for a novel, non-invasive biomarker that can help clinicians better understand the health of their patients.

We have also continued our focus on pathology this year, showing how to improve the grading of prostate cancer using ML, detect metastatic breast cancer with deep learning, and developed a prototype for an augmented-reality microscope that can aid pathologists and other scientists by overlaying visual information derived from computer vision models into the visual field of the microscopist in real time.

For the past four years, we have had a significant research effort around using deep learning on electronic health records to make clinically-relevant predictions. In 2018, in collaboration with University of Chicago Medicine, UCSF and Stanford Medicine, we published work in Nature Digital Medicine showing how ML models applied to de-identified electronic medical records can make significantly higher accuracy predictions for a variety of clinically relevant tasks than the current clinical best practice. As part of this work, we developed tools to make it significantly easier to create these models even on quite different tasks and quite different underlying EHR data sets. We have open sourced software related to the Fast Healthcare Interoperability Resources (FHIR) standard that we developed in this work to help make working with medical data easier and more standardized (see this GitHub repository). We also improved the accuracy, speed and utility of our deep learning-based variant caller, DeepVariant. The team has forged ahead with partners and recently published the peer-reviewed paper in Nature Biotechnology.

When applying ML to historically-collected data, it's important to understand the populations that have experienced human and structural biases in the past and how those biases have been codified in the data. Machine-learning offers an opportunity to detect and address bias and to proactively advance health equity, which we are designing our systems to do.

Research Outreach
We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world.

Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.

Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.

We believe that contributing openly to the broader research community is a critical part of supporting a healthy and productive research ecosystem. In addition to our open source and dataset releases, much of our research is published openly in top conference venues and journals, and we actively participate in the organization and sponsorship of conferences, all across the spectrum of different disciplines. For just a small sample, see our involvement at ICLR 2018, NAACL 2018, ICML 2018, CVPR 2018, NeurIPS 2018, ECCV 2018 and EMNLP 2018. Googlers also participated extensively in ASPLOS, HPCA, ICSE, IEEE Security & Privacy, OSDI, SIGCOMM, and many other conferences in 2018.

New Places, New Faces
In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.

Looking Forward to 2019
This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!

googblogs.com

All Google blogs and Press in one site

Tag Archives: Year in Review

Google Research: Themes from 2021 and Beyond

Source: Google AI Blog

Google Research: Looking Back at 2020, and Forward to 2021

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Source: Google AI Blog

Google Research: Looking Back at 2019, and Forward to 2020 and Beyond

Source: Google AI Blog

Looking Back at Google’s Research Efforts in 2018

Source: Google AI Blog