2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.
Ethical Principles and AI
Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.
AI for Social Good
The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property.
A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.
We have also seen a huge number of external parties, sometimes in collaboration with Google researchers and engineers, using open source software like TensorFlow to tackle a wide range of scientific and social problems, such as using convolutional neural networks to identify humpback whale calls, detecting new exoplanets, identifying diseased cassava plants and more.
To spur creative activity in this area, we announced the Google AI for Social Impact Challenge in collaboration with Google.org, whereby individuals and organizations can receive grants from a total of $25M of funding, along with mentorship and advice from Google research scientists, engineers and other experts as they work to take a project with large potential social impact from idea to reality.
Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.
Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).
An important focus in our research is helping to make products like the Google Assistant support more languages and allow better understanding of semantic similarity, even when very different ways of expressing the same concept or idea are used. Underlying new product capabilities like these is research we performed on improving the quality of both speech synthesis and text-to-speech for languages without much training data available.
Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.
|A Bristlecone chip being installed by Research Scientist Marissa Giustina at the Quantum AI Lab in Santa Barbara.|
Natural Language Understanding
Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.
|BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.|
Our perception research tackles the hard problems of allowing computers to understand images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality. In 2018, our technology improved Google Photos' ability to organize the content that users most care about, such as people and pets. Google Lens and the Assistant enabled users to learn about the natural world, answer questions in real-time, and do more with Lens in Google Images. A key aspect of the Google AI mission is to empower others to benefit from our technology, and we've made a lot of progress this year in improving capabilities and building blocks that are parts of Google APIs. Examples include improved and new capabilities in vision and video in Cloud ML APIs and face-related on-device building blocks through ML Kit.
|Google Lens can help you learn more about the world around you. Here, Lens identifies the breed of this dog. Learn more in this blog post.|
In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listen combines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.
Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.
The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.
|Motion photos on the Pixel 2 in Google Photos. For more examples, check out this Google Photos album.|
|Augmented chicken family with Motion Stills AR mode.|
|Left: iPhone XS (full resolution image here). Right: Pixel 3 Night Sight (full resolution image here).|
Algorithms are the backbone of Google systems and touch all our products, from routing algorithms behind Google trips to consistent hashing for Google cloud. Over the past year, we continued our research in algorithms and theory covering a wide range of areas from theoretical foundations to applied algorithms, and from graph mining to privacy-preserving computation. Our work in optimization spans areas from studying continuous optimization for machine learning to distributed combinatorial optimization. In the former area, our work on studying convergence of stochastic optimization algorithms for training neural networks (which won an ICLR 2018 Best Paper Award) exhibited issues with popular gradient-based optimization methods (such as some variants of ADAM), but provided a solid foundation for new gradient-based optimization methods.
distributed optimization, we worked to improve the round and communication complexity of well-studied combinatorial optimization problems such as matchings in graphs via round compression and via core-sets, as well as submodular maximization, and k-core decomposition. On the more applied side, we developed algorithmic techniques for solving set cover at scale via sketching and for solving balanced partitioning and hierarchical clustering for graphs with trillions of edges. Our work on online delivery services was nominated for the best paper award at WWW'18. Finally, our open source optimization OR-tools platform won 4 gold medals at the 2018 Minizinc constraint programming competition.
In algorithmic choice theory, we have proposed new models and investigated the problems of reconstruction and learning a mixture of multinomial logits. We also studied the classes of functions learnable by neural networks and how to use machine-learned oracles to improve classic online algorithms.
Understanding learning techniques with strong privacy guarantees is of great importance for us at Google. In this context, we developed two new means of analyzing how differential privacy can be amplified by iteration and by shuffling. We also applied differential privacy techniques to design incentive-aware learning methods that are robust against gaming. Such learning techniques have applications in efficient online market design. Our new research in the area of market algorithms include also techniques to help advertisers test incentive compatibility of ad auctions, and optimizing ad refresh for in-app advertising. We also pushed the boundaries of state-of-the-art dynamic mechanisms for repeated auctions, and presented dynamic auctions that are robust against lack of prediction of future, against noisy forecasts, or against heterogenous buyer behaviour, and extend our results to dynamic double auctions. Finally, in the context of robustness in online optimization and online learning, we developed new online allocation algorithms for stochastic input with traffic spikes and new bandit algorithms robust to corrupted data.
A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.
|The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.|
Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.
|The Hierarchical Planner's placement of a NMT (4-layer) model. White denotes CPU and the four colors each represent one of the GPUs. Note that every step of every layer is allocated across multiple GPUs. This placement is 53.7% faster than that generated by a human expert.|
Google products, our Cloud offerings and inference for machine learning models depend critically on the ability to provide large-scale, reliable, efficient technical infrastructure for computing, storage and networking. A few research highlights from the past year include the evolution of Google's Software Defined Networking WAN, a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats, in many storage systems (BigTable, Spanner, Google Spreadsheets, etc.) and a report on our extensive use of code review, investigating the motivations behind code review at Google, current practices, and developers' satisfaction and challenges.
Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guarantees on the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.
AutoML, also known as meta-learning, is the use of machine learning to automate some aspects of machine learning. We have been performing research in this space for many years, and the long-term goal is to develop learning systems that can learn to take a new problem and solve it automatically, using insights and capabilities derived from other problems that have been previously solved. Our earlier work in this space has mostly used reinforcement learning, but we are also interested in the use of evolutionary algorithms. Last year we showed how evolutionary algorithms can be used to automatically discover state-of-the-art neural network architectures for a variety of visual tasks. We also explored how reinforcement learning can be applied to other problems than just neural network architecture search, showing that it can be used to 1) automatically generate image transformation sequences that improve the accuracy of a wide variety of image models, and 2) find new symbolic optimization expressions that are more effective than the commonly used optimization update rules. Our work on AdaNet showed how to have a fast and flexible AutoML algorithm with learning guarantees.
|AdaNet adaptively growing an ensemble of neural networks. At each iteration, it measures the ensemble loss for each candidate, and selects the best one to move onto the next iteration.|
Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.
|An individual TPU v3 device (left) and a portion of a TPU v3 Pod (right). TPU v3 is the latest generation of Google's Tensor Processing Unit (TPU) hardware. Available to external customers as Cloud TPU v3, these systems are liquid-cooled for maximum performance (computer chips + liquid = exciting!), and a full TPU v3 Pod can apply more than 100 petaflops of computational power to the world's largest ML problems.|
Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.
In addition to continued work on existing open source ecosystems, in 2018 we introduced a new framework for flexible and reproducible reinforcement learning, new visualization tools to rapidly understand the characteristics of a dataset (without needing to write any code), added a high-level library for expressing machine learning problems that involve learning-to-rank (the process of ordering a list of items in a way that maximizes the utility of the entire list, applicable across domains that include search engines, recommender systems, machine translation, dialogue systems and even computational biology), released a framework for fast and flexible AutoML solutions with learning guarantees, a library for doing in-browser realtime t-SNE visualizations using TensorFlow.js and added FHIR tools and software for working with electronic healthcare data (discussed in the healthcare section of this post).
|Real-time evolution of the tSNE embedding for the complete MNIST dataset. The dataset contains images of 60,000 handwritten digits. You can find a live demo here.|
|Pictures from India & Singapore added to Open Images Extended using the Crowdsource app.|
|Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene, original image.|
In 2018, we made significant progress towards our goal of understanding how ML can teach robots how to act in the world, achieving a new milestone in the ability to teach robots to grasp novel objects (best systems paper at CoRL'18), and using it to learn about objects without human supervision. We've also made progress in learning robot motion by combining ML and sampling-based methods (best paper in service robotics at ICRA'18) and learning robot geometry for faster planning. We've made great strides in our ability to better perceive the structure of the world from autonomous observation. For the first time, we've been able to successfully train deep reinforcement learning models online on real robots, and are finding new, theoretically grounded ways, to learn stable approaches to robot control.
In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.
Our Nature Methods paper on high-precision automated reconstruction of neurons proposed a new model that improves the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques.
|Our algorithm in action as it traces a single neurite in 3d in a songbird brain.|
- Finding new planets outside our solar system by data mining light curves of stars
- Recognizing the origin or function of short DNA sequences
- Automatically detecting out-of-focus microscope images
- Digitally creating images of the same cells with multiple stains
- Automatically mapping mass spectrometry output to peptide chains
|A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.|
For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.
At the end of 2016, we published work showing that a model trained to assess retinal fundus images for signs of diabetic retinopathy was able to perform on-par to slightly-better than U.S. medical-board-certified ophthalmologists at this task in a retrospective study. In 2018, we were able to show that by having the training images labeled by retinal specialists and by using an adjudicated protocol (where multiple retinal specialists convene and have to arrive at a single collective assessment for each fundus image), we could arrive at a model that is on-par with retinal specialists. Later, we published an evaluation that showed how pairing ophthalmologists and this ML model allow them to make more accurate decisions than either alone. We have deployed this diabetic retinopathy detection system in partnership with our Alphabet colleagues at Verily at over 10 sites including Aravind Eye Hospitals in India and at Rajavithi Hospital affiliated with the Ministry of Health in Thailand.
medical and eye specialists found quite remarkable, we also published research on a machine learning model that can assess cardiovascular risk from retinal images. This shows early promising signs for a novel, non-invasive biomarker that can help clinicians better understand the health of their patients.
We have also continued our focus on pathology this year, showing how to improve the grading of prostate cancer using ML, detect metastatic breast cancer with deep learning, and developed a prototype for an augmented-reality microscope that can aid pathologists and other scientists by overlaying visual information derived from computer vision models into the visual field of the microscopist in real time.
For the past four years, we have had a significant research effort around using deep learning on electronic health records to make clinically-relevant predictions. In 2018, in collaboration with University of Chicago Medicine, UCSF and Stanford Medicine, we published work in Nature Digital Medicine showing how ML models applied to de-identified electronic medical records can make significantly higher accuracy predictions for a variety of clinically relevant tasks than the current clinical best practice. As part of this work, we developed tools to make it significantly easier to create these models even on quite different tasks and quite different underlying EHR data sets. We have open sourced software related to the Fast Healthcare Interoperability Resources (FHIR) standard that we developed in this work to help make working with medical data easier and more standardized (see this GitHub repository). We also improved the accuracy, speed and utility of our deep learning-based variant caller, DeepVariant. The team has forged ahead with partners and recently published the peer-reviewed paper in Nature Biotechnology.
When applying ML to historically-collected data, it's important to understand the populations that have experienced human and structural biases in the past and how those biases have been codified in the data. Machine-learning offers an opportunity to detect and address bias and to proactively advance health equity, which we are designing our systems to do.
We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world.
Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.
Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.
We believe that contributing openly to the broader research community is a critical part of supporting a healthy and productive research ecosystem. In addition to our open source and dataset releases, much of our research is published openly in top conference venues and journals, and we actively participate in the organization and sponsorship of conferences, all across the spectrum of different disciplines. For just a small sample, see our involvement at ICLR 2018, NAACL 2018, ICML 2018, CVPR 2018, NeurIPS 2018, ECCV 2018 and EMNLP 2018. Googlers also participated extensively in ASPLOS, HPCA, ICSE, IEEE Security & Privacy, OSDI, SIGCOMM, and many other conferences in 2018.
New Places, New Faces
In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.
Looking Forward to 2019
This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!