Tag Archives: Publications

Natural Questions: a New Corpus and Challenge for Question Answering Research

Open-domain question answering (QA) is a benchmark task in natural language understanding (NLU) that aims to emulate how people look for information, finding answers to questions by reading and understanding entire documents. Given a question expressed in natural language ("Why is the sky blue?"), a QA system should be able to read the web (such as this Wikipedia page) and return the correct answer, even if the answer is somewhat complicated and long. However, there are currently no large, publicly available sources of naturally occurring questions (i.e. questions asked by a person seeking information) and answers that can be used to train and evaluate QA models. This is because assembling a high-quality dataset for question answering requires a large source of real questions and significant human effort in finding correct answers.

To help spur research advances in QA, we are excited to announce Natural Questions (NQ), a new, large-scale corpus for training and evaluating open-domain question answering systems, and the first to replicate the end-to-end process in which people find answers to questions. NQ is large, consisting of 300,000 naturally occurring questions, along with human annotated answers from Wikipedia pages, to be used in training QA systems. We have additionally included 16,000 examples where answers (to the same questions) are provided by 5 different annotators, useful for evaluating the performance of the learned QA systems. Since answering the questions in NQ requires much deeper understanding than is needed to answer trivia questions — which are already quite easy for computers to solve — we are also announcing a challenge based on this data to help advance natural language understanding in computers.

The Data
NQ is the first dataset to use naturally occurring queries and focus on finding answers by reading an entire page, rather than extracting answers from a short paragraph. To create NQ, we started with real, anonymized, aggregated queries that users have posed to Google's search engine. We then ask annotators to find answers by reading through an entire Wikipedia page as they would if the question had been theirs. Annotators look for both long answers that cover all of the information required to infer the answer, and short answers that answer the question succinctly with the names of one or more entities. The quality of the annotations in the NQ corpus has been measured at 90% accuracy.

Our paper "Natural Questions: a Benchmark for Question Answering Research", which has been accepted for publication in Transactions of the Association for Computational Linguistics, has a full description of the data collection process. To see some more examples from the dataset, please check out the NQ website.

The Challenge
NQ is aimed at enabling QA systems to read and comprehend an entire Wikipedia article that may or may not contain the answer to the question. Systems will need to first decide whether the question is sufficiently well defined to be answerable — many questions make false assumptions or are just too ambiguous to be answered concisely. Then they will need to decide whether there is any part of the Wikipedia page that contains all of the information needed to infer the answer. We believe that the long answer identification task — finding all of the information required to infer an answer — requires a deeper level of language understanding than finding short answers once the long answers are known.

It is our hope that the release of NQ, and the associated challenge, will help spur the development of more effective and robust QA systems. We encourage the NLU community to participate and to help close the large gap between the performance of current state-of-the-art approaches and a human upper bound. Please visit the challenge website to view the leaderboard and learn more.

Source: Google AI Blog

Looking Back at Google’s Research Efforts in 2018

2018 was an exciting year for Google's research teams, with our work advancing technology in many ways, including fundamental computer science research results and publications, the application of our research to emerging areas new to Google (such as healthcare and robotics), open source software contributions and strong collaborations with Google product teams, all aimed at providing useful tools and services. Below, we highlight just some of our efforts from 2018, and we look forward to what will come in the new year. For a more comprehensive look, please see our publications in 2018.

Ethical Principles and AI
Over the past few years, we have observed major advances in AI and the positive impact it can have on our products and the everyday lives of our billions of users. For those of us working in this field, we care deeply that AI is a force for good in the world, and that it is applied ethically, and to problems that are beneficial to society. This year we published the Google AI Principles, supported with a set of responsible AI practices outlining technical recommendations for implementation. In combination they provide a framework for us to evaluate our own development of AI, and we hope that other organizations can also use these principles to help shape their own thinking. It's important to note that because this field is evolving quite rapidly, best practices in some of the principles noted, such as "Avoid creating or reinforcing unfair bias" or "Be accountable to people", are also changing and improving as we and others conduct new research in areas like ML fairness and model interpretability. This research in turn leads to advances in our products to make them more inclusive and less biased, such as our work on reducing gender biases in Google Translate, and allows the exploration and release of more inclusive image datasets and models that enable computer vision to work for the diversity of global cultures. Furthermore, this work allows us to share best practices with the broader research community with the Fairness Module in the Machine Learning Crash Course.

AI for Social Good
The potential of AI to make dramatic impacts on many areas of social and societal importance is clear. One example of how AI can be applied to real-world problems is our work on flood prediction. In collaboration with many teams across Google, this research aims to provide accurate and timely fine-grained information about the likely extent and scope of flooding, enabling those in flood-prone regions to make better decisions about how best to protect themselves and their property.
A second example is our work on earthquake aftershock prediction, where we showed that a machine learning (ML) model can predict aftershock locations much more accurately than traditional physics-based models. Perhaps more importantly, because the ML model was designed to be interpretable, scientists have been able to make new discoveries about the behavior of aftershocks, leading to not only more accurate predictions, but also new levels of understanding.

We have also seen a huge number of external parties, sometimes in collaboration with Google researchers and engineers, using open source software like TensorFlow to tackle a wide range of scientific and social problems, such as using convolutional neural networks to identify humpback whale calls, detecting new exoplanets, identifying diseased cassava plants and more.
To spur creative activity in this area, we announced the Google AI for Social Impact Challenge in collaboration with Google.org, whereby individuals and organizations can receive grants from a total of $25M of funding, along with mentorship and advice from Google research scientists, engineers and other experts as they work to take a project with large potential social impact from idea to reality.

Assistive Technology
Much of our research centered on using ML and computer science to help our users accomplish things faster and more effectively. Often, these results in collaborations with various product teams to release the fruits of this research in various product features and settings. One example is Google Duplex, a system that requires research in natural language and dialogue understanding, speech recognition, text-to-speech, user understanding and effective UI design to all come together to enable an experience whereby a user can say "Can you book me a haircut at 4 PM today?", and a virtual agent will interact on your behalf over the telephone to handle the necessary details.

Other examples include Smart Compose, a tool that uses predictive models to give relevant suggestions about how to compose emails, making the process of email composition faster and easier, and Sound Search, a technology built on the Now Playing feature that enables you to discover what song is playing fast and accurately. Additionally, Smart Linkify in Android shows how we can use an on-device ML model to make many different kinds of text that appear on the screen of your phone more useful by understanding the kind of text you're selecting (e.g. knowing that something is an address, so we can offer a shortcut to a maps or direction link).

An important focus in our research is helping to make products like the Google Assistant support more languages and allow better understanding of semantic similarity, even when very different ways of expressing the same concept or idea are used. Underlying new product capabilities like these is research we performed on improving the quality of both speech synthesis and text-to-speech for languages without much training data available.

Quantum computing
Quantum computing is an emerging paradigm for computing that promises the ability to solve challenging problems that no classical computer can solve. We have been actively pursuing research in this area for the past several years, and we believe the field is on the cusp of demonstrating this capability for at least one problem (so-called quantum supremacy), which will be a watershed event for the field. Over the last year we produced a number of exciting new results, including the development of Bristlecone, a new 72-qubit quantum computing device, which scales the size of problems that can be tackled in quantum computers in the run-up towards quantum supremacy.
A Bristlecone chip being installed by Research Scientist Marissa Giustina at the Quantum AI Lab in Santa Barbara.
We also released Cirq, an open source programming framework for quantum computers, and explored how quantum computers could be used for neural networks. Finally, we shared our experience and techniques for understanding performance fluctuations in quantum processors, and shared some thoughts on how quantum computers might be useful as a computational substrate for neural networks. We're looking forward to exciting results in the quantum computing space in 2019!

Natural Language Understanding
Natural language research at Google had an exciting 2018, with a mix of basic research as well as product-focused collaborations. We developed improvements to our Transformer work from 2017, resulting in a new parallel-in-time version of the model called the Universal Transformer that shows strong gains across a number of natural language tasks including translation and linguistic reasoning. We also developed BERT, the first deeply bidirectional, unsupervised language representation, pre-trained using only a plain text corpus, that can then be fine-tuned on a wide variety of natural language tasks using transfer learning. BERT shows significant improvements over previous state-of-the-art results on 11 natural language tasks.
BERT also improves the state-of-the-art by 7.6% absolute on the very challenging GLUE benchmark, a set of 9 diverse Natural Language Understanding (NLU) tasks.
In addition to collaborating with various research teams to enable Smart Compose and Duplex (discussed previously), we worked to make the Google Assistant handle multilingual use cases better, with the goal of making the Assistant naturally conversational for all users.

Our perception research tackles the hard problems of allowing computers to understand images, sounds, music and video, as well as providing more powerful tools for image capture, compression, processing, creative expression, and augmented reality. In 2018, our technology improved Google Photos' ability to organize the content that users most care about, such as people and pets. Google Lens and the Assistant enabled users to learn about the natural world, answer questions in real-time, and do more with Lens in Google Images. A key aspect of the Google AI mission is to empower others to benefit from our technology, and we've made a lot of progress this year in improving capabilities and building blocks that are parts of Google APIs. Examples include improved and new capabilities in vision and video in Cloud ML APIs and face-related on-device building blocks through ML Kit.
Google Lens can help you learn more about the world around you. Here, Lens identifies the breed of this dog. Learn more in this blog post.
In 2018, our contributions to academic research included advances in deep learning for 3D scene understanding, such as stereo magnification, which enables synthesizing novel photorealistic views of a scene. Our ongoing research on better understanding images and video enables users to find, organize, enhance and improve images and video in Google products such as Photos, YouTube, Search and more. In 2018, notable advances included a fast bottom-up model for joint pose estimation and person instance segmentation, a system for visualizing complex motion, a system which models spatio-temporal relations between people and objects and improvements in video action recognition based on distillation and 3D convolutions.

In the audio domain, we proposed a method for unsupervised learning of semantic audio representations as well as significant improvements to expressive and human-like speech synthesis. Multimodal perception is an increasingly important research topic. Looking to Listen combines visual and auditory cues in an input video to isolate and enhance the speech of desired speakers in a video. This technology could support a range of applications, from speech enhancement and recognition in videos, through video conferencing, to improved hearing aids, especially in situations where multiple people are speaking.

Enabling perception on resource-constrained platforms has becoming increasingly important. MobileNetV2 is Google's next-generation mobile computer vision model and our MobileNets are used widely across academia and industry. MorphNet proposes an efficient method for learning the structure of deep networks that results in across-the-board performance improvements on image and audio models while respecting computational resource constraints, and more recent work on automatic generation of mobile network architectures demonstrates that even higher performance is possible.

Computational Photography
The improvements in quality and versatility of cell phone cameras over the last few years has been nothing short of remarkable. A modest part of this is improvements in the actual physical sensors used in phones, but a much greater part of it is due to advances in the scientific field of computational photography. Our research teams publish their new research techniques, and work closely with the Android and Consumer Hardware teams at Google to deliver this research into your hands in the latest Pixel and Android phones and other devices. In 2014, we introduced HDR+, a technique whereby the camera captures a burst of frames, aligns the frames in software, and merges them together with computational software. Originally in the HDR+ work, this was to enable pictures to have higher dynamic range than was possible with a single exposure. However, capturing a burst of frames and then performing computational analysis of these frames is a general approach that has enabled many advances in cameras in 2018. For example, it allowed the development of Motion Photos in Pixel 2 and the Augmented Reality mode in Motion Stills.
Motion photos on the Pixel 2 in Google Photos. For more examples, check out this Google Photos album.
Augmented chicken family with Motion Stills AR mode.
This year, one of our primary efforts in computational photography research was to create a new capability called Night Sight, which enables Pixel phone cameras to "see in the dark", earning praise by both press and users. Of course, Night Sight is just one of the new software-enabled camera features our teams have developed to help you take the perfect photo, including using ML to provide better portrait mode shots, seeing better and further with Super Res Zoom and capturing special moments with Top Shot and Google Clips.
Left: iPhone XS (full resolution image here). Right: Pixel 3 Night Sight (full resolution image here).
Algorithms and Theory
Algorithms are the backbone of Google systems and touch all our products, from routing algorithms behind Google trips to consistent hashing for Google cloud. Over the past year, we continued our research in algorithms and theory covering a wide range of areas from theoretical foundations to applied algorithms, and from graph mining to privacy-preserving computation. Our work in optimization spans areas from studying continuous optimization for machine learning to distributed combinatorial optimization. In the former area, our work on studying convergence of stochastic optimization algorithms for training neural networks (which won an ICLR 2018 Best Paper Award) exhibited issues with popular gradient-based optimization methods (such as some variants of ADAM), but provided a solid foundation for new gradient-based optimization methods.
Performance comparison of ADAM and AMSGRAD on a synthetic example of a simple one dimensional convex problem inspired by our examples of non-convergence. The first two plots (left and center) are for the online setting and the the last one (right) is for the stochastic setting.
In distributed optimization, we worked to improve the round and communication complexity of well-studied combinatorial optimization problems such as matchings in graphs via round compression and via core-sets, as well as submodular maximization, and k-core decomposition. On the more applied side, we developed algorithmic techniques for solving set cover at scale via sketching and for solving balanced partitioning and hierarchical clustering for graphs with trillions of edges. Our work on online delivery services was nominated for the best paper award at WWW'18. Finally, our open source optimization OR-tools platform won 4 gold medals at the 2018 Minizinc constraint programming competition.

In algorithmic choice theory, we have proposed new models and investigated the problems of reconstruction and learning a mixture of multinomial logits. We also studied the classes of functions learnable by neural networks and how to use machine-learned oracles to improve classic online algorithms.

Understanding learning techniques with strong privacy guarantees is of great importance for us at Google. In this context, we developed two new means of analyzing how differential privacy can be amplified by iteration and by shuffling. We also applied differential privacy techniques to design incentive-aware learning methods that are robust against gaming. Such learning techniques have applications in efficient online market design. Our new research in the area of market algorithms include also techniques to help advertisers test incentive compatibility of ad auctions, and optimizing ad refresh for in-app advertising. We also pushed the boundaries of state-of-the-art dynamic mechanisms for repeated auctions, and presented dynamic auctions that are robust against lack of prediction of future, against noisy forecasts, or against heterogenous buyer behaviour, and extend our results to dynamic double auctions. Finally, in the context of robustness in online optimization and online learning, we developed new online allocation algorithms for stochastic input with traffic spikes and new bandit algorithms robust to corrupted data.

Software Systems
A large part of our research on software systems continues to relate to building machine-learning models and to TensorFlow in particular. For example, we published on the design and implementation of dynamic control flow for TensorFlow 1.0. Some of our newer research introduces a system that we call Mesh TensorFlow, which makes it easy to specify large-scale distributed computations with model parallelism, sometimes with billions of parameters. As another example, we released a library for scalable deep neural ranking using TensorFlow.
The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.
We also released JAX, an accelerator-backed variant of NumPy that supports automatic differentiation of Python functions to arbitrary order. While JAX is not part of TensorFlow, it leverages some of the same underlying software infrastructure (e.g. XLA), and some of its ideas and algorithms have been helpful to our TensorFlow projects. Finally, we continued our research on the security and privacy of machine learning, and our development of open source frameworks for safety and privacy in AI systems, such as CleverHans and TensorFlow Privacy.

Another important research direction for us is the application of ML to software systems, at many levels of the stack. For instance, we continued work on placement of computations onto devices, with a hierarchical model, and we contributed to learning memory access patterns. We also continued to explore how learned indices could be used to replace traditional index structures in database systems and storage systems. As I wrote last year, we believe that we are just scratching the surface in terms of the use of machine learning in computer systems.
The Hierarchical Planner's placement of a NMT (4-layer) model. White denotes CPU and the four colors each represent one of the GPUs. Note that every step of every layer is allocated across multiple GPUs. This placement is 53.7% faster than that generated by a human expert.
In 2018 we learned about Spectre and Meltdown, new classes of serious security vulnerabilities in modern computer processors, thanks to Google's Project Zero team in collaboration with others. These and related vulnerabilities will keep computer architecture researchers quite busy. In our continuing efforts to model CPU behavior, our Compiler Research team integrated their tool for measuring machine instruction latency and port pressure into LLVM, making possible better compilation decisions.

Google products, our Cloud offerings and inference for machine learning models depend critically on the ability to provide large-scale, reliable, efficient technical infrastructure for computing, storage and networking. A few research highlights from the past year include the evolution of Google's Software Defined Networking WAN, a stand-alone, federated query processing platform that executes SQL queries against data stored in different file-based formats, in many storage systems (BigTable, Spanner, Google Spreadsheets, etc.) and a report on our extensive use of code review, investigating the motivations behind code review at Google, current practices, and developers' satisfaction and challenges.

Running a large-scale web service such as content hosting, requires load balancing with stability in a dynamic environment. We developed a consistent hashing scheme with tight provable guarantees on the maximum load of each server, and deployed it for our cloud customers in Google Cloud Pub/Sub. After making an earlier version of our paper available, engineers at Vimeo found the paper, implemented and open sourced it in haproxy, and used it for their load balancing project at Vimeo. The results were dramatic: applying these algorithmic ideas helped them decrease the cache bandwidth by a factor of almost 8, eliminating a scaling bottleneck.

AutoML, also known as meta-learning, is the use of machine learning to automate some aspects of machine learning. We have been performing research in this space for many years, and the long-term goal is to develop learning systems that can learn to take a new problem and solve it automatically, using insights and capabilities derived from other problems that have been previously solved. Our earlier work in this space has mostly used reinforcement learning, but we are also interested in the use of evolutionary algorithms. Last year we showed how evolutionary algorithms can be used to automatically discover state-of-the-art neural network architectures for a variety of visual tasks. We also explored how reinforcement learning can be applied to other problems than just neural network architecture search, showing that it can be used to 1) automatically generate image transformation sequences that improve the accuracy of a wide variety of image models, and 2) find new symbolic optimization expressions that are more effective than the commonly used optimization update rules. Our work on AdaNet showed how to have a fast and flexible AutoML algorithm with learning guarantees.
AdaNet adaptively growing an ensemble of neural networks. At each iteration, it measures the ensemble loss for each candidate, and selects the best one to move onto the next iteration.
Another focus for us was on automatically discovering neural network architectures that are computationally efficient, so that they can run in environments such as mobile phones or autonomous vehicles that have tight constraints on either computational resources or on inference time. For this, we showed that combining the accuracy of a model with its inference computation time in the reward function for a reinforcement learning architecture search can find models that are highly accurate while meeting particular performance constraints. We also explored using ML to learn to automatically compress ML models to have fewer parameters and use less computational resources.

Tensor Processing Units (TPUs) are Google's internally-developed ML hardware accelerators, designed from the ground up to power both training and inference at scale. TPUs have enabled Google research breakthroughs such as BERT (discussed previously), and they also allow researchers around the world to build on Google research via open source and to pursue new breakthroughs of their own. For example, anyone can fine-tune BERT on TPUs for free via Colab, and the TensorFlow Research Cloud has given thousands of researchers the opportunity to benefit from even larger amounts of free Cloud TPU computing power. We've also made multiple generations of TPU hardware commercially available as Cloud TPUs, including ML supercomputers called Cloud TPU Pods that make large-scale ML training much more accessible. Internally, in addition to enabling faster advances in ML research, TPUs have driven major improvements across Google's core products, including Search, YouTube, Gmail, Google Assistant, Google Translate, and many others. We look forward to seeing ML teams both here at Google and elsewhere achieve even more with ML via the unprecedented computing scale that TPUs provide.
An individual TPU v3 device (left) and a portion of a TPU v3 Pod (right). TPU v3 is the latest generation of Google's Tensor Processing Unit (TPU) hardware. Available to external customers as Cloud TPU v3, these systems are liquid-cooled for maximum performance (computer chips + liquid = exciting!), and a full TPU v3 Pod can apply more than 100 petaflops of computational power to the world's largest ML problems.
Open Source Software and Datasets
Releasing open source software and the creation of new public datasets are two major ways that we contribute to the research and software engineering communities. One of our largest efforts in this space is TensorFlow, a widely popular system for ML computations that we released in November 2015. We celebrated TensorFlow's third birthday in 2018, and during this time, TensorFlow has been downloaded more than 30M times, with over 1700 contributors adding 45,000 commits. In 2018, TensorFlow had eight major releases and added major capabilities such as eager execution and distribution strategies. We launched public design reviews engaging the community in the development process, and we engaged contributors via special interest groups. With the launches of associated products such as TensorFlow Lite, TensorFlow.js and TensorFlow Probability, the TensorFlow ecosystem grew dramatically in 2018.

We are happy that TensorFlow has the strongest Github user retention of the top machine learning and deep learning frameworks. The TensorFlow team is also working to address Github issues faster and provide a smooth path for external contributors. In research, we continue to power much of the world's machine learning and deep learning research on a published paper basis according to Google Scholar data. TensorFlow Lite is now on more than 1.5B devices globally after being available for just one year. Additionally, TensorFlow.js is the number one ML framework for JavaScript; in the nine months since launch, it had over 2M Content Delivery Network (CDN) hits, 250K downloads and more than 10,000 stars on Github.

In addition to continued work on existing open source ecosystems, in 2018 we introduced a new framework for flexible and reproducible reinforcement learning, new visualization tools to rapidly understand the characteristics of a dataset (without needing to write any code), added a high-level library for expressing machine learning problems that involve learning-to-rank (the process of ordering a list of items in a way that maximizes the utility of the entire list, applicable across domains that include search engines, recommender systems, machine translation, dialogue systems and even computational biology), released a framework for fast and flexible AutoML solutions with learning guarantees, a library for doing in-browser realtime t-SNE visualizations using TensorFlow.js and added FHIR tools and software for working with electronic healthcare data (discussed in the healthcare section of this post).
Real-time evolution of the tSNE embedding for the complete MNIST dataset. The dataset contains images of 60,000 handwritten digits. You can find a live demo here.
Public datasets are often a great source of inspiration that lead to great progress across many fields, since they give the broader community both access to interesting data and problems as well as a healthy competitive drive to achieve better results on a variety of tasks. This year we were happy to release Google Dataset Search, a new tool for finding public datasets from all of the web. Over the years we have also curated and released many new, novel datasets, including everything from millions of general annotated images or videos, to a crowd-source Bengali dataset for speech recognition to robot arm grasping datasets and more. In 2018, we added even more datasets to that list.
Pictures from India & Singapore added to Open Images Extended using the Crowdsource app.
We released Open Images V4, a dataset containing 15.4M bounding-boxes for 600 categories on 1.9M images, as well as 30.1M human-verified image-level labels from 19,794 categories. We also extended this dataset to add more diversity of people and scenes from all over the world, by adding 5.5M generated annotations provided by tens of thousands of users from all over the world using crowdsource.google.com. We released the Atomic Visual Actions (AVA) dataset that provides audiovisual annotations of video for improving the state of the art in understanding human actions and speech in video. We also announced an updated YouTube-8M, and the 2nd YouTube-8M Large-Scale Video Understanding Challenge and Workshop. The HDR+ Burst Photography Dataset aims to enable a wide variety of research in the field of computational photography, and Google-Landmarks was a new dataset and challenge for landmark recognition. And while not a dataset release, we explored techniques that can enable faster creation of visual datasets using Fluid Annotation, an exploratory ML-powered interface for faster image annotation.
Visualization of the fluid annotation interface in action on image from COCO dataset. Image credit: gamene, original image.
From time-to-time, we also help establish new kinds of challenges for the research community, so that we can all work together on solving difficult research problems. Often these are done with the release of a new dataset, but not always. This year, we established new challenges around the Inclusive Images Challenge, to work towards making more robust models that are free from many kinds of biases, the iNaturalist 2018 Challenge which aims to enable computers' fine-grained discrimination of visual categories (such as species of plants in an image), a Kaggle "Quick, Draw!" Doodle Recognition Challenge to create a better classifier for the QuickDraw challenge game, and Conceptual Captions, a larger-scale image captioning dataset and challenge aimed at enabling better image captioning model research.

In 2018, we made significant progress towards our goal of understanding how ML can teach robots how to act in the world, achieving a new milestone in the ability to teach robots to grasp novel objects (best systems paper at CoRL'18), and using it to learn about objects without human supervision. We've also made progress in learning robot motion by combining ML and sampling-based methods (best paper in service robotics at ICRA'18) and learning robot geometry for faster planning. We've made great strides in our ability to better perceive the structure of the world from autonomous observation. For the first time, we've been able to successfully train deep reinforcement learning models online on real robots, and are finding new, theoretically grounded ways, to learn stable approaches to robot control.
Applications of AI to Other Fields
In 2018, we have applied ML to a wide variety of problems in the physical and biological sciences. Using ML, we can supply scientists with the equivalent of hundreds or thousands of research assistants digging through data, which then frees the scientists to become more creative and productive.

Our Nature Methods paper on high-precision automated reconstruction of neurons proposed a new model that improves the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques.
Our algorithm in action as it traces a single neurite in 3d in a songbird brain.
Some other examples of applying ML to science include:
A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.
For the past several years, we have been applying ML to health, an area that affects every one of us, and is also one where we believe ML can make a tremendous difference by augmenting the intuitions and experience of healthcare professionals. Our general approach in this space is to collaborate with healthcare organizations to tackle basic research problems (using feedback from clinical experts to make our results more robust), and then publish the results in well-respected, peer-reviewed scientific and clinical journals. Once the research has been clinically and scientifically validated, we then conduct user and HCI research to understand how we can deploy this in real-world clinical settings. In 2018, we expanded our efforts across the broad space of computer-aided diagnostics to clinical task predictions as well.

At the end of 2016, we published work showing that a model trained to assess retinal fundus images for signs of diabetic retinopathy was able to perform on-par to slightly-better than U.S. medical-board-certified ophthalmologists at this task in a retrospective study. In 2018, we were able to show that by having the training images labeled by retinal specialists and by using an adjudicated protocol (where multiple retinal specialists convene and have to arrive at a single collective assessment for each fundus image), we could arrive at a model that is on-par with retinal specialists. Later, we published an evaluation that showed how pairing ophthalmologists and this ML model allow them to make more accurate decisions than either alone. We have deployed this diabetic retinopathy detection system in partnership with our Alphabet colleagues at Verily at over 10 sites including Aravind Eye Hospitals in India and at Rajavithi Hospital affiliated with the Ministry of Health in Thailand.
On the left is a retinal fundus image graded as having moderate DR ("Mo") by an adjudication panel of ophthalmologists (ground truth). On the top right is an illustration of the predicted scores ("N" = no DR, "Mi" = Mild DR, "Mo" = Moderate DR) from the model. On the bottom right is the set of scores given by physicians without assistance ("Unassisted") and those who saw the model's predictions ("Grades Only").
In work that medical and eye specialists found quite remarkable, we also published research on a machine learning model that can assess cardiovascular risk from retinal images. This shows early promising signs for a novel, non-invasive biomarker that can help clinicians better understand the health of their patients.

We have also continued our focus on pathology this year, showing how to improve the grading of prostate cancer using ML, detect metastatic breast cancer with deep learning, and developed a prototype for an augmented-reality microscope that can aid pathologists and other scientists by overlaying visual information derived from computer vision models into the visual field of the microscopist in real time.

For the past four years, we have had a significant research effort around using deep learning on electronic health records to make clinically-relevant predictions. In 2018, in collaboration with University of Chicago Medicine, UCSF and Stanford Medicine, we published work in Nature Digital Medicine showing how ML models applied to de-identified electronic medical records can make significantly higher accuracy predictions for a variety of clinically relevant tasks than the current clinical best practice. As part of this work, we developed tools to make it significantly easier to create these models even on quite different tasks and quite different underlying EHR data sets. We have open sourced software related to the Fast Healthcare Interoperability Resources (FHIR) standard that we developed in this work to help make working with medical data easier and more standardized (see this GitHub repository). We also improved the accuracy, speed and utility of our deep learning-based variant caller, DeepVariant. The team has forged ahead with partners and recently published the peer-reviewed paper in Nature Biotechnology.

When applying ML to historically-collected data, it's important to understand the populations that have experienced human and structural biases in the past and how those biases have been codified in the data. Machine-learning offers an opportunity to detect and address bias and to proactively advance health equity, which we are designing our systems to do.

Research Outreach
We interact with the external research community in many different ways, including faculty engagement and student support. We are proud to host hundreds of undergraduate, M.S. and Ph.D. students as interns during the academic year, as well as providing multi-year Ph.D. fellowships to students throughout North America, Europe, and the Middle East. In addition to financial support, each of the fellowship recipients is assigned one or more Google researchers as a mentor, and we bring together all the fellows for an annual Google Ph.D. Fellowship Summit, where they are exposed to state-of-the-art research being pursued at Google and given the opportunity to network with Google's researchers as well as other PhD Fellows from around the world.
Complementing this fellowship program is the Google AI Residency, a way of allowing people who want to learn to conduct deep learning research to spend a year working alongside and being mentored by researchers at Google. Now in its third year, residents are embedded in various teams across Google's global offices, pursuing research in areas such as machine learning, perception, algorithms and optimization, language understanding, healthcare and much more. With applications having just closed for the fourth year of this program, we are excited to see the research the new cohort of residents will pursue in 2019.

Each year, we also support a number of faculty members and students on research projects through our Google Faculty Research Awards program. In 2018, we also continued to host workshops at Google locations for faculty and graduate students in particular areas, including a workshop on AI/ML Research and Practice hosted in our Bangalore, India office, an Algorithms & Optimization Workshop hosted in our Zürich office, a workshop on healthcare applications of ML hosted in Sunnyvale and a workshop on Fairness and Bias in ML hosted in our Cambridge, MA office.

We believe that contributing openly to the broader research community is a critical part of supporting a healthy and productive research ecosystem. In addition to our open source and dataset releases, much of our research is published openly in top conference venues and journals, and we actively participate in the organization and sponsorship of conferences, all across the spectrum of different disciplines. For just a small sample, see our involvement at ICLR 2018, NAACL 2018, ICML 2018, CVPR 2018, NeurIPS 2018, ECCV 2018 and EMNLP 2018. Googlers also participated extensively in ASPLOS, HPCA, ICSE, IEEE Security & Privacy, OSDI, SIGCOMM, and many other conferences in 2018.

New Places, New Faces
In 2018, we were excited to welcome many new people with a wide range of backgrounds into our research organization. We announced our first AI research office in Africa, located in Accra, Ghana. We expanded our AI research presence in Paris, Tokyo and Amsterdam, and opened a research lab in Princeton. We continue to hire talented people into our offices all over the world, and you can learn more about joining our research efforts here.

Looking Forward to 2019
This blog post summarizes just a small fraction of the research performed in 2018. As we look back on 2018, we're excited (and proud!) of the breadth and depth of what we have accomplished. In 2019, we look forward to having even more impact on Google's direction and products, as well as on the broader research and engineering community!

Source: Google AI Blog

Exploring Quantum Neural Networks

Since its inception, the Google AI Quantum team has pushed to understand the role of quantum computing in machine learning. The existence of algorithms with provable advantages for global optimization suggest that quantum computers may be useful for training existing models within machine learning more quickly, and we are building experimental quantum computers to investigate how intricate quantum systems can carry out these computations. While this may prove invaluable, it does not yet touch on the tantalizing idea that quantum computers might be able to provide a way to learn more about complex patterns in physical systems that conventional computers cannot in any reasonable amount of time.

Today we talk about two recent papers from the Google AI Quantum team that make progress towards understanding the power of quantum computers for learning tasks. The first constructs a quantum model of neural networks to investigate how a popular classification task might be carried out on quantum processors. In the second paper, we show how peculiar features of quantum geometry change the strategies for training these networks in comparison to their classical counterparts, and offer guidance towards more robust training of these networks.

In “Classification with Quantum Neural Networks on Near Term Processors”, we construct a model of quantum neural networks (QNNs) that is specifically designed to work on quantum processors that are expected to be available in the near term. While the current work is primarily theoretical, their structure facilitates implementation and testing on quantum computers in the immediate future. These QNNs can be adapted through supervised learning of labeled data, and we show that it is possible to train a QNN to classify images in the famous MNIST dataset. Follow up work in this area with larger quantum devices may pit the ability of quantum networks to learn patterns against popular classical networks.
Quantum Neural Network for classification. Here we depict a sample quantum neural network, where in contrast to hidden layers in classical deep neural networks, the boxes represent entangling actions, or “quantum gates”, on qubits. In a superconducting qubit setup this could be enacted through a microwave control pulse corresponding to each box.
In “Barren Plateaus in Quantum Neural Network Training Landscapes”, we focus on the training of quantum neural networks, and probe questions related to a key difficulty in classical neural networks, which is the problem of vanishing or exploding gradients. In conventional neural networks, a good unbiased initial guess for the neuron weights often involves randomization, although there can be some difficulties as well. Our paper shows that peculiar features of quantum geometry unequivocally prevent this from being a good strategy in the quantum case, instead taking you to barren plateaus. The implications of this work may guide future strategies for initializing and training quantum neural networks.
QNN vanishing gradient: concentration of measure in high dimensional spaces. In very high dimensional spaces, such as those explored by quantum computers, the vast majority of states counterintuitively sit near the equator of the hypersphere (left). This means that any smooth function on this space will tend to take a value very close to its mean with overwhelming probability when selected at random (right).
This research sets the stage for improvements in both the construction and training of quantum neural networks. In particular, experimental realizations of quantum neural networks using hardware at Google will enable rapid exploration of quantum neural networks in the near term. We hope that the insights from the geometry of these states will lead to new algorithms to train these networks that will be essential to unlocking their full potential.

Source: Google AI Blog

The NeurIPS 2018 Test of Time Award: The Trade-Offs of Large Scale Learning

Progress in machine learning (ML) is happening so rapidly, that it can sometimes feel like any idea or algorithm more than 2 years old is already outdated or superseded by something better. However, old ideas sometimes remain relevant even when a large fraction of the scientific community has turned away from them. This is often a question of context: an idea which may seem to be a dead end in a particular context may become wildly successful in a different one. In the specific case of deep learning (DL), the growth of both the availability of data and computing power renewed interest in the area and significantly influenced research directions.

The NIPS 2008 paper “The Trade-Offs of Large Scale Learning” by Léon Bottou (then at NEC Labs, now at Facebook AI Research) and Olivier Bousquet (Google AI, Zürich) is a good example of this phenomenon. As the recent recipient of the NeurIPS 2018 Test of Time Award, this seminal work investigated the interplay between data and computation in ML, showing that if one is limited by computing power but can make use of a large dataset, it is more efficient to perform a small amount of computation on many individual training examples rather than to perform extensive computation on a subset of the data. This demonstrated the power of an old algorithm, stochastic gradient descent, which is nowadays used in pretty much all applications of DL.

Optimization and the Challenge of Scale
Many ML algorithms can be thought of as the combination of two main ingredients:
  • A model, which is a set of possible functions that will be used to fit the data.
  • An optimization algorithm which specifies how to find the best function in that set.
Back in the 90’s the datasets used in ML were much smaller than the ones in use today, and while artificial neural networks had already led to some successes, they were considered hard to train. In the early 2000’s, with the introduction of Kernel Machines (SVMs in particular), neural networks went out of fashion. Simultaneously, the attention shifted away from the optimization algorithms that had been used to train neural networks (stochastic gradient descent) to focus on those used for kernel machines (quadratic programming). One important difference being that in the former case, training examples are used one at a time to perform gradient steps (this is called “stochastic”), while in the latter case, all training examples are used at each iteration (this is called “batch”).

As the size of the training sets increased, the efficiency of optimization algorithms to handle large amounts of data became a bottleneck. For example, in the case of quadratic programming, running time scales at least quadratically in the number of examples. In other words, if you double your training set size, your training will take at least 4 times longer. Hence, lots of effort went into trying to make these algorithms scale to larger training sets (see for example Large Scale Kernel Machines).

People who had experience with training neural networks knew that stochastic gradient descent was comparably easier to scale to large datasets, but unfortunately its convergence is very slow (it takes lots of iterations to reach an accuracy comparable to that of a batch algorithm), so it wasn’t clear that this would be a solution to the scaling problem.

Stochastic Algorithms Scale Better
In the context of ML, the number of iterations needed to optimize the cost function is actually not the main concern: there is no point in perfectly tuning your model since you will essentially “overfit” to the training data. So why not reduce the computational effort that you put into tuning the model and instead spend the effort processing more data?

The work of Léon and Olivier provided a formal study of this phenomenon: by considering access to a large amount of data and assuming the limiting factor is computation, they showed that it is better to perform a minimal amount of computation on each individual training example (thus processing more of them) rather than performing extensive computation on a smaller amount of data.

In doing so, they also demonstrated that among various possible optimization algorithms, stochastic gradient descent is the best. This was confirmed by many experiments and led to a renewed interest in online optimization algorithms which are now in extensive use in ML.

Mysteries Remain
In the following years, many variants of stochastic gradient descent were developed both in the convex case and in the non-convex one (particularly relevant for DL). The most common variant now is the so-called “mini-batch” SGD where one considers a small number (~10-100) of training examples at each iteration, and performs several passes over the training set, with a couple of clever tricks to scale the gradient appropriately. Most ML libraries provide a default implementation of such an algorithm and it is arguably one of the pillars of DL.

While this analysis provided a solid foundation for understanding the properties of this algorithm, the amazing and sometimes surprising successes of DL continue to raise many more questions for the scientific community. In particular, the role of this algorithm in the generalization properties of deep networks has been repeatedly demonstrated but is still poorly understood. This means that a lot of fascinating questions are yet to be explored which could lead to a better understanding of the algorithms currently in use and the development of even more efficient algorithms in the future.

The perspective proposed by Léon and Olivier in their collaboration 10 years ago provided a significant boost to the development of the algorithm that is nowadays the workhorse of ML systems that benefit our lives daily, and we offer our sincere congratulations to both authors on this well-deserved award.

Source: Google AI Blog

Google at NeurIPS 2018

This week, Montréal hosts the 32nd annual Conference on Neural Information Processing Systems (NeurIPS 2018), the biggest machine learning conference of the year. The conference includes invited talks, demonstrations and presentations of some of the latest in machine learning research. Google will have a strong presence at NeurIPS 2018, with more than 400 Googlers attending in order to contribute to, and learn from, the broader academic research community via talks, posters, workshops, competitions and tutorials. We will be presenting work that pushes the boundaries of what is possible in language understanding, translation, speech recognition and visual & audio perception, with Googlers co-authoring nearly 100 accepted papers (see below).

At the forefront of machine learning, Google is actively exploring virtually all aspects of the field spanning both theory and applications. This research is often inspired by real product needs but increasingly more often driven by scientific curiosity. Given the range of research projects that we pursue, we have found it useful to define a new framework which helps crystalize the goals of projects and allows us to measure progress and success in appropriate ways. Our contributions to NeurIPS and to the broader research community in general are integral to our research mission.

If you are attending NeurIPS 2018, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving the world's most challenging research problems, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented in the list below (Googlers highlighted in blue).

Google is a Platinum Sponsor of NeurIPS 2018.

NeurIPS Foundation Board
Corinna Cortes, John C. Platt, Fernando Pereira

NeurIPS Organizing Committee
General Chair: Samy Bengio
Program Co-Chair: Hugo Larochelle
Party Chair: Douglas Eck
Diversity and Inclusion Co-Chair: Katherine A. Heller

NeurIPS Program Committee
Senior Area Chairs include:Angela Yu, Claudio Gentile, Cordelia Schmid, Corinna Cortes, Csaba Szepesvari, Dale Schuurmans, Elad Hazan, Mehryar Mohri, Raia Hadsell, Satyen Kale, Yishay Mansour, Afshin Rostamizadeh, Alex Kulesza

Area Chairs include: Amin Karbasi, Amir Globerson, Amit Daniely, Andras Gyorgy, Andriy Mnih, Been Kim, Branislav Kveton, Ce Liu, D Sculley, Danilo Rezende, Danny TarlowDavid Balduzzi, Denny Zhou, Dilan Gorur, Dumitru Erhan, George Dahl, Graham Taylor, Ian Goodfellow, Jasper Snoek, Jean-Philippe Vert, Jia Deng, Jon Shlens, Karen Simonyan, Kevin Swersky, Kun Zhang, Lihong Li, Marc G. Bellemare, Marco Cuturi, Maya Gupta, Michael BowlingMichalis Titsias, Mohammad Norouzi, Mouhamadou Moustapha Cisse, Nicolas Le Roux, Remi Munos, Sanjiv Kumar, Sanmi Koyejo, Sergey Levine, Silvia Chiappa, Slav PetrovSurya Ganguli, Timnit Gebru, Timothy Lillicrap, Viren Jain, Vitaly Feldman, Vitaly Kuznetsov

Workshops Program Committee includes: Mehryar Mohri, Sergey Levine

Accepted Papers
3D-Aware Scene Manipulation via Inverse Graphics
Shunyu Yao, Tzu Ming Harry Hsu, Jun-Yan Zhu, Jiajun Wu, Antonio Torralba, William T. Freeman, Joshua B. Tenenbaum

A Retrieve-and-Edit Framework for Predicting Structured Outputs
Tatsunori Hashimoto, Kelvin Guu, Yonatan Oren, Percy Liang

Adversarial Attacks on Stochastic Bandits
Kwang-Sung Jun, Lihong Li, Yuzhe Ma, Xiaojin Zhu

Adversarial Examples that Fool both Computer Vision and Time-Limited Humans
Gamaleldin F. Elsayed, Shreya Shankar, Brian Cheung, Nicolas Papernot, Alex Kurakin, Ian Goodfellow, Jascha Sohl-Dickstein

Adversarially Robust Generalization Requires More Data
Ludwig Schmidt, Shibani Santurkar, Dimitris Tsipras, Kunal Talwar, Aleksander Madry

Are GANs Created Equal? A Large-Scale Study
Mario Lucic, Karol Kurach, Marcin Michalski, Olivier Bousquet, Sylvain Gelly

Collaborative Learning for Deep Neural Networks
Guocong Song, Wei Chai

Completing State Representations using Spectral Learning
Nan Jiang, Alex Kulesza, Santinder Singh

Content Preserving Text Generation with Attribute Controls
Lajanugen Logeswaran, Honglak Lee, Samy Bengio

Context-aware Synthesis and Placement of Object Instances
Donghoon Lee, Sifei Liu, Jinwei Gu, Ming-Yu Liu, Ming-Hsuan Yang, Jan Kautz

Co-regularized Alignment for Unsupervised Domain Adaptation
Abhishek Kumar, Prasanna Sattigeri, Kahini Wadhawan, Leonid Karlinsky, Rogerlo Feris, William T. Freeman, Gregory Wornell

cpSGD: Communication-efficient and differentially-private distributed SGD
Naman Agarwal, Ananda Theertha Suresh, Felix Yu, Sanjiv Kumar, H. Brendan Mcmahan

Data Center Cooling Using Model-Predictive Control
Nevena Lazic, Craig Boutilier, Tyler Lu, Eehern Wong, Binz Roy, MK Ryu, Greg Imwalle

Data-Efficient Hierarchical Reinforcement Learning
Ofir Nachum, Shixiang Gu, Honglak Lee, Sergey Levine

Deep Attentive Tracking via Reciprocative Learning
Shi Pu, Yibing Song, Chao Ma, Honggang Zhang, Ming-Hsuan Yang

Generalizing Point Embeddings Using the Wasserstein Space of Elliptical Distributions
Boris Muzellec, Marco Cuturi

GLoMo: Unsupervised Learning of Transferable Relational Graphs
Zhilin Yang, Jake (Junbo) Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann LeCun

GroupReduce: Block-Wise Low-Rank Approximation for Neural Language Model Shrinking
Patrick Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh

Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections
Xin Zhang, Armando Solar-Lezama, Rishabh Singh

Learning Hierarchical Semantic Image Manipulation through Structured Representations
Seunghoon Hong, Xinchen Yan, Thomas Huang, Honglak Lee

Learning Temporal Point Processes via Reinforcement Learning
Shuang Li, Shuai Xiao, Shixiang Zhu, Nan Du, Yao Xie, Le Song

Learning Towards Minimum Hyperspherical Energy
Weiyang Liu, Rongmei Lin, Zhen Liu, Lixin Liu, Zhiding Yu, Bo Dai, Le Song

Mesh-TensorFlow: Deep Learning for Supercomputers
Noam Shazeer, Youlong Cheng, Niki Parmar, Dustin Tran, Ashish Vaswani, Penporn Koanantakool, Peter Hawkins, HyoukJoong Lee, Mingsheng Hong, Cliff Young, Ryan Sepassi, Blake Hechtman

MiME: Multilevel Medical Embedding of Electronic Health Records for Predictive Healthcare
Edward Choi, Cao Xiao, Walter F. Stewart, Jimeng Sun

Searching for Efficient Multi-Scale Architectures for Dense Image Prediction
Liang-Chieh Chen, Maxwell D. Collins, Yukun Zhu, George Papandreou, Barret Zoph, Florian Schroff, Hartwig Adam, Jonathon Shlens

SplineNets: Continuous Neural Decision Graphs
Cem Keskin, Shahram Izadi

Task-Driven Convolutional Recurrent Models of the Visual System
Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins

To Trust or Not to Trust a Classifier
Heinrich Jiang, Been Kim, Melody Guan, Maya Gupta

Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu

Algorithms and Theory for Multiple-Source Adaptation
Judy Hoffman, Mehryar Mohri, Ningshan Zhang

A Lyapunov-based Approach to Safe Reinforcement Learning
Yinlam Chow, Ofir Nachum, Edgar Duenez-Guzman, Mohammad Ghavamzadeh

Adaptive Methods for Nonconvex Optimization
Manzil Zaheer, Sashank Reddi, Devendra Sachan, Satyen Kale, Sanjiv Kumar

Assessing Generative Models via Precision and Recall
Mehdi S. M. Sajjadi, Olivier Bachem, Mario Lucic, Olivier Bousquet, Sylvain Gelly

A Loss Framework for Calibrated Anomaly Detection
Aditya Menon, Robert Williamson

Blockwise Parallel Decoding for Deep Autoregressive Models
Mitchell Stern, Noam Shazeer, Jakob Uszkoreit

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation
Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

Contextual Pricing for Lipschitz Buyers
Jieming Mao, Renato Leme, Jon Schneider

Coupled Variational Bayes via Optimization Embedding
Bo Dai, Hanjun Dai, Niao He, Weiyang Liu, Zhen Liu, Jianshu Chen, Lin Xiao, Le Song

Data Amplification: A Unified and Competitive Approach to Property Estimation
Yi HAO, Alon Orlitsky, Ananda Theertha Suresh, Yihong Wu

Deep Network for the Integrated 3D Sensing of Multiple People in Natural Images
Elisabeta Marinoiu, Mihai Zanfir, Alin-Ionut Popa, Cristian Sminchisescu

Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation
Wenqi Ren, Jiawei Zhang, Lin Ma, Jinshan Pan, Xiaochun Cao, Wei Liu, Ming-Hsuan Yang

Diminishing Returns Shape Constraints for Interpretability and Regularization
Maya Gupta, Dara Bahri, Andrew Cotter, Kevin Canini

DropBlock: A Regularization Method for Convolutional Networks
Golnaz Ghiasi, Tsung-Yi Lin, Quoc V. Le

Generalization Bounds for Uniformly Stable Algorithms
Vitaly Feldman, Jan Vondrak

Geometrically Coupled Monte Carlo Sampling
Mark Rowland, Krzysztof Choromanski, Francois Chalus, Aldo Pacchiano, Tamas Sarlos, Richard E. Turner, Adrian Weller

GILBO: One Metric to Measure Them All
Alexander A. Alemi, Ian Fischer

Insights on Representational Similarity in Neural Networks with Canonical Correlation
Ari S. Morcos, Maithra Raghu, Samy Bengio

Improving Online Algorithms via ML Predictions
Manish Purohit, Zoya Svitkina, Ravi Kumar

Learning to Exploit Stability for 3D Scene Parsing
Yilun Du, Zhijan Liu, Hector Basevi, Ales Leonardis, William T. Freeman, Josh Tenembaum, Jiajun Wu

Maximizing Induced Cardinality Under a Determinantal Point Process
Jennifer Gillenwater, Alex Kulesza, Sergei Vassilvitskii, Zelda Mariet

Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing
Chen Liang, Mohammad Norouzi, Jonathan Berant, Quoc V. Le, Ni Lao

PCA of High Dimensional Random Walks with Comparison to Neural Network Training
Joseph M. Antognini, Jascha Sohl-Dickstein

Predictive Approximate Bayesian Computation via Saddle Points
Yingxiang Yang, Bo Dai, Negar Kiyavash, Niao He

Recurrent World Models Facilitate Policy Evolution
David Ha, Jürgen Schmidhuber

Sanity Checks for Saliency Maps
Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, Been Kim

Simple, Distributed, and Accelerated Probabilistic Programming
Dustin Tran, Matthew Hoffman, Dave Moore, Christopher Suter, Srinivas Vasudevan, Alexey Radul, Matthew Johnson, Rif A. Saurous

Tangent: Automatic Differentiation Using Source-Code Transformation for Dynamically Typed Array Programming
Bart van Merriënboer, Dan Moldovan, Alex Wiltschko

The Emergence of Multiple Retinal Cell Types Through Efficient Coding of Natural Movies
Samuel A. Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny

The Everlasting Database: Statistical Validity at a Fair Price
Blake Woodworth, Vitaly Feldman, Saharon Rosset, Nathan Srebro

The Spectrum of the Fisher Information Matrix of a Single-Hidden-Layer Neural Network
Jeffrey Pennington, Pratik Worah

A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks
Kimin Lee, Kibok Lee, Honglak Lee, Jinwoo Shin

Autoconj: Recognizing and Exploiting Conjugacy Without a Domain-Specific Language
Matthew D. Hoffman, Matthew Johnson, Dustin Tran

A Bayesian Nonparametric View on Count-Min Sketch
Diana Cai, Michael Mitzenmacher, Ryan Adams (no longer at Google)

Automatic Differentiation in ML: Where We are and Where We Should be Going
Bart van Merriënboer, Olivier Breuleux, Arnaud Bergeron, Pascal Lamblin

Assessing the Scalability of Biologically-Motivated Deep Learning Algorithms and Architectures
Sergey Bartunov, Adam Santoro, Blake A. Richards, Geoffrey E. Hinton, Timothy P. Lillicrap

Deep Generative Models for Distribution-Preserving Lossy Compression
Michael Tschannen, Eirikur Agustsson, Mario Lucic

Deep Structured Prediction with Nonlinear Output Transformations
Colin Graber, Ofer Meshi, Alexander Schwing

Discovery of Latent 3D Keypoints via End-to-end Geometric Reasoning
Supasorn Suwajanakorn, Noah Snavely, Jonathan Tompson, Mohammad Norouzi

Transfer Learning with Neural AutoML
Catherine Wong, Neil Houlsby, Yifeng Lu, Andrea Gesmundo

Efficient Gradient Computation for Structured Output Learning with Rational and Tropical Losses
Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Dmitry Storcheus, Scott Yang

Cooperative neural networks (CoNN): Exploiting prior independence structure for improved classification
Harsh Shrivastava, Eugene Bart, Bob Price, Hanjun Dai, Bo Dai, Srinivas Aluru

Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization
Blake Woodworth, Jialei Wang, Brendan McMahan, Nathan Srebro

Hierarchical Reinforcement Learning for Zero-shot Generalization with Subtask Dependencies
Sungryull Sohn, Junhyuk Oh, Honglak Lee

Human-in-the-Loop Interpretability Prior
Isaac Lage, Andrew Slavin Ross, Been Kim, Samuel J. Gershman, Finale Doshi-Velez

Joint Autoregressive and Hierarchical Priors for Learned Image Compression
David Minnen, Johannes Ballé, George D Toderici

Large-Scale Computation of Means and Clusters for Persistence Diagrams Using Optimal Transport
Théo Lacombe, Steve Oudot, Marco Cuturi

Learning to Reconstruct Shapes from Unseen Classes
Xiuming Zhang, Zhoutong Zhang, Chengkai Zhang, Joshua B. Tenenbaum, William T. Freeman, Jiajun Wu

Large Margin Deep Networks for Classification
Gamaleldin Fathy Elsayed, Dilip Krishnan, Hossein Mobahi, Kevin Regan, Samy Bengio

Mallows Models for Top-k Lists
Flavio Chierichetti, Anirban Dasgupta, Shahrzad Haddadan, Ravi Kumar, Silvio Lattanzi

Meta-Learning MCMC Proposals
Tongzhou Wang, YI WU, Dave Moore, Stuart Russell

Non-delusional Q-Learning and Value-Iteration
Tyler Lu, Dale Schuurmans, Craig Boutilier

Online Learning of Quantum States
Scott Aaronson, Xinyi Chen, Elad Hazan, Satyen Kale, Ashwin Nayak

Online Reciprocal Recommendation with Theoretical Performance Guarantees
Fabio Vitale, Nikos Parotsidis, Claudio Gentile

Optimal Algorithms for Continuous Non-monotone Submodular and DR-Submodular Maximization
Rad Niazadeh, Tim Roughgarden, Joshua R. Wang

Policy Regret in Repeated Games
Raman Arora, Michael Dinitz, Teodor Vanislavov Marinov, Mehryar Mohri

Provable Variational Inference for Constrained Log-Submodular Models
Josip Djolonga, Stefanie Jegelka, Andreas Krause

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms
Avital Oliver, Augustus Odena, Colin Raffel, Ekin D. Cubuk, Ian J. Goodfellow

Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion
Jacob Buckman, Danijar Hafner, George Tucker, Eugene Brevdo, Honglak Lee

Visual Object Networks: Image Generation with Disentangled 3D Representations
JunYan Zhu, Zhoutong Zhang, Chengkai Zhang, Jiajun Wu, Antonio Torralba, Josh Tenenbaum, William T. Freeman

Watch Your Step: Learning Node Embeddings via Graph Attention
Sami Abu-El-Haija, Bryan Perozzi, Rami AlRfou, Alexander Alemi

2nd Workshop on Machine Learning on the Phone and Other Consumer Devices
Co-Chairs include: Sujith Ravi, Wei Chai, Hrishikesh Aradhye

Bayesian Deep Learning
Workshop Organizers include: Kevin Murphy

Continual Learning
Workshop Organizers include: Marc Pickett

The Second Conversational AI Workshop – Today's Practice and Tomorrow's Potential
Workshop Organizers include: Dilek Hakkani-Tur

Visually Grounded Interaction and Language
Workshop Organizers include: Olivier Pietquin

Workshop on Ethical, Social and Governance Issues in AI
Workshop Organizers include: D. Sculley

AI for Social Good
Workshop Program Committee includes: Samuel Greydanus

Black in AI
Workshop Organizers: Mouhamadou Moustapha Cisse, Timnit Gebru
Program Committee: Irwan Bello, Samy Bengio, Ian Goodfellow, Hugo Larochelle, Margaret Mitchell

Interpretability and Robustness in Audio, Speech, and Language
Workshop Organizers include: Ehsan Variani, Bhuvana Ramabhadran

LatinX in AI
Workshop Organizers includes: Pablo Samuel Castro
Program Committee includes: Sergio Guadarrama

Machine Learning for Systems
Workshop Organizers include: Anna Goldie, Azalia Mirhoseini, Kevin Swersky, Milad Hashemi
Program Committee includes: Simon Kornblith, Nicholas Frosst, Amir Yazdanbakhsh, Azade Nazi, James Bradbury, Sharan Narang, Martin Maas, Carlos Villavieja

Queer in AI
Workshop Organizers include: Raphael Gontijo Lopes

Second Workshop on Machine Learning for Creativity and Design
Workshop Organizers include: Jesse Engel, Adam Roberts

Workshop on Security in Machine Learning
Workshop Organizers include: Nicolas Papernot

Visualization for Machine Learning
Fernanda Viégas, Martin Wattenberg

Source: Google AI Blog

Google at EMNLP 2018

This week, the annual conference on Empirical Methods in Natural Language Processing (EMNLP 2018) will be held in Brussels, Belgium. Google will have a strong presence at EMNLP with several of our researchers presenting their research on a diverse set of topics, including language identification, segmentation, semantic parsing and question answering, additionally serving in various levels of organization in the conference. Googlers will also be presenting their papers and participating in the co-located Conference on Computational Natural Language Learning (CoNLL 2018) shared task on multilingual parsing.

In addition to this involvement, we are sharing several new datasets with the academic community that are released with papers published at EMNLP, with the goal of accelerating progress in empirical natural language processing (NLP). These releases are designed to help account for mismatches between the datasets a machine learning model is trained and tested on, and the inputs an NLP system would be asked to handle “in the wild”. All of the datasets we are releasing include realistic, naturally occurring text, and fall into two main categories: 1) challenge sets for well-studied core NLP tasks (part-of-speech tagging, coreference) and 2) datasets to encourage new directions of research on meaning preservation under rephrasings/edits (query well-formedness, split-and-rephrase, atomic edits):
  • Noun-Verb Ambiguity in POS Tagging Dataset: English part-of-speech taggers regularly make egregious errors related to noun-verb ambiguity, despite high accuracies on standard datasets. For example: in “Mark which area you want to distress” several state-of-the-art taggers annotate “Mark” as a noun instead of a verb. We release a new dataset of over 30,000 naturally occurring non-trivial annotated examples of noun-verb ambiguity. Taggers previously indistinguishable from each other have accuracies ranging from 57% to 75% accuracy on this challenge set.
  • Query Wellformedness Dataset: Web search queries are usually “word-salad” style queries with little resemblance to natural language questions (“barack obama height” as opposed to “What is the height of Barack Obama?”). Differentiating a natural language question from a query is of importance to several applications include dialogue. We annotate and release 25,100 queries from the open-source Paralex corpus with ratings on how close they are to well-formed natural language questions.
  • WikiSplit: Split and Rephrase Dataset Extracted from Wikipedia Edits: We extract examples of sentence splits from Wikipedia edits where one sentence gets split into two sentences that together preserve the original meaning of the sentence (E.g. “Street Rod is the first in a series of two games released for the PC and Commodore 64 in 1989.” is split into “Street Rod is the first in a series of two games.” and “It was released for the PC and Commodore 64 in 1989.”) The released corpus contains one million sentence splits with a vocabulary of more than 600,000 words. 
  • WikiAtomicEdits: A Multilingual Corpus of Atomic Wikipedia Edits: Information about how people edit language in Wikipedia can be used to understand the structure of language itself. We pay particular attention to two atomic edits: insertions and deletions that consist of a single contiguous span of text. We extract around 43 million such edits in 8 languages and show that they provide valuable information about entailment and discourse. For example, insertion of “in 1949” adds a prepositional phrase to the sentence “She died there after a long illness” resulting in “She died there in 1949 after a long illness”.
These datasets join the others that Google has recently released, such as Conceptual Captions and GAP Coreference Resolution in addition to our past contributions.

Below is a full list of Google’s involvement and publications being presented at EMNLP and CoNLL (Googlers highlighted in blue). We are particularly happy to announce that the paper “Linguistically-Informed Self-Attention for Semantic Role Labeling” was awarded one of the two Best Long Paper awards. This work was done by our 2017 intern Emma Strubell, Googlers Daniel Andor, David Weiss and Google Faculty Advisor Andrew McCallum. We congratulate these authors, and all other researchers who are presenting their work at the conference.

Area Chairs Include:
Ming-Wei Chang, Marius Pasca, Slav Petrov, Emily Pitler, Meg Mitchell, Taro Watanabe

EMNLP Publications
A Challenge Set and Methods for Noun-Verb Ambiguity
Ali Elkahky, Kellie Webster, Daniel Andor, Emily Pitler

A Fast, Compact, Accurate Model for Language Identification of Codemixed Text
Yuan Zhang, Jason Riesa, Daniel Gillick, Anton Bakalov, Jason Baldridge, David Weiss

AirDialogue: An Environment for Goal-Oriented Dialogue Research
Wei Wei, Quoc Le, Andrew Dai, Jia Li

Content Explorer: Recommending Novel Entities for a Document Writer
Michal Lukasik, Richard Zens

Deep Relevance Ranking using Enhanced Document-Query Interactions
Ryan McDonald, George Brokos, Ion Androutsopoulos

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William Cohen, Ruslan Salakhutdinov, Christopher D. Manning

Identifying Well-formed Natural Language Questions
Manaal Faruqui, Dipanjan Das

Learning To Split and Rephrase From Wikipedia Edit History
Jan A. Botha, Manaal Faruqui, John Alex, Jason Baldridge, Dipanjan Das

Linguistically-Informed Self-Attention for Semantic Role Labeling
Emma Strubell, Patrick Verga, Daniel Andor, David Weiss, Andrew McCallum

Open Domain Question Answering Using Early Fusion of Knowledge Bases and Text
Haitian Sun, Bhuwan Dhingra, Manzil Zaheer, Kathryn Mazaitis, Ruslan Salakhutdinov, William Cohen

Noise Contrastive Estimation for Conditional Models: Consistency and Statistical Efficiency
Zhuang Ma, Michael Collins

Part-of-Speech Tagging for Code-Switched, Transliterated Texts without Explicit Language Identification
Kelsey Ball, Dan Garrette

Phrase-Indexed Question Answering: A New Challenge for Scalable Document Comprehension
Minjoon Seo, Tom Kwiatkowski, Ankur P. Parikh, Ali Farhadi, Hannaneh Hajishirzi

Policy Shaping and Generalized Update Equations for Semantic Parsing from Denotations
Dipendra Misra, Ming-Wei Chang, Xiaodong He, Wen-tau Yih

Revisiting Character-Based Neural Machine Translation with Capacity and Compression
Colin Cherry, George Foster, Ankur Bapna, Orhan Firat, Wolfgang Macherey

Self-governing neural networks for on-device short text classification
Sujith Ravi, Zornitsa Kozareva

Semi-Supervised Sequence Modeling with Cross-View Training
Kevin Clark, Minh-Thang Luong, Christopher D. Manning, Quoc Le

State-of-the-art Chinese Word Segmentation with Bi-LSTMs
Ji Ma, Kuzman Ganchev, David Weiss

Subgoal Discovery for Hierarchical Dialogue Policy Learning
Da Tang, Xiujun Li, Jianfeng Gao, Chong Wang, Lihong Li, Tony Jebara

SwitchOut: an Efficient Data Augmentation Algorithm for Neural Machine Translation
Xinyi Wang, Hieu Pham, Zihang Dai, Graham Neubig

The Importance of Generation Order in Language Modeling
Nicolas Ford, Daniel Duckworth, Mohammad Norouzi, George Dahl

Training Deeper Neural Machine Translation Models with Transparent Attention
Ankur Bapna, Mia Chen, Orhan Firat, Yuan Cao, Yonghui Wu

Understanding Back-Translation at Scale
Sergey Edunov, Myle Ott, Michael Auli, David Grangier

Unsupervised Natural Language Generation with Denoising Autoencoders
Markus Freitag, Scott Roy

WikiAtomicEdits: A Multilingual Corpus of Wikipedia Edits for Modeling Language and Discourse
Manaal Faruqui, Ellie Pavlick, Ian Tenney, Dipanjan Das

WikiConv: A Corpus of the Complete Conversational History of a Large Online Collaborative Community
Yiqing Hua, Cristian Danescu-Niculescu-Mizil, Dario Taraborelli, Nithum Thain, Jeffery Sorensen, Lucas Dixon

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo, John Richardson

Universal Sentence Encoder for English
Daniel Cer, Yinfei Yang, Sheng-yi Kong, Nan Hua, Nicole Limtiaco, Rhomni St. John, Noah Constant, Mario Guajardo-Cespedes, Steve Yuan, Chris Tar, Brian Strope, Ray Kurzweil

CoNLL Shared Task
Multilingual Parsing from Raw Text to Universal Dependencies
Slav Petrov, co-organizer

Universal Dependency Parsing with Multi-Treebank Models
Aaron Smith, Bernd Bohnet, Miryam de Lhoneux, Joakim Nivre, Yan Shao, Sara Stymne
(Winner of the Universal POS Tagging and Morphological Tagging subtasks, using the open-sourced Meta-BiLSTM tagger)

CoNLL Publication
Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!
Katharina Kann, Sascha Rothe, Katja Filippova

Source: Google AI Blog

Understanding Performance Fluctuations in Quantum Processors

One area of research the Google AI Quantum team pursues is building quantum processors from superconducting electrical circuits, which are attractive candidates for implementing quantum bits (qubits). While superconducting circuits have demonstrated state-of-the-art performance and extensibility to modest processor sizes comprising tens of qubits, an outstanding challenge is stabilizing their performance, which can fluctuate unpredictably. Although performance fluctuations have been observed in numerous superconducting qubit architectures, their origin isn’t well understood, impeding progress in stabilizing processor performance.

In “Fluctuations of Energy-Relaxation Times in Superconducting Qubits” published in this week’s Physical Review Letters, we use qubits as probes of their environment to show that performance fluctuations are dominated by material defects. This was done by investigating qubits’ energy relaxation times (T1) — a popular performance metric that gives the length of time that it takes for a qubit to undergo energy-relaxation from its excited to ground state — as a function of operating frequency and time.

In measuring T1, we found that some qubit operating frequencies are significantly worse than others, forming energy-relaxation hot-spots (see figure below). Our research suggests that these hot spots are due to material defects, which are themselves quantum systems that can extract energy from qubits when their frequencies overlap (i.e. are “resonant”). Surprisingly, we found that the energy-relaxation hot spots are not static, but “move” on timescales ranging from minutes to hours. From these observations, we concluded that the dynamics of defects’ frequencies into and out of resonance with qubits drives the most significant performance fluctuations.
Left: A quantum processor similar to the one that was used to investigate qubit performance fluctuations. One qubit is highlighted in blue. Right: One qubit’s energy-relaxation time “T1” plotted as a function of it’s operating frequency and time. We see energy-relaxation hotspots, which our data suggest are due to material defects (black arrowheads). The motion of these hotspots into and out-of resonance with the qubit are responsible for the most significant energy-relaxation fluctuations. Note that these data were taken over a frequency band with an above-average density of defects.
These defects — which are typically referred to as two-level-systems (TLS) — are commonly believed to exist at the material interfaces of superconducting circuits. However, even after decades of research, their microscopic origin still puzzles researchers. In addition to clarifying the origin of qubit performance fluctuations, our data shed light on the physics governing defect dynamics, which is an important piece of this puzzle. Interestingly, from thermodynamics arguments we would not expect the defects that we see to exhibit any dynamics at all. Their energies are about one order of magnitude higher than the thermal energy available in our quantum processor, and so they should be “frozen out.” The fact that they are not frozen out suggests their dynamics may be driven by interactions with other defects that have much lower energies and can thus be thermally activated.

The fact that qubits can be used to investigate individual material defects - which are believed to have atomic dimensions, millions of times smaller than our qubits - demonstrates that they are powerful metrological tools. While it’s clear that defect research could help address outstanding problems in materials physics, it’s perhaps surprising that it has direct implications on improving the performance of today’s quantum processors. In fact, defect metrology already informs our processor design and fabrication, and even the mathematical algorithms that we use to avoid defects during quantum processor runtime. We hope this research motivates further work into understanding material defects in superconducting circuits.

Source: Google AI Blog

Improving Connectomics by an Order of Magnitude

The field of connectomics aims to comprehensively map the structure of the neuronal networks that are found in the nervous system, in order to better understand how the brain works. This process requires imaging brain tissue in 3D at nanometer resolution (typically using electron microscopy), and then analyzing the resulting image data to trace the brain’s neurites and identify individual synaptic connections. Due to the high resolution of the imaging, even a cubic millimeter of brain tissue can generate over 1,000 terabytes of data! When combined with the fact that the structures in these images can be extraordinarily subtle and complex, the primary bottleneck in brain mapping has been automating the interpretation of these data, rather than acquisition of the data itself.

Today, in collaboration with colleagues at the Max Planck Institute of Neurobiology, we published “High-Precision Automated Reconstruction of Neurons with Flood-Filling Networks” in Nature Methods, which shows how a new type of recurrent neural network can improve the accuracy of automated interpretation of connectomics data by an order of magnitude over previous deep learning techniques. An open-access version of this work is also available from biorXiv (2017).

3D Image Segmentation with Flood-Filling Networks
Tracing neurites in large-scale electron microscopy data is an example of an image segmentation problem. Traditional algorithms have divided the process into at least two steps: finding boundaries between neurites using an edge detector or a machine-learning classifier, and then grouping together image pixels that are not separated by a boundary using an algorithm like watershed or graph cut. In 2015, we began experimenting with an alternative approach based on recurrent neural networks that unifies these two steps. The algorithm is seeded at a specific pixel location and then iteratively “fills” a region using a recurrent convolutional neural network that predicts which pixels are part of the same object as the seed. Since 2015, we have been working to apply this new approach to large-scale connectomics datasets and rigorously quantify its accuracy.
A flood-filling network segmenting an object in 2d. The yellow dot is the center of the current area of focus; the algorithm expands the segmented region (blue) as it iteratively examines more of the overall image.
Measuring Accuracy via Expected Run Length
Working with our partners at the Max Planck Institute, we devised a metric we call “expected run length” (ERL) that measures the following: given a random point within a random neuron in a 3d image of a brain, how far can we trace the neuron before making some kind of mistake? This is an example of a mean-time-between-failure metric, except that in this case we measure the amount of space between failures rather than the amount of time. For engineers, the appeal of ERL is that it relates a linear, physical path length to the frequency of individual mistakes that are made by an algorithm, and that it can be computed in a straightforward way. For biologists, the appeal is that a particular numerical value of ERL can be related to biologically relevant quantities, such as the average path length of neurons in different parts of the nervous system.
Progress in expected run length (blue line) leading up to the results shared today in Nature Methods. The red line shows progress in the “merge rate,” which measures the frequency with which two separate neurites were erroneously traced as a single object; achieving a very low merge rate is important for enabling efficient strategies for manual identification and correction of the remaining errors in the reconstruction.
Songbird Connectomics
We used ERL to measure our progress on a ground-truth set of neurons within a 1-million cubic micron zebra finch song-bird brain imaged by our collaborators using serial block-face scanning electron microscopy and found that our approach performed much better than previous deep learning pipelines applied to the same dataset.
Our algorithm in action as it traces a single neurite in 3d in a songbird brain.
We segmented every neuron in a small portion of a zebra finch song-bird brain using the new flood-filling network approach, as depicted here:
Reconstruction of a portion of zebra finch brain. Colors denote distinct objects in the segmentation that was automatically generated using a flood-filling network. Gold spheres represent synaptic locations automatically identified using a previously published approach.
By combining these automated results with a small amount of additional human effort required to fix the remaining errors, our collaborators at the Max Planck Institute are now able to study the songbird connectome to derive new insights into how zebra finch birds sing their song and test theories related to how they learn their song.

Next Steps
We will continue to improve connectomics reconstruction technology, with the aim of fully automating synapse-resolution connectomics and contributing to ongoing connectomics projects at the Max Planck Institute and elsewhere. In order to help support the larger research community in developing connectomics techniques, we have also open-sourced the TensorFlow code for the flood-filling network approach, along with WebGL visualization software for 3d datasets that we developed to help us understand and improve our reconstruction results.

We would like to acknowledge core contributions from Tim Blakely, Peter Li, Larry Lindsey, Jeremy Maitin-Shepard, Art Pope and Mike Tyka (Google), as well as Joergen Kornfeld and Winfried Denk (Max Planck Institute).

Source: Google AI Blog

Google at ICML 2018

Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.

As a leader in machine learning research, Google is proud to be a Platinum Sponsor of the thirty-fifth International Conference on Machine Learning (ICML 2018), a premier annual event supported by the International Machine Learning Society taking place this week in Stockholm, Sweden. With over 130 Googlers attending the conference to present publications and host workshops, we look forward to our continued collaboration with the larger ML research community.

If you're attending ICML 2018, we hope you'll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges. Our researchers will also be available to talk about TensorFlow Hub, the latest work from the Magenta project, a Q&A session on the Google AI Residency program and much more. You can also learn more about our research being presented at ICML 2018 in the list below (Googlers highlighted in blue).

ICML 2018 Committees
Board Members include: Andrew McCallumCorinna CortesHugo LarochelleWilliam Cohen
Sponsorship Co-Chair: Ryan Adams

Accepted Publications
Predict and Constrain: Modeling Cardinality in Deep Structured Prediction
Nataly Brukhim, Amir Globerson

Quickshift++: Provably Good Initializations for Sample-Based Mean Shift
Heinrich Jiang, Jennifer Jang, Samory Kpotufe

Learning a Mixture of Two Multinomial Logits
Flavio Chierichetti, Ravi KumarAndrew Tomkins

Structured Evolution with Compact Architectures for Scalable Policy Optimization
Krzysztof Choromanski, Mark Rowland, Vikas Sindhwani, Richard E Turner, Adrian Weller

Fixing a Broken ELBO
Alexander Alemi, Ben Poole, Ian FischerJoshua DillonRif SaurousKevin Murphy

Hierarchical Long-term Video Prediction without Supervision
Nevan Wichers, Ruben Villegas, Dumitru ErhanHonglak Lee

Self-Consistent Trajectory Autoencoder: Hierarchical Reinforcement Learning with Trajectory Embeddings
John Co-Reyes, Yu Xuan Liu, Abhishek Gupta, Benjamin Eysenbach, Pieter Abbeel, Sergey Levine

Well Tempered Lasso
Yuanzhi Li, Yoram Singer

Programmatically Interpretable Reinforcement Learning
Abhinav Verma, Vijayaraghavan Murali, Rishabh Singh, Pushmeet Kohli, Swarat Chaudhuri

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao XiaoYasaman BahriJascha Sohl-DicksteinSamuel SchoenholzJeffrey Pennington

On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization
Sanjeev Arora, Nadav Cohen, Elad Hazan

Scalable Deletion-Robust Submodular Maximization: Data Summarization with Privacy and Fairness Constraints
Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi

Data Summarization at Scale: A Two-Stage Submodular Approach
Marko Mitrovic, Ehsan Kazemi, Morteza Zadimoghaddam, Amin Karbasi

Machine Theory of Mind
Neil Rabinowitz, Frank Perbet, Francis Song, Chiyuan Zhang, S. M. Ali Eslami, Matthew Botvinick

Learning to Optimize Combinatorial Functions
Nir Rosenfeld, Eric Balkanski, Amir Globerson, Yaron Singer

Proportional Allocation: Simple, Distributed, and Diverse Matching with High Entropy
Shipra Agarwal, Morteza ZadimoghaddamVahab Mirrokni

Path Consistency Learning in Tsallis Entropy Regularized MDPs
Yinlam Chow, Ofir NachumMohammad Ghavamzadeh

Efficient Neural Architecture Search via Parameters Sharing
Hieu Pham, Melody Guan, Barret ZophQuoc LeJeff Dean

Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam Shazeer, Mitchell Stern

Learning Memory Access Patterns
Milad HashemiKevin SwerskyJamie Smith, Grant Ayers, Heiner Litz, Jichuan Chang, Christos Kozyrakis, Parthasarathy Ranganathan

SBEED: Convergent Reinforcement Learning with Nonlinear Function Approximation
Bo Dai, Albert Shaw, Lihong Li, Lin Xiao, Niao He, Zhen Liu, Jianshu Chen, Le Song

Scalable Bilinear Pi Learning Using State and Action Features
Yichen Chen, Lihong Li, Mengdi Wang

Distributed Asynchronous Optimization with Unbounded Delays: How Slow Can You Go?
Zhengyuan Zhou, Panayotis Mertikopoulos, Nicholas Bambos, Peter Glynn, Yinyu Ye, Li-Jia Li, Li Fei-Fei

Shampoo: Preconditioned Stochastic Tensor Optimization
Vineet Gupta, Tomer Koren, Yoram Singer

Parallel and Streaming Algorithms for K-Core Decomposition
Hossein Esfandiari, Silvio LattanziVahab Mirrokni

Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
Maithra RaghuAlexander Irpan, Jacob Andreas, Bobby Kleinberg, Quoc Le, Jon Kleinberg

Is Generator Conditioning Causally Related to GAN Performance?
Augustus OdenaJacob BuckmanCatherine OlssonTom BrownChristopher OlahColin RaffelIan Goodfellow

The Mirage of Action-Dependent Baselines in Reinforcement Learning
George TuckerSurya Bhupatiraju, Shixiang Gu, Richard E Turner, Zoubin Ghahramani, Sergey Levine

MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels
Lu Jiang, Zhengyuan Zhou, Thomas Leung, Li-Jia LiLi Fei-Fei

Loss Decomposition for Fast Learning in Large Output Spaces
En-Hsu Yen, Satyen KaleFelix Xinnan YuDaniel Holtmann-RiceSanjiv Kumar, Pradeep Ravikumar

A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
Adam RobertsJesse EngelColin RaffelCurtis HawthorneDouglas Eck

Smoothed Action Value Functions for Learning Gaussian Policies
Ofir NachumMohammad NorouziGeorge TuckerDale Schuurmans

Fast Decoding in Sequence Models Using Discrete Latent Variables
Lukasz KaiserSamy BengioAurko RoyAshish VaswaniNiki ParmarJakob UszkoreitNoam Shazeer

Accelerating Greedy Coordinate Descent Methods
Haihao Lu, Robert Freund, Vahab Mirrokni

Approximate Leave-One-Out for Fast Parameter Tuning in High Dimensions
Shuaiwen Wang, Wenda Zhou, Haihao Lu, Arian Maleki, Vahab Mirrokni

Image Transformer
Niki Parmar, Ashish Vaswani, Jakob Uszkoreit, Lukasz Kaiser, Noam Shazeer, Alexander Ku, Dustin Tran

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan, Eric Battenberg, Ying Xiao, Yuxuan Wang, Daisy Stanton, Joel Shor, Ron Weiss, Robert Clark, Rif Saurous

Dynamical Isometry and a Mean Field Theory of RNNs: Gating Enables Signal Propagation in Recurrent Neural Networks
Minmin Chen, Jeffrey Pennington,, Samuel Schoenholz

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel ShorYing Xiao, Ye Jia, Fei Ren, Rif Saurous

Constrained Interacting Submodular Groupings
Andrew CotterMahdi Milani FardSeungil YouMaya Gupta, Jeff Bilmes

Reinforcing Adversarial Robustness using Model Confidence Induced by Adversarial Training
Xi Wu, Uyeong Jang, Jiefeng Chen, Lingjiao Chen, Somesh Jha

Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viégas, Rory Sayres

Online Learning with Abstention
Corinna CortesGiulia DeSalvoClaudio GentileMehryar Mohri, Scott Yang

Online Linear Quadratic Control
Alon CohenAvinatan HasidimTomer KorenNevena LazicYishay MansourKunal Talwar

Competitive Caching with Machine Learned Advice
Thodoris Lykouris, Sergei Vassilvitskii

Efficient Neural Audio Synthesis
Nal Kalchbrenner, Erich Elsen, Karen Simonyan, Seb Noury, Norman Casagrande, Edward Lockhart, Florian Stimberg, Aäron van den Oord, Sander Dieleman, Koray Kavukcuoglu

Gradient Descent with Identity Initialization Efficiently Learns Positive Definite Linear Transformations by Deep Residual Networks
Peter Bartlett, Dave Helmbold, Phil Long

Understanding and Simplifying One-Shot Architecture Search
Gabriel BenderPieter-Jan KindermansBarret ZophVijay VasudevanQuoc Le

Approximation Algorithms for Cascading Prediction Models
Matthew Streeter

Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Trieu TrinhAndrew DaiThang LuongQuoc Le

Self-Imitation Learning
Junhyuk Oh, Yijie Guo, Satinder Singh, Honglak Lee

Adaptive Sampled Softmax with Kernel Based Sampling
Guy Blanc, Steffen Rendle

2018 Workshop on Human Interpretability in Machine Learning (WHI)
Organizers: Been Kim, Kush Varshney, Adrian Weller
Invited Speakers include: Fernanda ViégasMartin Wattenberg

Exploration in Reinforcement Learning
Organizers: Ben EysenbachSurya BhupatirajuShane Gu, Junhyuk Oh, Vincent Vanhoucke, Oriol Vinyals, Doina Precup

Theoretical Foundations and Applications of Deep Generative Models
Invited speakers include: Honglak Lee

Source: Google AI Blog

Teaching Uncalibrated Robots to Visually Self-Adapt

People are remarkably proficient at manipulating objects without needing to adjust their viewpoint to a fixed or specific pose. This capability (referred to as visual motor integration) is learned during childhood from manipulating objects in various situations, and governed by a self-adaptation and mistake correction mechanism that uses rich sensory cues and vision as feedback. However, this capability is quite difficult for vision-based controllers in robotics, which until now have been built on a rigid setup for reading visual input data from a fixed mounted camera which should not be moved or repositioned at train and test time. The ability to quickly acquire visual motor control skills under large viewpoint variation would have substantial implications for autonomous robotic systems — for example, this capability would be particularly desirable for robots that can help rescue efforts in emergency or disaster zones.

In “Sim2Real Viewpoint Invariant Visual Servoing by Recurrent Control” presented at CVPR 2018 this week, we study a novel deep network architecture (consisting of two fully convolutional networks and a long short-term memory unit) that learns from a past history of actions and observations to self-calibrate. Using diverse simulated data consisting of demonstrated trajectories and reinforcement learning objectives, our visually-adaptive network is able to control a robotic arm to reach a diverse set of visually-indicated goals, from various viewpoints and independent of camera calibration.
Viewpoint invariant manipulation for visually indicated goal reaching with a physical robotic arm. We learn a single policy that can reach diverse goals from sensory input captured from drastically different camera viewpoints. First row shows the visually indicated goals.

The Challenge
Discovering how the controllable degrees of freedom (DoF) affect visual motion can be ambiguous and underspecified from a single image captured from an unknown viewpoint. Identifying the effect of actions on image-space motion and successfully performing the desired task requires a robust perception system augmented with the ability to maintain a memory of past actions. To be able to tackle this challenging problem, we had to address the following essential questions:
  • How can we make it feasible to provide the right amount of experience for the robot to learn the self-adaptation behavior based on pure visual observations that simulate a lifelong learning paradigm?
  • How can we design a model that integrates robust perception and self-adaptive control such that it can quickly transfer to unseen environments?
To do so, we devised a new manipulation task where a seven-DoF robot arm is provided with an image of an object and is directed to reach that particular goal amongst a set of distractor objects, while viewpoints change drastically from one trial to another. In doing so, we were able to simulate both the learning of complex behaviors and the transfer to unseen environments.
Visually indicated goal reaching task with a physical robotic arm and diverse camera viewpoints.
Harnessing Simulation to Learn Complex Behaviors
Collecting robot experience data is difficult and time-consuming. In a previous post, we showed how to scale up learning skills by distributing the data collection and trials to multiple robots. Although this approach expedited learning, it is still not feasibly extendable to learning complex behaviors such as visual self-calibration, where we need to expose robots to a huge space of various viewpoints. Instead, we opt to learn such complex behavior in simulation where we can collect unlimited robot trials and easily move the camera to various random viewpoints. In addition to fast data collection in simulation, we can also surpass hardware limitations requiring the installation of multiple cameras around a robot.
We use domain randomization technique to learn generalizable policies in simulation.
To learn visually robust features to transfer to unseen environments, we used a technique known as domain randomization (a.k.a. simulation randomization) introduced by Sadeghi & Levine (2017), that enables robots to learn vision-based policies entirely in simulation such that they can generalize to the real world. This technique was shown to work well for various robotic tasks such as indoor navigation, object localization, pick and placing, etc. In addition, to learn complex behaviors like self-calibration, we harnessed the simulation capabilities to generate synthetic demonstrations and combined reinforcement learning objectives to learn a robust controller for the robotic arm.
Viewpoint invariant manipulation for visually indicated goal reaching with a simulated seven-DoF robotic arm. We learn a single policy that can reach diverse goals from sensory input captured from dramatically different camera viewpoints.

Disentangling Perception from Control
To enable fast transfer to unseen environments, we devised a deep neural network that combines perception and control trained end-to-end simultaneously, while also allowing each to be learned independently if needed. This disentanglement between perception and control eases transfer to unseen environments, and makes the model both flexible and efficient in that each of its parts (i.e. 'perception' or 'control') can be independently adapted to new environments with small amounts of data. Additionally, while the control portion of the network was entirely trained by the simulated data, the perception part of our network was complemented by collecting a small amount of static images with object bounding boxes without needing to collect the whole action sequence trajectory with a physical robot. In practice, we fine-tuned the perception part of our network with only 76 object bounding boxes coming from 22 images.
Real-world robot and moving camera setup. First row shows the scene arrangements and the second row shows the visual sensory input to the robot.
Early Results
We tested the visually-adapted version of our network on a physical robot and on real objects with drastically different appearances than the ones used in simulation. Experiments were performed with both one or two objects on a table — “seen objects” (as labeled in the figure below) were used for visual adaptation using small collection of real static images, while “unseen objects” had not been seen during visual adaptation. During the test, the robot arm was directed to reach a visually indicated object from various viewpoints. For the two object experiments the second object was to "fool" the robotic arm. While the simulation-only network has good generalization capability (due to being trained with domain randomization technique), the very small amount of static visual data to visually adapt the controller boosted the performance, due to the flexible architecture of our network.
After adapting the visual features with the small amount of real images, performance was boosted by more than 10%. All used real objects are drastically different from the objects seen in simulation.
We believe that learning online visual self-adaptation is an important and yet challenging problem with the goal of learning generalizable policies for robots that can act in diverse and unstructured real world setup. Our approach can be extended to any sort of automatic self-calibration. See the video below for more information on this work.
This research was conducted by Fereshteh Sadeghi, Alexander Toshev, Eric Jang and Sergey Levine. We would also like to thank Erwin Coumans and Yunfei Bai for providing pybullet, and Vincent Vanhoucke for insightful discussions.

Source: Google AI Blog