Tag Archives: Publications

Toward Generalized Sim-to-Real Transfer for Robot Learning

Reinforcement and imitation learning methods in robotics research can enable autonomous environmental navigation and efficient object manipulation, which in turn opens up a breadth of useful real-life applications. Previous work has demonstrated how robots that learn end-to-end using deep neural networks can reliably and safely interact with the unstructured world around us by comprehending camera observations to take actions and solve tasks. However, while end-to-end learning methods can generalize and scale for complicated robot manipulation tasks, they require hundreds of thousands real world robot training episodes, which can be difficult to obtain. One can attempt to alleviate this constraint by using a simulation of the environment that allows virtual robots to learn more quickly and at scale, but the simulations’ inability to exactly match the real world presents a challenge c ommonly referred to as the sim-to-real gap. One important source of the gap comes from discrepancies between the images rendered in simulation and the real robot camera observations, which then causes the robot to perform poorly in the real world.

To-date, work on bridging this gap has employed a technique called pixel-level domain adaptation, which translates synthetic images to realistic ones at the pixel level. One example of this technique is GraspGAN, which employs a generative adversarial network (GAN), a framework that has been very effective at image generation, to model this transformation between simulated and real images given datasets of each domain. These pseudo-real images correct some sim-to-real gap, so policies learned with simulation execute more successfully on real robots. A limitation for their use in sim-to-real transfer, however, is that because GANs translate images at the pixel-level, multi-pixel features or structures that are necessary for robot task learning may be arbitrarily modified or even removed.

To address the above limitation, and in collaboration with the Everyday Robot Project at X, we introduce two works, RL-CycleGAN and RetinaGAN, that train GANs with robot-specific consistencies — so that they do not arbitrarily modify visual features that are specifically necessary for robot task learning — and thus bridge the visual discrepancy between sim and real. We demonstrate how these consistencies preserve features critical to policy learning, eliminating the need for hand-engineered, task-specific tuning, which in turn allows for this sim-to-real methodology to work flexibly across tasks, domains, and learning algorithms. With RL-CycleGAN, we describe our sim-to-real transfer methodology and demonstrate state-of-the-art performance on real world grasping tasks trained with RL. With RetinaGAN, we extend our approach to include imitation learning with a door opening task.

RL-CycleGAN
In “RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real”, we leverage a variation of CycleGAN for sim-to-real adaptation by ensuring consistency of task-relevant features between real and simulated images. CycleGAN encourages preservation of image contents by ensuring an adapted image transformed back to the original domain is identical to the original image, which is called cycle consistency. To further encourage the adapted images to be useful for robotics, the CycleGAN is jointly trained with a reinforcement learning (RL) robot agent that ensures the robot’s actions are the same given both the original images and those after GAN-adaptation. That is, task-specific features like robot arm or graspable object locations are unaltered, but the GAN may still alter lighting or textural differences between domains that do not affect task-level decisions.

Evaluating RL-CycleGAN
We evaluated RL-CycleGAN on a robotic indiscriminate grasping task. Trained on 580,000 real trials and simulations adapted with RL-CycleGAN, the robot grasps objects with 94% success, surpassing the 89% success rate of the prior state-of-the-art sim-to-real method GraspGAN and the 87% mark using real-only data without simulation. With only 28,000 trials, the RL-CycleGAN method reaches 86%, comparable to the previous baselines with 20x the data. Some examples of the RL-CycleGAN output alongside the simulation images are shown below.

Comparison between simulation images of robot grasping before (left) and after RL-CycleGAN translation (right).

RetinaGAN
While RL-CycleGAN reliably transfers from sim-to-real for the RL domain using task awareness, a natural question arises: can we develop a more flexible sim-to-real transfer technique that applies broadly to different tasks and robot learning techniques?

In “RetinaGAN: An Object-Aware Approach to Sim-to-Real Transfer”, presented at ICRA 2021, we develop such a task-decoupled, algorithm-decoupled GAN approach to sim-to-real transfer by instead focusing on robots’ perception of objects. RetinaGAN enforces strong object-semantic awareness through perception consistency via object detection to predict bounding box locations for all objects on all images. In an ideal sim-to-real model, we expect the object detector to predict the same box locations before and after GAN translation, as objects should not change structurally. RetinaGAN is trained toward this ideal by backpropagation, such that there is consistency in perception of objects both when a) simulated images are transformed from simulation to real and then back to simulation and b) when real images are transformed from real to simulation and then back to real. We find this object-based consistency to be more widely applicable than the task-specific consistency required by RL-CycleGAN.

Diagram of RetinaGAN stages. The simulated image (top left) is transformed by the sim-to-real generator and subsequently by the real-to-sim generator. The real image (bottom left) undergoes the transformation in reverse order. Having separate pipelines that start with the simulated and real images improves the GAN’s performance.

Evaluating RetinaGAN on a Real Robot
Given the goal of building a more flexible sim-to-real transfer technique, we evaluate RetinaGAN in multiple ways to understand for which tasks and under what conditions it accomplishes sim-to-real transfer.

We first apply RetinaGAN to a grasping task. As demonstrated visually below, RetinaGAN emphasizes the translation of realistic object textures, shadows, and lighting, while maintaining the visual quality and saliency of the graspable objects. We couple a pre-trained RetinaGAN model with the distributed reinforcement learning method Q2-Opt to train a vision-based task model for instance grasping. On real robots, this policy grasps object instances with 80% success when trained on a hundred thousand episodes — outperforming prior adaptation methods RL-CycleGAN and CycleGAN (both achieving ~68%) and training without domain adaptation (grey bars below: 19% with sim data, 22% with real data, and 54% with mixed data). This gives us confidence that perception consistency is a valuable strategy for sim-to-real transfer. Further, with just 10,000 training episodes (8% of the data), the RL policy with RetinaGAN grasps with 66% success, demonstrating performance of prior methods with significantly less data.

Evaluation performance of RL policies on instance grasping, trained with various datasets and sim-to-real methods. Low-Data RetinaGAN uses 8% of the real dataset.
The simulated grasping environment (left) is translated to a realistic image (right) using RetinaGAN.

Next, we pair RetinaGAN with a different learning method, behavioral cloning, to open conference room doors given demonstrations by human operators. Using images from both simulated and real demonstrations, we train RetinaGAN to translate the synthetic images to look realistic, bridging the sim-to-real gap. We then train a behavior cloning model to imitate the task-solving actions of the human operators within real and RetinaGAN-adapted sim demonstrations. When evaluating this model by predicting actions to take, the robot enters real conference rooms over 93% of the time, surpassing baselines of 75% and below.

Both of the above images show the same simulation, but RetinaGAN translates simulated door opening images (left) to look more like real robot sensor data (right).
Three examples of the real robot successfully opening conference room doors using the RetinaGAN-trained behavior cloning policy.

Conclusion
This work has demonstrated how additional constraints on GANs may address the visual sim-to-real gap without requiring task-specific tuning; these approaches reach higher real robot success rates with less data collection. RL-CycleGAN translates synthetic images to realistic ones with an RL-consistency loss that automatically preserves task-relevant features. RetinaGAN is an object-aware sim-to-real adaptation technique that transfers robustly across environments and tasks, agnostic to the task learning method. Since RetinaGAN is not trained with any task-specific knowledge, we show how it can be reused for a novel object pushing task. We hope that work on the sim-to-real gap further generalizes toward solving task-agnostic robotic manipulation in unstructured environments.

Acknowledgements
Research into RL-CycleGAN was conducted by Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, and Mohi Khansari. Research into RetinaGAN was conducted by Daniel Ho, Kanishka Rao, Zhuo Xu, Eric Jang, Mohi Khansari, and Yunfei Bai. We’d also like to give special thanks to Ivonne Fajardo, Noah Brown, Benjamin Swanson, Christopher Paguyo, Armando Fuentes, and Sphurti More for overseeing the robot operations. We thank Paul Wohlhart, Konstantinos Bousmalis, Daniel Kappler, Alexander Herzog, Anthony Brohan, Yao Lu, Chad Richards, Vincent Vanhoucke, and Mrinal Kalakrishnan, Max Braun and others in the Robotics at Google team and the Everyday Robot Project for valuable discussions and help.

Source: Google AI Blog


Google at ICLR 2021

The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more.

As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold).

Officers and Board Members
Includes: Hugo Larochelle, Tara Sainath

Organizing Committee
Includes: Sanmi Koyejo, Chelsea Finn

Area Chairs
Includes: Abhishek Kumar, Aditya Menon, Aleksandra Faust, Alexey Dosovitskiy, Andrew Cotter, Andrew Dai, Augustus Odena, Been Kim, Behnam Neyshabur, Ben Poole, Bo Dai, Bo Li, Branislav Kveton, Ce Liu, Claudio Gentile, Colin Raffel, Danny Tarlow, David Ha, Dengyong Zhou, Dumitru Erhan, Dustin Tran, Felix Hill, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jascha Sohl-Dickstein, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Kevin Swersky, Marco Cuturi, Mario Lucic, Marlos C. Machado, Mathieu Blondel, Matt Johnson, Matthieu Geist, Mohammad Norouzi, Naman Agarwal, Navdeep Jaitly, Nicolas Le Roux, Niki Parmar, Olivier Bachem, Olivier Pietquin, Philip Long, Quentin Berthet, Razvan Pascanu, Rodolphe Jenatton, Samy Bengio*, Sebastian Nowozin, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Suman Ravuri, Tim Salimans, Vitaly Kuznetsov, William Cohen, Yann Dauphin, Yujia Li

Publications
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (see the blog post)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Evolving Reinforcement Learning Algorithms (see the blog post)
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

Sharpness-aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models Zirui Wang*, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Mathematical Reasoning via Self-supervised Skip-tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

LambdaNetworks: Modeling Long-Range Interactions without Attention
Irwan Bello

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

LEAF: A Learnable Frontend for Audio Classification (see the blog post)
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Batch Reinforcement Learning Through Continuation Method
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli*, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado*, Pablo Samuel Castro, Marc G Bellemare

Scaling Symbolic Methods Using Gradients for Neural Model Explanation
Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

Primal Wasserstein Imitation Learning (see the blog post)
Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Teaching Temporal Logics to Neural Networks
Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner

Shape-Texture Debiased Neural Network Training
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

Rethinking Embedding Coupling in Pre-trained Language Models
Hyung Won Chung, Thibault Fevry*, Henry Tsai, Melvin Johnson, Sebastian Ruder

Overparameterisation and Worst-Case Generalisation: Friend or Foe?
Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Single-Photon Image Classification
Thomas Fischbacher, Luciano Sbaiz

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis*, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

Adaptive Federated Optimization
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

Open Question Answering over Tables and Text
Wenhu Chen*, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

A Universal Representation Transformer Layer for Few-Shot Image Classification
Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle

Tradeoffs in Data Augmentation: An Empirical Study
Raphael Gontijo-Lopes, Sylvia Smullin, Ekin Dogus Cubuk, Ethan Dyer

Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

Rethinking Attention with Performers (see the blog post)
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, Adrian Weller

Teaching with Commentaries
Aniruddh Raghu*, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
Vinay Venkatesh Ramasesh, Ethan Dyer, Maithra Raghu

Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold

The Geometry of Integration in Text Classification RNNs
Kyle Aitken*, Vinay Venkatesh Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L Smith, Benoit Dherin, David Barrett, Soham De

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (see the blog post)
Preetum Nakkiran*, Behnam Neyshabur, Hanie Sedghi

Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma

Latent Skill Planning for Exploration and Transfer
Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
Yuliang Zou*, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister

WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen*, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, William Chan

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala, Abhimanyu Das, Brendan Juba*, Rina Panigrahy, Vatsal Sharan*, Xin Wang, Qiuyi Zhang

Long Range Arena : A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Explainable Deep One-Class Classification
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus Robert Muller

Net-DNF: Effective Deep Modeling of Tabular Data
Liran Katzir, Gal Elidan, Ran El-Yaniv

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
Lucio M. Dery, Yann Dauphin, David Grangier

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Average-Case Acceleration for Bilinear Games and Normal Matrices
Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay*, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Training Independent Subnetworks for Robust Prediction
Marton Havasi*, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran

Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine

TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
Martin Trimmel, Henning Petzka, Cristian Sminchisescu

Mastering Atari with Discrete World Models (see the blog post)
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang*, Manzil Zaheer, Yuan Wang, Amr Ahmed

Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

HyperGrid Transformers: Towards A Single Model for Multiple Tasks
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Maruan Al-Shedivat*, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith

A Unifying View on Implicit Bias in Training Linear Neural Networks
Chulhee Yun*, Shankar Krishnan, Hossein Mobahi

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine

Mathematical Reasoning via Self-Supervised Skip-Tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Lipschitz Recurrent Neural Networks
N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael R Zhang*, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi

The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G Bellemare

Monotonic Kronecker-Factored Lattice
William Taylor Bakst, Nobuyuki Morioka, Erez Louidor

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Adversarially Guided Actor-Critic
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee*, Seunghoon Hong

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Dual-Mode ASR: Unify and Improve Streaming ASR with Full-Context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N Sainath, Yonghui Wu, Ruoming Pang

Implicit Gradient Regularization
David Barrett, Benoit Dherin

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

Deconstructing the Regularization of BatchNorm
Yann Dauphin, Ekin Dogus Cubuk

C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

Control-Aware Representations for Model-based Reinforcement Learning
Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Evaluations and Methods for Explanation through Robustness Analysis
Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

Learning and Evaluating Representations for Deep One-Class Classification
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister

No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models
Will Sussman Grathwohl, Jacob Jin Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Neural Thompson Sampling
Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

A Design Space Study for LISTA and Beyond
Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

Calibration of Neural Networks using Splines
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

Extreme Memorization via Scale of Initialization
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

Molecule Optimization by Explainable Evolution
Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song

Combining Ensembles and Data Augmentation Can Harm Your Calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Workshops
Science and Engineering of Deep Learning
Speakers and Panelists include: Alex Hanna
Moderator and Advisors include: Emily Denton
Organizers include: Negar Rostemzadeh, Samy Bengio*

Synthetic Data Generation: Quality, Privacy, Bias
Speakers include: Jinsung Yoon, Emily Denton
Program Committee includes: Syed Ashrafulla

Enormous Language Models: Perspectives and Benchmarks
Speakers and Panelists include: Noam Shazeer, Natalie Schluter
Organizers include: Colin Raffel, Adam Roberts, Jascha Sohl-Dickstein, Katherine Lee, William Fedus, Aitor Lewkowycz

The Role of Mathematical Reasoning in General Artificial Intelligence
Speakers and Panelists include: Markus Rabe, Christian Szegedy

Weakly Supervised Learning
Invited Speakers include: Lu Jiang

Learning to Learn
Organizers include: Yevgen Chebotar

Embodied Multimodal Learning (EML)
Invited Speakers includes: Sergey Levine

Distributed and Private Machine Learning
Program Committee includes: Peter Kairouz, Ananda Theertha Suresh

S2D-OLAD: From Shallow to Deep, Overcoming Limited and Adverse Data
Invited Speakers include: Alex Hanna, Hugo Larochelle
Organizers include: Vincent Dumoulin

Responsible AI (RAI)
Speakers include: Been Kim

Energy-Based Models: Current Perspectives, Challenges, and Opportunities
Organizers include: Adji Bousso Dieng, Igor Mordatch

A Roadmap to Never-Ending RL
Invited Session Panelists include: Aleksandra Faust
Program Committee includes: Coline Devin, Karol Hausman, Ben Eysenbach, Ofir Nachum, Ryan Julian, Tianhe Yu, Dumitru Erhan, Marc Pickett, Shixiang Gu

2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
Program Committee includes: Pablo Samuel Castro

Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
Speakers include: David Ha, Hugo Larochelle
Organizers include: Sara Hooker


* Indicates work done while at Google

Source: Google AI Blog


Google at ICLR 2021

The 9th International Conference on Learning Representations (ICLR 2021), a virtual conference focused on deep learning, kicked off this week, offering conference and workshop tracks that present some of the latest research in deep learning and its applications to areas such as computer vision, computational biology, speech recognition, text understanding, and more.

As a Platinum Sponsor of ICLR 2021, Google will have a strong presence with over 100 accepted publications and participation on organizing committees and in workshops. If you have registered for ICLR 2021, we hope you’ll watch our talks and learn about the work at Google that goes into solving interesting problems for billions of people. Learn more about our research being presented in the list below (Googlers in bold).

Officers and Board Members
Includes: Hugo Larochelle, Tara Sainath

Organizing Committee
Includes: Sanmi Koyejo, Chelsea Finn

Area Chairs
Includes: Abhishek Kumar, Aditya Menon, Aleksandra Faust, Alexey Dosovitskiy, Andrew Cotter, Andrew Dai, Augustus Odena, Been Kim, Behnam Neyshabur, Ben Poole, Bo Dai, Bo Li, Branislav Kveton, Ce Liu, Claudio Gentile, Colin Raffel, Danny Tarlow, David Ha, Dengyong Zhou, Dumitru Erhan, Dustin Tran, Felix Hill, George Tucker, Hanie Sedghi, Heinrich Jiang, Hossein Mobahi, Izhak Shafran, Jascha Sohl-Dickstein, Jasper Snoek, Jean-Philippe Vert, Jeffrey Pennington, Justin Gilmer, Kevin Swersky, Marco Cuturi, Mario Lucic, Marlos C. Machado, Mathieu Blondel, Matt Johnson, Matthieu Geist, Mohammad Norouzi, Naman Agarwal, Navdeep Jaitly, Nicolas Le Roux, Niki Parmar, Olivier Bachem, Olivier Pietquin, Philip Long, Quentin Berthet, Razvan Pascanu, Rodolphe Jenatton, Samy Bengio*, Sebastian Nowozin, Silvio Lattanzi, Slav Petrov, Srinadh Bhojanapalli, Suman Ravuri, Tim Salimans, Vitaly Kuznetsov, William Cohen, Yann Dauphin, Yujia Li

Publications
Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (see the blog post)
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, Neil Houlsby

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Evolving Reinforcement Learning Algorithms (see the blog post)
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

When Do Curricula Work?
Xiaoxia Wu, Ethan Dyer, Behnam Neyshabur

Sharpness-aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models Zirui Wang*, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Mathematical Reasoning via Self-supervised Skip-tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

LambdaNetworks: Modeling Long-Range Interactions without Attention
Irwan Bello

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

LEAF: A Learnable Frontend for Audio Classification (see the blog post)
Neil Zeghidour, Olivier Teboul, Félix de Chaumont Quitry, Marco Tagliasacchi

Batch Reinforcement Learning Through Continuation Method
Yijie Guo, Shengyu Feng, Nicolas Le Roux, Ed Chi, Honglak Lee, Minmin Chen

Scalable Transfer Learning with Expert Models
Joan Puigcerver, Carlos Riquelme Ruiz, Basil Mustafa, Cedric Renggli*, André Susano Pinto, Sylvain Gelly, Daniel Keysers, Neil Houlsby

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado*, Pablo Samuel Castro, Marc G Bellemare

Scaling Symbolic Methods Using Gradients for Neural Model Explanation
Subham Sekhar Sahoo, Subhashini Venugopalan, Li Li, Rishabh Singh, Patrick Riley

Primal Wasserstein Imitation Learning (see the blog post)
Robert Dadashi, Leonard Hussenot, Matthieu Geist, Olivier Pietquin

Reset-Free Lifelong Learning with Skill-Space Planning
Kevin Lu, Aditya Grover, Pieter Abbeel, Igor Mordatch

Teaching Temporal Logics to Neural Networks
Christopher Hahn, Frederik Schmitt, Jens U. Kreber, Markus Norman Rabe, Bernd Finkbeiner

Shape-Texture Debiased Neural Network Training
Yingwei Li, Qihang Yu, Mingxing Tan, Jieru Mei, Peng Tang, Wei Shen, Alan Yuille, Cihang Xie

Rethinking Embedding Coupling in Pre-trained Language Models
Hyung Won Chung, Thibault Fevry*, Henry Tsai, Melvin Johnson, Sebastian Ruder

Overparameterisation and Worst-Case Generalisation: Friend or Foe?
Aditya Krishna Menon, Ankit Singh Rawat, Sanjiv Kumar

Single-Photon Image Classification
Thomas Fischbacher, Luciano Sbaiz

Into the Wild with AudioScope: Unsupervised Audio-Visual Separation of On-Screen Sounds
Efthymios Tzinis*, Scott Wisdom, Aren Jansen, Shawn Hershey, Tal Remez, Daniel P. W. Ellis, John R. Hershey

Adaptive Federated Optimization
Sashank J. Reddi, Zachary Charles, Manzil Zaheer, Zachary Garrett, Keith Rush, Jakub Konečný, Sanjiv Kumar, Hugh Brendan McMahan

Share or Not? Learning to Schedule Language-Specific Capacity for Multilingual Translation
Biao Zhang*, Ankur Bapna, Rico Sennrich, Orhan Firat

Off-Dynamics Reinforcement Learning: Training for Transfer with Domain Classifiers
Benjamin Eysenbach, Shreyas Chaudhari, Swapnil Asawa, Sergey Levine, Ruslan Salakhutdinov

Open Question Answering over Tables and Text
Wenhu Chen*, Ming-Wei Chang, Eva Schlinger, William Yang Wang, William W. Cohen

Practical Real Time Recurrent Learning with a Sparse Approximation
Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

IDF++: Analyzing and Improving Integer Discrete Flows for Lossless Compression
Rianne van den Berg, Alexey A. Gritsenko, Mostafa Dehghani, Casper Kaae Sønderby, Tim Salimans

A Universal Representation Transformer Layer for Few-Shot Image Classification
Lu Liu, William L. Hamilton, Guodong Long, Jing Jiang, Hugo Larochelle

Tradeoffs in Data Augmentation: An Empirical Study
Raphael Gontijo-Lopes, Sylvia Smullin, Ekin Dogus Cubuk, Ethan Dyer

Coping with Label Shift via Distributionally Robust Optimisation
Jingzhao Zhang, Aditya Krishna Menon, Andreas Veit, Srinadh Bhojanapalli, Sanjiv Kumar, Suvrit Sra

Rethinking Attention with Performers (see the blog post)
Krzysztof Marcin Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Quincy Davis, Afroz Mohiuddin, Lukasz Kaiser, David Benjamin Belanger, Lucy J Colwell, Adrian Weller

Teaching with Commentaries
Aniruddh Raghu*, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton

Anatomy of Catastrophic Forgetting: Hidden Representations and Task Semantics
Vinay Venkatesh Ramasesh, Ethan Dyer, Maithra Raghu

Model-Based Offline Planning
Arthur Argenson, Gabriel Dulac-Arnold

The Geometry of Integration in Text Classification RNNs
Kyle Aitken*, Vinay Venkatesh Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

On the Origin of Implicit Regularization in Stochastic Gradient Descent
Samuel L Smith, Benoit Dherin, David Barrett, Soham De

Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song*, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, Ben Poole

The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers (see the blog post)
Preetum Nakkiran*, Behnam Neyshabur, Hanie Sedghi

Learning Energy-Based Models by Diffusion Recovery Likelihood
Ruiqi Gao, Yang Song, Ben Poole, Ying Nian Wu, Diederik P Kingma

Latent Skill Planning for Exploration and Transfer
Kevin Xie, Homanga Bharadhwaj, Danijar Hafner, Animesh Garg, Florian Shkurti

PseudoSeg: Designing Pseudo Labels for Semantic Segmentation
Yuliang Zou*, Zizhao Zhang, Han Zhang, Chun-Liang Li, Xiao Bian, Jia-Bin Huang, Tomas Pfister

WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen*, Yu Zhang, Heiga Zen, Ron J Weiss, Mohammad Norouzi, William Chan

One Network Fits All? Modular versus Monolithic Task Formulations in Neural Networks
Atish Agarwala, Abhimanyu Das, Brendan Juba*, Rina Panigrahy, Vatsal Sharan*, Xin Wang, Qiuyi Zhang

Long Range Arena : A Benchmark for Efficient Transformers
Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler

Explainable Deep One-Class Classification
Philipp Liznerski, Lukas Ruff, Robert A. Vandermeulen, Billy Joe Franks, Marius Kloft, Klaus Robert Muller

Net-DNF: Effective Deep Modeling of Tabular Data
Liran Katzir, Gal Elidan, Ran El-Yaniv

Deployment-Efficient Reinforcement Learning via Model-Based Offline Optimization
Tatsuya Matsushima, Hiroki Furuta, Yutaka Matsuo, Ofir Nachum, Shixiang Gu

Auxiliary Task Update Decomposition: The Good, the Bad and the Neutral
Lucio M. Dery, Yann Dauphin, David Grangier

Long-Tail Learning via Logit Adjustment
Aditya Krishna Menon, Sadeep Jayasumana, Ankit Singh Rawat, Himanshu Jain, Andreas Veit, Sanjiv Kumar

Average-Case Acceleration for Bilinear Games and Normal Matrices
Carles Domingo-Enrich, Fabian Pedregosa, Damien Scieur

OPAL: Offline Primitive Discovery for Accelerating Offline Reinforcement Learning
Anurag Ajay*, Aviral Kumar, Pulkit Agrawal, Sergey Levine, Ofir Nachum

Training Independent Subnetworks for Robust Prediction
Marton Havasi*, Rodolphe Jenatton, Stanislav Fort, Jeremiah Zhe Liu, Jasper Snoek, Balaji Lakshminarayanan, Andrew Mingbo Dai, Dustin Tran

Benchmarks for Deep Off-Policy Evaluation
Justin Fu, Mohammad Norouzi, Ofir Nachum, George Tucker, Ziyu Wang, Alexander Novikov, Mengjiao Yang, Michael R Zhang, Yutian Chen, Aviral Kumar, Cosmin Paduraru, Sergey Levine, Thomas Paine

TropEx: An Algorithm for Extracting Linear Terms in Deep Neural Networks
Martin Trimmel, Henning Petzka, Cristian Sminchisescu

Mastering Atari with Discrete World Models (see the blog post)
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit
Danijar Hafner, Timothy P Lillicrap, Mohammad Norouzi, Jimmy Ba

Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning
Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

Anchor & Transform: Learning Sparse Embeddings for Large Vocabularies
Paul Pu Liang*, Manzil Zaheer, Yuan Wang, Amr Ahmed

Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret*, Ariel Kleiner, Hossein Mobahi, Behnam Neyshabur

HyperGrid Transformers: Towards A Single Model for Multiple Tasks
Yi Tay, Zhe Zhao, Dara Bahri, Donald Metzler, Da-Cheng Juan

Federated Learning via Posterior Averaging: A New Perspective and Practical Algorithms
Maruan Al-Shedivat*, Jennifer Gillenwater, Eric Xing, Afshin Rostamizadeh

BUSTLE: Bottom-Up Program Synthesis Through Learning-Guided Exploration
Augustus Odena, Kensen Shi, David Bieber, Rishabh Singh, Charles Sutton, Hanjun Dai

Are Neural Rankers Still Outperformed by Gradient Boosted Decision Trees?
Zhen Qin, Le Yan, Honglei Zhuang, Yi Tay, Rama Kumar Pasumarthi, Xuanhui Wang, Michael Bendersky, Marc Najork

Do Wide and Deep Networks Learn the Same Things? Uncovering How Neural Network Representations Vary with Width and Depth
Thao Nguyen, Maithra Raghu, Simon Kornblith

A Unifying View on Implicit Bias in Training Linear Neural Networks
Chulhee Yun*, Shankar Krishnan, Hossein Mobahi

Implicit Under-Parameterization Inhibits Data-Efficient Deep Reinforcement Learning
Aviral Kumar, Rishabh Agarwal, Dibya Ghosh, Sergey Levine

Mathematical Reasoning via Self-Supervised Skip-Tree Training
Markus Norman Rabe, Dennis Lee, Kshitij Bansal, Christian Szegedy

Lipschitz Recurrent Neural Networks
N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

Autoregressive Dynamics Models for Offline Policy Evaluation and Optimization
Michael R Zhang*, Thomas Paine, Ofir Nachum, Cosmin Paduraru, George Tucker, ziyu wang, Mohammad Norouzi

The Importance of Pessimism in Fixed-Dataset Policy Optimization
Jacob Buckman, Carles Gelada, Marc G Bellemare

Monotonic Kronecker-Factored Lattice
William Taylor Bakst, Nobuyuki Morioka, Erez Louidor

What Matters for On-Policy Deep Actor-Critic Methods? A Large-Scale Study
Marcin Andrychowicz, Anton Raichuk, Piotr Stańczyk, Manu Orsini, Sertan Girgin, Raphaël Marinier, Leonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, Sylvain Gelly, Olivier Bachem

Adversarially Guided Actor-Critic
Yannis Flet-Berliac, Johan Ferret, Olivier Pietquin, Philippe Preux, Matthieu Geist

Scalable Learning and MAP Inference for Nonsymmetric Determinantal Point Processes
Mike Gartrell, Insu Han, Elvis Dohmatob, Jennifer Gillenwater, Victor-Emmanuel Brunel

GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, Zhifeng Chen

Revisiting Hierarchical Approach for Persistent Long-Term Video Prediction
Wonkwang Lee, Whie Jung, Han Zhang, Ting Chen, Jing Yu Koh, Thomas Huang, Hyungsuk Yoon, Honglak Lee*, Seunghoon Hong

Gradient Vaccine: Investigating and Improving Multi-task Optimization in Massively Multilingual Models
Zirui Wang, Yulia Tsvetkov, Orhan Firat, Yuan Cao

Dataset Meta-Learning from Kernel Ridge-Regression
Timothy Nguyen, Zhourong Chen, Jaehoon Lee

Dual-Mode ASR: Unify and Improve Streaming ASR with Full-Context Modeling
Jiahui Yu, Wei Han, Anmol Gulati, Chung-Cheng Chiu, Bo Li, Tara N Sainath, Yonghui Wu, Ruoming Pang

Implicit Gradient Regularization
David Barrett, Benoit Dherin

Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning
Rishabh Agarwal, Marlos C. Machado, Pablo Samuel Castro, Marc G Bellemare

Deconstructing the Regularization of BatchNorm
Yann Dauphin, Ekin Dogus Cubuk

C-Learning: Learning to Achieve Goals via Recursive Classification
Benjamin Eysenbach, Ruslan Salakhutdinov, Sergey Levine

Evolving Reinforcement Learning Algorithms
John D Co-Reyes, Yingjie Miao, Daiyi Peng, Esteban Real, Quoc V Le, Sergey Levine, Honglak Lee, Aleksandra Faust

Colorization Transformer
Manoj Kumar, Dirk Weissenborn, Nal Kalchbrenner

Control-Aware Representations for Model-based Reinforcement Learning
Brandon Cui, Yinlam Chow, Mohammad Ghavamzadeh

Evaluations and Methods for Explanation through Robustness Analysis
Cheng-Yu Hsieh, Chih-Kuan Yeh, Xuanqing Liu, Pradeep Kumar Ravikumar, Seungyeon Kim, Sanjiv Kumar, Cho-Jui Hsieh

Learning and Evaluating Representations for Deep One-Class Classification
Kihyuk Sohn, Chun-Liang Li, Jinsung Yoon, Minho Jin, Tomas Pfister

No MCMC for Me: Amortized Sampling for Fast and Stable Training of Energy-Based Models
Will Sussman Grathwohl, Jacob Jin Kelly, Milad Hashemi, Mohammad Norouzi, Kevin Swersky, David Duvenaud

Neural Thompson Sampling
Weitong ZHANG, Dongruo Zhou, Lihong Li, Quanquan Gu

A Design Space Study for LISTA and Beyond
Tianjian Meng, Xiaohan Chen, Yifan Jiang, Zhangyang Wang

i-Mix: A Domain-Agnostic Strategy for Contrastive Representation Learning
Kibok Lee, Yian Zhu, Kihyuk Sohn, Chun-Liang Li, Jinwoo Shin, Honglak Lee

Factorizing Declarative and Procedural Knowledge in Structured, Dynamical Environments
Anirudh Goyal, Alex Lamb, Phanideep Gampa, Philippe Beaudoin, Charles Blundell, Sergey Levine, Yoshua Bengio, Michael Curtis Mozer

Calibration of Neural Networks using Splines
Kartik Gupta, Amir Rahimi, Thalaiyasingam Ajanthan, Thomas Mensink, Cristian Sminchisescu, Richard Hartley

Extreme Memorization via Scale of Initialization
Harsh Mehta, Ashok Cutkosky, Behnam Neyshabur

Molecule Optimization by Explainable Evolution
Binghong Chen, Tianzhe Wang, Chengtao Li, Hanjun Dai, Le Song

Combining Ensembles and Data Augmentation Can Harm Your Calibration
Yeming Wen, Ghassen Jerfel, Rafael Muller, Michael W Dusenberry, Jasper Snoek, Balaji Lakshminarayanan, Dustin Tran

Workshops
Science and Engineering of Deep Learning
Speakers and Panelists include: Alex Hanna
Moderator and Advisors include: Emily Denton
Organizers include: Negar Rostemzadeh, Samy Bengio*

Synthetic Data Generation: Quality, Privacy, Bias
Speakers include: Jinsung Yoon, Emily Denton
Program Committee includes: Syed Ashrafulla

Enormous Language Models: Perspectives and Benchmarks
Speakers and Panelists include: Noam Shazeer, Natalie Schluter
Organizers include: Colin Raffel, Adam Roberts, Jascha Sohl-Dickstein, Katherine Lee, William Fedus, Aitor Lewkowycz

The Role of Mathematical Reasoning in General Artificial Intelligence
Speakers and Panelists include: Markus Rabe, Christian Szegedy

Weakly Supervised Learning
Invited Speakers include: Lu Jiang

Learning to Learn
Organizers include: Yevgen Chebotar

Embodied Multimodal Learning (EML)
Invited Speakers includes: Sergey Levine

Distributed and Private Machine Learning
Program Committee includes: Peter Kairouz, Ananda Theertha Suresh

S2D-OLAD: From Shallow to Deep, Overcoming Limited and Adverse Data
Invited Speakers include: Alex Hanna, Hugo Larochelle
Organizers include: Vincent Dumoulin

Responsible AI (RAI)
Speakers include: Been Kim

Energy-Based Models: Current Perspectives, Challenges, and Opportunities
Organizers include: Adji Bousso Dieng, Igor Mordatch

A Roadmap to Never-Ending RL
Invited Session Panelists include: Aleksandra Faust
Program Committee includes: Coline Devin, Karol Hausman, Ben Eysenbach, Ofir Nachum, Ryan Julian, Tianhe Yu, Dumitru Erhan, Marc Pickett, Shixiang Gu

2nd Workshop on Practical ML for Developing Countries: Learning Under Limited/low Resource Scenarios
Program Committee includes: Pablo Samuel Castro

Beyond Static Papers: Rethinking How We Share Scientific Understanding in ML
Speakers include: David Ha, Hugo Larochelle
Organizers include: Sara Hooker


* Indicates work done while at Google

Source: Google AI Blog


Monster Mash: A Sketch-Based Tool for Casual 3D Modeling and Animation

3D computer animation is a time-consuming and highly technical medium — to complete even a single animated scene requires numerous steps, like modeling, rigging and animating, each of which is itself a sub-discipline that can take years to master. Because of its complexity, 3D animation is generally practiced by teams of skilled specialists and is inaccessible to almost everyone else, despite decades of advances in technology and tools. With the recent development of tools that facilitate game character creation and game balance, a natural question arises: is it possible to democratize the 3D animation process so it’s accessible to everyone?

To explore this concept, we start with the observation that most forms of artistic expression have a casual mode: a classical guitarist might jam without any written music, a trained actor could ad-lib a line or two while rehearsing, and an oil painter can jot down a quick gesture drawing. What these casual modes have in common is that they allow an artist to express a complete thought quickly and intuitively without fear of making a mistake. This turns out to be essential to the creative process — when each sketch is nearly effortless, it is possible to iteratively explore the space of possibilities far more effectively.

In this post, we describe Monster Mash, an open source tool presented at SIGGRAPH Asia 2020 that allows experts and amateurs alike to create rich, expressive, deformable 3D models from scratch — and to animate them — all in a casual mode, without ever having to leave the 2D plane. With Monster Mash, the user sketches out a character, and the software automatically converts it to a soft, deformable 3D model that the user can immediately animate by grabbing parts of it and moving them around in real time. There is also an online demo, where you can try it out for yourself.



Creating a walk cycle using Monster Mash. Step 1: Draw a character. Step 2: Animate it.

Creating a 2D Sketch
The insight that makes this casual sketching approach possible is that many 3D models, particularly those of organic forms, can be described by an ordered set of overlapping 2D regions. This abstraction makes the complex task of 3D modeling much easier: the user creates 2D regions by drawing their outlines, then the algorithm creates a 3D model by stitching the regions together and inflating them. The result is a simple and intuitive user interface for sketching 3D figures.

For example, suppose the user wants to create a 3D model of an elephant. The first step is to draw the body as a closed stroke (a). Then the user adds strokes to depict other body parts such as legs (b). Drawing those additional strokes as open curves provides a hint to the system that they are meant to be smoothly connected with the regions they overlap. The user can also specify that some new parts should go behind the existing ones by drawing them with the right mouse button (c), and mark other parts as symmetrical by double-clicking on them (d). The result is an ordered list of 2D regions.

Steps in creating a 2D sketch of an elephant.

Stitching and Inflation
To understand how a 3D model is created from these 2D regions, let’s look more closely at one part of the elephant. First, the system identifies where the leg must be connected to the body (a) by finding the segment (red) that completes the open curve. The system cuts the body’s front surface along that segment, and then stitches the front of the leg together with the body (b). It then inflates the model into 3D by solving a modified form of Poisson’s equation to produce a surface with a rounded cross-section (c). The resulting model (d) is smooth and well-shaped, but because all of the 3D parts are rooted in the drawing plane, they may intersect each other, resulting in a somewhat odd-looking “elephant”. These intersections will be resolved by the deformation system.

Illustration of the details of the stitching and inflation process. The schematic illustrations (b, c) are cross-sections viewed from the elephant’s front.

Layered Deformation
At this point we just have a static model — we need to give the user an easy way to pose the model, and also separate the intersecting parts somehow. Monster Mash’s layered deformation system, based on the well-known smooth deformation method as-rigid-as-possible (ARAP), solves both of these problems at once. What’s novel about our layered “ARAP-L” approach is that it combines deformation and other constraints into a single optimization framework, allowing these processes to run in parallel at interactive speed, so that the user can manipulate the model in real time.

The framework incorporates a set of layering and equality constraints, which move body parts along the z axis to prevent them from visibly intersecting each other. These constraints are applied only at the silhouettes of overlapping parts, and are dynamically updated each frame.

In steps (d) through (h) above, ARAP-L transforms a model from one with intersecting 3D parts to one with the depth ordering specified by the user. The layering constraints force the leg’s silhouette to stay in front of the body (green), and the body’s silhouette to stay behind the leg (yellow). Equality constraints (red) seal together the loose boundaries between the leg and the body.

Meanwhile, in a separate thread of the framework, we satisfy point constraints to make the model follow user-defined control points (described in the section below) in the xy-plane. This ARAP-L method allows us to combine modeling, rigging, deformation, and animation all into a single process that is much more approachable to the non-specialist user.

The model deforms to match the point constraints (red dots) while the layering constraints prevent the parts from visibly intersecting.

Animation
To pose the model, the user can create control points anywhere on the model’s surface and move them. The deformation system converges over multiple frames, which gives the model’s movement a soft and floppy quality, allowing the user to intuitively grasp its dynamic properties — an essential prerequisite for kinesthetic learning.

Because the effect of deformations converges over multiple frames, our system lends 3D models a soft and dynamic quality.

To create animation, the system records the user’s movements in real time. The user can animate one control point, then play back that movement while recording additional control points. In this way, the user can build up a complex action like a walk by layering animation, one body part at a time. At every stage of the animation process, the only task required of the user is to move points around in 2D, a low-risk workflow meant to encourage experimentation and play.

Conclusion
We believe this new way of creating animation is intuitive and can thus help democratize the field of computer animation, encouraging novices who would normally be unable to try it on their own as well as experts who often require fast iteration under tight deadlines. Here you can see a few of the animated characters that have been created using Monster Mash. Most of these were created in a matter of minutes.

A selection of animated characters created using Monster Mash. The original hand-drawn outline used to create each 3D model is visible as an inset above each character.

All of the code for Monster Mash is available as open source, and you can watch our presentation and read our paper from SIGGRAPH Asia 2020 to learn more. We hope this software will make creating 3D animations more broadly accessible. Try out the online demo and see for yourself!

Acknowledgements
Monster Mash is the result of a collaboration between Google Research, Czech Technical University in Prague, ETH Zürich, and the University of Washington. Key contributors include Marek Dvorožňák, Daniel Sýkora, Cassidy Curtis, Brian Curless, Olga Sorkine-Hornung, and David Salesin. We are also grateful to Hélène Leroux, Neth Nom, David Murphy, Samuel Leather, Pavla Sýkorová, and Jakub Javora for participating in the early interactive sessions.

Source: Google AI Blog


Google Research: Looking Back at 2020, and Forward to 2021

When I joined Google over 20 years ago, we were just figuring out how to really start on the journey of making a high quality and comprehensive search service for information on the web, using lots of curiously wired computers. Fast forward to today, and while we’re taking on a much broader array of technical challenges, it’s still with the same overarching goal of organizing the world's information and making it universally accessible and useful. In 2020, as the world has been reshaped by COVID-19, we saw the ways research-developed technologies could help billions of people better communicate, understand the world, and get things done. I’m proud of what we’ve accomplished, and excited about new possibilities on the horizon.

The goal of Google Research is to work on long-term, ambitious problems across a wide range of important topics — from predicting the spread of COVID-19, to designing algorithms, to learning to translate more and more languages automatically, to mitigating bias in ML models. In the spirit of our annual reviews for 2019, 2018, and more narrowly focused reviews of some work in 2017 and 2016, this post covers key Google Research highlights from this unusual year. This is a long post, but grouped into many different sections. Hopefully, there’s something interesting in here for everyone! For a more comprehensive look, please see our >750 research publications in 2020.

COVID-19 and Health
As the impact of COVID-19 took a tremendous toll on people’s lives, researchers and developers around the world rallied together to develop tools and technologies to help public health officials and policymakers understand and respond to the pandemic. Apple and Google partnered in 2020 to develop the Exposure Notifications System (ENS), a Bluetooth-enabled privacy-preserving technology that allows people to be notified if they have been exposed to others who have tested positive for COVID-19. ENS supplements traditional contact tracing efforts and has been deployed by public health authorities in more than 50 countries, states and regions to help curb the spread of infection.

In the early days of the pandemic, public health officials signalled their need for more comprehensive data to combat the virus’ rapid spread. Our Community Mobility Reports, which provide anonymized insights into movement trends, are helping researchers not only understand the impact of policies like stay-at-home directives and social distancing, and also conduct economic forecasting.

Community Mobility Reports: Navigate and download a report for regions of interest.

Our own researchers have also explored using this anonymized data to forecast COVID-19 spread using graph neural networks instead of traditional time series-based models.

Although the research community knew little about this disease and secondary effects initially, we’re learning more every day. Our COVID-19 Search Trends symptoms allows researchers to explore temporal or symptomatic associations, such as anosmia — the loss of smell that is sometimes a symptom of the virus. To further support the broader research community, we launched Google Health Studies app to provide the public ways to participate in research studies.

Our COVID-19 Search Trends are helping researchers study the link between the disease’s spread and symptom-related searches.

Teams across Google are contributing tools and resources to the broader scientific community, which is working to address the health and economic impacts of the virus.

A spatio-temporal graph for modelling COVID-19 Spread.

Accurate information is critical in dealing with public health threats. We collaborated with many product teams at Google in order to improve information quality about COVID-19 in Google News and Search through supporting fact checking efforts, as well as similar efforts in YouTube.

We helped multilingual communities get equal access to critical COVID-19 information by sponsoring localization of Nextstrain.org’s weekly Situation Reports and developing a COVID-19 open source parallel dataset in collaboration with Translators Without Borders.

Modelling a complex global event is particularly challenging and requires more comprehensive epidemiological datasets, the development of novel interpretable models and agent-based simulators to inform the public health response. Machine learning techniques have also helped in other ways from deploying natural language understanding to helping researchers quickly navigate the mountains of COVID-19 scientific literature, applying anonymization technology to protect privacy while making useful datasets available, and exploring whether public health can conduct faster screening with fewer tests via Bayesian group testing.

These are only a sample of the many pieces of work that happened across Google to help users and public health authorities respond to COVID-19. For more, see using technology to help take on COVID-19.

Research in Machine Learning for Medical Diagnostics
We continue to make headway helping clinicians harness the power of ML to deliver better care for more patients. This year we have described notable advances in applying computer vision to aid doctors in the diagnosis and management of cancer, including helping to make sure that doctors don’t miss potentially cancerous polyps during colonoscopies, and showing that an ML system can achieve substantially higher accuracy than pathologists in Gleason grading of prostate tissue, enabling radiologists to achieve significant reductions in both false negative and false positive results when examining X-rays for signs of breast cancer.

To determine the aggressiveness of prostate cancers, pathologists examine a biopsy and assign it a Gleason grade. In published research, our system was able to grade with higher accuracy than a cohort of pathologists who have not had specialist training in prostate cancer. The first stage of the deep learning system assigns a Gleason grade to every region in a biopsy. In this biopsy, green indicates Gleason pattern 3, while yellow indicates Gleason pattern 4.

We’ve also been working on systems to help identify skin disease, help detect age-related macular degeneration (the leading cause of blindness in the U.S. and U.K., and the third-largest cause of blindness worldwide), and on potential novel non-invasive diagnostics (e.g., being able to detect signs of anemia from retinal images).

Our study examines how a deep learning model can quantify hemoglobin levels — a measure doctors use to detect anemia — from retinal images.

This year has also brought exciting demonstrations of how these same technologies can peer into the human genome. Google’s open-source tool, DeepVariant, identifies genomic variants in sequencing data using a convolutional neural network, and this year won the FDA Challenge for best accuracy in 3 out of 4 categories. Using this same tool, a study led by the Dana-Farber Cancer Institute improved diagnostic yield by 14% for genetic variants that lead to prostate cancer and melanoma in a cohort of 2,367 cancer patients.

Research doesn’t end at measurement of experimental accuracy. Ultimately, truly helping patients receive better care requires understanding how ML tools will affect people in the real world. This year we began work with Mayo Clinic to develop a machine learning system to assist in radiotherapy planning and to better understand how this technology could be deployed into clinical practice. With our partners in Thailand, we’ve used diabetic eye disease screening as a test case in how we can build systems with people at the center, and recognize the fundamental role of diversity, equity, and inclusion in building tools for a healthier world.

Weather, Environment and Climate Change
Machine learning can help us better understand the environment and make useful predictions to help people in both their everyday life as well as in disaster situations. For weather and precipitation forecasting, computationally intensive physics-based models like NOAA’s HRRR have long reigned supreme. We have been able to show, though, that ML-based forecasting systems can predict current precipitation with much better spatial resolution (“Is it raining in my local park in Seattle?” and not just “Is it raining in Seattle?”) and can produce short-term forecasts of up to eight hours that are considerably more accurate than HRRR, and can compute the forecast more quickly, yet with higher temporal and spatial resolution.

A visualization of predictions made over the course of roughly one day. Left: The 1-hour HRRR prediction made at the top of each hour, the limit to how often HRRR provides predictions. Center: The ground truth, i.e., what we are trying to predict. Right: The predictions made by our model. Our predictions are every 2 minutes (displayed here every 15 minutes) at roughly 10 times the spatial resolution made by HRRR. Notice that we capture the general motion and general shape of the storm.

We’ve also developed an improved technique called HydroNets, which uses a network of neural networks to model the actual river systems in the world to more accurately understand the interactions of upstream water levels to downstream inundation, resulting in more accurate water-level predictions and flood forecasting. Using these techniques, we've expanded our coverage of flood alerts by 20x in India and Bangladesh, helping to better protect more than 200 million people in 250,000 square kilometers.

An illustration of the HydroNets architecture.

Better analysis of satellite imagery data can also give Google users a better understanding of the impact and extent of wildfires (which caused devastating effects in California and Australia this year). We showed that automated analysis of satellite imagery can help with rapid assessment of damage after natural disasters even with limited prior satellite imagery. It can also aid urban tree-planting efforts by helping cities assess their current tree canopy coverage and where they should focus on planting new trees. We’ve also shown how machine learning techniques that leverage temporal context can help improve ecological and wildlife monitoring.

Based on this work, we’re excited to partner with NOAA on using AI and ML to amplify NOAA’s environmental monitoring, weather forecasting and climate research using Google Cloud’s infrastructure.

Accessibility
Machine learning continues to provide amazing opportunities for improving accessibility, because it can learn to transfer one kind of sensory input into others. As one example, we released Lookout, an Android application that can help visually impaired users by identifying packaged foods, both in a grocery store and also in their kitchen cupboard at home. The machine learning system behind Lookout demonstrates that a powerful-but-compact machine learning model can accomplish this in real-time on a phone for nearly 2 million products.


Similarly, people who communicate with sign language find it difficult to use video conferencing systems because even if they are signing, they are not detected as actively speaking by audio-based speaker detection systems. Developing Real-Time, Automatic Sign Language Detection for Video Conferencing presents a real-time sign language detection model and demonstrates how it can be used to provide video conferencing systems with a mechanism to identify the person signing as the active speaker.

We also enabled useful Android accessibility capabilities such as Voice Access and Sound Notifications for important household sounds.

Live Caption was expanded to support calls on the Pixel phone with the ability to caption phone calls and video calls. This came out of the Live Relay research project, which enables deaf and hard of hearing people to make calls without assistance.

Applications of ML to Other Fields
Machine learning continues to prove vital in helping us make progress across many fields of science. In 2020, in collaboration with the FlyEM team at HHMI Janelia Research Campus, we released the drosophila hemibrain connectome, the large synapse-resolution map of brain connectivity, reconstructed using large-scale machine learning models applied to high-resolution electron microscope imaging of brain tissue. This connectome information will aid neuroscientists in a wide variety of inquiries, helping us all better understand how brains function. Be sure to check out the very fly interactive 3-D UI!

The application of ML to problems in systems biology is also on the rise. Our Google Accelerated Science team, in collaboration with our colleagues at Calico, have been applying machine learning to yeast, to get a better understanding of how genes work together as a whole system. We’ve also been exploring how to use model-based reinforcement learning in order to design biological sequences like DNA or proteins that have desirable properties for medical or industrial uses. Model-based RL is used to improve sample efficiency. At each round of experimentation the policy is trained offline using a simulator fit on functional measurements from prior rounds. On various tasks like designing DNA transcription factor binding sites, designing antimicrobial proteins, and optimizing the energy of Ising models based on protein structures, we find that model-based RL is an attractive alternative to existing methods.

In partnership with X-Chem Pharmaceuticals and ZebiAI, we have also been developing ML techniques to do “virtual screening” of promising molecular compounds computationally. Previous work in this area has tended to focus on relatively small sets of related compounds, but in this work, we are trying to use DNA-encoded small molecule libraries in order to be able to generalize to find “hits” across a wide swath of chemical space, reducing the need for slower, physical-based lab work in order to progress from idea to working pharmaceutical.

We’ve also seen success applying machine learning to core computer science and computer systems problems, a growing trend that is spawning entire new conferences like MLSys. In Learning-based Memory Allocation for C++ Server Workloads, a neural network-based language model predicts context-sensitive per-allocation site object lifetime information, and then uses this to organize the heap so as to reduce fragmentation. It is able to reduce fragmentation by up to 78% while only using huge pages (which are better for TLB behavior). End-to-End, Transferable Deep RL for Graph Optimization described an end-to-end transferable deep reinforcement learning method for computational graph optimization that shows 33%-60% speedup on three graph optimization tasks compared to TensorFlow default optimization, with 15x faster convergence over prior computation graph optimization methods.

Overview of GO: An end-to-end graph policy network that combines graph embedding and sequential attention.

As described in Chip Design with Deep Reinforcement Learning, we have also been applying reinforcement learning to the problem of place-and-route in computer chip design. This is normally a very time-consuming, labor-intensive process, and is a major reason that going from an idea for a chip to actually having a fully designed and fabricated chip takes so long. Unlike prior methods, our approach has the ability to learn from past experience and improve over time. In particular, as we train over a greater number of chip blocks, our method becomes better at rapidly generating optimized placements for previously unseen chip blocks. The system is able to generate placements that usually outperform those of human chip design experts, and we have been using this system (running on TPUs) to do placement and layout for major portions of future generations of TPUs. Menger is a recent infrastructure we’ve built for large-scale distributed reinforcement learning that is yielding promising performance for difficult RL tasks such as chip design.

Macro placements of Ariane, an open-source RISC-V processor, as training progresses. On the left, the policy is being trained from scratch, and on the right, a pre-trained policy is being fine-tuned for this chip. Each rectangle represents an individual macro placement. Notice how the cavity that is occupied by non-macro logic cells that is discovered by the from-scratch policy is already present from the outset in the pre-trained policy’s placement.

Responsible AI
The Google AI Principles guide our development of advanced technologies. We continue to invest in responsible AI research and tools, update our recommended technical practices in this area, and share regular updates — including a 2020 blog post and report — on our progress in implementation.

To help better understand the behavior of language models, we developed the Language Interpretability Tool (LIT), a toolkit for better interpretability of language models, enabling interactive exploration and analysis of their decisions. We developed techniques for measuring gendered correlations in pre-trained language models and scalable techniques for reducing gender bias in Google Translate. We used the kernel trick to propose a simple method to estimate the influence of a training data example on an individual prediction. To help non-specialists interpret machine learning results, we extended the TCAV technique introduced in 2019 to now provide a complete and sufficient set of concepts. With the original TCAV work, we were able to say that ‘fur’ and ‘long ears’ are important concepts for ‘rabbit’ prediction. With this work, we can also say that these two concepts are enough to fully explain the prediction; you don’t need any other concepts. Concept bottleneck models are a technique to make models more interpretable by training them so that one of the layers is aligned with pre-defined expert concepts (e.g., “bone spurs present”, or “wing color”, as shown below) before making a final prediction for a task, so that we can not only interpret but also turn on/off these concepts on the fly.

Aligning predictions to pre-identified concepts can make models more interpretable, as described in Concept Bottleneck Models.

In collaboration with many other institutions, we also looked into memorization effects of language models, showing that training data extraction attacks are realistic threats on state-of-the-art large language models. This finding along with a result that embedding models can leak information can have significant privacy implications (especially for models trained on private data). In Thieves of Sesame Street: Model Extraction on BERT-based APIs, we demonstrated that attackers with only API access to a language model could create models whose outputs had very high correlation with the original model, even with relatively few API queries to the original model. Subsequent work demonstrated that attackers can extract smaller models with arbitrary accuracy. On the AI Principle of safety we demonstrated that thirteen published defenses to adversarial examples can be circumvented despite attempting to perform evaluations using adaptive attacks. Our work focuses on laying out the methodology and the approach necessary to perform an adaptive attack, and thus will allow the community to make further progress in building more robust models.

Examining the way in which machine learning systems themselves are examined is also an important area of exploration. In collaboration with the Partnership on AI, we defined a framework for how to audit the use of machine learning in software product settings, drawing on lessons from the aerospace, medical devices, and finance industries and their best practices. In joint work with University of Toronto and MIT, we identified several ethical concerns that can arise when auditing the performance of facial recognition systems. In joint work with the University of Washington, we identified some important considerations related to diversity and inclusion when choosing subsets for evaluating algorithmic fairness. As an initial step in making responsible AI work for the next billion users and to help understand if notions of fairness were consistent in different parts of the world, we analyzed and created a framework for algorithmic fairness in India, accounting for datasets, fairness optimizations, infrastructures, and ecosystems

The Model Cards work that was introduced in collaboration with the University of Toronto in 2019 has been growing in influence. Indeed, many well-known models like OpenAI’s GPT-2 and GPT-3, many of Google’s MediaPipe models and various Google Cloud APIs have all adopted Model Cards as a way of giving users of a machine learning model more information about the model’s development and the observed behavior of the model under different conditions. To make this easier for others to adopt for their own machine learning models, we also introduced the Model Card Toolkit for easier model transparency reporting. In order to increase transparency in ML development practices, we demonstrate the applicability of a range of best practices throughout the dataset development lifecycle, including data requirements specification and data acceptance testing.

In collaboration with the U.S. National Science Foundation (NSF), we announced and helped to fund a National AI Research Institute for Human-AI Interaction and Collaboration. We also released the MinDiff framework, a new regularization technique available in the TF Model Remediation library for effectively and efficiently mitigating unfair biases when training ML models, along with ML-fairness gym for building simple simulations that explore potential long-run impacts of deploying machine learning-based decision systems in social environments.

In addition to developing frameworks for fairness, we developed approaches for identifying and improving the health and quality of experiences with Recommender Systems, including using reinforcement learning to introduce safer trajectories. We also continue to work on improving the reliability of our machine learning systems, where we’ve seen that approaches such as generating adversarial examples can improve robustness and that robustness approaches can improve fairness.

Differential privacy is a way to formally quantify privacy protections and requires a rethinking of the most basic algorithms to operate in a way that they do not leak information about any particular individual. In particular, differential privacy can help in addressing memorization effects and information leakage of the kinds mentioned above. In 2020 there were a number of exciting developments, from more efficient ways of computing private empirical risk minimizers to private clustering methods with tight approximation guarantees and private sketching algorithms. We also open sourced the differential privacy libraries that lie at the core of our internal tools, taking extra care to protect against leakage caused by the floating point representation of real numbers. These are the exact same tools that we use to produce differentially private COVID-19 mobility reports that have been a valuable source of anonymous data for researchers and policymakers.

To help developers assess the privacy properties of their classification models we released an ML privacy testing library in Tensorflow. We hope this library will be the starting point of a robust privacy testing suite that can be used by any machine learning developer around the world.

Membership inference attack on models for CIFAR10. The x-axis is the test accuracy of the model, and y-axis is vulnerability score (lower means more private). Vulnerability grows while test accuracy remains the same — better generalization could prevent privacy leakage.

In addition to pushing the state of the art in developing private algorithms, I am excited about the advances we made in weaving privacy into the fabric of our products. One of the best examples is Chrome’s Privacy Sandbox, which changes the underpinnings of the advertising ecosystem and helps systematically protect individuals’ privacy. As part of the project, we proposed and evaluated a number of different APIs, including federated learning of cohorts (FLoC) for interest based targeting, and aggregate APIs for differentially private measurement.

Launched in 2017, federated learning is now a complete research field unto itself, with over 3000 publications on federated learning appearing in 2020 alone. Our cross-institutional Advances and Open Problems in Federated Learning survey paper published in 2019 has been cited 367 times in the past year, and an updated version will soon be published in the Foundations & Trends in Machine Learning series. In July, we hosted a Workshop on Federated Learning and Analytics, and made all research talks and a TensorFlow Federated tutorial publicly available.

The lifecycle of an FL-trained model and the various actors in a federated learning system.

We continue to push the state of the art in federated learning, including the development of new federated optimization algorithms including adaptive learning algorithms, posterior averaging algorithms, and techniques for mimicking centralized algorithms in federated settings, substantial improvements in complimentary cryptographic protocols, and more. We announced and deployed federated analytics, enabling data science over raw data that is stored locally on users’ devices. New uses of federated learning in Google products include contextual emoji suggestions in Gboard, and pioneering privacy-preserving medical research with Google Health Studies. Furthermore, in Privacy Amplification via Random Check-Ins we presented the first privacy accounting mechanism for Federated Learning.

Security for our users is also an area of considerable interest for us. In 2020, we continued to improve protections for Gmail users, by deploying a new ML-based document scanner that provides protection against malicious documents, which increased malicious office document detection by 10% on a daily basis. Thanks to its ability to generalize, this tool has been very effective at blocking some adversarial malware campaigns that elude other detection mechanisms and increased our detection rate by 150% in some cases.

On the account protection side, we released a fully open-source security key firmware to help advance state of art in the two factor authentication space, staying focused on security keys as the best way to protect accounts against phishing.

Natural Language Understanding
Better understanding of language is an area where we saw considerable progress this year. Much of the work in this space from Google and elsewhere now relies on Transformers, a particular style of neural network model originally developed for language problems (but with a growing body of evidence that they are also useful for images, videos, speech, protein folding, and a wide variety of other domains).

One area of excitement is in dialog systems that can chat with a user about something of interest, often encompassing multiple turns of interaction. While successful work in this area to date has involved creating systems that are specialized around particular topics (e.g., Duplex) these systems cannot carry on general conversations. In pursuit of the general research goal of creating systems capable of much more open-ended dialog, in 2020 we described Meena, a learned conversational agent that aspirationally can chat about anything. Meena achieves high scores on a dialog system metric called SSA, which measures both sensibility and specificity of responses. We’ve seen that as we scale up the model size of Meena, it is able to achieve lower perplexity and, as shown in the paper, lower perplexity correlates extremely closely with improved SSA.

A chat between Meena (left) and a person (right).

One well-known issue with generative language models and dialog systems is that when discussing factual data, the model’s capacity may not be large enough to remember every specific detail about a topic, so they generate language that is plausible but incorrect. (This is not unique to machines — people can commit these errors too.) To address this in dialog systems, we are exploring ways to augment a conversational agent by giving it access to external information sources (e.g., a large corpus of documents or a search engine API), and developing learning techniques to use this as an additional resource in order to generate language that is consistent with the retrieved text. Work in this area includes integrating retrieval into language representation models (and a key underlying technology for this to work well is something like ScaNN, an efficient vector similarity search, to efficiently match the desired information to information in the corpus of text). Once appropriate content is found, it can be better understood with approaches like using neural networks to find answers in tables and extracting structured data from templatic documents. Our work on PEGASUS, a state-of-the-art model for abstractive text summarization can also help to create automatic summaries from any piece of text, a general technique useful in conversations, retrieval systems, and many other places.

Efficiency of NLP models has also been a significant focus for our work in 2020. Techniques like transfer learning and multi-task learning can dramatically help with making general NLP models usable for new tasks with modest amounts of computation. Work in this vein includes transfer learning explorations in T5, sparse activation of models (as in our GShard work mentioned below), and more efficient model pre-training with ELECTRA. Several threads of work also look to improve on the basic Transformer architecture, including Reformer, which uses locality-sensitive hashing and reversible computation to more efficiently support much larger attention windows, Performers, which use an approach for attention that scales linearly rather than quadratically (and discusses its use in the context of protein modeling), and ETC and BigBird, which utilize global and sparse random connections, to enable linear scaling for larger and structured sequences. We also explored techniques for creating very lightweight NLP models that are 100x smaller than a larger BERT model, but perform nearly as well for some tasks, making them very suitable for on-device NLP. In Encode, Tag and Realize, we also explored new approaches for generative text models that use edit operations rather than fully general text generation, which can have advantages in computation requirements for generation, more control over the generated text, and require less training data.

Language Translation
Effective language translation helps bring the world closer together by enabling us to all communicate, despite speaking different languages. To date, over a billion people around the world use Google Translate, and last year we added support for five new languages (Kinyarwanda, Odia, Tatar, Turkmen and Uyghur, collectively spoken by 75 million people). Translation quality continues to improve, showing an average +5 BLEU point gain across more than 100 languages from May 2019 to May 2020, through a wide variety of techniques like improved model architectures and training, better handling of noise in datasets, multilingual transfer and multi-task learning, and better use of monolingual data to improve low-resource languages (those without much written public content on the web), directly in line with our goals of improving ML fairness of machine learning systems to provide benefits to the greatest number of people possible.

We strongly believe that continued scaling of multilingual translation models will bring further quality improvements, especially to the billions of speakers of low-resource languages around the world. In GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding, Google researchers showed that training sparsely-activated multilingual translation models of up to 600 billion parameters leads to major improvements in translation quality for 100 languages as measured by BLEU score improvement over a baseline of a separate 400M parameter monolingual baseline model for each language. Three trends stood out in this work, illustrated by Figure 6 in the paper, reproduced below (see the paper for complete discussion):

  • The BLEU score improvements from multilingual training are high for all languages but are even higher for low-resource languages (right hand side of graph is higher than the left) whose speakers represent billions of people in some of the world’s most marginalized communities. Each rectangle on the figure represents languages with 1B speakers.
  • The larger and deeper the model, the larger the BLEU score improvements were across all languages (the lines hardly ever cross).
  • Large, sparse models also show a ~10x to 100x improvement in computational efficiency for model training over training a large, dense model, while simultaneously matching or significantly exceeding the BLEU scores of the large, dense model (computational efficiency discussed in paper).
An illustration of the significant gains in translation quality across 100 languages for large, sparsely-activated language models described in GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding.

We’re actively working on bringing the benefits demonstrated in this GShard research work to Google Translate, as well as training single models that cover 1000 languages, including languages like Dhivehi and Sudanese Arabic (while sharing some challenges that needed solving along the way).

We also developed techniques to create language-agnostic representations of sentences for BERT models, which can help with developing better translation models. To more effectively evaluate translation quality, we introduced BLEURT, a new metric for evaluating language generation for tasks like translation that considers the semantics of the generated text, rather than just the amount of word overlap with ground-truth data, illustrated in the table below.

Machine Learning Algorithms
We continue to develop new machine learning algorithms and approaches for training that enable systems to learn more quickly and from less supervised data. By replaying intermediate results during training of neural networks, we find that we can fill idle time on ML accelerators and therefore can train neural networks faster. By changing the connectivity of neurons dynamically during training, we can find better solutions compared with statically-connected neural networks. We also developed SimCLR, a new self-supervised and semi-supervised learning technique that simultaneously maximizes agreement between differently transformed views of the same image and minimizes agreement between transformed views of different images. This approach significantly improves on the best self-supervised learning techniques.

ImageNet top-1 accuracy of linear classifiers trained on representations learned with different self-supervised methods (pretrained on ImageNet). Gray cross indicates supervised ResNet-50.

We also extended the idea of contrastive learning to the supervised regime, resulting in a loss function that significantly improves over cross-entropy for supervised classification problems.

Reinforcement Learning
Reinforcement learning (RL), which learns to make good long-term decisions from limited experience, has been an important focus area for us. An important challenge in RL is to learn to make decisions from few data points, and we’ve improved RL algorithm efficiency through learning from fixed datasets, learning from other agents, and improving exploration.

A major focus area this year has been around offline RL, which relies solely on fixed, previously collected datasets (for example, from previous experiments or human demonstrations), extending RL to the applications that can’t collect training data on-the-fly. We’ve introduced a duality approach to RL, developed improved algorithms for off-policy evaluation, estimating confidence intervals, and offline policy optimization. In addition, we’re collaborating with the broader community to tackle these problems by releasing open-source benchmark datasets, and DQN dataset for Atari.

Offline RL on Atari games using the DQN Replay Dataset.

Another line of research improved sample efficiency by learning from other agents through apprenticeship learning. We developed methods to learn from informed agents, matching other agent’s distribution, or learning from adversarial examples. To improve the exploration in RL, we explored bonus-based exploration methods including imitation techniques able to mimic structured exploration arising in agents having prior knowledge about their environment.

We’ve also made significant advances in the mathematical theory of reinforcement learning. One of our main areas of research was studying reinforcement learning as an optimization process. We found connections to the Frank-Wolfe algorithm, momentum methods, KL divergence regularization, operator theory, and convergence analysis; some of these insights led to an algorithm that achieves state-of-the-art performance in challenging RL benchmarks and discovery that polynomial transfer functions avoid convergence problems associated with softmax, both in RL and supervised learning. We’ve made some exciting progress on the topic of safe reinforcement learning, where one seeks to discover optimal control rules while respecting important experimental constraints. This includes a framework for safe policy optimization. We studied efficient RL-based algorithms for solving a class of problems known as mean field games, which model systems with a large number of decision-makers, from mobile networks to electric grids.

We’ve made breakthroughs toward generalization to new tasks and environments, an important challenge for scaling up RL to complex real-world problems. A 2020 focus area was population-based learning-to-learn methods, where another RL or evolutionary agent trained a population of RL agents to create a curriculum of emergent complexity, and discover new state-of-the-art RL algorithms. Learning to estimate the importance of data points in the training set and parts of visual input with selective attention resulted in significantly more skillful RL agents.

Overview of our method and illustration of data processing flow in AttentionAgent. Top: Input transformation — A sliding window segments an input image into smaller patches, and then “flattens” them for future processing. Middle: Patch election — The modified self-attention module holds votes between patches to generate a patch importance vector. Bottom: Action generation — AttentionAgent picks the patches of the highest importance, extracts corresponding features and makes decisions based on them.

Further, we made progress in model-based RL by showing that learning predictive behavior models accelerates RL learning, and enables decentralized cooperative multi-agent tasks in diverse teams, and learning long-term behavior models. Observing that skills bring predictable changes in the environment, we discover skills without supervision. Better representations stabilize RL learning, while hierarchical latent spaces and value-improvement paths yield better performance.

We shared open source tools for scaling up and productionizing RL. To expand the scope and problems tackled by users, we’ve introduced SEED, a massively parallel RL agent, released a library for measuring the RL algorithm reliability, and a new version of TF-Agents that includes distributed RL, TPU support, and a full set of bandit algorithms. In addition, we performed a large empirical study of RL algorithms to improve hyperparameter selection and algorithm design.

Finally, in collaboration with Loon, we trained and deployed RL to more efficiently control stratospheric balloons, improving both power usage and their ability to navigate.

AutoML
Using learning algorithms to develop new machine learning techniques and solutions, or meta-learning, is a very active and exciting area of research. In much of our previous work in this area, we’ve created search spaces that look at how to find ways to combine sophisticated hand-designed components together in interesting ways. In AutoML-Zero: Evolving Code that Learns, we took a different approach, by giving an evolutionary algorithm a search space consisting of very primitive operations (like addition, subtraction, variable assignment, and matrix multiplication) in order to see if it was possible to evolve modern ML algorithms from scratch. The presence of useful learning algorithms in this space is incredibly sparse, so it is remarkable that the system was able to progressively evolve more and more sophisticated ML algorithms. As shown in the figure below, the system reinvents many of the most important ML discoveries over the past 30 years, such as linear models, gradient descent, rectified linear units, effective learning rate settings and weight initializations, and gradient normalization.

We also used meta-learning to discover a variety of new efficient architectures for object detection in both still images and videos. Last year’s work on EfficientNet for efficient image classification architectures showed significant accuracy improvements and computational cost reductions for image classification. In follow-on work this year, EfficientDet: Towards Scalable and Efficient Object Detection builds on top of the EfficientNet work to derive new efficient architectures for object detection and localization, showing remarkable improvements in both highest absolute accuracy, as well as computational cost reductions of 13-42x over previous approaches to achieve a given level of accuracy.

EfficientDet achieves state-of-the-art 52.2 mAP, up 1.5 points from the prior state of the art (not shown since it is at 3045B FLOPs) on COCO test-dev under the same setting. Under the same accuracy constraint, EfficientDet models are 4x-9x smaller and use 13x-42x less computation than previous detectors.

Our work on SpineNet describes a meta-learned architecture that can retain spatial information more effectively, allowing detection to be done at finer resolution. We also focused on learning effective architectures for a variety of video classification problems. AssembleNet: Searching for Multi-Stream Neural Connectivity in Video Architectures, AssembleNet++: Assembling Modality Representations via Attention Connections, and AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification demonstrate how to use evolutionary algorithms to create novel state-of-the-art video processing machine learning architectures.

This approach can also be used to develop effective model architectures for time series forecasting. Using AutoML for Time Series Forecasting describes the system that discovers new forecasting models via an automated search over a search space involving many interesting kinds of low-level building blocks, and its effectiveness was demonstrated in the Kaggle M5 Forecasting Competition, by generating an algorithm and system that placed 138th out of 5558 participants (top 2.5%). While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.

Better Understanding of ML Algorithms and Models
Deeper understanding of machine learning algorithms and models is crucial for designing and training more effective models, as well as understanding when models may fail. Last year, we focused on fundamental questions around representation power, optimization, model generalization, and label noise, among others. As mentioned earlier in this post, Transformer networks have had a huge impact on modeling language, speech and vision problems, but what is the class of functions represented by these models? Recently we showed that transformers are universal approximators for sequence-to-sequence functions. Furthermore, sparse transformers also remain universal approximators even when they use just a linear number of interactions among the tokens. We have been developing new optimization techniques based on layerwise adaptive learning rates to improve the convergence speed of transformers, e.g., Large batch optimization for deep learning (LAMB): Training BERT in 76 minutes.

As neural networks are made wider and deeper, they often train faster and generalize better. This is a core mystery in deep learning since classical learning theory suggests that large networks should overfit more. We are working to understand neural networks in this overparameterized regime. In the limit of infinite width, neural networks take on a surprisingly simple form, and are described by a Neural Network Gaussian Process (NNGP) or Neural Tangent Kernel (NTK). We studied this phenomenon theoretically and experimentally, and released Neural Tangents, an open-source software library written in JAX that allows researchers to build and train infinite-width neural networks.

Left: A schematic showing how deep neural networks induce simple input / output maps as they become infinitely wide. Right: As the width of a neural network increases, we see that the distribution of outputs over different random instantiations of the network becomes Gaussian.

As finite width networks are made larger, they also demonstrate peculiar double descent phenomena — where they generalize better, then worse, then better again with increasing width. We have shown that this phenomenon can be explained by a novel bias-variance decomposition, and further that it can sometimes manifest as triple descent.

Lastly, in real-world problems, one often needs to deal with significant label noise. For instance, in large scale learning scenarios, weakly labeled data is available in abundance with large label noise. We have developed new techniques for distilling effective supervision from severe label noise leading to state-of-the-art results. We have further analyzed the effects of training neural networks with random labels, and shown that it leads to alignment between network parameters and input data, enabling faster downstream training than initializing from scratch. We have also explored questions such as whether label smoothing or gradient clipping can mitigate label noise, leading to new insights for developing robust training techniques with noisy labels.

Algorithmic Foundations and Theory
2020 was a productive year for our work in algorithmic foundations and theory, with several impactful research publications and notable results. On the optimization front, our paper on edge-weighted online bipartite matching develops a new technique for online competitive algorithms and solves a thirty-year old open problem for the edge-weighted variant with applications in efficient online ad allocation. Along with this work in online allocation, we developed dual mirror descent techniques that generalize to a variety of models with additional diversity and fairness constraints, and published a sequence of papers on the topic of online optimization with ML advice in online scheduling, online learning and online linear optimization. Another research result gave the first improvement in 50 years on the classic bipartite matching in dense graphs. Finally, another paper solves a long-standing open problem about chasing convex bodies online — using an algorithm from The Book, no less.

We also continued our work in scalable graph mining and graph-based learning and hosted the Graph Mining & Learning at Scale Workshop at NeurIPS’20, which covered work on scalable graph algorithms including graph clustering, graph embedding, causal inference, and graph neural networks. As part of the workshop, we showed how to solve several fundamental graph problems faster, both in theory and practice, by augmenting standard synchronous computation frameworks like MapReduce with a distributed hash-table similar to a BigTable. Our extensive empirical study validates the practical relevance of the AMPC model inspired by our use of distributed hash tables in massive parallel algorithms for hierarchical clustering and connected components, and our theoretical results show how to solve many of these problems in constant distributed rounds, greatly improving upon our previous results. We also achieved exponential speedup for computing PageRank and random walks. On the graph-based learning side, we presented Grale, our framework for designing graphs for use in machine learning. Furthermore, we presented our work on more scalable graph neural network models, where we show that PageRank can be used to greatly accelerate inference in GNNs.

In market algorithms, an area at the intersection of computer science and economics, we continued our research in designing improved online marketplaces, such as measuring incentive properties of ad auctions, two-sided markets, and optimizing order statistics in ad selection. In the area of repeated auctions, we developed frameworks to make dynamic mechanisms robust against lack of forecasting or estimation errors of the current market and/or the future market, leading to provably tight low-regret dynamic mechanisms. Later, we characterized when it is possible to achieve the asymptotically optimal objective through geometry-based criteria. We also compared the equilibrium outcome of a range of budget management strategies used in practice, showed their impact on the tradeoff between revenue and buyers' utility and shed light on their incentive properties. Additionally, we continued our research in learning optimal auction parameters, and settled the complexity of batch-learning with revenue loss. We designed the optimal regret and studied combinatorial optimization for contextual auction pricing, and developed a new active learning framework for auctions and improved the approximation for posted-price auctions. Finally, motivated by the importance of incentives in ad auctions, and in the hope to help advertisers study the impact of incentives in auctions, we introduce a data-driven metric to quantify how much a mechanism deviates from incentive compatibility.

Machine Perception
Perceiving the world around us — understanding, modeling and acting on visual, auditory and multimodal input — continues to be a research area with tremendous potential to be beneficial in our everyday lives.

In 2020, deep learning powered new approaches that bring 3D computer vision and computer graphics closer together. CvxNet, deep implicit functions for 3D shapes, neural voxel rendering and CoReNet are a few examples of this direction. Furthermore, our research on representing scenes as neural radiance fields (aka NeRF, see also this blog post) is a good example of how Google Research's academic collaborations stimulate rapid progress in the area of neural volume rendering.

In Learning to Factorize and Relight a City, a collaboration with UC Berkeley, we proposed a learning-based framework for disentangling outdoor scenes into temporally-varying illumination and permanent scene factors. This gives the ability to change lighting effects and scene geometry for any Street View panorama, or even turn it into a full-day timelapse video.

Our work on generative human shape and articulated pose models introduces a statistical, articulated 3D human shape modeling pipeline, within a fully trainable, modular, deep learning framework. Such models enable 3D human pose and shape reconstruction of people from a single photo to better understand the scene.

Overview of end-to-end statistical 3D articulated human shape model construction in GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models.

The growing area of media compression using neural networks continued to make strong progress in 2020, not only on learned image compression, but also in deep approaches to video compression, volume compression and nice results in deep distortion-agnostic image watermarking.

Samples of encoded and cover images for Distortion Agnostic Deep Watermarking. First row: Cover image with no embedded message. Second row: Encoded image from HiDDeN combined distortion model. Third row: Encoded images from our model. Fourth row: Normalized difference of the encoded image and cover image for the HiDDeN combined model. Fifth row: Normalized difference for our model

Additional important themes in perceptual research included:

Engaging with the broader research community through open sourcing of solutions and datasets is another important aspect of furthering perceptual research. In 2020, we open sourced multiple new perceptual inference capabilities and solutions in MediaPipe, such as on-device face, hand and pose prediction, real-time body pose tracking, real-time iris tracking and depth estimation, and real-time 3D object detection.

We continued to make strides to improve experiences and promote helpfulness on mobile devices through ML-based solutions. Our ability to run sophisticated natural language processing on-device, enabling more natural conversational features, continues to improve. In 2020, we expanded Call Screen and launched Hold for Me to allow users to save time when performing mundane tasks, and we also launched language-based actions and language navigability of our Recorder app to aid productivity.

We have used Google's Duplex technology to make calls to businesses and confirm things like temporary closures. This has enabled us to make 3 million updates to business information globally, that have been seen over 20 billion times on Maps and Search. We also used text to speech technology for easier access to web pages, by enabling Google Assistant to read it aloud, supporting 42 languages.

We also continued to make meaningful improvements to imaging applications. We made it easier to capture precious moments on Pixel with innovative controls and new ways to relight, edit, enhance and relive them again in Google Photos. For the Pixel camera, beginning with Pixel 4 and 4a, we added Live HDR+, which uses machine learning to approximate the vibrance and balanced exposure and appearance of HDR+ burst photography in real time in the viewfinder. We also created dual exposure controls, which allow the brightness of shadows and highlights in a scene to be adjusted independently — live in the viewfinder.

More recently, we introduced Portrait Light, a new post-capture feature for the Pixel Camera and Google Photos apps that adds a simulated directional light source to portraits. This feature is again one that is powered by machine learning, having been trained on 70 different people, photographed one light at a time, in our pretty cool 331-LED Light Stage computational illumination system.

In the past year, Google researchers were excited to contribute to many new (and timely) ways of using Google products. Here are a few examples

Robotics
In the area of robotics research, we’ve made tremendous progress in our ability to learn more and more complex, safe and robust robot behaviors with less and less data, using many of the RL techniques described earlier in the post.

Transporter Networks are a novel approach to learning how to represent robotic tasks as spatial displacements. Representing relations between objects and the robot end-effectors, as opposed to absolute positions in the environment, makes learning robust transformations of the workspace very efficient.

In Grounding Language in Play, we demonstrated how a robot can be taught to follow natural language instructions (in many languages!). This required a scalable approach to collecting paired data of natural language instructions and robot behaviors. One key insight is that this can be accomplished by asking robot operators to simply play with the robot, and label after-the-fact what instructions would have led to the robot accomplishing the same task.

We also explored doing away with robots altogether (by having humans use a camera-equipped grasping stick) for even more scalable data collection, and how to efficiently transfer visual representations across robotic tasks.

We investigated how to learn very agile strategies for robot locomotion, by taking inspiration from nature, using evolutionary meta-learning strategies, human demonstrations, and various approaches to training data-efficient controllers using deep reinforcement learning.

One increased emphasis this year has been on safety: how do we deploy safe delivery drones in the real world? How do we explore the world in a way that always allows the robot to recover from its mistakes? How do we certify the stability of learned behaviors? This is a critical area of research on which we expect to see increased focus in the future.

Quantum Computing
Our Quantum AI team continued its work to establish practical uses of quantum computing. We ran experimental algorithms on our Sycamore processors to simulate systems relevant to chemistry and physics. These simulations are approaching a scale at which they can not be performed on classical computers anymore, making good on Feynman’s original idea of using quantum computers as an efficient means to simulate systems in which quantum effects are important. We published new quantum algorithms, for instance to perform precise processor calibration, to show an advantage for quantum machine learning or to test quantum enhanced optimization. We also worked on programming models to make it easier to express quantum algorithms. We released qsim, an efficient simulation tool to develop and test quantum algorithms with up to 40 qubits on Google Cloud.

We continued to follow our roadmap towards building a universal error-corrected quantum computer. Our next milestone is the demonstration that quantum error correction can work in practice. To achieve this, we will show that a larger grid of qubits can hold logical information exponentially longer than a smaller grid, even though individual components such as qubits, couplers or I/O devices have imperfections. We are also particularly excited that we now have our own cleanroom which should significantly increase the speed and quality of our processor fabrication.

Supporting the Broader Developer and Researcher Community
This year marked TensorFlow’s 5th birthday, passing 160M downloads. The TensorFlow community continued its impressive growth with new special interest groups, TensorFlow User Groups, TensorFlow Certificates, AI Service partners, and inspiring demos #TFCommunitySpotlight. We significantly improved TF 2.x with seamless TPU support, out of the box performance (and best-in-class performance on MLPerf 0.7), data preprocessing, distribution strategy and a new NumPy API.

We also added many more capabilities to the TensorFlow Ecosystem to help developers and researchers in their workflows: Sounds of India demonstrated going from research to production in under 90 days, using TFX for training and TF.js for deployment in the browser. With Mesh TensorFlow, we pushed the boundaries of model parallelism to provide ultra-high image resolution image analysis. We open-sourced the new TF runtime, TF Profiler for model performance debugging, and tools for Responsible AI, such as the Model Card Toolkit for model transparency and a privacy testing library. With TensorBoard.dev we made it possible to easily host, track, and share your ML experiments for free.

In addition, we redoubled our investment in JAX, an open-source, research-focused ML system that has been actively developed over the past two years. Researchers at Google and beyond are now using JAX in a wide range of fields, including differential privacy, neural rendering, physics-informed networks, fast attention, molecular dynamics, tensor networks, neural tangent kernels, and neural ODEs. JAX accelerates research at DeepMind, powering a growing ecosystem of libraries and work on GANs, meta-gradients, reinforcement learning, and more. We also used JAX and the Flax neural network library to build record-setting MLPerf benchmark submissions, which we demonstrated live at NeurIPS on a large TPU Pod slice with a next-generation Cloud TPU user experience (slides, video, sign-up form). Finally, we’re ensuring that JAX works seamlessly with TF ecosystem tooling, from TF.data for data preprocessing and TensorBoard for experiment visualization to the TF Profiler for performance debugging, with more to come in 2021.

Many recent research breakthroughs have been enabled by increased computing power, and we make more than 500 petaflops of Cloud TPU computing power available for free to researchers around the world via the TFRC program to help broaden access to the machine learning research frontier. More than 120 TFRC-supported papers have been published to date, many of which would not have been possible without the computing resources that the program provides. For example, TFRC researchers have recently developed simulations of wildfire spread, helped analyze COVID-19 content and vaccine sentiment changes on social media networks, and advanced our collective understanding of the lottery ticket hypothesis and neural network pruning. Members of the TFRC community have also published experiments with Persian poetry, won a Kaggle contest on fine-grained fashion image segmentation, and shared tutorials and open-source tools as starting points for others. In 2021, we will change the name of the TFRC program to the TPU Research Cloud program to be more inclusive now that Cloud TPUs support JAX and PyTorch in addition to TensorFlow.

Finally, this was a huge year for Colab. Usage doubled, and we launched productivity features to help people do their work more efficiently, including improved Drive integration and access to the Colab VM via the terminal. And we launched Colab Pro to enable users to access faster GPUs, longer runtimes and more memory.

Open Datasets and Dataset Search
Open datasets with clear and measurable goals are often very helpful in driving forward the field of machine learning. To help the research community find interesting datasets, we continue to index a wide variety of open datasets sourced from many different organizations with Google Dataset Search. We also think it's important to create new datasets for the community to explore and to develop new techniques, while ensuring that we share open data responsibly. This year, in addition to open datasets to help address the COVID crisis, we released a number of open datasets across many different areas:

Research Community Interaction
We are proud to enthusiastically support and participate in the broader research community. In 2020, Google researchers presented over 500 papers at leading research conferences, additionally serving on program committees, organizing workshops, tutorials and numerous other activities aimed at collectively progressing the state of the art in the field. To learn more about our contributions to some of the larger research conferences this year, please see our blog posts for ICLR 2020, CVPR 2020, ACL 2020, ICML 2020, ECCV 2020 and NeurIPS 2020.

In 2020 we supported external research with $37M in funding, including $8.5M in COVID research, $8M in research inclusion and equity, and $2M in responsible AI research. In February, we announced the 2019 Google Faculty Research Award Recipients, funding research proposals from 150 faculty members throughout the world. Among this group, 27% self-identified as members of historically underrepresented groups within technology. We also announced a new Research Scholar Program to support early-career professors who are pursuing research in fields relevant to Google via unrestricted gifts. As we have for more than a decade, we selected a group of incredibly talented PhD student researchers to receive Google PhD Fellowships, which provides funding for graduate studies, as well as mentorship as they pursue their research, and opportunities to interact with other Google PhD Fellows.

We are also expanding the ways that we support inclusion and bring new voices into the field of computer science. In 2020, we created a new Award for Inclusion Research program that supports academic research in computing and technology addressing the needs of underrepresented populations. In the inaugural set of awards, we selected 16 proposals for funding with 25 principal investigators, focused on topics around diversity and inclusion, algorithmic bias, education innovation, health tools, accessibility, gender bias, AI for social good, security, and social justice. We additionally partnered with the Computing Alliance of Hispanic-Serving Institutions (CAHSI) and the CMD-IT Diversifying Future Leadership in the Professoriate Alliance (FLIP) to create an award program for doctoral students from traditionally underrepresented backgrounds to support the last year of the completion of the dissertation requirements.

In 2019, Google’s CS Research Mentorship Program (CSRMP) helped provide mentoring to 37 undergraduate students to introduce them to conducting computer science research. Based on the success of the program in 2019/2020, we’re excited to greatly expand this program in 2020/2021 and will have hundreds of Google researchers mentoring hundreds of undergraduate students in order to encourage more people from underrepresented backgrounds to pursue computer science research careers. Finally, in October we provided exploreCSR awards to 50 institutions around the world for the 2020 academic year. These awards fund faculty to host workshops for undergraduates from underrepresented groups in order to encourage them to pursue CS research.

Looking Forward to 2021 and Beyond
I’m excited about what’s to come, from our technical work on next-generation AI models, to the very human work of growing our community of researchers.

We’ll keep ensuring our research is done responsibly and has a positive impact, using our AI Principles as a guiding framework and applying particular scrutiny to topics that can have broad societal impact. This post covers just a few of the many papers on responsible AI that Google published in the past year. While pursuing our research, we’ll focus on:

  • Promoting research integrity: We’ll make sure Google keeps conducting a wide range of research in an appropriate manner, and provides comprehensive, scientific views on a variety of challenging, interesting topics.
  • Responsible AI development: Tackling tough topics will remain core to our work, and Google will continue creating new ML algorithms to make machine learning more efficient and accessible, developing approaches to combat unfair bias in language models, devising new techniques for ensuring privacy in learning systems, and much more. And importantly, beyond looking at AI development with a suitably critical eye, we’re eager to see what techniques we and others in the community can develop to mitigate risks and make sure new technologies have equitable, positive impacts on society.
  • Advancing diversity, equity, and inclusion: We care deeply that the people who are building influential products and computing systems better reflect the people using these products all around the world. Our efforts here are both within Google Research, as well as within the wider research and academic communities — we’ll be calling upon the academic and industry partners we work with to advance these efforts together. On a personal level, I am deeply committed to improving representation in computer science, having spent hundreds of hours working towards these goals over the last few years, as well as supporting universities like Berkeley, CMU, Cornell, Georgia Tech, Howard, UW, and numerous other organizations that work to advance inclusiveness. This is important to me, to Google, and to the broader computer science community.

Finally, looking ahead to the year, I’m particularly enthusiastic about the possibilities of building more general-purpose machine learning models that can handle a variety of modalities and that can automatically learn to accomplish new tasks with very few training examples. Advances in this area will empower people with dramatically more capable products, bringing better translation, speech recognition, language understanding and creative tools to billions of people all around the world. This kind of exploration and impact is what keeps us excited about our work!

Acknowledgements
Thanks to Martin Abadi, Marc Bellemare, Elie Bursztein, Zhifeng Chen, Ed Chi, Charina Chou, Katherine Chou, Eli Collins, Greg Corrado, Corinna Cortes, Tiffany Deng, Tulsee Doshi, Robin Dua, Kemal El Moujahid, Aleksandra Faust, Orhan Firat, Jen Gennai, Till Hennig, Ben Hutchinson, Alex Ingerman, Tomáš Ižo, Matthew Johnson, Been Kim, Sanjiv Kumar, Yul Kwon, Steve Langdon, James Laudon, Quoc Le, Yossi Matias, Brendan McMahan, Aranyak Mehta, Vahab Mirrokni, Meg Mitchell, Hartmut Neven, Mohammad Norouzi, Timothy Novikoff, Michael Piatek, Florence Poirel, David Salesin, Nithya Sambasivan, Navin Sarma, Tom Small, Jascha Sohl-Dickstein, Zak Stone, Rahul Sukthankar, Mukund Sundararajan, Andreas Terzis, Sergei Vassilvitskii, Vincent Vanhoucke, and Leslie Yeh and others for helpful feedback and for drafting portions of this post, and to the entire Research and Health communities at Google for everyone’s contributions towards this work.

Source: Google AI Blog


Google at ECCV 2020

This week, the 16th European Conference on Computer Vision (ECCV2020) begins, a premier forum for the dissemination of research in computer vision and related fields. Being held virtually for the first time this year, Google is proud to be an ECCV2020 Platinum Partner and is excited to share our research with the community with nearly 50 accepted publications, alongside several tutorials and workshops.

If you are registered for ECCV this year, please visit our virtual booth in the Platinum Exhibition Hall to learn more about the research we’re presenting at ECCV 2020, including some demos and opportunities to connect with our researchers. You can also learn more about our contributions below (Google affiliations in bold).

Organizing Committee
General Chairs: Vittorio Ferrari, Bob Fisher, Cordelia Schmid, Emanuele TrucoAcademic Demonstrations Chair: Thomas Mensink

Accepted Publications
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (Honorable Mention Award)
Ben Mildenhall, Pratul Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng

Quaternion Equivariant Capsule Networks for 3D Point Clouds
Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, Federico Tombari

SoftpoolNet: Shape Descriptor for Point Cloud Completion and Classification
Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction
Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll

CoReNet: Coherent 3D scene reconstruction from a single RGB image
Stefan Popov, Pablo Bauszat, Vittorio Ferrari

Adversarial Generative Grammars for Human Activity Prediction
AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo

Self6D: Self-Supervised Monocular 6D Object Pose Estimation
Gu Wang, Fabian Manhardt, Jianzhun Shao, Xiangyang Ji, Nassir Navab, Federico Tombari

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels
Yinda Zhang, Neal Wadhwa, Sergio Orts-Escolano, Christian Häne, Sean Fanello, Rahul Garg

What Matters in Unsupervised Optical Flow
Rico Jonschkowski, Austin Stone, Jonathan T. Barron, Ariel Gordon, Kurt Konolige, Anelia Angelova

Appearance Consensus Driven Self-Supervised Human Mesh Recovery
Jogendra N. Kundu, Mugalodi Rakesh, Varun Jampani, Rahul M. Venkatesh, R. Venkatesh Babu

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie

PointMixup: Augmentation for Point Clouds
Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes1, Pengwan Yang, Cees Snoek

Connecting Vision and Language with Localized Narratives (see our blog post)
Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari

Big Transfer (BiT): General Visual Representation Learning (see our blog post)
Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby

View-Invariant Probabilistic Embedding for Human Pose
Jennifer J. Sun, Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

A Generalization of Otsu's Method and Minimum Error Thresholding
Jonathan T. Barron

Learning to Factorize and Relight a City
Andrew Liu, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, Noah Snavely

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows
Andrei Zanfir, Eduard Gabriel Bazavan, Hongyi Xu, Bill Freeman, Rahul Sukthankar, Cristian Sminchisescu

Multi-modal Transformer for Video Retrieval
Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid

Generative Latent Textured Proxies for Category-Level Object Modeling
Ricardo Martin Brualla, Sofien Bouaziz, Matthew Brown, Rohit Pandey, Dan B Goldman

Neural Design Network: Graphic Layout Generation with Constraints
Hsin-Ying Lee*, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang

Neural Articulated Shape Approximation
Boyang Deng, Gerard Pons-Moll, Timothy Jeruzalski, JP Lewis, Geoffrey Hinton, Mohammad Norouzi, Andrea Tagliasacchi

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
Anurag Arnab, Arsha Nagrani, Chen Sun, Cordelia Schmid

Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes
Johanna Wald, Torsten Sattler, Stuart Golodetz, Tommaso Cavallari, Federico Tombari

Consistency Guided Scene Flow Estimation
Yuhua Chen, Luc Van Gool, Cordelia Schmid, Cristian Sminchisescu

Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections
Theodora Kontogianni*, Michael Gygli, Jasper Uijlings, Vittorio Ferrari

SimPose: Effectively Learning DensePose and Surface Normal of People from Simulated Data
Tyler Lixuan Zhu, Per Karlsson, Christoph Bregler

Learning Data Augmentation Strategies for Object Detection
Barret Zoph, Ekin Dogus Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V Le

Streaming Object Detection for 3-D Point Clouds
Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

Improving 3D Object Detection through Progressive Population Based Augmentation
Shuyang Cheng, Zhaoqi Leng, Ekin Dogus Cubuk, Barret Zoph, Chunyan Bai, Jiquan Ngiam, Yang Song, Benjamin Caine, Vijay Vasudevan, Congcong Li, Quoc V. Le, Jonathon Shlens, Dragomir Anguelov

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds
Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A Ross, Thomas Funkhouser, Alireza Fathi

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le

Memory-Efficient Incremental Learning Through Feature Adaptation
Ahmet Iscen, Jeffrey Zhang, Svetlana Lazebnik, Cordelia Schmid

Virtual Multi-view Fusion for 3D Semantic Segmentation
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David A Ross, Brian E Brewington, Thomas Funkhouser, Caroline Pantofaru

Efficient Scale-permuted Backbone with Learned Resource Distribution
Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Yin Cui, Mingxing Tan, Quoc V Le, Xiaodan Song

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
Hung-Yu Tseng*, Hsin-Ying Lee*, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Graph convolutional networks for learning with few clean and many noisy labels
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondrej Chum, Cordelia Schmid

Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis
Ruixuan Yu, Xin Wei, Federico Tombari, Jian Sun

Federated Visual Classification with Real-World Data Distribution
Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown

Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer
Xide Xia, Meng Zhang, Tianfan Xue, Zheng Sun, Hui Fang, Brian Kulis, Jiawen Chen

AssembleNet++: Assembling Modality Representations via Attention Connections
Michael S. Ryoo, AJ Piergiovanni, Juhana Kangaspunta, Anelia Angelova

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
Liang-Chieh Chen, Raphael Gontijo-Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

Unifying Deep Local and Global Features for Image Search
Bingyi Cao, Andre Araujo, Jack Sim

Pillar-based Object Detection for Autonomous Driving
Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Tom Funkhouser, Justin Solomon

Improving Object Detection with Selective Self-supervised Self-training
Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong

Environment-agnostic Multitask Learning for Natural Language Grounded NavigationXin Eric Wang*, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction
Junwei Liang, Lu Jiang, Alex Hauptmann

Tutorials
New Frontiers for Learning with Limited Labels or Data
Organizers: Shalini De Mello, Sifei Liu, Zhiding Yu, Pavlo Molchanov, Varun Jampani, Arash Vahdat, Animashree Anandkumar, Jan Kautz

Weakly Supervised Learning in Computer Vision
Organizers: Seong Joon Oh, Rodrigo Benenson, Hakan Bilen

Workshops
Joint COCO and LVIS Recognition Challenge
Organizers: Alexander Kirillov, Tsung-Yi Lin, Yin Cui, Matteo Ruggero Ronchi, Agrim Gupta, Ross Girshick, Piotr Dollar

4D Vision
Organizers: Anelia Angelova, Vincent Casser, Jürgen Sturm, Noah Snavely, Rahul Sukthankar

GigaVision: When Gigapixel Videography Meets Computer Vision
Organizers: Lu Fang, Shengjin Wang, David J. Brady, Feng Yang

Advances in Image Manipulation Workshop and Challenges
Organizers: Radu Timofte, Andrey Ignatov, Luc Van Gool, Wangmeng Zuo, Ming-Hsuan Yang, Kyoung Mu Lee, Liang Lin, Eli Shechtman, Kai Zhang, Dario Fuoli, Zhiwu Huang, Martin Danelljan, Shuhang Gu, Ming-Yu Liu, Seungjun Nah, Sanghyun Son, Jaerin Lee, Andres Romero, ETH Zurich, Hannan Lu, Ruofan Zhou, Majed El Helou, Sabine Süsstrunk, Roey Mechrez, BeyondMinds & Technion, Pengxu Wei, Evangelos Ntavelis, Siavash Bigdeli

Robust Vision Challenge 2020
Organizers:Oliver Zendel, Hassan Abu Alhaija, Rodrigo Benenson, Marius Cordts, Angela Dai, Xavier Puig Fernandez, Andreas Geiger, Niklas Hanselmann, Nicolas Jourdan, Vladlen Koltun, Peter Kontschider, Alina Kuznetsova, Yubin Kang, Tsung-Yi Lin, Claudio Michaelis, Gerhard Neuhold, Matthias Niessner, Marc Pollefeys, Rene Ranftl, Carsten Rother, Torsten Sattler, Daniel Scharstein, Hendrik Schilling, Nick Schneider, Jonas Uhrig, Xiu-Shen Wei, Jonas Wulff, Bolei Zhou

“Deep Internal Learning”: Training with no prior examples
Organizers: Michal Irani,Tomer Michaeli, Tali Dekel, Assaf Shocher, Tamar Rott Shaham

Instance-Level Recognition
Organizers: Andre Araujo, Cam Askew, Bingyi Cao, Ondrej Chum, Bohyung Han, Torsten Sattler, Jack Sim, Giorgos Tolias, Tobias Weyand, Xu Zhang

Women in Computer Vision Workshop (WiCV) (Platinum Sponsor)
Panel Participation: Dina Damen, Sanja Fiddler, Zeynep Akata, Grady Booch, Rahul Sukthankar

*Work performed while at Google

Source: Google AI Blog


Google at ECCV 2020

This week, the 16th European Conference on Computer Vision (ECCV2020) begins, a premier forum for the dissemination of research in computer vision and related fields. Being held virtually for the first time this year, Google is proud to be an ECCV2020 Platinum Partner and is excited to share our research with the community with nearly 50 accepted publications, alongside several tutorials and workshops.

If you are registered for ECCV this year, please visit our virtual booth in the Platinum Exhibition Hall to learn more about the research we’re presenting at ECCV 2020, including some demos and opportunities to connect with our researchers. You can also learn more about our contributions below (Google affiliations in bold).

Organizing Committee
General Chairs: Vittorio Ferrari, Bob Fisher, Cordelia Schmid, Emanuele TrucoAcademic Demonstrations Chair: Thomas Mensink

Accepted Publications
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (Honorable Mention Award)
Ben Mildenhall, Pratul Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng

Quaternion Equivariant Capsule Networks for 3D Point Clouds
Yongheng Zhao, Tolga Birdal, Jan Eric Lenssen, Emanuele Menegatti, Leonidas Guibas, Federico Tombari

SoftpoolNet: Shape Descriptor for Point Cloud Completion and Classification
Yida Wang, David Joseph Tan, Nassir Navab, Federico Tombari

Combining Implicit Function Learning and Parametric Models for 3D Human Reconstruction
Bharat Lal Bhatnagar, Cristian Sminchisescu, Christian Theobalt, Gerard Pons-Moll

CoReNet: Coherent 3D scene reconstruction from a single RGB image
Stefan Popov, Pablo Bauszat, Vittorio Ferrari

Adversarial Generative Grammars for Human Activity Prediction
AJ Piergiovanni, Anelia Angelova, Alexander Toshev, Michael S. Ryoo

Self6D: Self-Supervised Monocular 6D Object Pose Estimation
Gu Wang, Fabian Manhardt, Jianzhun Shao, Xiangyang Ji, Nassir Navab, Federico Tombari

Du2Net: Learning Depth Estimation from Dual-Cameras and Dual-Pixels
Yinda Zhang, Neal Wadhwa, Sergio Orts-Escolano, Christian Häne, Sean Fanello, Rahul Garg

What Matters in Unsupervised Optical Flow
Rico Jonschkowski, Austin Stone, Jonathan T. Barron, Ariel Gordon, Kurt Konolige, Anelia Angelova

Appearance Consensus Driven Self-Supervised Human Mesh Recovery
Jogendra N. Kundu, Mugalodi Rakesh, Varun Jampani, Rahul M. Venkatesh, R. Venkatesh Babu

Fashionpedia: Ontology, Segmentation, and an Attribute Localization Dataset
Menglin Jia, Mengyun Shi, Mikhail Sirotenko, Yin Cui, Claire Cardie, Bharath Hariharan, Hartwig Adam, Serge Belongie

PointMixup: Augmentation for Point Clouds
Yunlu Chen, Vincent Tao Hu, Efstratios Gavves, Thomas Mensink, Pascal Mettes1, Pengwan Yang, Cees Snoek

Connecting Vision and Language with Localized Narratives (see our blog post)
Jordi Pont-Tuset, Jasper Uijlings, Soravit Changpinyo, Radu Soricut, Vittorio Ferrari

Big Transfer (BiT): General Visual Representation Learning (see our blog post)
Alexander Kolesnikov, Lucas Beyer, Xiaohua Zhai, Joan Puigcerver, Jessica Yung, Sylvain Gelly, Neil Houlsby

View-Invariant Probabilistic Embedding for Human Pose
Jennifer J. Sun, Jiaping Zhao, Liang-Chieh Chen, Florian Schroff, Hartwig Adam, Ting Liu

Axial-DeepLab: Stand-Alone Axial-Attention for Panoptic Segmentation
Huiyu Wang, Yukun Zhu, Bradley Green, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve
Weicheng Kuo, Anelia Angelova, Tsung-Yi Lin, Angela Dai

A Generalization of Otsu's Method and Minimum Error Thresholding
Jonathan T. Barron

Learning to Factorize and Relight a City
Andrew Liu, Shiry Ginosar, Tinghui Zhou, Alexei A. Efros, Noah Snavely

Weakly Supervised 3D Human Pose and Shape Reconstruction with Normalizing Flows
Andrei Zanfir, Eduard Gabriel Bazavan, Hongyi Xu, Bill Freeman, Rahul Sukthankar, Cristian Sminchisescu

Multi-modal Transformer for Video Retrieval
Valentin Gabeur, Chen Sun, Karteek Alahari, Cordelia Schmid

Generative Latent Textured Proxies for Category-Level Object Modeling
Ricardo Martin Brualla, Sofien Bouaziz, Matthew Brown, Rohit Pandey, Dan B Goldman

Neural Design Network: Graphic Layout Generation with Constraints
Hsin-Ying Lee*, Lu Jiang, Irfan Essa, Phuong B Le, Haifeng Gong, Ming-Hsuan Yang, Weilong Yang

Neural Articulated Shape Approximation
Boyang Deng, Gerard Pons-Moll, Timothy Jeruzalski, JP Lewis, Geoffrey Hinton, Mohammad Norouzi, Andrea Tagliasacchi

Uncertainty-Aware Weakly Supervised Action Detection from Untrimmed Videos
Anurag Arnab, Arsha Nagrani, Chen Sun, Cordelia Schmid

Beyond Controlled Environments: 3D Camera Re-Localization in Changing Indoor Scenes
Johanna Wald, Torsten Sattler, Stuart Golodetz, Tommaso Cavallari, Federico Tombari

Consistency Guided Scene Flow Estimation
Yuhua Chen, Luc Van Gool, Cordelia Schmid, Cristian Sminchisescu

Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections
Theodora Kontogianni*, Michael Gygli, Jasper Uijlings, Vittorio Ferrari

SimPose: Effectively Learning DensePose and Surface Normal of People from Simulated Data
Tyler Lixuan Zhu, Per Karlsson, Christoph Bregler

Learning Data Augmentation Strategies for Object Detection
Barret Zoph, Ekin Dogus Cubuk, Golnaz Ghiasi, Tsung-Yi Lin, Jonathon Shlens, Quoc V Le

Streaming Object Detection for 3-D Point Clouds
Wei Han, Zhengdong Zhang, Benjamin Caine, Brandon Yang, Christoph Sprunk, Ouais Alsharif, Jiquan Ngiam, Vijay Vasudevan, Jonathon Shlens, Zhifeng Chen

Improving 3D Object Detection through Progressive Population Based Augmentation
Shuyang Cheng, Zhaoqi Leng, Ekin Dogus Cubuk, Barret Zoph, Chunyan Bai, Jiquan Ngiam, Yang Song, Benjamin Caine, Vijay Vasudevan, Congcong Li, Quoc V. Le, Jonathon Shlens, Dragomir Anguelov

An LSTM Approach to Temporal 3D Object Detection in LiDAR Point Clouds
Rui Huang, Wanyue Zhang, Abhijit Kundu, Caroline Pantofaru, David A Ross, Thomas Funkhouser, Alireza Fathi

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models
Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le

Memory-Efficient Incremental Learning Through Feature Adaptation
Ahmet Iscen, Jeffrey Zhang, Svetlana Lazebnik, Cordelia Schmid

Virtual Multi-view Fusion for 3D Semantic Segmentation
Abhijit Kundu, Xiaoqi Yin, Alireza Fathi, David A Ross, Brian E Brewington, Thomas Funkhouser, Caroline Pantofaru

Efficient Scale-permuted Backbone with Learned Resource Distribution
Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Yin Cui, Mingxing Tan, Quoc V Le, Xiaodan Song

RetrieveGAN: Image Synthesis via Differentiable Patch Retrieval
Hung-Yu Tseng*, Hsin-Ying Lee*, Lu Jiang, Ming-Hsuan Yang, Weilong Yang

Graph convolutional networks for learning with few clean and many noisy labels
Ahmet Iscen, Giorgos Tolias, Yannis Avrithis, Ondrej Chum, Cordelia Schmid

Deep Positional and Relational Feature Learning for Rotation-Invariant Point Cloud Analysis
Ruixuan Yu, Xin Wei, Federico Tombari, Jian Sun

Federated Visual Classification with Real-World Data Distribution
Tzu-Ming Harry Hsu, Hang Qi, Matthew Brown

Joint Bilateral Learning for Real-time Universal Photorealistic Style Transfer
Xide Xia, Meng Zhang, Tianfan Xue, Zheng Sun, Hui Fang, Brian Kulis, Jiawen Chen

AssembleNet++: Assembling Modality Representations via Attention Connections
Michael S. Ryoo, AJ Piergiovanni, Juhana Kangaspunta, Anelia Angelova

Naive-Student: Leveraging Semi-Supervised Learning in Video Sequences for Urban Scene Segmentation
Liang-Chieh Chen, Raphael Gontijo-Lopes, Bowen Cheng, Maxwell D. Collins, Ekin D. Cubuk, Barret Zoph, Hartwig Adam, Jonathon Shlens

AttentionNAS: Spatiotemporal Attention Cell Search for Video Classification
Xiaofang Wang, Xuehan Xiong, Maxim Neumann, AJ Piergiovanni, Michael S. Ryoo, Anelia Angelova, Kris M. Kitani, Wei Hua

Unifying Deep Local and Global Features for Image Search
Bingyi Cao, Andre Araujo, Jack Sim

Pillar-based Object Detection for Autonomous Driving
Yue Wang, Alireza Fathi, Abhijit Kundu, David Ross, Caroline Pantofaru, Tom Funkhouser, Justin Solomon

Improving Object Detection with Selective Self-supervised Self-training
Yandong Li, Di Huang, Danfeng Qin, Liqiang Wang, Boqing Gong

Environment-agnostic Multitask Learning for Natural Language Grounded NavigationXin Eric Wang*, Vihan Jain, Eugene Ie, William Yang Wang, Zornitsa Kozareva, Sujith Ravi

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction
Junwei Liang, Lu Jiang, Alex Hauptmann

Tutorials
New Frontiers for Learning with Limited Labels or Data
Organizers: Shalini De Mello, Sifei Liu, Zhiding Yu, Pavlo Molchanov, Varun Jampani, Arash Vahdat, Animashree Anandkumar, Jan Kautz

Weakly Supervised Learning in Computer Vision
Organizers: Seong Joon Oh, Rodrigo Benenson, Hakan Bilen

Workshops
Joint COCO and LVIS Recognition Challenge
Organizers: Alexander Kirillov, Tsung-Yi Lin, Yin Cui, Matteo Ruggero Ronchi, Agrim Gupta, Ross Girshick, Piotr Dollar

4D Vision
Organizers: Anelia Angelova, Vincent Casser, Jürgen Sturm, Noah Snavely, Rahul Sukthankar

GigaVision: When Gigapixel Videography Meets Computer Vision
Organizers: Lu Fang, Shengjin Wang, David J. Brady, Feng Yang

Advances in Image Manipulation Workshop and Challenges
Organizers: Radu Timofte, Andrey Ignatov, Luc Van Gool, Wangmeng Zuo, Ming-Hsuan Yang, Kyoung Mu Lee, Liang Lin, Eli Shechtman, Kai Zhang, Dario Fuoli, Zhiwu Huang, Martin Danelljan, Shuhang Gu, Ming-Yu Liu, Seungjun Nah, Sanghyun Son, Jaerin Lee, Andres Romero, ETH Zurich, Hannan Lu, Ruofan Zhou, Majed El Helou, Sabine Süsstrunk, Roey Mechrez, BeyondMinds & Technion, Pengxu Wei, Evangelos Ntavelis, Siavash Bigdeli

Robust Vision Challenge 2020
Organizers:Oliver Zendel, Hassan Abu Alhaija, Rodrigo Benenson, Marius Cordts, Angela Dai, Xavier Puig Fernandez, Andreas Geiger, Niklas Hanselmann, Nicolas Jourdan, Vladlen Koltun, Peter Kontschider, Alina Kuznetsova, Yubin Kang, Tsung-Yi Lin, Claudio Michaelis, Gerhard Neuhold, Matthias Niessner, Marc Pollefeys, Rene Ranftl, Carsten Rother, Torsten Sattler, Daniel Scharstein, Hendrik Schilling, Nick Schneider, Jonas Uhrig, Xiu-Shen Wei, Jonas Wulff, Bolei Zhou

“Deep Internal Learning”: Training with no prior examples
Organizers: Michal Irani,Tomer Michaeli, Tali Dekel, Assaf Shocher, Tamar Rott Shaham

Instance-Level Recognition
Organizers: Andre Araujo, Cam Askew, Bingyi Cao, Ondrej Chum, Bohyung Han, Torsten Sattler, Jack Sim, Giorgos Tolias, Tobias Weyand, Xu Zhang

Women in Computer Vision Workshop (WiCV) (Platinum Sponsor)
Panel Participation: Dina Damen, Sanja Fiddler, Zeynep Akata, Grady Booch, Rahul Sukthankar

*Work performed while at Google

Source: Google AI Blog


Google at ICML 2020



Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.

As a leader in machine learning research, Google is proud to be a Platinum Sponsor of the thirty-seventh International Conference on Machine Learning (ICML 2020), a premier annual event taking place virtually this week. With over 100 accepted publications and Googlers participating in workshops, we look forward to our continued collaboration with the larger machine learning research community.

If you're registered for ICML 2020, we hope you'll visit the Google virtual booth to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges. You can also learn more about the Google research being presented at ICML 2020 in the list below (Google affiliations bolded).

ICML Expo
Google Dataset Search: Building an Open Ecosystem for Dataset Discovery
Natasha Noy

End-to-end Bayesian inference workflows in TensorFlow Probability
Colin Carroll

Publications
Population-Based Black-Box Optimization for Biological Sequence Design
Christof Angermueller, David Belanger, Andreea Gane, Zelda Mariet, David Dohan, Kevin Murphy, Lucy Colwell, D Sculley

Predictive Coding for Locally-Linear Control
Rui Shu, Tung Nguyen, Yinlam Chow, Tuan Pham, Khoat Than, Mohammad Ghavamzadeh, Stefano Ermon, Hung Bui

FedBoost: A Communication-Efficient Algorithm for Federated Learning
Jenny Hamer, Mehryar Mohri, Ananda Theertha Suresh

Faster Graph Embeddings via Coarsening
Matthew Fahrbach, Gramoz Goranci, Richard Peng, Sushant Sachdeva, Chi Wang

Revisiting Fundamentals of Experience Replay
William Fedus, Prajit Ramachandran, Rishabh Agarwal, Yoshua Bengio, Hugo Larochelle, Mark Rowland, Will Dabney

Boosting for Control of Dynamical Systems
Naman Agarwal, Nataly Brukhim, Elad Hazan, Zhou Lu

Neural Clustering Processes
Ari Pakman, Yueqi Wang, Catalin Mitelut, JinHyung Lee, Liam Paninski

The Tree Ensemble Layer: Differentiability Meets Conditional Computation
Hussein Hazimeh, Natalia Ponomareva, Petros Mol, Zhenyu Tan, Rahul Mazumder

Representations for Stable Off-Policy Reinforcement Learning
Dibya Ghosh, Marc Bellemare

REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, Ming-Wei Chang

Context Aware Local Differential Privacy
Jayadev Acharya, Keith Bonawitz, Peter Kairouz, Daniel Ramage, Ziteng Sun

Scalable Deep Generative Modeling for Sparse Graphs
Hanjun Dai, Azade Nazi, Yujia Li, Bo Dai, Dale Schuurmans

Deep k-NN for Noisy Labels
Dara Bahri, Heinrich Jiang, Maya Gupta

Revisiting Spatial Invariance with Low-Rank Local Connectivity
Gamaleldin F. Elsayed, Prajit Ramachandran, Jonathon Shlens, Simon Kornblith

SCAFFOLD: Stochastic Controlled Averaging for Federated Learning
Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri, Sashank J. Reddi, Sebastian U. Stich, Ananda Theertha Suresh

Incremental Sampling Without Replacement for Sequence Models
Kensen Shi, David Bieber, Charles Sutton

SoftSort: A Continuous Relaxation for the argsort Operator
Sebastian Prillo, Julian Martin Eisenschlos

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalisation (see blog post)
Junjie Hu, Sebastian Ruder, Aditya Siddhant, Graham Neubig, Orhan Firat, Melvin Johnson

Learning to Stop While Learning to Predict
Xinshi Chen, Hanjun Dai, Yu Li, Xin Gao, Le Song

Bandits with Adversarial Scaling
Thodoris Lykouris, Vahab Mirrokni, Renato Paes Leme

SimGANs: Simulator-Based Generative Adversarial Networks for ECG Synthesis to Improve Deep ECG Classification
Tomer Golany, Daniel Freedman, Kira Radinsky

Stochastic Frank-Wolfe for Constrained Finite-Sum Minimization
Geoffrey Negiar, Gideon Dresdner, Alicia Yi-Ting Tsai, Laurent El Ghaoui, Francesco Locatello, Robert M. Freund, Fabian Pedregosa

Implicit differentiation of Lasso-type models for hyperparameter optimization
Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Infinite attention: NNGP and NTK for deep attention networks
Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
Asaf Cassel, Alon Cohen, Tomer Koren

Adversarial Learning Guarantees for Linear Hypotheses and Neural Networks
Pranjal Awasthi, Natalie Frank, Mehryar Mohri

Random Hypervolume Scalarizations for Provable Multi-Objective Black Box Optimization
Daniel Golovin, Qiuyi (Richard) Zhang

Generating Programmatic Referring Expressions via Program Synthesis
Jiani Huang, Calvin Smith, Osbert Bastani, Rishabh Singh, Aws Albarghouthi, Mayur Naik

Optimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching Approach
Martin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, Craig Boutilier

AutoML-Zero: Evolving Machine Learning Algorithms From Scratch (see blog post)
Esteban Real, Chen Liang, David R. So, Quoc V. Le

How Good is the Bayes Posterior in Deep Neural Networks Really?
Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Swiatkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Which Tasks Should Be Learned Together in Multi-task Learning?
Trevor Standley, Amir R. Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, Silvio Savarese

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems
Tong Yu, Branislav Kveton, Zheng Wen, Ruiyi Zhang, Ole J. Mengshoel

Disentangling Trainability and Generalization in Deep Neural Networks
Lechao Xiao, Jeffrey Pennington, Samuel S. Schoenholz

The Many Shapley Values for Model Explanation
Mukund Sundararajan, Amir Najmi

Neural Contextual Bandits with UCB-based Exploration
Dongruo Zhou, Lihong Li, Quanquan Gu

Automatic Shortcut Removal for Self-Supervised Representation Learning
Matthias Minderer, Olivier Bachem, Neil Houlsby, Michael Tschannen

Federated Learning with Only Positive Labels
Felix X. Yu, Ankit Singh Rawat, Aditya Krishna Menon, Sanjiv Kumar

How Recurrent Networks Implement Contextual Processing in Sentiment Analysis
Niru Maheswaranathan, David Sussillo

Supervised Learning: No Loss No Cry
Richard Nock, Aditya Krishna Menon

Ready Policy One: World Building Through Active Learning
Philip Ball, Jack Parker-Holder, Aldo Pacchiano, Krzysztof Choromanski, Stephen Roberts

Weakly-Supervised Disentanglement Without Compromises
Francesco Locatello, Ben Poole, Gunnar Raetsch, Bernhard Schölkopf, Olivier Bachem, Michael Tschannen

Fast Differentiable Sorting and Ranking
Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga

Debiased Sinkhorn barycenters
Hicham Janati, Marco Cuturi, Alexandre Gramfort

Interpretable, Multidimensional, Multimodal Anomaly Detection with Negative Sampling for Detection of Device Failure
John Sipple

Accelerating Large-Scale Inference with Anisotropic Vector Quantization
Ruiqi Guo, Philip Sun, Erik Lindgren, Quan Geng, David Simcha, Felix Chern, Sanjiv Kumar

An Optimistic Perspective on Offline Reinforcement Learning (see blog post)
Rishabh Agarwal, Dale Schuurmans, Mohammad Norouzi

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization
Ben Adlam, Jeffrey Pennington

Private Query Release Assisted by Public Data
Raef Bassily, Albert Cheu, Shay Moran, Aleksandar Nikolov, Jonathan Ullman, Zhiwei Steven Wu

Learning and Evaluating Contextual Embedding of Source Code
Aditya Kanade, Petros Maniatis, Gogul Balakrishnan, Kensen Shi

Evaluating Machine Accuracy on ImageNet
Vaishaal Shankar, Rebecca Roelofs, Horia Mania, Alex Fang, Benjamin Recht, Ludwig Schmidt

Imputer: Sequence Modelling via Imputation and Dynamic Programming
William Chan, Chitwan Saharia, Geoffrey Hinton, Mohammad Norouzi, Navdeep Jaitly

Domain Aggregation Networks for Multi-Source Domain Adaptation
Junfeng Wen, Russell Greiner, Dale Schuurmans

Planning to Explore via Self-Supervised World Models
Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

Context-Aware Dynamics Model for Generalization in Model-Based Reinforcement Learning
Kimin Lee, Younggyo Seo, Seunghyun Lee, Honglak Lee, Jinwoo Shin

Retro*: Learning Retrosynthetic Planning with Neural Guided A* Search
Binghong Chen, Chengtao Li, Hanjun Dai, Le Song

On the Consistency of Top-k Surrogate Losses
Forest Yang, Sanmi Koyejo

Dual Mirror Descent for Online Allocation Problems
Haihao Lu, Santiago Balseiro, Vahab Mirrokni

Efficient and Scalable Bayesian Neural Nets with Rank-1 Factors
Michael W. Dusenberry, Ghassen Jerfel, Yeming Wen, Yi-An Ma, Jasper Snoek, Katherine Heller, Balaji Lakshminarayanan, Dustin Tran

Batch Stationary Distribution Estimation
Junfeng Wen, Bo Dai, Lihong Li, Dale Schuurmans

Small-GAN: Speeding Up GAN Training Using Core-Sets
Samarth Sinha, Han Zhang, Anirudh Goyal, Yoshua Bengio, Hugo Larochelle, Augustus Odena

Data Valuation Using Reinforcement Learning
Jinsung Yoon, Sercan ‎Ö. Arik, Tomas Pfister

A Game Theoretic Perspective on Model-Based Reinforcement Learning
Aravind Rajeswaran, Igor Mordatch, Vikash Kumar

Encoding Musical Style with Transformer Autoencoders
Kristy Choi, Curtis Hawthorne, Ian Simon, Monica Dinculescu, Jesse Engel

The Shapley Taylor Interaction Index
Kedar Dhamdhere, Mukund Sundararajan, Ashish Agarwal

Multidimensional Shape Constraints
Maya Gupta, Erez Louidor, Olexander Mangylov, Nobu Morioka, Taman Narayan, Sen Zhao

Private Counting from Anonymous Messages: Near-Optimal Accuracy with Vanishing Communication Overhead
Badih Ghazi, Ravi Kumar, Pasin Manurangsi, Rasmus Pagh

Learning to Score Behaviors for Guided Policy Optimization
Aldo Pacchiano, Jack Parker-Holder, Yunhao Tang, Anna Choromanska, Krzysztof Choromanski, Michael I. Jordan

Fundamental Tradeoffs between Invariance and Sensitivity to Adversarial Perturbations
Florian Tramèr, Jens Behrmann, Nicholas Carlini, Nicolas Papernot, Jörn-Henrik Jacobsen

Optimizing Black-Box Metrics with Adaptive Surrogates
Qijia Jiang, Olaoluwa Adigun, Harikrishna Narasimhan, Mahdi Milani Fard, Maya Gupta

Circuit-Based Intrinsic Methods to Detect Overfitting
Sat Chatterjee, Alan Mishchenko

Automatic Reparameterisation of Probabilistic Programs
Maria I. Gorinova, Dave Moore, Matthew D. Hoffman

Stochastic Flows and Geometric Optimization on the Orthogonal Group
Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

Black-Box Variational Inference as a Parametric Approximation to Langevin Dynamics
Matthew Hoffman, Yi-An Ma

Concise Explanations of Neural Networks Using Adversarial Training
Prasad Chalasani, Jiefeng Chen, Amrita Roy Chowdhury, Somesh Jha, Xi Wu

p-Norm Flow Diffusion for Local Graph Clustering
Shenghao Yang, Di Wang, Kimon Fountoulakis

Empirical Study of the Benefits of Overparameterization in Learning Latent Variable Models
Rares-Darius Buhai, Yoni Halpern, Yoon Kim, Andrej Risteski, David Sontag

Robust Pricing in Dynamic Mechanism Design
Yuan Deng, Sébastien Lahaie, Vahab Mirrokni

Differentiable Product Quantization for Learning Compact Embedding Layers
Ting Chen, Lala Li, Yizhou Sun

Adaptive Region-Based Active Learning
Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang

Countering Language Drift with Seeded Iterated Learning
Yuchen Lu, Soumye Singhal, Florian Strub, Olivier Pietquin, Aaron Courville

Does Label Smoothing Mitigate Label Noise?
Michal Lukasik, Srinadh Bhojanapalli, Aditya Krishna Menon, Sanjiv Kumar

Acceleration Through Spectral Density Estimation
Fabian Pedregosa, Damien Scieur

Momentum Improves Normalized SGD
Ashok Cutkosky, Harsh Mehta

ConQUR: Mitigating Delusional Bias in Deep Q-Learning
Andy Su, Jayden Ooi, Tyler Lu, Dale Schuurmans, Craig Boutilier

Online Learning with Imperfect Hints
Aditya Bhaskara, Ashok Cutkosky, Ravi Kumar, Manish Purohit

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks
Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

On Implicit Regularization in β-VAEs
Abhishek Kumar, Ben Poole

Is Local SGD Better than Minibatch SGD?
Blake Woodworth, Kumar Kshitij Patel, Sebastian U. Stich, Zhen Dai, Brian Bullins, H. Brendan McMahan, Ohad Shamir, Nathan Sreb

A Simple Framework for Contrastive Learning of Visual Representations
Ting Chen, Simon Kornblith, Mohammad Norouzi, Geoffrey Hinton

Universal Average-Case Optimality of Polyak Momentum
Damien Scieur, Fabian Pedregosa

An Imitation Learning Approach for Cache Replacement
Evan Zheran Liu, Milad Hashemi, Kevin Swersky, Parthasarathy Ranganathan, Junwhan Ahn

Collapsed Amortized Variational Inference for Switching Nonlinear Dynamical Systems
Zhe Dong, Bryan A. Seybold, Kevin P. Murphy, Hung H. Bui

Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels
Lu Jiang, Di Huang, Mason Liu, Weilong Yang

Optimizing Data Usage via Differentiable Rewards
Xinyi Wang, Hieu Pham, Paul Michel, Antonios Anastasopoulos, Jaime Carbonell, Graham Neubig

Sparse Sinkhorn Attention
Yi Tay, Dara Bahri, Liu Yang, Donald Metzler, Da-Cheng Juan

One Policy to Control Them All: Shared Modular Policies for Agent-Agnostic Control
Wenlong Huang, Igor Mordatch, Deepak Pathak

On Thompson Sampling with Langevin Algorithms
Eric Mazumdar, Aldo Pacchiano, Yi-An Ma, Peter L. Bartlett, Michael I. Jordan

Good Subnetworks Provably Exist: Pruning via Greedy Forward Selection
Mao Ye, Chengyue Gong, Lizhen Nie, Denny Zhou, Adam Klivans, Qiang Liu

On the Global Convergence Rates of Softmax Policy Gradient Methods
Jincheng Mei, Chenjun Xiao, Csaba Szepesvari, Dale Schuurmans

Concept Bottleneck Models
Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, Percy Liang

Supervised Quantile Normalization for Low-Rank Matrix Approximation
Marco Cuturi, Olivier Teboul, Jonathan Niles-Weed, Jean-Philippe Vert

Missing Data Imputation Using Optimal Transport
Boris Muzellec, Julie Josse, Claire Boyer, Marco Cuturi

Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention Over Modules
Sarthak Mittal, Alex Lamb, Anirudh Goyal, Vikram Voleti, Murray Shanahan, Guillaume Lajoie, Michael Mozer, Yoshua Bengio

Stochastic Optimization for Regularized Wasserstein Estimators
Marin Ballu, Quentin Berthet, Francis Bach

Low-Rank Bottleneck in Multi-head Attention Models
Srinadh Bhojanapalli, Chulhee Yun, Ankit Singh Rawat, Sashank Jakkam Reddi, Sanjiv Kumar

Rigging the Lottery: Making All Tickets Winners
Utku Evci, Trevor Gale, Jacob Menick, Pablo Samuel Castro, Erich Elsen

Online Learning with Dependent Stochastic Feedback Graphs
Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Ningshan Zhang

Calibration, Entropy Rates, and Memory in Language Models
Mark Braverman, Xinyi Chen, Sham Kakade, Karthik Narasimhan, Cyril Zhang, Yi Zhang

Composable Sketches for Functions of Frequencies: Beyond the Worst Case
Edith Cohen, Ofir Geri, Rasmus Pagh

Energy-Based Processes for Exchangeable Data
Mengjiao Yang, Bo Dai, Hanjun Dai, Dale Schuurmans

Near-Optimal Regret Bounds for Stochastic Shortest Path
Alon Cohen, Haim Kaplan, Yishay Mansour, Aviv Rosenberg

PEGASUS: Pre-training with Extracted Gap-sentences for Abstractive Summarization (see blog post)
Jingqing Zhang, Yao Zhao, Mohammad Saleh, Peter J. Liu

The Complexity of Finding Stationary Points with Stochastic Gradient Descent
Yoel Drori, Ohad Shamir

The k-tied Normal Distribution: A Compact Parameterization of Gaussian Mean Field Posteriors in Bayesian Neural Networks
Jakub Swiatkowski, Kevin Roth, Bas Veeling, Linh Tran, Josh Dillon, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin

Regularized Optimal Transport is Ground Cost Adversarial
François-Pierre Paty, Marco Cuturi

Workshops
New In ML
Invited Speaker: Nicolas Le Roux
Organizers: Zhen Xu, Sparkle Russell-Puleri, Zhengying Liu, Sinead A Williamson, Matthias W Seeger, Wei-Wei Tu, Samy Bengio, Isabelle Guyon

LatinX in AI
Workshop Advisor: Pablo Samuel Castro

Women in Machine Learning Un-Workshop
Invited Speaker: Doina Precup
Sponsor Expo Speaker: Jennifer Wei

Queer in AI
Invited Speaker: Shakir Mohamed

Workshop on Continual Learning
Organizers: Haytham Fayek, Arslan Chaudhry, David Lopez-Paz, Eugene Belilovsky, Jonathan Schwarz, Marc Pickett, Rahaf Aljundi, Sayna Ebrahimi, Razvan Pascanu, Puneet Dokania

5th ICML Workshop on Human Interpretability in Machine Learning (WHI)
Organizers: Kush Varshney, Adrian Weller, Alice Xiang, Amit Dhurandhar, Been Kim, Dennis Wei, Umang Bhatt

Self-supervision in Audio and Speech
Organizers: Mirco Ravanelli, Dmitriy Serdyuk, R Devon Hjelm, Bhuvana Ramabhadran, Titouan Parcollet

Workshop on eXtreme Classification: Theory and Applications
Invited Speakers: Sanjiv Kumar

Healthcare Systems, Population Health, and the Role of Health-tech
Organizers: Krzysztof Choromanski, David Cheikhi, Jared Davis, Valerii Likhosherstov, Achille Nazaret, Achraf Bahamou, Xingyou Song, Mrugank Akarte, Jack Parker-Holder, Jacob Bergquist, Yuan Gao, Aldo Pacchiano, Tamas Sarlos, Adrian Weller, Vikas Sindhwani

Theoretical Foundations of Reinforcement Learning
Program Committee: Alon Cohen, Chris Dann

Uncertainty and Robustness in Deep Learning Workshop (UDL)
Invited Speaker: Justin Gilmer

Organizers: Sharon Li, Balaji Lakshminarayanan, Dan Hendrycks, Thomas Dietterich, Jasper Snoek
Program Committee: Jeremiah Liu, Jie Ren, Rodolphe Jenatton, Zack Nado, Alexander Alemi, Florian Wenzel, Mike Dusenberry, Raphael Lopes

Beyond First Order Methods in Machine Learning Systems
Industry Panel: Jonathan Hseu

Object-Oriented Learning: Perception, Representation, and Reasoning
Invited Speakers: Thomas Kipf, Igor Mordatch

Graph Representation Learning and Beyond (GRL+)
Organizers: Michael Bronstein, Andreea Deac, William L. Hamilton, Jessica B. Hamrick, Milad Hashemi, Stefanie Jegelka, Jure Leskovec, Renjie Liao, Federico Monti, Yizhou Sun, Kevin Swersky, Petar Veličković, Rex Ying, Marinka Žitnik
Speakers: Thomas Kipf
Program Committee: Bryan Perozzi, Kevin Swersky, Milad Hashemi, Thomas Kipf, Ting Cheng

ML Interpretability for Scientific Discovery
Organizers: Subhashini Venugopalan, Michael Brenner, Scott Linderman, Been Kim
Program Committee: Akinori Mitani, Arunachalam Narayanaswamy, Avinash Varadarajan, Awa Dieng, Benjamin Sanchez-Lengeling, Bo Dai, Stephan Hoyer, Subham Sekhar Sahoo, Suhani Vora
Steering Committee: John Platt, Mukund Sundararajan, Jon Kleinberg

Negative Dependence and Submodularity for Machine Learning
Organizers: Zelda Mariet, Mike Gartrell, Michal Derezinski

7th ICML Workshop on Automated Machine Learning (AutoML)
Organizers: Charles Weill, Katharina Eggensperger, Matthias Feurer, Frank Hutter, Marius Lindauer, Joaquin Vanschoren

Federated Learning for User Privacy and Data Confidentiality
Keynote: Brendan McMahan
Program Committee: Peter Kairouz, Jakub Konecný

MLRetrospectives: A Venue for Self-Reflection in ML Research
Speaker: Margaret Mitchell

Machine Learning for Media Discovery
Speaker: Ed Chi

INNF+: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models
Organizers: Chin-Wei Huang, David Krueger, Rianne van den Berg, George Papamakarios, Chris Cremer, Ricky Chen, Danilo Rezende

4th Lifelong Learning Workshop
Program Committee: George Tucker, Marlos C. Machado

2nd ICML Workshop on Human in the Loop Learning (HILL)
Organizers: Shanghang Zhang, Xin Wang, Fisher Yu, Jiajun Wu, Trevor Darrell

Machine Learning for Global Health
Organizers: Danielle Belgrave, Danielle Belgrave, Stephanie Hyland, Charles Onu, Nicholas Furnham, Ernest Mwebaze, Neil Lawrence

Committee
Social Chair: Adam White

Work performed while at Google

Source: Google AI Blog


Google at ACL 2020



This week, the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020), a premier conference covering a broad spectrum of research areas that are concerned with computational approaches to natural language, takes place online.

As a leader in natural language processing and understanding, and a Diamond Level sponsor of ACL 2020, Google will showcase the latest research in the field with over 30 publications, and the organization of and participation in a variety of workshops and tutorials.

If you’re registered for ACL 2020, we hope that you’ll visit the Google virtual booth to learn more about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about the Google research being presented at ACL 2020 below (Google affiliations bolded).

Committees
Diversity & Inclusion (D&I) Chair: Vinodkumar Prabhakaran
Accessibility Chair: Sushant Kafle
Local Sponsorship Chair: Kristina Toutanova
Virtual Infrastructure Committee: Yi Luan
Area Chairs: Anders Søgaard, Ankur Parikh, Annie Louis, Bhuvana Ramabhadran, Christo Kirov, Daniel Cer, Dipanjan Das, Diyi Yang, Emily Pitler, Eunsol Choi, George Foster, Idan Szpektor, Jacob Eisenstein, Jason Baldridge, Jun Suzuki, Kenton Lee, Luheng He, Marius Pasca, Ming-Wei Chang, Sebastian Gehrmann, Shashi Narayan, Slav Petrov, Vinodkumar Prabhakaran, Waleed Ammar, William Cohen

Long Papers
Cross-modal Language Generation using Pivot Stabilization for Web-scale Language Coverage
Ashish V. Thapliyal, Radu Soricut

Automatic Detection of Generated Text is Easiest when Humans are Fooled
Daphne Ippolito, Daniel Duckworth, Chris Callison-Burch, Douglas Eck

On Faithfulness and Factuality in Abstractive Summarization
Joshua Maynez, Shashi Narayan, Bernd Bohnet, Ryan McDonald

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou

BabyWalk: Going Farther in Vision-and-Language Navigation by Taking Baby Steps
Wang Zhu, Hexiang Hu, Jiacheng Chen, Zhiwei Deng, Vihan Jain, Eugene Ie, Fei Sha

Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
Xuanli He, Gholamreza Haffari, Mohammad Norouzi

GoEmotions: A Dataset of Fine-Grained Emotions
Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, Gaurav Nemade, Sujith Ravi

TaPas: Weakly Supervised Table Parsing via Pre-training (see blog post)
Jonathan Herzig, Pawel Krzysztof Nowak, Thomas Müller, Francesco Piccinno, Julian Eisenschlos

Toxicity Detection: Does Context Really Matter?
John Pavlopoulos, Jeffrey Sorensen, Lucas Dixon, Nithum Thain, Ion Androutsopoulos

(Re)construing Meaning in NLP
Sean Trott, Tiago Timponi Torrent, Nancy Chang, Nathan Schneider

Pretraining with Contrastive Sentence Objectives Improves Discourse Performance of Language Models
Dan Iter, Kelvin Guu, Larry Lansing, Dan Jurafsky

Probabilistic Assumptions Matter: Improved Models for Distantly-Supervised Document-Level Question Answering
Hao Cheng, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

AdvAug: Robust Adversarial Augmentation for Neural Machine Translation
Yong Cheng, Lu Jiang, Wolfgang Macherey, Jacob Eisenstein

Named Entity Recognition as Dependency Parsing
Juntao Yu, Bernd Bohnet, Massimo Poesio

Cross-modal Coherence Modeling for Caption Generation
Malihe Alikhani, Piyush Sharma, Shengjie Li, Radu Soricut, Matthew Stone

Representation Learning for Information Extraction from Form-like Documents (see blog post)
Bodhisattwa Prasad Majumder, Navneet Potti, Sandeep Tata, James Bradley Wendt, Qi Zhao, Marc Najork

Low-Dimensional Hyperbolic Knowledge Graph Embeddings
Ines Chami, Adva Wolf, Da-Cheng Juan, Frederic Sala, Sujith Ravi, Christopher Ré

What Question Answering can Learn from Trivia Nerds
Jordan Boyd-Graber, Benjamin Börschinger

Learning a Multi-Domain Curriculum for Neural Machine Translation
Wei Wang, Ye Tian, Jiquan Ngiam, Yinfei Yang, Isaac Caswell, Zarana Parekh

Translationese as a Language in "Multilingual" NMT
Parker Riley, Isaac Caswell, Markus Freitag, David Grangier

Mapping Natural Language Instructions to Mobile UI Action Sequences
Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, Jason Baldridge

BLEURT: Learning Robust Metrics for Text Generation (see blog post)
Thibault Sellam, Dipanjan Das, Ankur Parikh

Exploring Unexplored Generalization Challenges for Cross-Database Semantic Parsing
Alane Suhr, Ming-Wei Chang, Peter Shaw, Kenton Lee

Frugal Paradigm Completion
Alexander Erdmann, Tom Kenter, Markus Becker, Christian Schallhart

Short Papers
Reverse Engineering Configurations of Neural Text Generation Models
Yi Tay, Dara Bahri, Che Zheng, Clifford Brunk, Donald Metzler, Andrew Tomkins

Syntactic Data Augmentation Increases Robustness to Inference Heuristics
Junghyun Min, R. Thomas McCoy, Dipanjan Das, Emily Pitler, Tal Linzen

Leveraging Monolingual Data with Self-Supervision for Multilingual Neural Machine Translation
Aditya Siddhant, Ankur Bapna, Yuan Cao, Orhan Firat, Mia Chen, Sneha Kudugunta, Naveen Arivazhagan, Yonghui Wu

Social Biases in NLP Models as Barriers for Persons with Disabilities
Ben Hutchinson, Vinodkumar Prabhakaran, Emily Denton, Kellie Webster, Yu Zhong, Stephen Denuyl

Toward Better Storylines with Sentence-Level Language Models
Daphne Ippolito, David Grangier, Douglas Eck, Chris Callison-Burch

TACL Papers
TYDI QA: A Benchmark for Information-Seeking Question Answering in Typologically Diverse Languages (see blog post)
Jonathan H. Clark, Eunsol Choi, Michael Collins, Dan Garrette, Tom Kwiatkowski, Vitaly Nikolaev, Jennimaria Palomaki

Phonotactic Complexity and Its Trade-offs
Tiago Pimentel, Brian Roark, Ryan Cotterell

Demos
Multilingual Universal Sentence Encoder for Semantic Retrieval (see blog post)
Yinfei Yang, Daniel Cer, Amin Ahmad, Mandy Guo, Jax Law, Noah Constant, Gustavo Hernandez Abrego, Steve Yuan, Chris Tar, Yun-Hsuan Sung, Brian Strope, Ray Kurzweil

Workshops
IWPT - The 16th International Conference on Parsing Technologies
Yuji Matsumoto, Stephan Oepen, Kenji Sagae, Anders Søgaard, Weiwei Sun and Reut Tsarfaty

ALVR - Workshop on Advances in Language and Vision Research
Xin Wang, Jesse Thomason, Ronghang Hu, Xinlei Chen, Peter Anderson, Qi Wu, Asli Celikyilmaz, Jason Baldridge and William Yang Wang

WNGT - The 4th Workshop on Neural Generation and Translation
Alexandra Birch, Graham Neubig, Andrew Finch, Hiroaki Hayashi, Kenneth Heafield, Ioannis Konstas, Yusuke Oda and Xian Li

NLPMC - NLP for Medical Conversations
Parminder Bhatia, Chaitanya Shivade, Mona Diab, Byron Wallace, Rashmi Gangadharaiah, Nan Du, Izhak Shafran and Steven Lin

AutoSimTrans - The 1st Workshop on Automatic Simultaneous Translation
Hua Wu, Colin Cherry, James Cross, Liang Huang, Zhongjun He, Mark Liberman and Yang Liu

Tutorials
Interpretability and Analysis in Neural NLP (cutting-edge)
Yonatan Belinkov, Sebastian Gehrmann, Ellie Pavlick

Commonsense Reasoning for Natural Language Processing (Introductory)
Maarten Sap, Vered Shwartz, Antoine Bosselut, Yejin Choi, Dan Roth

Source: Google AI Blog


Google at CVPR 2020



This week marks the start of the fully virtual 2020 Conference on Computer Vision and Pattern Recognition (CVPR 2020), the premier annual computer vision event consisting of the main conference, workshops and tutorials. As a leader in computer vision research and a Supporter Level Virtual Sponsor, Google will have a strong presence at CVPR 2020, with nearly 70 publications accepted, along with the organization of, and participation in, multiple workshops/tutorials.

If you are participating in CVPR this year, please visit our virtual booth to learn about what Google is actively pursuing for the next generation of intelligent systems that utilize the latest machine learning techniques applied to various areas of machine perception.

You can also learn more about our research being presented at CVPR 2020 in the list below (Google affiliations are bolded).

Organizing Committee

General Chairs: Terry Boult, Gerard Medioni, Ramin Zabih
Program Chairs: Ce Liu, Greg Mori, Kate Saenko, Silvio Savarese
Workshop Chairs: Tal Hassner, Tali Dekel
Website Chairs: Tianfan Xue, Tian Lan
Technical Chair: Daniel Vlasic
Area Chairs include: Alexander Toshev, Alexey Dosovitskiy, Boqing Gong, Caroline Pantofaru, Chen Sun, Deqing Sun, Dilip Krishnan, Feng Yang, Liang-Chieh Chen, Michael Rubinstein, Rodrigo Benenson, Timnit Gebru, Thomas Funkhouser, Varun Jampani, Vittorio Ferrari, William Freeman

Oral Presentations

Evolving Losses for Unsupervised Video Representation Learning
AJ Piergiovanni, Anelia Angelova, Michael Ryoo

CvxNet: Learnable Convex Decomposition
Boyang Deng, Kyle Genova, Soroosh Yazdani, Sofien Bouaziz, Geoffrey Hinton, Andrea Tagliasacchi

Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise
Xuanqing Liu, Tesi Xiao, Si Si, Qin Cao, Sanjiv Kumar, Cho-Jui Hsieh

Scalability in Perception for Autonomous Driving: Waymo Open Dataset
Pei Sun, Henrik Kretzschmar, Xerxes Dotiwalla‎, Aurélien Chouard, Vijaysai Patnaik, Paul Tsui, James Guo, Yin Zhou, Yuning Chai, Benjamin Caine, Vijay Vasudevan, Wei Han, Jiquan Ngiam, Hang Zhao, Aleksei Timofeev‎, Scott Ettinger, Maxim Krivokon, Amy Gao, Aditya Joshi‎, Sheng Zhao, Shuyang Chen, Yu Zhang, Jon Shlens, Zhifeng Chen, Dragomir Anguelov

Deep Implicit Volume Compression
Saurabh Singh, Danhang Tang, Cem Keskin, Philip Chou, Christian Haene, Mingsong Dou, Sean Fanello, Jonathan Taylor, Andrea Tagliasacchi, Philip Davidson, Yinda Zhang, Onur Guleryuz, Shahram Izadi, Sofien Bouaziz

Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation from a Blackbox Model
Dongdong Wan, Yandong Li, Liqiang Wang, and Boqing Gong

Google Landmarks Dataset v2 - A Large-Scale Benchmark for Instance-Level Recognition and Retrieval (see the blog post)
Tobias Weyand, Andre Araujo, Jack Sim, Bingyi Cao

CycleISP: Real Image Restoration via Improved Data Synthesis
Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang, Ling Shao

Dynamic Graph Message Passing Networks
Li Zhang, Dan Xu, Anurag Arnab, Philip Torr

Local Deep Implicit Functions for 3D Shape
Kyle Genova, Forrester Cole, Avneesh Sud, Aaron Sarna, Thomas Funkhouser

GHUM & GHUML: Generative 3D Human Shape and Articulated Pose Models
Hongyi Xu, Eduard Gabriel Bazavan, Andrei Zanfir, William Freeman, Rahul Sukthankar, Cristian Sminchisescu

Search to Distill: Pearls are Everywhere but not the Eyes
Yu Liu, Xuhui Jia, Mingxing Tan, Raviteja Vemulapalli, Yukun Zhu, Bradley Green, Xiaogang Wang

Semantic Pyramid for Image Generation
Assaf Shocher, Yossi Gandelsman, Inbar Mosseri, Michal Yarom, Michal Irani, William Freeman, Tali Dekel

Flow Contrastive Estimation of Energy-Based Models
Ruiqi Gao, Erik Nijkamp, Diederik Kingma, Zhen Xu, Andrew Dai, Ying Nian Wu

Rethinking Class-Balanced Methods for Long-Tailed Visual Recognition from A Domain Adaptation Perspective
Muhammad Abdullah Jamal, Matthew Brown, Ming-Hsuan Yang, Liqiang Wang, Boqing Gong

Category-Level Articulated Object Pose Estimation
Xiaolong Li, He Wang, Li Yi, Leonidas Guibas, Amos Abbott, Shuran Song

AdaCoSeg: Adaptive Shape Co-Segmentation with Group Consistency Loss
Chenyang Zhu, Kai Xu, Siddhartha Chaudhuri, Li Yi, Leonidas Guibas, Hao Zhang

SpeedNet: Learning the Speediness in Videos
Sagie Benaim, Ariel Ephrat, Oran Lang, Inbar Mosseri, William Freeman, Michael Rubinstein, Michal Irani, Tali Dekel

BSP-Net: Generating Compact Meshes via Binary Space Partitioning
Zhiqin Chen, Andrea Tagliasacchi, Hao Zhang

SAPIEN: A SimulAted Part-based Interactive ENvironment
Fanbo Xiang, Yuzhe Qin, Kaichun Mo, Yikuan Xia, Hao Zhu, Fangchen Liu, Minghua Liu, Hanxiao Jiang, Yifu Yuan, He Wang, Li Yi, Angel Chang, Leonidas Guibas, Hao Su

SurfelGAN: Synthesizing Realistic Sensor Data for Autonomous Driving
Zhenpei Yang, Yuning Chai, Dragomir Anguelov, Yin Zhou, Pei Sun, Dumitru Erhan, Sean Rafferty, Henrik Kretzschmar

Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks
Saurabh Singh, Shankar Krishnan

RL-CycleGAN: Reinforcement Learning Aware Simulation-To-Real
Kanishka Rao, Chris Harris, Alex Irpan, Sergey Levine, Julian Ibarz, Mohi Khansari

Open Compound Domain Adaptation
Ziwei Liu, Zhongqi Miao, Xingang Pan, Xiaohang Zhan, Dahua Lin, Stella X.Yu, and Boqing Gong

Posters
Single-view view synthesis with multiplane images
Richard Tucker, Noah Snavely

Adversarial Examples Improve Image Recognition
Cihang Xie, Mingxing Tan, Boqing Gong, Jiang Wang, Alan Yuille, Quoc V. Le

Adversarial Texture Optimization from RGB-D Scans
Jingwei Huang, Justus Thies, Angela Dai, Abhijit Kundu, Chiyu “Max” Jiang,Leonidas Guibas, Matthias Niessner, Thomas Funkhouser

Single-Image HDR Reconstruction by Learning to Reverse the Camera Pipeline
Yu-Lun Liu, Wei-Sheng Lai, Yu-Sheng Chen, Yi-Lung Kao, Ming-Hsuan Yang,Yung-Yu Chuang, Jia-Bin Huang

Collaborative Distillation for Ultra-Resolution Universal Style Transfer
Huan Wang, Yijun Li, Yuehai Wang, Haoji Hu, Ming-Hsuan Yang

Learning to Autofocus
Charles Herrmann, Richard Strong Bowen, Neal Wadhwa, Rahul Garg, Qiurui He, Jonathan T. Barron, Ramin Zabih

Multi-Scale Boosted Dehazing Network with Dense Feature Fusion
Hang Dong, Jinshan Pan, Lei Xiang, Zhe Hu, Xinyi Zhang, Fei Wang, Ming-Hsuan Yang

Composing Good Shots by Exploiting Mutual Relations
Debang Li, Junge Zhang, Kaiqi Huang, Ming-Hsuan Yang

PatchVAE: Learning Local Latent Codes for Recognition
Kamal Gupta, Saurabh Singh, Abhinav Shrivastava

Neural Voxel Renderer: Learning an Accurate and Controllable Rendering Tool
Konstantinos Rematas, Vittorio Ferrari

Local Implicit Grid Representations for 3D Scenes
Chiyu “Max” Jiang, Avneesh Sud, Ameesh Makadia, Jingwei Huang, Matthias Niessner, Thomas Funkhouser

Large Scale Video Representation Learning via Relational Graph Clustering
Hyodong Lee, Joonseok Lee, Joe Yue-Hei Ng, Apostol (Paul) Natsev

Deep Homography Estimation for Dynamic Scenes
Hoang Le, Feng Liu, Shu Zhang, Aseem Agarwala

C-Flow: Conditional Generative Flow Models for Images and 3D Point Clouds
Albert Pumarola, Stefan Popov, Francesc Moreno-Noguer, Vittorio Ferrari

Lighthouse: Predicting Lighting Volumes for Spatially-Coherent Illumination
Pratul Srinivasan, Ben Mildenhall, Matthew Tancik, Jonathan T. Barron, Richard Tucker, Noah Snavely

Scale-space flow for end-to-end optimized video compression
Eirikur Agustsson, David Minnen, Nick Johnston, Johannes Ballé, Sung Jin Hwang, George Toderici

StructEdit: Learning Structural Shape Variations
Kaichun Mo, Paul Guerrero, Li Yi, Hao Su, Peter Wonka, Niloy Mitra, Leonidas Guibas

3D-MPA: Multi Proposal Aggregation for 3D Semantic Instance Segmentation
Francis Engelmann, Martin Bokeloh, Alireza Fathi, Bastian Leibe, Matthias Niessner

Sequential mastery of multiple tasks: Networks naturally learn to learn and forget to forget
Guy Davidson, Michael C. Mozer

Distilling Effective Supervision from Severe Label Noise
Zizhao Zhang, Han Zhang, Sercan Ö. Arik, Honglak Lee, Tomas Pfister

ViewAL: Active Learning With Viewpoint Entropy for Semantic Segmentation
Yawar Siddiqui, Julien Valentin, Matthias Niessner

Attribution in Scale and Space
Shawn Xu, Subhashini Venugopalan, Mukund Sundararajan

Weakly-Supervised Semantic Segmentation via Sub-category Exploration
Yu-Ting Chang, Qiaosong Wang, Wei-Chih Hung, Robinson Piramuthu, Yi-Hsuan Tsai, Ming-Hsuan Yang

Speech2Action: Cross-modal Supervision for Action Recognition
Arsha Nagrani, Chen Sun, David Ross, Rahul Sukthankar, Cordelia Schmid, Andrew Zisserman

Counting Out Time: Class Agnostic Video Repetition Counting in the Wild
Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman

The Garden of Forking Paths: Towards Multi-Future Trajectory Prediction
Junwei Liang, Lu Jiang, Kevin Murphy, Ting Yu, Alexander Hauptmann

Self-training with Noisy Student improves ImageNet classification
Qizhe Xie, Minh-Thang Luong, Eduard Hovy, Quoc V. Le

EfficientDet: Scalable and Efficient Object Detection (see the blog post)
Mingxing Tan, Ruoming Pang, Quoc Le

ACNe: Attentive Context Normalization for Robust Permutation-Equivariant Learning
Weiwei Sun, Wei Jiang, Eduard Trulls, Andrea Tagliasacchi, Kwang Moo Yi

VectorNet: Encoding HD Maps and Agent Dynamics from Vectorized Representation
Jiyang Gao, Chen Sun, Hang Zhao, Yi Shen, Dragomir Anguelov, Cordelia Schmid, Congcong Li

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization
Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc Le, Xiaodan Song

KeyPose: Multi-View 3D Labeling and Keypoint Estimation for Transparent Objects
Xingyu Liu, Rico Jonschkowski, Anelia Angelova, Kurt Konolige

Structured Multi-Hashing for Model Compression
Elad Eban, Yair Movshovitz-Attias, Hao Wu, Mark Sandler, Andrew Poon, Yerlan Idelbayev, Miguel A. Carreira-Perpinan

DOPS: Learning to Detect 3D Objects and Predict their 3D Shapes
Mahyar Najibi, Guangda Lai, Abhijit Kundu, Zhichao Lu, Vivek Rathod, Tom Funkhouser, Caroline Pantofaru, David Ross, Larry Davis, Alireza Fathi

Panoptic-DeepLab: A Simple, Strong, and Fast Baseline for Bottom-Up Panoptic Segmentation
Bowen Cheng, Maxwell Collins, Yukun Zhu, Ting Liu, Thomas S. Huang, Hartwig Adam, Liang-Chieh Chen

Context R-CNN: Long Term Temporal Context for Per-Camera Object Detection
Sara Beery, Guanhang Wu, Vivek Rathod, Ronny Votel, Jonathan Huang

Distortion Agnostic Deep Watermarking
Xiyang Luo, Ruohan Zhan, Huiwen Chang, Feng Yang, Peyman Milanfar

Can weight sharing outperform random architecture search? An investigation with TuNAS
Gabriel Bender, Hanxiao Liu, Bo Chen, Grace Chu, Shuyang Cheng, Pieter-Jan Kindermans, Quoc Le

GIFnets: Differentiable GIF Encoding Framework
Innfarn Yoo, Xiyang Luo, Yilin Wang, Feng Yang, Peyman Milanfar

Your Local GAN: Designing Two Dimensional Local Attention Mechanisms for Generative Models
Giannis Daras, Augustus Odena, Han Zhang, Alex Dimakis

Fast Sparse ConvNets
Erich Elsen, Marat Dukhan, Trevor Gale, Karen Simonyan

RetinaTrack: Online Single Stage Joint Detection and Tracking
Zhichao Lu, Vivek Rathod, Ronny Votel, Jonathan Huang

Learning to See Through Obstructions
Yu-Lun Liu, Wei-Sheng Lai, Ming-Hsuan Yang,Yung-Yu Chuang, Jia-Bin Huang

Self-Supervised Learning of Video-Induced Visual Invariances
Michael Tschannen, Josip Djolonga, Marvin Ritter, Aravindh Mahendran, Neil Houlsby, Sylvain Gelly, Mario Lucic

Workshops

3rd Workshop and Challenge on Learned Image Compression
Organizers include: George Toderici, Eirikur Agustsson, Lucas Theis, Johannes Ballé, Nick Johnston

CLVISION 1st Workshop on Continual Learning in Computer Vision
Organizers include: Zhiyuan (Brett) Chen, Marc Pickett

Embodied AI
Organizers include: Alexander Toshev, Jie Tan, Aleksandra Faust, Anelia Angelova

The 1st International Workshop and Prize Challenge on Agriculture-Vision: Challenges & Opportunities for Computer Vision in Agriculture
Organizers include: Zhen Li, Jim Yuan

Embodied AI
Organizers include: Alexander Toshev, Jie Tan, Aleksandra Faust, Anelia Angelova

New Trends in Image Restoration and Enhancement workshop and challenges on image and video restoration and enhancement (NTIRE)
Talk: “Sky Optimization: Semantically aware image processing of skies in low-light photography”
Orly Liba, Longqi Cai, Yun-Ta Tsai, Elad Eban, Yair Movshovitz-Attias, Yael Pritch, Huizhong Chen, Jonathan Barron

The End-of-End-to-End A Video Understanding Pentathlon
Organizers include: Rahul Sukthankar

4th Workshop on Media Forensics
Organizers include: Christoph Bregler

4th Workshop on Visual Understanding by Learning from Web Data
Organizers include: Jesse Berent, Rahul Sukthankar

AI for Content Creation
Organizers include: Deqing Sun, Lu Jiang, Weilong Yang

Fourth Workshop on Computer Vision for AR/VR
Organizers include: Sofien Bouaziz

Low-Power Computer Vision Competition (LPCVC)
Organizers include: Bo Chen, Andrew Howard, Jaeyoun Kim

Sight and Sound
Organizers include: William Freeman

Workshop on Efficient Deep Learning for Computer Vision
Organizers include: Pete Warden

Extreme classification in computer vision
Organizers include: Ramin Zabih, Zhen Li

Image Matching: Local Features and Beyond (see the blog post)
Organizers include: Eduard Trulls

The DAVIS Challenge on Video Object Segmentation
Organizers include: Alberto Montes, Jordi Pont-Tuset, Kevis-Kokitsi Maninis

2nd Workshop on Precognition: Seeing through the Future
Organizers include: Utsav Prabhu

Computational Cameras and Displays (CCD)
Talk: Orly Liba

2nd Workshop on Learning from Unlabeled Videos (LUV)
Organizers include:Honglak Lee, Rahul Sukthankar

7th Workshop on Fine Grained Visual Categorization (FGVC7) (see the blog post)
Organizers include: Christine Kaeser-Chen, Serge Belongie

Language & Vision with applications to Video Understanding
Organizers include: Lu Jiang

Neural Architecture Search and Beyond for Representation Learning
Organizers include: Barret Zoph

Tutorials

Disentangled 3D Representations for Relightable Performance Capture of Humans
Organizers include: Sean Fanello, Christoph Rhemann, Jonathan Taylor, Sofien Bouaziz, Adarsh Kowdle, Rohit Pandey, Sergio Orts-Escolano, Paul Debevec, Shahram Izadi

Learning Representations via Graph-Structured Networks
Organizers include:Chen Sun, Ming-Hsuan Yang

Novel View Synthesis: From Depth-Based Warping to Multi-Plane Images and Beyond
Organizers include:Varun Jampani

How to Write a Good Review
Talks by:Vittorio Ferrari, Bill Freeman, Jordi Pont-Tuset

Neural Rendering
Organizers include:Ricardo Martin-Brualla, Rohit K. Pandey, Sean Fanello,Maneesh Agrawala, Dan B. Goldman

Fairness Accountability Transparency and Ethics and Computer Vision
Organizers: Timnit Gebru, Emily Denton

Source: Google AI Blog