Tag Archives: ICML

Recursive Sketches for Modular Deep Learning



Much of classical machine learning (ML) focuses on utilizing available data to make more accurate predictions. More recently, researchers have considered other important objectives, such as how to design algorithms to be small, efficient, and robust. With these goals in mind, a natural research objective is the design of a system on top of neural networks that efficiently stores information encoded within—in other words, a mechanism to compute a succinct summary (a “sketch”) of how a complex deep network processes its inputs. Sketching is a rich field of study that dates back to the foundational work of Alon, Matias, and Szegedy, which can enable neural networks to efficiently summarize information about their inputs.

For example: Imagine stepping into a room and briefly viewing the objects within. Modern machine learning is excellent at answering immediate questions, known at training time, about this scene: “Is there a cat? How big is said cat?” Now, suppose we view this room every day over the course of a year. People can reminisce about the times they saw the room: “How often did the room contain a cat? Was it usually morning or night when we saw the room?”. However, can one design systems that are also capable of efficiently answering such memory-based questions even if they are unknown at training time?

In “Recursive Sketches for Modular Deep Learning”, recently presented at ICML 2019, we explore how to succinctly summarize how a machine learning model understands its input. We do this by augmenting an existing (already trained) machine learning model with “sketches” of its computation, using them to efficiently answer memory-based questions—for example, image-to-image-similarity and summary statistics—despite the fact that they take up much less memory than storing the entire original computation.

Basic Sketching Algorithms
In general, sketching algorithms take a vector x and produce an output sketch vector that behaves like x but whose storage cost is much smaller. The fact that the storage cost is much smaller allows one to succinctly store information about the network, which is critical for efficiently answering memory-based questions. In the simplest case, a linear sketch x is given by the matrix-vector product Ax where A is a wide matrix, i.e., the number of columns is equal to the original dimension of x and the number of rows is equal to the new reduced dimension. Such methods have led to a variety of efficient algorithms for basic tasks on massive datasets, such as estimating fundamental statistics (e.g., histogram, quantiles and interquartile range), finding popular items (known as frequent elements), as well as estimating the number of distinct elements (known as support size) and the related tasks of norms and entropy estimation.
A simple method to sketch the vector x is to multiply it by a wide matrix A to produce a lower-dimensional vector y.
This basic approach works well in the relatively simple case of linear regression, where it is possible to identify important data dimensions simply by the magnitude of weights (under the common assumption that they have uniform variance). However, many modern machine learning models are actually deep neural networks and are based on high-dimensional embeddings (such as Word2Vec, Image Embeddings, Glove, DeepWalk and BERT), which makes the task of summarizing the operation of the model on the input much more difficult. However, a large subset of these more complex networks are modular, allowing us to generate accurate sketches of their behavior, in spite of their complexity.

Neural Network Modularity
A modular deep network consists of several independent neural networks (modules) that only communicate via one’s output serving as another’s input. This concept has inspired several practical architectures, including Neural Modular Networks, Capsule Neural Networks and PathNet. It is also possible to split other canonical architectures to view them as modular networks and apply our approach. For example, convolutional neural networks (CNNs) are traditionally understood to behave in a modular fashion; they detect basic concepts and attributes in their lower layers and build up to detecting more complex objects in their higher layers. In this view, the convolution kernels correspond to modules. A cartoon depiction of a modular network is given below.
This is a cartoon depiction of a modular network for image processing. Data flows from the bottom of the figure to the top through the modules represented with blue boxes. Note that modules in the lower layers correspond to basic objects, such as edges in an image, while modules in upper layers correspond to more complex objects, like humans or cats. Also notice that in this imaginary modular network, the output of the face module is generic enough to be used by both the human and cat modules.
Sketch Requirements
To optimize our approach for these modular networks, we identified several desired properties that a network sketch should satisfy:
  • Sketch-to-Sketch Similarity: The sketches of two unrelated network operations (either in terms of the present modules or in terms of the attribute vectors) should be very different; on the other hand, the sketches of two similar network operations should be very close.
  • Attribute Recovery: The attribute vector, e.g., the activations of any node of the graph can be approximately recovered from the top-level sketch.
  • Summary Statistics: If there are multiple similar objects, we can recover summary statistics about them. For example, if an image has multiple cats, we can count how many there are. Note that we want to do this without knowing the questions ahead of time.
  • Graceful Erasure: Erasing a suffix of the top-level sketch maintains the above properties (but would smoothly increase the error).
  • Network Recovery: Given sufficiently many (input, sketch) pairs, the wiring of the edges of the network as well as the sketch function can be approximately recovered.
This is a 2D cartoon depiction of the sketch-to-sketch similarity property. Each vector represents a sketch and related sketches are more likely to cluster together.
The Sketching Mechanism
The sketching mechanism we propose can be applied to a pre-trained modular network. It produces a single top-level sketch summarizing the operation of this network, simultaneously satisfying all of the desired properties above. To understand how it does this, it helps to first consider a one-layer network. In this case, we ensure that all the information pertaining to a specific node is “packed” into two separate subspaces, one corresponding to the node itself and one corresponding to its associated module. Using suitable projections, the first subspace lets us recover the attributes of the node whereas the second subspace facilitates quick estimates of summary statistics. Both subspaces help enforce the aforementioned sketch-to-sketch similarity property. We demonstrate that these properties hold if all the involved subspaces are chosen independently at random.

Of course, extra care has to be taken when extending this idea to networks with more than one layer—which leads to our recursive sketching mechanism. Due to their recursive nature, these sketches can be “unrolled” to identify sub-components, capturing even complicated network structures. Finally, we utilize a dictionary learning algorithm tailored to our setup to prove that the random subspaces making up the sketching mechanism together with the network architecture can be recovered from a sufficiently large number of (input, sketch) pairs.

Future Directions
The question of succinctly summarizing the operation of a network seems to be closely related to that of model interpretability. It would be interesting to investigate whether ideas from the sketching literature can be applied to this domain. Our sketches could also be organized in a repository to implicitly form a “knowledge graph”, allowing patterns to be identified and quickly retrieved. Moreover, our sketching mechanism allows for seamlessly adding new modules to the sketch repository—it would be interesting to explore whether this feature can have applications to architecture search and evolving network topologies. Finally, our sketches can be viewed as a way of organizing previously encountered information in memory, e.g., images that share the same modules or attributes would share subcomponents of their sketches. This, on a very high level, is similar to the way humans use prior knowledge to recognize objects and generalize to unencountered situations.

Acknowledgements
This work was the joint effort of Badih Ghazi, Rina Panigrahy and Joshua R. Wang.

Source: Google AI Blog


Applying AutoML to Transformer Architectures



Since it was introduced a few years ago, Google’s Transformer architecture has been applied to challenges ranging from generating fantasy fiction to writing musical harmonies. Importantly, the Transformer’s high performance has demonstrated that feed forward neural networks can be as effective as recurrent neural networks when applied to sequence tasks, such as language modeling and translation. While the Transformer and other feed forward models used for sequence problems are rising in popularity, their architectures are almost exclusively manually designed, in contrast to the computer vision domain where AutoML approaches have found state-of-the-art models that outperform those that are designed by hand. Naturally, we wondered if the application of AutoML in the sequence domain could be equally successful.

After conducting an evolution-based neural architecture search (NAS), using translation as a proxy for sequence tasks in general, we found the Evolved Transformer, a new Transformer architecture that demonstrates promising improvements on a variety of natural language processing (NLP) tasks. Not only does the Evolved Transformer achieve state-of-the-art translation results, but it also demonstrates improved performance on language modeling when compared to the original Transformer. We are releasing this new model as part of Tensor2Tensor, where it can be used for any sequence problem.

Developing the Techniques
To begin the evolutionary NAS, it was necessary for us to develop new techniques, due to the fact that the task used to evaluate the “fitness” of each architecture, WMT’14 English-German translation, is computationally expensive. This makes the searches more expensive than similar searches executed in the vision domain, which can leverage smaller datasets, like CIFAR-10. The first of these techniques is warm starting—seeding the initial evolution population with the Transformer architecture instead of random models. This helps ground the search in an area of the search space we know is strong, thereby allowing it to find better models faster.

The second technique is a new method we developed called Progressive Dynamic Hurdles (PDH), an algorithm that augments the evolutionary search to allocate more resources to the strongest candidates, in contrast to previous works, where each candidate model of the NAS is allocated the same amount of resources when it is being evaluated. PDH allows us to terminate the evaluation of a model early if it is flagrantly bad, allowing promising architectures to be awarded more resources.

The Evolved Transformer
Using these methods, we conducted a large-scale NAS on our translation task and discovered the Evolved Transformer (ET). Like most sequence to sequence (seq2seq) neural network architectures, it has an encoder that encodes the input sequence into embeddings and a decoder that uses those embeddings to construct an output sequence; in the case of translation, the input sequence is the sentence to be translated and the output sequence is the translation.

The most interesting feature of the Evolved Transformer is the convolutional layers at the bottom of both its encoder and decoder modules that were added in a similar branching pattern in both places (i.e. the inputs run through two separate convolutional layers before being added together).
A comparison between the Evolved Transformer and the original Transformer encoder architectures. Notice the branched convolution structure at the bottom of the module, which formed in both the encoder and decoder independently. See our paper for a description of the decoder.
This is particularly interesting because the encoder and decoder architectures are not shared during the NAS, so this architecture was independently discovered as being useful in both the encoder and decoder, speaking to the strength of this design. Whereas the original Transformer relied solely on self-attention, the Evolved Transformer is a hybrid, leveraging the strengths of both self-attention and wide convolution.

Evaluation of the Evolved Transformer
To test the effectiveness of this new architecture, we first compared it to the original Transformer on the English-German translation task we used during the search. We found that the Evolved Transformer had better BLEU and perplexity performance at all parameter sizes, with the biggest gain at the size compatible with mobile devices (~7 million parameters), demonstrating an efficient use of parameters. At a larger size, the Evolved Transformer reaches state-of-the-art performance on WMT’ 14 En-De with a BLEU score of 29.8 and a SacreBLEU score of 29.2.
Comparison between the Evolved Transformer and the original Transformer on WMT’14 En-De at varying sizes. The biggest gains in performance occur at smaller sizes, while ET also shows strength at larger sizes, outperforming the largest Transformer with 37.6% less parameters (models to compare are circled in green). See Table 3 in our paper for the exact numbers.
To test generalizability, we also compared ET to the Transformer on additional NLP tasks. First, we looked at translation using different language pairs, and found ET demonstrated improved performance, with margins similar to those seen on English-German; again, due to its efficient use of parameters, the biggest improvements were observed for medium sized models. We also compared the decoders of both models on language modeling using LM1B, and saw a performance improvement of nearly 2 perplexity.
Future Work
These results are the first step in exploring the application of architecture search to feed forward sequence models. The Evolved Transformer is being open sourced as part of Tensor2Tensor, where it can be used for any sequence problem. To promote reproducibility, we are also open sourcing the search space we used for our search and a Colab with an implementation of Progressive Dynamic Hurdles. We look forward to seeing what the research community does with the new model and hope that others are able to build off of these new search techniques!

Source: Google AI Blog


Google at ICML 2019



Machine learning is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.

As a leader in machine learning research, Google is proud to be a Sapphire Sponsor of the thirty-sixth International Conference on Machine Learning (ICML 2019), a premier annual event supported by the International Machine Learning Society taking place this week in Long Beach, CA. With nearly 200 Googlers attending the conference to present publications and host workshops, we look forward to our continued collaboration with the larger machine learning research community.

If you're attending ICML 2019, we hope you'll visit the Google booth to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges, with researchers on hand to talk about Google Research Football Environment, AdaNet, Robotics at Google and much more. You can learn more about the Google research being presented at ICML 2019 in the list below (Google affiliations highlighted in blue).

ICML 2019 Committees
Board Members include: Andrew McCallum, Corinna Cortes, Hugo Larochelle, William Cohen (Emeritus)

Senior Area Chairs include: Charles Sutton, Claudio Gentile, Corinna Cortes, Kevin Murphy, Mehryar Mohri, Nati Srebro, Samy Bengio, Surya Ganguli

Area Chairs include: Jacob Abernethy, William Cohen, Dumitru Erhan, Cho-Jui Hsieh, Chelsea Finn, Sergey Levine, Manzil Zaheer, Sergei Vassilvitskii, Boqing Gong, Been Kim, Dale Schuurmans, Danny Tarlow, Dustin Tran, Hanie Sedghi, Honglak Lee, Jasper Snoek, Lihong Li, Minmin Chen, Mohammad Norouzi, Nicolas Le Roux, Phil Long, Sanmi Koyejo, Timnit Gebru, Vitaly Feldman, Satyen Kale, Katherine Heller, Hossein Mobahi, Amir Globerson, Ilya Tolstikhin, Marco Cuturi, Sebastian Nowozin, Amin Karbasi, Ohad Shamir, Graham Taylor

Accepted Publications
Learning to Groove with Inverse Sequence Transformations
Jon Gillick, Adam Roberts, Jesse Engel, Douglas Eck, David Bamman

Metric-Optimized Example Weights
Sen Zhao, Mahdi Milani Fard, Harikrishna Narasimhan, Maya Gupta

HOList: An Environment for Machine Learning of Higher Order Logic Theorem Proving
Kshitij Bansal, Sarah Loos, Markus Rabe, Christian Szegedy, Stewart Wilcox

Learning to Clear the Market
Weiran Shen, Sebastien Lahaie, Renato Paes Leme

Shape Constraints for Set Functions
Andrew Cotter, Maya Gupta, Heinrich Jiang, Erez Louidor, James Muller, Tamann Narayan, Serena Wang, Tao Zhu

Self-Attention Generative Adversarial Networks
Han Zhang, Ian Goodfellow, Dimitris Metaxas, Augustus Odena

High-Fidelity Image Generation With Fewer Labels
Mario Lučić, Michael Tschannen, Marvin Ritter, Xiaohua Zhai, Olivier Bachem, Sylvain Gelly

Learning Optimal Linear Regularizers
Matthew Streeter

DeepMDP: Learning Continuous Latent Space Models for Representation Learning
Carles Gelada, Saurabh Kumar, Jacob Buckman, Ofir Nachum, Marc G. Bellemare

kernelPSI: a Post-Selection Inference Framework for Nonlinear Variable Selection
Lotfi Slim, Clément Chatelain, Chloe-Agathe Azencott, Jean-Philippe Vert

Learning from a Learner
Alexis Jacq, Matthieu Geist, Ana Paiva, Olivier Pietquin

Rate Distortion For Model Compression:From Theory To Practice
Weihao Gao, Yu-Han Liu, Chong Wang, Sewoong Oh

An Investigation into Neural Net Optimization via Hessian Eigenvalue Density
Behrooz Ghorbani, Shankar Krishnan, Ying Xiao

Graph Matching Networks for Learning the Similarity of Graph Structured Objects
Yujia Li, Chenjie Gu, Thomas Dullien, Oriol Vinyals, Pushmeet Kohli

Subspace Robust Wasserstein Distances
François-Pierre Paty, Marco Cuturi

Training Well-Generalizing Classifiers for Fairness Metrics and Other Data-Dependent Constraints
Andrew Cotter, Maya Gupta, Heinrich Jiang, Nathan Srebro, Karthik Sridharan, Serena Wang, Blake Woodworth, Seungil You

The Effect of Network Width on Stochastic Gradient Descent and Generalization: an Empirical Study
Daniel Park, Jascha Sohl-Dickstein, Quoc Le, Samuel Smith

A Theory of Regularized Markov Decision Processes
Matthieu Geist, Bruno Scherrer, Olivier Pietquin

Area Attention
Yang Li, Łukasz Kaiser, Samy Bengio, Si Si

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan, Quoc Le

Static Automatic Batching In TensorFlow
Ashish Agarwal

The Evolved Transformer
David So, Quoc Le, Chen Liang

Policy Certificates: Towards Accountable Reinforcement Learning
Christoph Dann, Lihong Li, Wei Wei, Emma Brunskill

Self-similar Epochs: Value in Arrangement
Eliav Buchnik, Edith Cohen, Avinatan Hasidim, Yossi Matias

The Value Function Polytope in Reinforcement Learning
Robert Dadashi, Marc G. Bellemare, Adrien Ali Taiga, Nicolas Le Roux, Dale Schuurmans

Adversarial Examples Are a Natural Consequence of Test Error in Noise
Justin Gilmer, Nicolas Ford, Nicholas Carlini, Ekin Cubuk

SOLAR: Deep Structured Representations for Model-Based Reinforcement Learning
Marvin Zhang, Sharad Vikram, Laura Smith, Pieter Abbeel, Matthew Johnson, Sergey Levine

Garbage In, Reward Out: Bootstrapping Exploration in Multi-Armed Bandits
Branislav Kveton, Csaba Szepesvari, Sharan Vaswani, Zheng Wen, Tor Lattimore, Mohammad Ghavamzadeh

Imperceptible, Robust, and Targeted Adversarial Examples for Automatic Speech Recognition
Yao Qin, Nicholas Carlini, Garrison Cottrell, Ian Goodfellow, Colin Raffel

Direct Uncertainty Prediction for Medical Second Opinions
Maithra Raghu, Katy Blumer, Rory Sayres, Ziad Obermeyer, Bobby Kleinberg, Sendhil Mullainathan, Jon Kleinberg

A Large-Scale Study on Regularization and Normalization in GANs
Karol Kurach, Mario Lučić, Xiaohua Zhai, Marcin Michalski, Sylvain Gelly

Learning a Compressed Sensing Measurement Matrix via Gradient Unrolling
Shanshan Wu, Alex Dimakis, Sujay Sanghavi, Felix Yu, Daniel Holtmann-Rice, Dmitry Storcheus, Afshin Rostamizadeh, Sanjiv Kumar

NATTACK: Learning the Distributions of Adversarial Examples for an Improved Black-Box Attack on Deep Neural Networks
Yandong Li, Lijun Li, Liqiang Wang, Tong Zhang, Boqing Gong

Distributed Weighted Matching via Randomized Composable Coresets
Sepehr Assadi, Mohammad Hossein Bateni, Vahab Mirrokni

Monge blunts Bayes: Hardness Results for Adversarial Training
Zac Cranko, Aditya Menon, Richard Nock, Cheng Soon Ong, Zhan Shi, Christian Walder

Generalized Majorization-Minimization
Sobhan Naderi Parizi, Kun He, Reza Aghajani, Stan Sclaroff, Pedro Felzenszwalb

NAS-Bench-101: Towards Reproducible Neural Architecture Search
Chris Ying, Aaron Klein, Eric Christiansen, Esteban Real, Kevin Murphy, Frank Hutter

Variational Russian Roulette for Deep Bayesian Nonparametrics
Kai Xu, Akash Srivastava, Charles Sutton

Surrogate Losses for Online Learning of Stepsizes in Stochastic Non-Convex Optimization
Zhenxun Zhuang, Ashok Cutkosky, Francesco Orabona

Improved Parallel Algorithms for Density-Based Network Clustering
Mohsen Ghaffari, Silvio Lattanzi, Slobodan Mitrović

The Advantages of Multiple Classes for Reducing Overfitting from Test Set Reuse
Vitaly Feldman, Roy Frostig, Moritz Hardt

Submodular Streaming in All Its Glory: Tight Approximation, Minimum Memory and Low Adaptive Complexity
Ehsan Kazemi, Marko Mitrovic, Morteza Zadimoghaddam, Silvio Lattanzi, Amin Karbasi

Hiring Under Uncertainty
Manish Purohit, Sreenivas Gollapudi, Manish Raghavan

A Tree-Based Method for Fast Repeated Sampling of Determinantal Point Processes
Jennifer Gillenwater, Alex Kulesza, Zelda Mariet, Sergei Vassilvtiskii

Statistics and Samples in Distributional Reinforcement Learning
Mark Rowland, Robert Dadashi, Saurabh Kumar, Remi Munos, Marc G. Bellemare, Will Dabney

Provably Efficient Maximum Entropy Exploration
Elad Hazan, Sham Kakade, Karan Singh, Abby Van Soest

Active Learning with Disagreement Graphs
Corinna Cortes, Giulia DeSalvo,, Mehryar Mohri, Ningshan Zhang, Claudio Gentile

MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing
Sami Abu-El-Haija, Bryan Perozzi, Amol Kapoor, Nazanin Alipourfard, Kristina Lerman, Hrayr Harutyunyan, Greg Ver Steeg, Aram Galstyan

Understanding the Impact of Entropy on Policy Optimization
Zafarali Ahmed, Nicolas Le Roux, Mohammad Norouzi, Dale Schuurmans

Matrix-Free Preconditioning in Online Learning
Ashok Cutkosky, Tamas Sarlos

State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations
Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio, Michael Mozer

Online Convex Optimization in Adversarial Markov Decision Processes
Aviv Rosenberg, Yishay Mansour

Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy
Kareem Amin, Alex Kulesza, Andres Munoz Medina, Sergei Vassilvtiskii

Complementary-Label Learning for Arbitrary Losses and Models
Takashi Ishida, Gang Niu, Aditya Menon, Masashi Sugiyama

Learning Latent Dynamics for Planning from Pixels
Danijar Hafner, Timothy Lillicrap, Ian Fischer, Ruben Villegas, David Ha, Honglak Lee, James Davidson

Unifying Orthogonal Monte Carlo Methods
Krzysztof Choromanski, Mark Rowland, Wenyu Chen, Adrian Weller

Differentially Private Learning of Geometric Concepts
Haim Kaplan, Yishay Mansour, Yossi Matias, Uri Stemmer

Online Learning with Sleeping Experts and Feedback Graphs
Corinna Cortes, Giulia DeSalvo, Claudio Gentile, Mehryar Mohri, Scott Yang

Adaptive Scale-Invariant Online Algorithms for Learning Linear Models
Michal Kempka, Wojciech Kotlowski, Manfred K. Warmuth

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing
Augustus Odena, Catherine Olsson, David Andersen, Ian Goodfellow

Online Control with Adversarial Disturbances
Naman Agarwal, Brian Bullins, Elad Hazan, Sham Kakade, Karan Singh

Adversarial Online Learning with Noise
Alon Resler, Yishay Mansour

Escaping Saddle Points with Adaptive Gradient Methods
Matthew Staib, Sashank Reddi, Satyen Kale, Sanjiv Kumar, Suvrit Sra

Fairness Risk Measures
Robert Williamson, Aditya Menon

DBSCAN++: Towards Fast and Scalable Density Clustering
Jennifer Jang, Heinrich Jiang

Learning Linear-Quadratic Regulators Efficiently with only √T Regret
Alon Cohen, Tomer Koren, Yishay Mansour

Understanding and correcting pathologies in the training of learned optimizers
Luke Metz, Niru Maheswaranathan, Jeremy Nixon, Daniel Freeman, Jascha Sohl-Dickstein

Parameter-Efficient Transfer Learning for NLP
Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin De Laroussilhe, Andrea Gesmundo, Mona Attariyan, Sylvain Gelly

Efficient Full-Matrix Adaptive Regularization
Naman Agarwal, Brian Bullins, Xinyi Chen, Elad Hazan, Karan Singh, Cyril Zhang, Yi Zhang

Efficient On-Device Models Using Neural Projections
Sujith Ravi

Flexibly Fair Representation Learning by Disentanglement
Elliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, Richard Zemel

Recursive Sketches for Modular Deep Learning
Badih Ghazi, Rina Panigrahy, Joshua Wang

POLITEX: Regret Bounds for Policy Iteration Using Expert Prediction
Yasin Abbasi-Yadkori, Peter L. Bartlett, Kush Bhatia, Nevena Lazić, Csaba Szepesvári, Gellért Weisz

Anytime Online-to-Batch, Optimism and Acceleration
Ashok Cutkosky

Insertion Transformer: Flexible Sequence Generation via Insertion Operations
Mitchell Stern, William Chan, Jamie Kiros, Jakob Uszkoreit

Robust Inference via Generative Classifiers for Handling Noisy Labels
Kimin Lee, Sukmin Yun, Kibok Lee, Honglak Lee, Bo Li, Jinwoo Shin

A Better k-means++ Algorithm via Local Search
Silvio Lattanzi, Christian Sohler

Analyzing and Improving Representations with the Soft Nearest Neighbor Loss
Nicholas Frosst, Nicolas Papernot, Geoffrey Hinton

Learning to Generalize from Sparse and Underspecified Rewards
Rishabh Agarwal, Chen Liang, Dale Schuurmans, Mohammad Norouzi

MeanSum: A Neural Model for Unsupervised Multi-Document Abstractive Summarization
Eric Chu, Peter Liu

CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
Tom Kenter, Vincent Wan, Chun-An Chan, Rob Clark, Jakub Vit

Similarity of Neural Network Representations Revisited
Simon Kornblith, Mohammad Norouzi, Honglak Lee, Geoffrey Hinton

Online Algorithms for Rent-Or-Buy with Expert Advice
Sreenivas Gollapudi, Debmalya Panigrahi

Breaking the Softmax Bottleneck via Learnable Monotonic Pointwise Non-linearities
Octavian Ganea, Sylvain Gelly, Gary Becigneul, Aliaksei Severyn

Non-monotone Submodular Maximization with Nearly Optimal Adaptivity and Query Complexity
Matthew Fahrbach, Vahab Mirrokni, Morteza Zadimoghaddam

Agnostic Federated Learning
Mehryar Mohri, Gary Sivek, Ananda Theertha Suresh

Categorical Feature Compression via Submodular Optimization
Mohammad Hossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab Mirrokni, Afshin Rostamizadeh

Cross-Domain 3D Equivariant Image Embeddings
Carlos Esteves, Avneesh Sud, Zhengyi Luo, Kostas Daniilidis, Ameesh Makadia

Faster Algorithms for Binary Matrix Factorization
Ravi Kumar, Rina Panigrahy, Ali Rahimi, David Woodruff

On Variational Bounds of Mutual Information
Ben Poole, Sherjil Ozair, Aaron Van Den Oord, Alex Alemi, George Tucker

Guided Evolutionary Strategies: Augmenting Random Search with Surrogate Gradients
Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

Semi-Cyclic Stochastic Gradient Descent
Hubert Eichner, Tomer Koren, Brendan McMahan, Nathan Srebro, Kunal Talwar

Workshops
1st Workshop on Understanding and Improving Generalization in Deep Learning
Organizers Include: Dilip Krishnan, Hossein Mobahi
Invited Speaker: Chelsea Finn

Climate Change: How Can AI Help?
Invited Speaker: John Platt

Generative Modeling and Model-Based Reasoning for Robotics and AI
Organizers Include: Dumitru Erhan, Sergey Levine, Kimberly Stachenfeld
Invited Speaker: Chelsea Finn

Human In the Loop Learning (HILL)
Organizers Include: Been Kim

ICML 2019 Time Series Workshop
Organizers Include: Vitaly Kuznetsov

Joint Workshop on On-Device Machine Learning & Compact Deep Neural Network Representations (ODML-CDNNR)
Organizers Include: Sujith Ravi, Zornitsa Kozareva

Negative Dependence: Theory and Applications in Machine Learning
Organizers Include: Jennifer Gillenwater, Alex Kulesza

Reinforcement Learning for Real Life
Organizers Include: Lihong Li
Invited Speaker: Craig Boutilier

Uncertainty and Robustness in Deep Learning
Organizers Include: Justin Gilmer

Theoretical Physics for Deep Learning
Organizers Include: Jaehoon Lee, Jeffrey Pennington, Yasaman Bahri

Workshop on the Security and Privacy of Machine Learning
Organizers Include: Nicolas Papernot
Invited Speaker: Been Kim

Exploration in Reinforcement Learning Workshop
Organizers Include: Benjamin Eysenbach, Surya Bhupatiraju, Shixiang Gu

ICML Workshop on Imitation, Intent, and Interaction (I3)
Organizers Include: Sergey Levine, Chelsea Finn
Invited Speaker: Pierre Sermanet

Identifying and Understanding Deep Learning Phenomena
Organizers Include: Hanie Sedghi, Samy Bengio, Kenji Hata, Maithra Raghu, Ali Rahimi, Ying Xiao

Workshop on Multi-Task and Lifelong Reinforcement Learning
Organizers Include: Sarath Chandar, Chelsea Finn
Invited Speakers: Karol Hausman, Sergey Levine

Workshop on Self-Supervised Learning
Organizers Include: Pierre Sermanet

Invertible Neural Networks and Normalizing Flows
Organizers Include: Rianne Van den Berg, Danilo J. Rezende
Invited Speakers: Eric Jang, Laurent Dinh

Source: Google AI Blog


Seminal Ideas from 2007



It is not everyday we have the chance to pause and think about how previous work has led to current successes, how it influenced other advances and reinterpret it in today’s context. That’s what the ICML Test-of-Time Award is meant to achieve, and this year it was given to the work Sylvain Gelly, now a researcher on the Google Brain team in our Zurich office, and David Silver, now at DeepMind and lead researcher on AlphaGo, for their 2007 paper Combining Online and Offline Knowledge in UCT. This paper presented new approaches to incorporate knowledge, learned offline or created online on the fly, into a search algorithm to augment its effectiveness.

The Game of Go is an ancient Chinese board game, which has tremendous popularity with millions of players worldwide. Since the success of Deep Blue in the game of Chess in the late 90’s, Go has been considered as the next benchmark for machine learning and games. Indeed, it has simple rules, can be efficiently simulated, and progress can be measured objectively. However, due to the vast search space of possible moves, making an ML system capable of playing Go well represented a considerable challenge. Over the last two years, DeepMind’s AlphaGo has pushed the limit of what is possible with machine learning in games, bringing many innovations and technological advances in order to successfully defeat some of the best players in the world [1], [2], [3].

A little more than 10 years before the success of AlphaGo, the classical tree search techniques that were so successful in Chess were reigning in computer Go programs, but only reaching weak amateur level for human Go players. Thanks to Monte-Carlo Tree Search — a (then) new type of search algorithm based on sampling possible outcomes of the game from a position, and incrementally improving the search tree from the results of those simulations — computers were able to search much deeper in the game. This is important because it made it possible to incorporate less human knowledge in the programs — a task which is very hard to do right. Indeed, any missing knowledge that a human expert either cannot express or did not think about may create errors in the computer evaluation of the game position, and lead to blunders*.

In 2007, Sylvain and David augmented the Monte Carlo Tree Search techniques by exploring two types of knowledge incorporation: (i) online, where the decision for the next move is taken from the current position, using compute resources at the time when the next move is needed, and (ii) offline, where the learning process happens entirely before the game starts, and is summarized into a model that can be applied to all possible positions of a game (even though not all possible positions have been seen during the learning process). This ultimately led to the computer program MoGo, which showed an improvement in performance over previous Go algorithms.


For the online part, they adapted the simple idea that some actions don’t necessarily depend on each other. For example, if you need to book a vacation, the choice of the hotel, flight and car rental is obviously dependent on the choice of your destination. However, once given a destination, these things can be chosen (mostly) independently of each other. The same idea can be applied to Go, where some moves can be estimated partially independently of each other to get a very quick, albeit imprecise, estimate. Of course, when time is available, the exact dependencies are also analysed.

For offline knowledge incorporation, they explored the impact of learning an approximation of the position value with the computer playing against itself using reinforcement learning, adding that knowledge in the tree search algorithm. They also looked at how expert play patterns, based on human knowledge of the game, can be used in a similar way. That offline knowledge was used in two places; first, it helped focus the program on moves that looked similar to good moves it learned offline. Second, it helped simulate more realistic games when the program tried to estimate a given position value.

These improvements led to good success on the smaller version of the game of Go (9x9), even beating one professional player in an exhibition game, and also reaching a stronger amateur level on the full game (19x19). And in the years since 2007, we’ve seen many rapid advances (almost on a monthly basis) from researchers all over the world that have allowed the development of algorithms culminating in AlphaGo (which itself introduced many innovations).

Importantly, these algorithms and techniques are not limited to applications towards games, but also enable improvements in many domains. The contributions introduced by David and Sylvain in their collaboration 10 years ago were an important piece to many of the improvements and advancements in machine learning that benefit our lives daily, and we offer our sincere congratulations to both authors on this well-deserved award.


* As a side note, that’s why machine learning as a whole is such a powerful tool: replacing expert knowledge with algorithms that can more fully explore potential outcomes.

Google at ICML 2017



Machine learning (ML) is a key strategic focus at Google, with highly active groups pursuing research in virtually all aspects of the field, including deep learning and more classical algorithms, exploring theory as well as application. We utilize scalable tools and architectures to build machine learning systems that enable us to solve deep scientific and engineering challenges in areas of language, speech, translation, music, visual processing and more.

As a leader in ML research, Google is proud to be a Platinum Sponsor of the thirty-fourth International Conference on Machine Learning (ICML 2017), a premier annual event supported by the International Machine Learning Society taking place this week in Sydney, Australia. With over 130 Googlers attending the conference to present publications and host workshops, we look forward to our continued colalboration with the larger ML research community.

If you're attending ICML 2017, we hope you'll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving some of the field's most interesting challenges. Our researchers will also be available to talk about and demo several recent efforts, including the technology behind Facets, neural audio synthesis with Nsynth, a Q&A session on the Google Brain Residency program and much more. You can also learn more about our research being presented at ICML 2017 in the list below (Googlers highlighted in blue).

ICML 2017 Committees
Senior Program Committee includes: Alex Kulesza, Amr Ahmed, Andrew Dai, Corinna Cortes, George Dahl, Hugo Larochelle, Matthew Hoffman, Maya Gupta, Moritz Hardt, Quoc Le

Sponsorship Co-Chair: Ryan Adams

Publications
Robust Adversarial Reinforcement Learning
Lerrel Pinto, James Davidson, Rahul Sukthankar, Abhinav Gupta

Tight Bounds for Approximate Carathéodory and Beyond
Vahab Mirrokni, Renato Leme, Adrian Vladu, Sam Wong

Sharp Minima Can Generalize For Deep Nets
Laurent Dinh, Razvan Pascanu, Samy Bengio, Yoshua Bengio

Geometry of Neural Network Loss Surfaces via Random Matrix Theory
Jeffrey Pennington, Yasaman Bahri

Conditional Image Synthesis with Auxiliary Classifier GANs
Augustus Odena, Christopher Olah, Jon Shlens

Learning Deep Latent Gaussian Models with Markov Chain Monte Carlo
Maithra Raghu, Ben Poole, Surya Ganguli, Jon Kleinberg, Jascha Sohl-Dickstein

On the Expressive Power of Deep Neural Networks
Maithra Raghu, Ben Poole, Surya Ganguli, Jon Kleinberg, Jascha Sohl-Dickstein

AdaNet: Adaptive Structural Learning of Artificial Neural Networks
Corinna Cortes, Xavi Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang

Learned Optimizers that Scale and Generalize
Olga Wichrowska, Niru Maheswaranathan, Matthew Hoffman, Sergio Gomez, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

Adaptive Feature Selection: Computationally Efficient Online Sparse Linear Regression under RIP
Satyen Kale, Zohar Karnin, Tengyuan Liang, David Pal

Algorithms for ℓp Low-Rank Approximation
Flavio Chierichetti, Sreenivas Gollapudi, Ravi Kumar, Silvio Lattanzi, Rina Panigrahy, David Woodruff

Consistent k-Clustering
Silvio Lattanzi, Sergei Vassilvitskii

Input Switched Affine Networks: An RNN Architecture Designed for Interpretability
Jakob Foerster, Justin Gilmer, Jan Chorowski, Jascha Sohl-Dickstein, David Sussillo

Online and Linear-Time Attention by Enforcing Monotonic Alignments
Colin RaffelThang Luong, Peter Liu, Ron Weiss, Douglas Eck

Gradient Boosted Decision Trees for High Dimensional Sparse Output
Si Si, Huan Zhang, Sathiya Keerthi, Dhruv Mahajan, Inderjit Dhillon, Cho-Jui Hsieh

Sequence Tutor: Conservative fine-tuning of sequence generation models with KL-control
Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, Jose Hernandez-Lobato, Richard E Turner, Douglas Eck

Uniform Convergence Rates for Kernel Density Estimation
Heinrich Jiang

Density Level Set Estimation on Manifolds with DBSCAN
Heinrich Jiang

Maximum Selection and Ranking under Noisy Comparisons
Moein Falahatgar, Alon Orlitsky, Venkatadheeraj Pichapati, Ananda Suresh

Neural Audio Synthesis of Musical Notes with WaveNet Autoencoders
Cinjon Resnick, Adam Roberts, Jesse Engel, Douglas Eck, Sander Dieleman, Karen Simonyan, Mohammad Norouzi

Distributed Mean Estimation with Limited Communication
Ananda Suresh, Felix Yu, Sanjiv Kumar, Brendan McMahan

Learning to Generate Long-term Future via Hierarchical Prediction
Ruben Villegas, Jimei Yang, Yuliang Zou, Sungryull Sohn, Xunyu Lin, Honglak Lee

Variational Boosting: Iteratively Refining Posterior Approximations
Andrew Miller, Nicholas J Foti, Ryan Adams

RobustFill: Neural Program Learning under Noisy I/O
Jacob Devlin, Jonathan Uesato, Surya Bhupatiraju, Rishabh Singh, Abdel-rahman Mohamed, Pushmeet Kohli

A Unified Maximum Likelihood Approach for Estimating Symmetric Properties of Discrete Distributions
Jayadev Acharya, Hirakendu Das, Alon Orlitsky, Ananda Suresh

Axiomatic Attribution for Deep Networks
Ankur Taly, Qiqi Yan,,Mukund Sundararajan

Differentiable Programs with Neural Libraries
Alex L Gaunt, Marc Brockschmidt, Nate Kushman, Daniel Tarlow

Latent LSTM Allocation: Joint Clustering and Non-Linear Dynamic Modeling of Sequence Data
Manzil Zaheer, Amr Ahmed, Alex Smola

Device Placement Optimization with Reinforcement Learning
Azalia Mirhoseini, Hieu Pham, Quoc Le, Benoit Steiner, Mohammad Norouzi, Rasmus Larsen, Yuefeng Zhou, Naveen Kumar, Samy Bengio, Jeff Dean

Canopy — Fast Sampling with Cover Trees
Manzil Zaheer, Satwik Kottur, Amr Ahmed, Jose Moura, Alex Smola

Zero-Shot Task Generalization with Multi-Task Deep Reinforcement Learning
Junhyuk Oh, Satinder Singh, Honglak Lee, Pushmeet Kohli

Probabilistic Submodular Maximization in Sub-Linear Time
Serban Stan, Morteza Zadimoghaddam, Andreas Krause, Amin Karbasi

Deep Value Networks Learn to Evaluate and Iteratively Refine Structured Outputs
Michael Gygli, Mohammad Norouzi, Anelia Angelova

Stochastic Generative Hashing
Bo Dai, Ruiqi Guo, Sanjiv Kumar, Niao He, Le Song

Accelerating Eulerian Fluid Simulation With Convolutional Networks
Jonathan Tompson, Kristofer D Schlachter, Pablo Sprechmann, Ken Perlin

Large-Scale Evolution of Image Classifiers
Esteban Real, Sherry Moore, Andrew Selle, Saurabh Saxena, Yutaka Leon Suematsu, Jie Tan, Quoc Le, Alexey Kurakin

Neural Message Passing for Quantum Chemistry
Justin Gilmer, Samuel Schoenholz, Patrick Riley, Oriol Vinyals, George Dahl

Neural Optimizer Search with Reinforcement Learning
Irwan BelloBarret Zoph, Vijay Vasudevan, Quoc Le

Workshops
Implicit Generative Models
Organizers include: Ian Goodfellow

Learning to Generate Natural Language
Accepted Papers include:
Generating High-Quality and Informative Conversation Responses with Sequence-to-Sequence Models
Louis Shao, Stephan Gouws, Denny Britz, Anna Goldie, Brian Strope, Ray Kurzweil

Lifelong Learning: A Reinforcement Learning Approach
Accepted Papers include:
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

Principled Approaches to Deep Learning
Organizers include: Robert Gens
Program Committee includes: Jascha Sohl-Dickstein

Workshop on Human Interpretability in Machine Learning (WHI)
Organizers include: Been Kim

ICML Workshop on TinyML: ML on a Test-time Budget for IoT, Mobiles, and Other Applications
Invited speakers include: Sujith Ravi

Deep Structured Prediction
Organizers include: Gal Chechik, Ofer Meshi
Program Committee includes: Vitaly Kuznetsov, Kevin Murphy
Invited Speakers include: Ryan Adams
Accepted Papers include:
Filtering Variational Objectives
Chris J Maddison, Dieterich Lawson, George Tucker, Mohammad Norouzi, Nicolas Heess, Arnaud Doucet, Andriy Mnih, Yee Whye Teh
REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

Machine Learning in Speech and Language Processing
Organizers include: Tara Sainath
Invited speakers include: Ron Weiss

Picky Learners: Choosing Alternative Ways to Process Data
Invited speakers include: Tomer Koren
Organizers include: Corinna Cortes, Mehryar Mohri

Private and Secure Machine Learning
Keynote Speakers include: Ilya Mironov

Reproducibility in Machine Learning Research
Invited Speakers include: Hugo Larochelle, Francois Chollet
Organizers include: Samy Bengio

Time Series Workshop
Organizers include: Vitaly Kuznetsov

Tutorial
Interpretable Machine Learning
Presenters include: Been Kim


ICML 2016 & Research at Google



This week, New York hosts the 2016 International Conference on Machine Learning (ICML 2016), a premier annual Machine Learning event supported by the International Machine Learning Society (IMLS). Machine Learning is a key focus area at Google, with highly active research groups exploring virtually all aspects of the field, including deep learning and more classical algorithms.

We work on an extremely wide variety of machine learning problems that arise from a broad range of applications at Google. One particularly important setting is that of large-scale learning, where we utilize scalable tools and architectures to build machine learning systems that work with large volumes of data that often preclude the use of standard single-machine training algorithms. In doing so, we are able to solve deep scientific problems and engineering challenges, exploring theory as well as application, in areas of language, speech, translation, music, visual processing and more.

As Gold Sponsor, Google has a strong presence at ICML 2016 with many Googlers publishing their research and hosting workshops. If you’re attending, we hope you’ll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving interesting ML problems that impact millions of people. You can also learn more about our research being presented at ICML 2016 in the list below (Googlers highlighted in blue).

ICML 2016 Organizing Committee
Area Chairs include: Corinna Cortes, John Blitzer, Maya Gupta, Moritz Hardt, Samy Bengio

IMLS
Board Members include: Corinna Cortes

Accepted Papers
ADIOS: Architectures Deep In Output Space
Moustapha Cisse, Maruan Al-Shedivat, Samy Bengio

Associative Long Short-Term Memory
Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, Alex Graves

Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

Binary embeddings with structured hashed projections
Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann LeCun

Discrete Distribution Estimation Under Local Privacy
Peter Kairouz, Keith Bonawitz, Daniel Ramage

Dueling Network Architectures for Deep Reinforcement Learning (Best Paper Award recipient)
Ziyu Wang, Nando de Freitas, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot

Exploiting Cyclic Symmetry in Convolutional Neural Networks
Sander Dieleman, Jeffrey De Fauw, Koray Kavukcuoglu

Fast Constrained Submodular Maximization: Personalized Data Summarization
Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi

Greedy Column Subset Selection: New Bounds and Distributed Algorithms
Jason Altschuler, Aditya Bhaskara, Gang Fu, Vahab Mirrokni, Afshin Rostamizadeh, Morteza Zadimoghaddam

Horizontally Scalable Submodular Maximization
Mario Lucic, Olivier Bachem, Morteza Zadimoghaddam, Andreas Krause

Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

Meta-Learning with Memory-Augmented Neural Networks
Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap

One-Shot Generalization in Deep Generative Models
Danilo Rezende, Shakir Mohamed, Daan Wierstra

Pixel Recurrent Neural Networks (Best Paper Award recipient)
Aaron Van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

Pricing a low-regret seller
Hoda Heidari, Mohammad Mahdian, Umar Syed, Sergei Vassilvitskii, Sadra Yazdanbod

Primal-Dual Rates and Certificates
Celestine Dünner, Simone Forte, Martin Takac, Martin Jaggi

Recommendations as Treatments: Debiasing Learning and Evaluation
Tobias Schnabel, Thorsten Joachims, Adith Swaminathan, Ashudeep Singh, Navin Chandak

Recycling Randomness with Structure for Sublinear Time Kernel Expansions
Krzysztof Choromanski, Vikas Sindhwani

Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt, Ben Recht, Yoram Singer

Variational Inference for Monte Carlo Objectives
Andriy Mnih, Danilo Rezende

Workshops
Abstraction in Reinforcement Learning
Organizing Committee: Daniel Mankowitz, Timothy Mann, Shie Mannor
Invited Speaker: David Silver

Deep Learning Workshop
Organizers: Antoine Bordes, Kyunghyun Cho, Emily Denton, Nando de Freitas, Rob Fergus
Invited Speaker: Raia Hadsell

Neural Networks Back To The Future
Organizers: Léon Bottou, David Grangier, Tomas Mikolov, John Platt

Data-Efficient Machine Learning
Organizers: Marc Deisenroth, Shakir Mohamed, Finale Doshi-Velez, Andreas Krause, Max Welling

On-Device Intelligence
Organizers: Vikas Sindhwani, Daniel Ramage, Keith Bonawitz, Suyog Gupta, Sachin Talathi
Invited Speakers: Hartwig Adam, H. Brendan McMahan

Online Advertising Systems
Organizing Committee: Sharat Chikkerur, Hossein Azari, Edoardo Airoldi
Opening Remarks: Hossein Azari
Invited Speakers: Martin Pál, Todd Phillips

Anomaly Detection 2016
Organizing Committee: Nico Goernitz, Marius Kloft, Vitaly Kuznetsov

Tutorials
Deep Reinforcement Learning
David Silver

Rigorous Data Dredging: Theory and Tools for Adaptive Data Analysis
Moritz Hardt, Aaron Roth