Author Archives: Research Blog

Smart Compose: Using Neural Networks to Help Write Emails



Last week at Google I/O, we introduced Smart Compose, a new feature in Gmail that uses machine learning to interactively offer sentence completion suggestions as you type, allowing you to draft emails faster. Building upon technology developed for Smart Reply, Smart Compose offers a new way to help you compose messages — whether you are responding to an incoming email or drafting a new one from scratch.
In developing Smart Compose, there were a number of key challenges to face, including:
  • Latency: Since Smart Compose provides predictions on a per-keystroke basis, it must respond ideally within 100ms for the user not to notice any delays. Balancing model complexity and inference speed was a critical issue.
  • Scale: Gmail is used by more than 1.4 billion diverse users. In order to provide auto completions that are useful for all Gmail users, the model has to have enough modeling capacity so that it is able to make tailored suggestions in subtly different contexts.
  • Fairness and Privacy: In developing Smart Compose, we needed to address sources of potential bias in the training process, and had to adhere to the same rigorous user privacy standards as Smart Reply, making sure that our models never expose user’s private information. Furthermore, researchers had no access to emails, which meant they had to develop and train a machine learning system to work on a dataset that they themselves cannot read.
Finding the Right Model
Typical language generation models, such as ngramneural bag-of-words (BoW) and RNN language (RNN-LM) models, learn to predict the next word conditioned on the prefix word sequence. In an email, however, the words a user has typed in the current email composing session is only one “signal” a model can use to predict the next word. In order to incorporate more context about what the user wants to say, our model is also conditioned on the email subject and the previous email body (if the user is replying to an incoming email).

One approach to include this additional context is to cast the problem as a sequence-to-sequence (seq2seq) machine translation task, where the source sequence is the concatenation of the subject and the previous email body (if there is one), and the target sequence is the current email the user is composing. While this approach worked well in terms of prediction quality, it failed to meet our strict latency constraints by orders of magnitude.

To improve on this, we combined a BoW model with an RNN-LM, which is faster than the seq2seq models with only a slight sacrifice to model prediction quality. In this hybrid approach, we encode the subject and previous email by averaging the word embeddings in each field. We then join those averaged embeddings, and feed them to the target sequence RNN-LM at every decoding step, as the model diagram below shows.
Smart Compose RNN-LM model architecture. Subject and previous email message are encoded by averaging the word embeddings in each field. The averaged embeddings are then fed to the RNN-LM at each decoding step.
Accelerated Model Training & Serving
Of course, once we decided on this modeling approach we still had to tune various model hyperparameters and train the models over billions of examples, all of which can be very time-intensive. To speed things up, we used a full TPUv2 Pod to perform experiments. In doing so, we’re able to train a model to convergence in less than a day.

Even after training our faster hybrid model, our initial version of Smart Compose running on a standard CPU had an average serving latency of hundreds of milliseconds, which is still unacceptable for a feature that is trying to save users' time. Fortunately, TPUs can also be used at inference time to greatly speed up the user experience. By offloading the bulk of the computation onto TPUs, we improved the average latency to tens of milliseconds while also greatly increasing the number of requests that can be served by a single machine.

Fairness and Privacy
Fairness in machine learning is very important, as language understanding models can reflect human cognitive biases resulting in unwanted word associations and sentence completions. As Caliskan et al. point out in their recent paper “Semantics derived automatically from language corpora contain human-like biases”, these associations are deeply entangled in natural language data, which presents a considerable challenge to building any language model. We are actively researching ways to continue to reduce potential biases in our training procedures. Also, since Smart Compose is trained on billions of phrases and sentences, similar to the way spam machine learning models are trained, we have done extensive testing to make sure that only common phrases used by multiple users are memorized by our model, using findings from this paper.

Future work
We are constantly working on improving the suggestion quality of the language generation model by following state-of-the-art architectures (e.g., Transformer, RNMT+, etc.) and experimenting with most recent and advanced training techniques. We will deploy those more advanced models to production once our strict latency constraints can be met. We are also working on incorporating personal language models, designed to more accurately emulate an individual’s style of writing into our system.

Acknowledgements
Smart Compose language generation model was developed by Benjamin Lee, Mia Chen, Gagan Bansal, Justin Lu, Jackie Tsay, Kaushik Roy, Tobias Bosch, Yinan Wang, Matthew Dierker, Katherine Evans, Thomas Jablin, Dehao Chen, Vinu Rajashekhar, Akshay Agrawal, Yuan Cao, Shuyuan Zhang, Xiaobing Liu, Noam Shazeer, Andrew Dai, Zhifeng Chen, Rami Al-Rfou, DK Choe, Yunhsuan Sung, Brian Strope, Timothy Sohn, Yonghui Wu, and many others.

Source: Google AI Blog


Automatic Photography with Google Clips



To me, photography is the simultaneous recognition, in a fraction of a second, of the significance of an event as well as of a precise organization of forms which give that event its proper expression.
Henri Cartier-Bresson

The last few years have witnessed a Cambrian-like explosion in AI, with deep learning methods enabling computer vision algorithms to recognize many of the elements of a good photograph: people, smiles, pets, sunsets, famous landmarks and more. But, despite these recent advancements, automatic photography remains a very challenging problem. Can a camera capture a great moment automatically?

Recently, we released Google Clips, a new, hands-free camera that automatically captures interesting moments in your life. We designed Google Clips around three important principles:
  • We wanted all computations to be performed on-device. In addition to extending battery life and reducing latency, on-device processing means that none of your clips leave the device unless you decide to save or share them, which is a key privacy control.
  • We wanted the device to capture short videos, rather than single photographs. Moments with motion can be more poignant and true-to-memory, and it is often easier to shoot a video around a compelling moment than it is to capture a perfect, single instant in time.
  • We wanted to focus on capturing candid moments of people and pets, rather than the more abstract and subjective problem of capturing artistic images. That is, we did not attempt to teach Clips to think about composition, color balance, light, etc.; instead, Clips focuses on selecting ranges of time containing people and animals doing interesting activities.
Learning to Recognize Great Moments
How could we train an algorithm to recognize interesting moments? As with most machine learning problems, we started with a dataset. We created a dataset of thousands of videos in diverse scenarios where we imagined Clips being used. We also made sure our dataset represented a wide range of ethnicities, genders, and ages. We then hired expert photographers and video editors to pore over this footage to select the best short video segments. These early curations gave us examples for our algorithms to emulate. However, it is challenging to train an algorithm solely from the subjective selection of the curators — one needs a smooth gradient of labels to teach an algorithm to recognize the quality of content, ranging from "perfect" to "terrible."

To address this problem, we took a second data-collection approach, with the goal of creating a continuous quality score across the length of a video. We split each video into short segments (similar to the content Clips captures), randomly selected pairs of segments, and asked human raters to select the one they prefer.
We took this pairwise comparison approach, instead of having raters score videos directly, because it is much easier to choose the better of a pair than it is to specify a number. We found that raters were very consistent in pairwise comparisons, and less so when scoring directly. Given enough pairwise comparisons for any given video, we were able to compute a continuous quality score over the entire length. In this process, we collected over 50,000,000 pairwise comparisons on clips sampled from over 1,000 videos. That’s a lot of human effort!
Training a Clips Quality Model
Given this quality score training data, our next step was to train a neural network model to estimate the quality of any photograph captured by the device. We started with the basic assumption that knowing what’s in the photograph (e.g., people, dogs, trees, etc.) will help determine “interestingness”. If this assumption is correct, we could learn a function that uses the recognized content of the photograph to predict its quality score derived above from human comparisons.

To identify content labels in our training data, we leveraged the same Google machine learning technology that powers Google image search and Google Photos, which can recognize over 27,000 different labels describing objects, concepts, and actions. We certainly didn’t need all these labels, nor could we compute them all on device, so our expert photographers selected the few hundred labels they felt were most relevant to predicting the “interestingness” of a photograph. We also added the labels most highly correlated with the rater-derived quality scores.

Once we had this subset of labels, we then needed to design a compact, efficient model that could predict them for any given image, on-device, within strict power and thermal limits. This presented a challenge, as the deep learning techniques behind computer vision typically require strong desktop GPUs, and algorithms adapted to run on mobile devices lag far behind state-of-the-art techniques on desktop or cloud. To train this on-device model, we first took a large set of photographs and again used Google’s powerful, server-based recognition models to predict label confidence for each of the “interesting” labels described above. We then trained a MobileNet Image Content Model (ICM) to mimic the predictions of the server-based model. This compact model is capable of recognizing the most interesting elements of photographs, while ignoring non-relevant content.

The final step was to predict a single quality score for an input photograph from its content predicted by the ICM, using the 50M pairwise comparisons as training data. This score is computed with a piecewise linear regression model that combines the output of the ICM into a frame quality score. This frame quality score is averaged across the video segment to form a moment score. Given a pairwise comparison, our model should compute a moment score that is higher for the video segment preferred by humans. The model is trained so that its predictions match the human pairwise comparisons as well as possible.
Diagram of the training process for generating frame quality scores. Piecewise linear regression maps from an ICM embedding to a score which, when averaged across a video segment, yields a moment score. The moment score of the preferred segment should be higher.
This process allowed us to train a model that combines the power of Google image recognition technology with the wisdom of human raters–represented by 50 million opinions on what makes interesting content!

While this data-driven score does a great job of identifying interesting (and non-interesting) moments, we also added some bonuses to our overall quality score for phenomena that we know we want Clips to capture, including faces (especially recurring and thus “familiar” ones), smiles, and pets. In our most recent release, we added bonuses for certain activities that customers particularly want to capture, such as hugs, kisses, jumping, and dancing. Recognizing these activities required extensions to the ICM model.

Shot Control
Given this powerful model for predicting the “interestingness” of a scene, the Clips camera can decide which moments to capture in real-time. Its shot control algorithms follow three main principles:
  1. Respect Power & Thermals: We want the Clips battery to last roughly three hours, and we don’t want the device to overheat — the device can’t run at full throttle all the time. Clips spends much of its time in a low-power mode that captures one frame per second. If the quality of that frame exceeds a threshold set by how much Clips has recently shot, it moves into a high-power mode, capturing at 15 fps. Clips then saves a clip at the first quality peak encountered.
  2. Avoid Redundancy: We don’t want Clips to capture all of its moments at once, and ignore the rest of a session. Our algorithms therefore cluster moments into visually similar groups, and limit the number of clips in each cluster.
  3. The Benefit of Hindsight: It’s much easier to determine which clips are the best when you can examine the totality of clips captured. Clips therefore captures more moments than it intends to show to the user. When clips are ready to be transferred to the phone, the Clips device takes a second look at what it has shot, and only transfers the best and least redundant content.
Machine Learning Fairness
In addition to making sure our video dataset represented a diverse population, we also constructed several other tests to assess the fairness of our algorithms. We created controlled datasets by sampling subjects from different genders and skin tones in a balanced manner, while keeping variables like content type, duration, and environmental conditions constant. We then used this dataset to test that our algorithms had similar performance when applied to different groups. To help detect any regressions in fairness that might occur as we improved our moment quality models, we added fairness tests to our automated system. Any change to our software was run across this battery of tests, and was required to pass. It is important to note that this methodology can’t guarantee fairness, as we can’t test for every possible scenario and outcome. However, we believe that these steps are an important part of our long-term work to achieve fairness in ML algorithms.

Conclusion
Most machine learning algorithms are designed to estimate objective qualities – a photo contains a cat, or it doesn’t. In our case, we aim to capture a more elusive and subjective quality – whether a personal photograph is interesting, or not. We therefore combine the objective, semantic content of photographs with subjective human preferences to build the AI behind Google Clips. Also, Clips is designed to work alongside a person, rather than autonomously; to get good results, a person still needs to be conscious of framing, and make sure the camera is pointed at interesting content. We’re happy with how well Google Clips performs, and are excited to continue to improve our algorithms to capture that “perfect” moment!

Acknowledgements
The algorithms described here were conceived and implemented by a large group of Google engineers, research scientists, and others. Figures were made by Lior Shapira. Thanks to Lior and Juston Payne for video content.

Source: Google AI Blog


Custom On-Device ML Models with Learn2Compress



Successful deep learning models often require significant amounts of computational resources, memory and power to train and run, which presents an obstacle if you want them to perform well on mobile and IoT devices. On-device machine learning allows you to run inference directly on the devices, with the benefits of data privacy and access everywhere, regardless of connectivity. On-device ML systems, such as MobileNets and ProjectionNets, address the resource bottlenecks on mobile devices by optimizing for model efficiency. But what if you wanted to train your own customized, on-device models for your personal mobile application?

Yesterday at Google I/O, we announced ML Kit to make machine learning accessible for all mobile developers. One of the core ML Kit capabilities that will be available soon is an automatic model compression service powered by “Learn2Compress” technology developed by our research team. Learn2Compress enables custom on-device deep learning models in TensorFlow Lite that run efficiently on mobile devices, without developers having to worry about optimizing for memory and speed. We are pleased to make Learn2Compress for image classification available soon through ML Kit. Learn2Compress will be initially available to a small number of developers, and will be offered more broadly in the coming months. You can sign up here if you are interested in using this feature for building your own models.

How it Works
Learn2Compress generalizes the learning framework introduced in previous works like ProjectionNet and incorporates several state-of-the-art techniques for compressing neural network models. It takes as input a large pre-trained TensorFlow model provided by the user, performs training and optimization and automatically generates ready-to-use on-device models that are smaller in size, more memory-efficient, more power-efficient and faster at inference with minimal loss in accuracy.
Learn2Compress for automatically generating on-device ML models.
To do this, Learn2Compress uses multiple neural network optimization and compression techniques including:
  • Pruning reduces model size by removing weights or operations that are least useful for predictions (e.g.low-scoring weights). This can be very effective especially for on-device models involving sparse inputs or outputs, which can be reduced up to 2x in size while retaining 97% of the original prediction quality.
  • Quantization techniques are particularly effective when applied during training and can improve inference speed by reducing the number of bits used for model weights and activations. For example, using 8-bit fixed point representation instead of floats can speed up the model inference, reduce power and further reduce size by 4x.
  • Joint training and distillation approaches follow a teacher-student learning strategy — we use a larger teacher network (in this case, user-provided TensorFlow model) to train a compact student network (on-device model) with minimal loss in accuracy.
    Joint training and distillation approach to learn compact student models.
    The teacher network can be fixed (as in distillation) or jointly optimized, and even train multiple student models of different sizes simultaneously. So instead of a single model, Learn2Compress generates multiple on-device models in a single shot, at different sizes and inference speeds, and lets the developer pick one best suited for their application needs.
These and other techniques like transfer learning also make the compression process more efficient and scalable to large-scale datasets.

How well does it work?
To demonstrate the effectiveness of Learn2Compress, we used it to build compact on-device models of several state-of-the-art deep networks used in image and natural language tasks such as MobileNets, NASNet, Inception, ProjectionNet, among others. For a given task and dataset, we can generate multiple on-device models at different inference speeds and model sizes.
Accuracy at various sizes for Learn2Compress models and full-sized baseline networks on CIFAR-10 (left) and ImageNet (right) image classification tasks. Student networks used to produce the compressed variants for CIFAR-10 and ImageNet are modeled using NASNet and MobileNet-inspired architectures, respectively.
For image classification, Learn2Compress can generate small and fast models with good prediction accuracy suited for mobile applications. For example, on ImageNet task, Learn2Compress achieves a model 22x smaller than Inception v3 baseline and 4x smaller than MobileNet v1 baseline with just 4.6-7% drop in accuracy. On CIFAR-10, jointly training multiple Learn2Compress models with shared parameters, takes only 10% more time than training a single Learn2Compress large model, but yields 3 compressed models that are upto 94x smaller in size and upto 27x faster with up to 36x lower cost and good prediction quality (90-95% top-1 accuracy).
Computation cost and average prediction latency (on Pixel phone) for baseline and Learn2Compress models on CIFAR-10 image classification task. Learn2Compress-optimized models use NASNet-style network architecture.
We are also excited to see how well this performs on developer use-cases. For example, Fishbrain, a social platform for fishing enthusiasts, used Learn2Compress to compress their existing image classification cloud model (80MB+ in size and 91.8% top-3 accuracy) to a much smaller on-device model, less than 5MB in size, with similar accuracy. In some cases, we observe that it is possible for the compressed models to even slightly outperform the original large model’s accuracy due to better regularization effects.

We will continue to improve Learn2Compress with future advances in ML and deep learning, and extend to more use-cases beyond image classification. We are excited and looking forward to make this available soon through ML Kit’s compression service on the Cloud. We hope this will make it easy for developers to automatically build and optimize their own on-device ML models so that they can focus on building great apps and cool user experiences involving computer vision, natural language and other machine learning applications.

Acknowledgments
I would like to acknowledge our core contributors Gaurav Menghani, Prabhu Kaliamoorthi and Yicheng Fan along with Wei Chai, Kang Lee, Sheng Xu and Pannag Sanketi. Special thanks to Dave Burke, Brahim Elbouchikhi, Hrishikesh Aradhye, Hugues Vincent, and Arun Venkatesan from the Android team; Sachin Kotwani, Wesley Tarle, Pavel Jbanov and from the Firebase team; Andrei Broder, Andrew Tomkins, Robin Dua, Patrick McGregor, Gaurav Nemade, the Google Expander team and TensorFlow team.


Source: Google AI Blog


Google Duplex: An AI System for Accomplishing Real World Tasks Over the Phone



A long-standing goal of human-computer interaction has been to enable people to have a natural conversation with computers, as they would with each other. In recent years, we have witnessed a revolution in the ability of computers to understand and to generate natural speech, especially with the application of deep neural networks (e.g., Google voice search, WaveNet). Still, even with today’s state of the art systems, it is often frustrating having to talk to stilted computerized voices that don't understand natural language. In particular, automated phone systems are still struggling to recognize simple words and commands. They don’t engage in a conversation flow and force the caller to adjust to the system instead of the system adjusting to the caller.

Today we announce Google Duplex, a new technology for conducting natural conversations to carry out “real world” tasks over the phone. The technology is directed towards completing specific tasks, such as scheduling certain types of appointments. For such tasks, the system makes the conversational experience as natural as possible, allowing people to speak normally, like they would to another person, without having to adapt to a machine.

One of the key research insights was to constrain Duplex to closed domains, which are narrow enough to explore extensively. Duplex can only carry out natural conversations after being deeply trained in such domains. It cannot carry out general conversations.

Here are examples of Duplex making phone calls (using different voices):
Duplex scheduling a hair salon appointment:
Duplex calling a restaurant:

While sounding natural, these and other examples are conversations between a fully automatic computer system and real businesses.

The Google Duplex technology is built to sound natural, to make the conversation experience comfortable. It’s important to us that users and businesses have a good experience with this service, and transparency is a key part of that. We want to be clear about the intent of the call so businesses understand the context. We’ll be experimenting with the right approach over the coming months.

Conducting Natural Conversations
There are several challenges in conducting natural conversations: natural language is hard to understand, natural behavior is tricky to model, latency expectations require fast processing, and generating natural sounding speech, with the appropriate intonations, is difficult.

When people talk to each other, they use more complex sentences than when talking to computers. They often correct themselves mid-sentence, are more verbose than necessary, or omit words and rely on context instead; they also express a wide range of intents, sometimes in the same sentence, e.g., “So umm Tuesday through Thursday we are open 11 to 2, and then reopen 4 to 9, and then Friday, Saturday, Sunday we... or Friday, Saturday we're open 11 to 9 and then Sunday we're open 1 to 9.”
Example of complex statement:

In natural spontaneous speech people talk faster and less clearly than they do when they speak to a machine, so speech recognition is harder and we see higher word error rates. The problem is aggravated during phone calls, which often have loud background noises and sound quality issues.

In longer conversations, the same sentence can have very different meanings depending on context. For example, when booking reservations “Ok for 4” can mean the time of the reservation or the number of people. Often the relevant context might be several sentences back, a problem that gets compounded by the increased word error rate in phone calls.

Deciding what to say is a function of both the task and the state of the conversation. In addition, there are some common practices in natural conversations — implicit protocols that include elaborations (“for next Friday” “for when?” “for Friday next week, the 18th.”), syncs (“can you hear me?”), interruptions (“the number is 212-” “sorry can you start over?”), and pauses (“can you hold? [pause] thank you!” different meaning for a pause of 1 second vs 2 minutes).

Enter Duplex
Google Duplex’s conversations sound natural thanks to advances in understanding, interacting, timing, and speaking.

At the core of Duplex is a recurrent neural network (RNN) designed to cope with these challenges, built using TensorFlow Extended (TFX). To obtain its high precision, we trained Duplex’s RNN on a corpus of anonymized phone conversation data. The network uses the output of Google’s automatic speech recognition (ASR) technology, as well as features from the audio, the history of the conversation, the parameters of the conversation (e.g. the desired service for an appointment, or the current time of day) and more. We trained our understanding model separately for each task, but leveraged the shared corpus across tasks. Finally, we used hyperparameter optimization from TFX to further improve the model.
Incoming sound is processed through an ASR system. This produces text that is analyzed with context data and other inputs to produce a response text that is read aloud through the TTS system.
Duplex handling interruptions:
Duplex elaborating:
Duplex responding to a sync:

Sounding Natural
We use a combination of a concatenative text to speech (TTS) engine and a synthesis TTS engine (using Tacotron and WaveNet) to control intonation depending on the circumstance.

The system also sounds more natural thanks to the incorporation of speech disfluencies (e.g. “hmm”s and “uh”s). These are added when combining widely differing sound units in the concatenative TTS or adding synthetic waits, which allows the system to signal in a natural way that it is still processing. (This is what people often do when they are gathering their thoughts.) In user studies, we found that conversations using these disfluencies sound more familiar and natural.

Also, it’s important for latency to match people’s expectations. For example, after people say something simple, e.g., “hello?”, they expect an instant response, and are more sensitive to latency. When we detect that low latency is required, we use faster, low-confidence models (e.g. speech recognition or endpointing). In extreme cases, we don’t even wait for our RNN, and instead use faster approximations (usually coupled with more hesitant responses, as a person would do if they didn’t fully understand their counterpart). This allows us to have less than 100ms of response latency in these situations. Interestingly, in some situations, we found it was actually helpful to introduce more latency to make the conversation feel more natural — for example, when replying to a really complex sentence.

System Operation
The Google Duplex system is capable of carrying out sophisticated conversations and it completes the majority of its tasks fully autonomously, without human involvement. The system has a self-monitoring capability, which allows it to recognize the tasks it cannot complete autonomously (e.g., scheduling an unusually complex appointment). In these cases, it signals to a human operator, who can complete the task.

To train the system in a new domain, we use real-time supervised training. This is comparable to the training practices of many disciplines, where an instructor supervises a student as they are doing their job, providing guidance as needed, and making sure that the task is performed at the instructor’s level of quality. In the Duplex system, experienced operators act as the instructors. By monitoring the system as it makes phone calls in a new domain, they can affect the behavior of the system in real time as needed. This continues until the system performs at the desired quality level, at which point the supervision stops and the system can make calls autonomously.

Benefits for Businesses and Users
Businesses that rely on appointment bookings supported by Duplex, and are not yet powered by online systems, can benefit from Duplex by allowing customers to book through the Google Assistant without having to change any day-to-day practices or train employees. Using Duplex could also reduce no-shows to appointments by reminding customers about their upcoming appointments in a way that allows easy cancellation or rescheduling.
Duplex calling a restaurant:

In another example, customers often call businesses to inquire about information that is not available online such as hours of operation during a holiday. Duplex can call the business to inquire about open hours and make the information available online with Google, reducing the number of such calls businesses receive, while at the same time, making the information more accessible to everyone. Businesses can operate as they always have, there’s no learning curve or changes to make to benefit from this technology.
Duplex asking for holiday hours:

For users, Google Duplex is making supported tasks easier. Instead of making a phone call, the user simply interacts with the Google Assistant, and the call happens completely in the background without any user involvement.
A user asks the Google Assistant for an appointment, which the Assistant then schedules by having Duplex call the business.
Another benefit for users is that Duplex enables delegated communication with service providers in an asynchronous way, e.g., requesting reservations during off-hours, or with limited connectivity. It can also help address accessibility and language barriers, e.g., allowing hearing-impaired users, or users who don’t speak the local language, to carry out tasks over the phone.

This summer, we’ll start testing the Duplex technology within the Google Assistant, to help users make restaurant reservations, schedule hair salon appointments, and get holiday hours over the phone.
Yaniv Leviathan, Google Duplex lead, and Matan Kalman, engineering manager on the project, enjoying a meal booked through a call from Duplex.
Duplex calling to book the above meal:


Allowing people to interact with technology as naturally as they interact with each other has been a long standing promise. Google Duplex takes a step in this direction, making interaction with technology via natural conversation a reality in specific scenarios. We hope that these technology advances will ultimately contribute to a meaningful improvement in people’s experience in day-to-day interactions with computers.

Source: Google AI Blog


Deep Learning for Electronic Health Records



When patients get admitted to a hospital, they have many questions about what will happen next. When will I be able to go home? Will I get better? Will I have to come back to the hospital? Having precise answers to those questions helps doctors and nurses make care better, safer, and faster — if a patient’s health is deteriorating, doctors could be sent proactively to act before things get worse.

Predicting what will happen next is a natural application of machine learning. We wondered if the same types of machine learning that predict traffic during your commute or the next word in a translation from English to Spanish could be used for clinical predictions. For predictions to be useful in practice they should be, at least:
  1. Scalable: Predictions should be straightforward to create for any important outcome and for different hospital systems. Since healthcare data is very complicated and requires much data wrangling, this requirement is not straightforward to satisfy.
  2. Accurate: Predictions should alert clinicians to problems but not distract them with false alarms. With the widespread adoption of electronic health records, we set out to use that data to create more accurate prediction models.
Together with colleagues at UC San Francisco, Stanford Medicine, and The University of Chicago Medicine, we published “Scalable and Accurate Deep Learning with Electronic Health Records” in Nature Partner Journals: Digital Medicine, which contributes to these two aims.
We used deep learning models to make a broad set of predictions relevant to hospitalized patients using de-identified electronic health records. Importantly, we were able to use the data as-is, without the laborious manual effort typically required to extract, clean, harmonize, and transform relevant variables in those records. Our partners had removed sensitive individual information before we received it, and on our side, we protected the data using state-of-the-art security including logical separation, strict access controls, and encryption of data at rest and in transit.

Scalability
Electronic health records (EHRs) are tremendously complicated. Even a temperature measurement has a different meaning depending on if it’s taken under the tongue, through your eardrum, or on your forehead. And that's just a simple vital sign. Moreover, each health system customizes their EHR system, making the data collected at one hospital look different than data on a similar patient receiving similar care at another hospital. Before we could even apply machine learning, we needed a consistent way to represent patient records, which we built on top of the open Fast Healthcare Interoperability Resources (FHIR) standard as described in an earlier blog post.

Once in a consistent format, we did not have to manually select or harmonize the variables to use. Instead, for each prediction, a deep learning model reads all the data-points from earliest to most recent and then learns which data helps predict the outcome. Since there are thousands of data points involved, we had to develop some new types of deep learning modeling approaches based on recurrent neural networks (RNNs) and feedforward networks.
Data in a patient's record is represented as a timeline. For illustrative purposes, we display various types of clinical data (e.g. encounters, lab tests) by row. Each piece of data, indicated as a little grey dot, is stored in FHIR, an open data standard that can be used by any healthcare institution. A deep learning model analyzed a patient's chart by reading the timeline from left to right, from the beginning of a chart to the current hospitalization, and used this data to make different types of predictions.
Thus we engineered a computer system to render predictions without hand-crafting a new dataset for each task, in a scalable manner. But setting up the data is only one part of the work; the predictions also need to be accurate.

Prediction Accuracy
The most common way to assess accuracy is by a measure called the area-under-the-receiver-operator curve, which measures how well a model distinguishes between a patient who will have a particular future outcome compared to one who will not. In this metric, 1.00 is perfect, and 0.50 is no better than random chance, so higher numbers mean the model is more accurate. By this measure, the models we reported in the paper scored 0.86 in predicting if patients will stay long in the hospital (traditional logistic regression scored 0.76); they scored 0.95 in predicting inpatient mortality (traditional methods were 0.86), and they scored 0.77 in predicting unexpected readmissions after patients are discharged (traditional methods were 0.70). These gains were statistically significant.

We also used these models to identify the conditions for which the patients were being treated. For example, if a doctor prescribed ceftriaxone and doxycycline for a patient with an elevated temperature, fever and cough, the model could identify these as signals that the patient was being treated for pneumonia. We emphasize that the model is not diagnosing patients — it picks up signals about the patient, their treatments and notes written by their clinicians, so the model is more like a good listener than a master diagnostician.

An important focus of our work includes the interpretability of the deep learning models used. An “attention map” of each prediction shows the important data points considered by the models as they make that prediction. We show an example as a proof-of-concept and see this as an important part of what makes predictions useful for clinicians.
A deep learning model was used to render a prediction 24 hours after a patient was admitted to the hospital. The timeline (top of figure) contains months of historical data and the most recent data is shown enlarged in the middle. The model "attended" to information highlighted in red that was in the patient's chart to "explain" its prediction. In this case-study, the model highlighted pieces of information that make sense clinically. Figure from our paper.
What does this mean for patients and clinicians?
The results of this work are early and on retrospective data only. Indeed, this paper represents just the beginning of the work that is needed to test the hypothesis that machine learning can be used to make healthcare better. Doctors are already inundated with alerts and demands on their attention — could models help physicians with tedious, administrative tasks so they can better focus on the patient in front of them or ones that need extra attention? Can we help patients get high-quality care no matter where they seek it? We look forward to collaborating with doctors and patients to figure out the answers to these questions and more.

Source: Google AI Blog


Introducing Google AI



Hmm, have I made a wrong turn? I was looking for Google Research…

For the past several years, we’ve pursued research that reflects our commitment to make AI available for everyone. From computer vision to healthcare research to AutoML, we have increasingly put emphasis on implementing machine learning techniques in nearly everything we do at Google. Our research has been core to the development and integration of these systems into Google products and platforms.
To better reflect this commitment, we’re unifying our efforts under “Google AI”, which encompasses all the state-of-the-art research happening across Google. As part of this, we have expanded the Google AI website, and are renaming our existing Google Research channels, including this blog and the affiliated Twitter and Google+ channels, to Google AI. And if you’re looking for information that existed on research.google.com or the affiliated social channels, don’t fret, it’s all still there. Any links to previous Google Research website content, blog posts or tweets will redirect appropriately.

The Google AI channels will continue to showcase the breadth of Google research, innovation and publications, in addition to a lot more new and exciting content to come. We encourage you to explore! We look forward to continuing to bring you the latest updates and results from Google, in AI and across many other areas of research.

Source: Google AI Blog


The Question of Quantum Supremacy



Quantum computing integrates the two largest technological revolutions of the last half century, information technology and quantum mechanics. If we compute using the rules of quantum mechanics, instead of binary logic, some intractable computational tasks become feasible. An important goal in the pursuit of a universal quantum computer is the determination of the smallest computational task that is prohibitively hard for today’s classical computers. This crossover point is known as the “quantum supremacy” frontier, and is a critical step on the path to more powerful and useful computations.

In “Characterizing quantum supremacy in near-term devices” published in Nature Physics (arXiv here), we present the theoretical foundation for a practical demonstration of quantum supremacy in near-term devices. It proposes the task of sampling bit-strings from the output of random quantum circuits, which can be thought of as the “hello world” program for quantum computers. The upshot of the argument is that the output of random chaotic systems (think butterfly effect) become very quickly harder to predict the longer they run. If one makes a random, chaotic qubit system and examines how long a classical system would take to emulate it, one gets a good measure of when a quantum computer could outperform a classical one. Arguably, this is the strongest theoretical proposal to prove an exponential separation between the computational power of classical and quantum computers.

Determining where exactly the quantum supremacy frontier lies for sampling random quantum circuits has rapidly become an exciting area of research. On one hand, improvements in classical algorithms to simulate quantum circuits aim to increase the size of the quantum circuits required to establish quantum supremacy. This forces an experimental quantum device with a sufficiently large number of qubits and low enough error rates to implement circuits of sufficient depth (i.e the number of layers of gates in the circuit) to achieve supremacy. On the other hand, we now understand better how the particular choice of the quantum gates used to build random quantum circuits affects the simulation cost, leading to improved benchmarks for near-term quantum supremacy (available for download here), which are in some cases quadratically more expensive to simulate classically than the original proposal.

Sampling from random quantum circuits is an excellent calibration benchmark for quantum computers, which we call cross-entropy benchmarking. A successful quantum supremacy experiment with random circuits would demonstrate the basic building blocks for a large-scale fault-tolerant quantum computer. Furthermore, quantum physics has not yet been tested for highly complex quantum states such as this.
Space-time volume of a quantum circuit computation. The computational cost for quantum simulation increases with the volume of the quantum circuit, and in general grows exponentially with the number of qubits and the circuit depth. For asymmetric grids of qubits, the computational space-time volume grows slower with depth than for symmetric grids, and can result in circuits exponentially easier to simulate.
In “A blueprint for demonstrating quantum supremacy with superconducting qubits” (arXiv here), we illustrate a blueprint towards quantum supremacy and experimentally demonstrate a proof-of-principle version for the first time. In the paper, we discuss two key ingredients for quantum supremacy: exponential complexity and accurate computations. We start by running algorithms on subsections of the device ranging from 5 to 9 qubits. We find that the classical simulation cost grows exponentially with the number of qubits. These results are intended to provide a clear example of the exponential power of these devices. Next, we use cross-entropy benchmarking to compare our results against that of an ordinary computer and show that our computations are highly accurate. In fact, the error rate is low enough to achieve quantum supremacy with a larger quantum processor.

Beyond achieving quantum supremacy, a quantum platform should offer clear applications. In our paper, we apply our algorithms towards computational problems in quantum statistical-mechanics using complex multi-qubit gates (as opposed to the two-qubit gates designed for a digital quantum processor with surface code error correction). We show that our devices can be used to study fundamental properties of materials, e.g. microscopic differences between metals and insulators. By extending these results to next-generation devices with ~50 qubits, we hope to answer scientific questions that are beyond the capabilities of any other computing platform.
Photograph of two gmon superconducting qubits and their tunable coupler developed by Charles Neill and Pedram Roushan.
These two publications introduce a realistic proposal for near-term quantum supremacy, and demonstrate a proof-of-principle version for the first time. We will continue to decrease the error rates and increase the number of qubits in quantum processors to reach the quantum supremacy frontier, and to develop quantum algorithms for useful near-term applications.

Source: Google AI Blog


The Question of Quantum Supremacy



Quantum computing integrates the two largest technological revolutions of the last half century, information technology and quantum mechanics. If we compute using the rules of quantum mechanics, instead of binary logic, some intractable computational tasks become feasible. An important goal in the pursuit of a universal quantum computer is the determination of the smallest computational task that is prohibitively hard for today’s classical computers. This crossover point is known as the “quantum supremacy” frontier, and is a critical step on the path to more powerful and useful computations.

In “Characterizing quantum supremacy in near-term devices” published in Nature Physics (arXiv here), we present the theoretical foundation for a practical demonstration of quantum supremacy in near-term devices. It proposes the task of sampling bit-strings from the output of random quantum circuits, which can be thought of as the “hello world” program for quantum computers. The upshot of the argument is that the output of random chaotic systems (think butterfly effect) become very quickly harder to predict the longer they run. If one makes a random, chaotic qubit system and examines how long a classical system would take to emulate it, one gets a good measure of when a quantum computer could outperform a classical one. Arguably, this is the strongest theoretical proposal to prove an exponential separation between the computational power of classical and quantum computers.

Determining where exactly the quantum supremacy frontier lies for sampling random quantum circuits has rapidly become an exciting area of research. On one hand, improvements in classical algorithms to simulate quantum circuits aim to increase the size of the quantum circuits required to establish quantum supremacy. This forces an experimental quantum device with a sufficiently large number of qubits and low enough error rates to implement circuits of sufficient depth (i.e the number of layers of gates in the circuit) to achieve supremacy. On the other hand, we now understand better how the particular choice of the quantum gates used to build random quantum circuits affects the simulation cost, leading to improved benchmarks for near-term quantum supremacy (available for download here), which are in some cases quadratically more expensive to simulate classically than the original proposal.

Sampling from random quantum circuits is an excellent calibration benchmark for quantum computers, which we call cross-entropy benchmarking. A successful quantum supremacy experiment with random circuits would demonstrate the basic building blocks for a large-scale fault-tolerant quantum computer. Furthermore, quantum physics has not yet been tested for highly complex quantum states such as this.
Space-time volume of a quantum circuit computation. The computational cost for quantum simulation increases with the volume of the quantum circuit, and in general grows exponentially with the number of qubits and the circuit depth. For asymmetric grids of qubits, the computational space-time volume grows slower with depth than for symmetric grids, and can result in circuits exponentially easier to simulate.
In “A blueprint for demonstrating quantum supremacy with superconducting qubits” (arXiv here), we illustrate a blueprint towards quantum supremacy and experimentally demonstrate a proof-of-principle version for the first time. In the paper, we discuss two key ingredients for quantum supremacy: exponential complexity and accurate computations. We start by running algorithms on subsections of the device ranging from 5 to 9 qubits. We find that the classical simulation cost grows exponentially with the number of qubits. These results are intended to provide a clear example of the exponential power of these devices. Next, we use cross-entropy benchmarking to compare our results against that of an ordinary computer and show that our computations are highly accurate. In fact, the error rate is low enough to achieve quantum supremacy with a larger quantum processor.

Beyond achieving quantum supremacy, a quantum platform should offer clear applications. In our paper, we apply our algorithms towards computational problems in quantum statistical-mechanics using complex multi-qubit gates (as opposed to the two-qubit gates designed for a digital quantum processor with surface code error correction). We show that our devices can be used to study fundamental properties of materials, e.g. microscopic differences between metals and insulators. By extending these results to next-generation devices with ~50 qubits, we hope to answer scientific questions that are beyond the capabilities of any other computing platform.
Photograph of two gmon superconducting qubits and their tunable coupler developed by Charles Neill and Pedram Roushan.
These two publications introduce a realistic proposal for near-term quantum supremacy, and demonstrate a proof-of-principle version for the first time. We will continue to decrease the error rates and increase the number of qubits in quantum processors to reach the quantum supremacy frontier, and to develop quantum algorithms for useful near-term applications.

Announcing Open Images V4 and the ECCV 2018 Open Images Challenge



In 2016, we introduced Open Images, a collaborative release of ~9 million images annotated with labels spanning thousands of object categories. Since its initial release, we've been hard at work updating and refining the dataset, in order to provide a useful resource for the computer vision community to develop new models

Today, we are happy to announce Open Images V4, containing 15.4M bounding-boxes for 600 categories on 1.9M images, making it the largest existing dataset with object location annotations. The boxes have been largely manually drawn by professional annotators to ensure accuracy and consistency. The images are very diverse and often contain complex scenes with several objects (8 per image on average; visualizer).
Annotated images from the Open Images dataset. Left: Mark Paul Gosselaar plays the guitar by Rhys A. Right: Civilization by Paul Downey. Both images used under CC BY 2.0 license.
In conjunction with this release, we are also introducing the Open Images Challenge, a new object detection challenge to be held at the 2018 European Conference on Computer Vision (ECCV 2018). The Open Images Challenge follows in the tradition of PASCAL VOC, ImageNet and COCO, but at an unprecedented scale.

This challenge is unique in several ways:
  • 12.2M bounding-box annotations for 500 categories on 1.7M training images,
  • A broader range of categories than previous detection challenges, including new objects such as “fedora” and “snowman”.
  • In addition to the object detection main track, the challenge includes a Visual Relationship Detection track, on detecting pairs of objects in particular relations, e.g. “woman playing guitar”.
The training set is available now. A test set of 100k images will be released on July 1st 2018 by Kaggle. Deadline for submission of results is on September 1st 2018. We hope that the very large training set will stimulate research into more sophisticated detection models that will exceed current state-of-the-art performance, and that the 500 categories will enable a more precise assessment of where different detectors perform best. Furthermore, having a large set of images with many objects annotated enables to explore Visual Relationship Detection, which is a hot emerging topic with a growing sub-community.

In addition to the above, Open Images V4 also contains 30.1M human-verified image-level labels for 19,794 categories, which are not part of the Challenge. The dataset includes 5.5M image-level labels generated by tens of thousands of users from all over the world at crowdsource.google.com.

Google at ICLR 2018



This week, Vancouver, Canada hosts the 6th International Conference on Learning Representations (ICLR 2018), a conference focused on how one can learn meaningful and useful representations of data for machine learning. ICLR includes conference and workshop tracks, with invited talks along with oral and poster presentations of some of the latest research on deep learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization.

At the forefront of innovation in cutting-edge technology in neural networks and deep learning, Google focuses on both theory and application, developing learning approaches to understand and generalize. As Platinum Sponsor of ICLR 2018, Google will have a strong presence with over 130 researchers attending, contributing to and learning from the broader academic research community by presenting papers and posters, in addition to participating on organizing committees and in workshops.

If you are attending ICLR 2018, we hope you'll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about our research being presented at ICLR 2018 in the list below (Googlers highlighted in blue)

Senior Program Chairs include:
Tara Sainath

Steering Committee includes:
Hugo Larochelle

Oral Contributions
Wasserstein Auto-Encoders
Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, Bernhard Scholkopf

On the Convergence of Adam and Beyond (Best Paper Award)
Sashank J. Reddi, Satyen Kale, Sanjiv Kumar

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning
Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Wojciech Gajewski, Andrea Gesmundo, Neil Houlsby, Wei Wang

Beyond Word Importance: Contextual Decompositions to Extract Interactions from LSTMs
W. James Murdoch, Peter J. Liu, Bin Yu

Conference Posters
Boosting the Actor with Dual Critic
Bo Dai, Albert Shaw, Niao He, Lihong Li, Le Song

MaskGAN: Better Text Generation via Filling in the _______
William Fedus, Ian Goodfellow, Andrew M. Dai

Scalable Private Learning with PATE
Nicolas Papernot, Shuang Song, Ilya Mironov, Ananth Raghunathan, Kunal Talwar, Ulfar Erlingsson

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches
Yeming Wen, Paul Vicol, Jimmy Ba, Dustin Tran, Roger Grosse

Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models
Adam Roberts, Jesse Engel, Matt Hoffman

Multi-Mention Learning for Reading Comprehension with Neural Cascades
Swabha Swayamdipta, Ankur P. Parikh, Tom Kwiatkowski

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension
Adams Wei Yu, David Dohan, Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le

Sensitivity and Generalization in Neural Networks: An Empirical Study
Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

Action-dependent Control Variates for Policy Optimization via Stein Identity
Hao Liu, Yihao Feng, Yi Mao, Dengyong Zhou, Jian Peng, Qiang Liu

An Efficient Framework for Learning Sentence Representations
Lajanugen Logeswaran, Honglak Lee

Fidelity-Weighted Learning
Mostafa Dehghani, Arash Mehrjou, Stephan Gouws, Jaap Kamps, Bernhard Schölkopf

Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu, Mohammad Saleh, Etienne Pot, Ben Goodrich, Ryan Sepassi, Lukasz Kaiser, Noam Shazeer

Matrix Capsules with EM Routing
Geoffrey Hinton, Sara Sabour, Nicholas Frosst

Temporal Difference Models: Model-Free Deep RL for Model-Based Control
Sergey Levine, Shixiang Gu, Murtaza Dalal, Vitchyr Pong

Deep Neural Networks as Gaussian Processes
Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel L. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

Many Paths to Equilibrium: GANs Do Not Need to Decrease a Divergence at Every Step
William Fedus, Mihaela Rosca, Balaji Lakshminarayanan, Andrew M. Dai, Shakir Mohamed, Ian Goodfellow

Initialization Matters: Orthogonal Predictive State Recurrent Neural Networks
Krzysztof Choromanski, Carlton Downey, Byron Boots

Learning Differentially Private Recurrent Language Models
H. Brendan McMahan, Daniel Ramage, Kunal Talwar, Li Zhang

Learning Latent Permutations with Gumbel-Sinkhorn Networks
Gonzalo Mena, David Belanger, Scott Linderman, Jasper Snoek

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning
Benjamin Eysenbach, Shixiang Gu, Julian IbarzSergey Levine

Meta-Learning for Semi-Supervised Few-Shot Classification
Mengye Ren, Eleni Triantafillou, Sachin Ravi, Jake Snell, Kevin Swersky, Josh Tenenbaum, Hugo Larochelle, Richard Zemel

Thermometer Encoding: One Hot Way to Resist Adversarial Examples
Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

A Hierarchical Model for Device Placement
Azalia Mirhoseini, Anna Goldie, Hieu Pham, Benoit Steiner, Quoc V. LeJeff Dean

Monotonic Chunkwise Attention
Chung-Cheng Chiu, Colin Raffel

Training Confidence-calibrated Classifiers for Detecting Out-of-Distribution Samples
Kimin Lee, Honglak Lee, Kibok Lee, Jinwoo Shin

Trust-PCL: An Off-Policy Trust Region Method for Continuous Control
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

Ensemble Adversarial Training: Attacks and Defenses
Florian Tramèr, Alexey Kurakin, Nicolas Papernot, Ian Goodfellow, Dan Boneh, Patrick McDaniel

Stochastic Variational Video Prediction
Mohammad Babaeizadeh, Chelsea Finn, Dumitru Erhan, Roy Campbell, Sergey Levine

Depthwise Separable Convolutions for Neural Machine Translation
Lukasz Kaiser, Aidan N. Gomez, Francois Chollet

Don’t Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith, Pieter-Jan Kindermans, Chris Ying, Quoc V. Le

Generative Models of Visually Grounded Imagination
Ramakrishna Vedantam, Ian Fischer, Jonathan Huang, Kevin Murphy

Large Scale Distributed Neural Network Training through Online Distillation
Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton

Learning a Neural Response Metric for Retinal Prosthesis
Nishal P. Shah, Sasidhar Madugula, Alan Litke, Alexander Sher, EJ Chichilnisky, Yoram Singer, Jonathon Shlens

Neumann Optimizer: A Practical Optimization Algorithm for Deep Neural Networks
Shankar Krishnan, Ying Xiao, Rif A. Saurous

A Neural Representation of Sketch Drawings
David HaDouglas Eck

Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling
Carlos Riquelme, George Tucker, Jasper Snoek

Generalizing Hamiltonian Monte Carlo with Neural Networks
Daniel Levy, Matthew D. HoffmanJascha Sohl-Dickstein

Leveraging Grammar and Reinforcement Learning for Neural Program Synthesis
Rudy Bunel, Matthew Hausknecht, Jacob Devlin, Rishabh Singh, Pushmeet Kohli

On the Discrimination-Generalization Tradeoff in GANs
Pengchuan Zhang, Qiang Liu, Dengyong Zhou, Tao Xu, Xiaodong He

A Bayesian Perspective on Generalization and Stochastic Gradient Descent
Samuel L. Smith, Quoc V. Le

Learning how to Explain Neural Networks: PatternNet and PatternAttribution
Pieter-Jan Kindermans, Kristof T. Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, Sven Dähne

Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Víctor Campos, Brendan Jou, Xavier Giró-i-Nieto, Jordi Torres, Shih-Fu Chang

Towards Neural Phrase-based Machine Translation
Po-Sen Huang, Chong Wang, Sitao Huang, Dengyong Zhou, Li Deng

Unsupervised Cipher Cracking Using Discrete GANs
Aidan N. Gomez, Sicong Huang, Ivan Zhang, Bryan M. Li, Muhammad Osama, Lukasz Kaiser

Variational Image Compression With A Scale Hyperprior
Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, Nick Johnston

Workshop Posters
Local Explanation Methods for Deep Neural Networks Lack Sensitivity to Parameter Values
Julius Adebayo, Justin Gilmer, Ian Goodfellow, Been Kim

Stoachastic Gradient Langevin Dynamics that Exploit Neural Network Structure
Zachary Nado, Jasper Snoek, Bowen Xu, Roger Grosse, David Duvenaud, James Martens

Towards Mixed-initiative generation of multi-channel sequential structure
Anna Huang, Sherol Chen, Mark J. Nelson, Douglas Eck

Can Deep Reinforcement Learning Solve Erdos-Selfridge-Spencer Games?
Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg

GILBO: One Metric to Measure Them All
Alexander Alemi, Ian Fischer

HoME: a Household Multimodal Environment
Simon Brodeur, Ethan Perez, Ankesh Anand, Florian Golemo, Luca Celotti, Florian Strub, Jean Rouat, Hugo Larochelle, Aaron Courville

Learning to Learn without Labels
Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Learning via Social Awareness: Improving Sketch Representations with Facial Feedback
Natasha Jaques, Jesse Engel, David Ha, Fred Bertsch, Rosalind Picard, Douglas Eck

Negative Eigenvalues of the Hessian in Deep Neural Networks
Guillaume Alain, Nicolas Le Roux, Pierre-Antoine Manzagol

Realistic Evaluation of Semi-Supervised Learning Algorithms
Avital Oliver, Augustus Odena, Colin Raffel, Ekin Cubuk, lan Goodfellow

Winner's Curse? On Pace, Progress, and Empirical Rigor
D. Sculley, Jasper Snoek, Alex Wiltschko, Ali Rahimi

Meta-Learning for Batch Mode Active Learning
Sachin Ravi, Hugo Larochelle

To Prune, or Not to Prune: Exploring the Efficacy of Pruning for Model Compression
Michael Zhu, Suyog Gupta

Adversarial Spheres
Justin Gilmer, Luke Metz, Fartash Faghri, Sam Schoenholz, Maithra Raghu,,Martin Wattenberg, Ian Goodfellow

Clustering Meets Implicit Generative Models
Francesco Locatello, Damien Vincent, Ilya Tolstikhin, Gunnar Ratsch, Sylvain Gelly, Bernhard Scholkopf

Decoding Decoders: Finding Optimal Representation Spaces for Unsupervised Similarity Tasks
Vitalii Zhelezniak, Dan Busbridge, April Shen, Samuel L. Smith, Nils Y. Hammerla

Learning Longer-term Dependencies in RNNs with Auxiliary Losses
Trieu Trinh, Quoc Le, Andrew Dai, Thang Luong

Graph Partition Neural Networks for Semi-Supervised Classification
Alexander Gaunt, Danny Tarlow, Marc Brockschmidt, Raquel Urtasun, Renjie Liao, Richard Zemel

Searching for Activation Functions
Prajit Ramachandran, Barret Zoph, Quoc Le

Time-Dependent Representation for Neural Event Sequence Prediction
Yang Li, Nan Du, Samy Bengio

Faster Discovery of Neural Architectures by Searching for Paths in a Large Model
Hieu Pham, Melody Guan, Barret Zoph, Quoc V. Le, Jeff Dean

Intriguing Properties of Adversarial Examples
Ekin Dogus Cubuk, Barret Zoph, Sam Schoenholz, Quoc Le

PPP-Net: Platform-aware Progressive Search for Pareto-optimal Neural Architectures
Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, Min Sun

The Mirage of Action-Dependent Baselines in Reinforcement Learning
George Tucker, Surya Bhupatiraju, Shixiang Gu, Richard E. Turner, Zoubin Ghahramani, Sergey Levine

Learning to Organize Knowledge with N-Gram Machines
Fan Yang, Jiazhong Nie, William W. Cohen, Ni Lao

Online variance-reducing optimization
Nicolas Le Roux, Reza Babanezhad, Pierre-Antoine Manzagol