Tag Archives: deep learning

The Google Brain Team — Looking Back on 2017 (Part 2 of 2)



The Google Brain team works to advance the state of the art in artificial intelligence by research and systems engineering, as one part of the overall Google AI effort. In Part 1 of this blog post, we shared some of our work in 2017 related to our broader research, from designing new machine learning algorithms and techniques to understanding them, as well as sharing data, software, and hardware with the community. In this post, we’ll dive into the research we do in some specific domains such as healthcare, robotics, creativity, fairness and inclusion, as well as share a little more about us.

Healthcare
We feel there is enormous potential for the application of machine learning techniques to healthcare. We are doing work across many different kinds of problems, including assisting pathologists in detecting cancer, understanding medical conversations to assist doctors and patients, and using machine learning to tackle a wide variety of problems in genomics, including an open-source release of a highly accurate variant calling system based on deep learning.
A lymph node biopsy, where our algorithm correctly identifies the tumor and not the benign macrophage.
We have continued our work on early detection of diabetic retinopathy (DR) and macular edema, building on the research paper we published December 2016 in the Journal of the American Medical Association (JAMA). In 2017, we moved this project from research project to actual clinical impact. We partnered with Verily (a life sciences company within Alphabet) to guide this work through the regulatory process, and together we are incorporating this technology into Nikon's line of Optos ophthalmology cameras. In addition, we are working to deploy this system in India, where there is a shortage of 127,000 eye doctors and as a result, almost half of patients are diagnosed too late — after the disease has already caused vision loss. As a part of a pilot, we’ve launched this system to help graders at Aravind Eye Hospitals to better diagnose diabetic eye disease. We are also working with our partners to understand the human factors affecting diabetic eye care, from ethnographic studies of patients and healthcare providers, to investigations on how eye care clinicians interact with the AI-enabled system.
First patient screened (top) and Iniya Paramasivam, a trained grader, viewing the output of the system (bottom).
We have also teamed up with researchers at leading healthcare organizations and medical centers including Stanford, UCSF, and University of Chicago to demonstrate the effectiveness of using machine learning to predict medical outcomes from de-identified medical records (i.e. given the current state of a patient, we believe we can predict the future for a patient by learning from millions of other patients’ journeys, as a way of helping healthcare professionals make better decisions). We’re very excited about this avenue of work and we look to forward to telling you more about it in 2018.

Robotics
Our long-term goal in robotics is to design learning algorithms to allow robots to operate in messy, real-world environments and to quickly acquire new skills and capabilities via learning, rather than the carefully-controlled conditions and the small set of hand-programmed tasks that characterize today’s robots. One thrust of our research is on developing techniques for physical robots to use their own experience and those of other robots to build new skills and capabilities, pooling the shared experiences in order to learn collectively. We are also exploring ways in which we can combine computer-based simulations of robotic tasks with physical robotic experience to learn new tasks more rapidly. While the physics of the simulator don’t entirely match up with the real world, we have observed that for robotics, simulated experience plus a small amount of real-world experience gives significantly better results than even large amounts of real-world experience on its own.

In addition to real-world robotic experience and simulated robotic environments, we have developed robotic learning algorithms that can learn by observing human demonstrations of desired behaviors, and believe that this imitation learning approach is a highly promising way of imparting new abilities to robots very quickly, without explicit programming or even explicit specification of the goal of an activity. For example, below is a video of a robot learning to pour from a cup in just 15 minutes of real world experience by observing humans performing this task from different viewpoints and then trying to imitate the behavior. As we might be with our own three-year-old child, we’re encouraged that it only spills a little!

We also co-organized and hosted the first occurrence of the new Conference on Robot Learning (CoRL) in November to bring together researchers working at the intersection of machine learning and robotics. The summary of the event contains more information, and we look forward to next year’s occurrence of the conference in Zürich.

Basic Science
We are also excited about the long term potential of using machine learning to help solve important problems in science. Last year, we utilized neural networks for predicting molecular properties in quantum chemistry, finding new exoplanets in astronomical datasets, earthquake aftershock prediction, and used deep learning to guide automated proof systems.
A Message Passing Neural Network predicts quantum properties of an organic molecule
Finding a new exoplanet: observing brightness of stars when planets block their light. 
Creativity
We’re very interested in how to leverage machine learning as a tool to assist people in creative endeavors. This year, we created an AI piano duet tool, helped YouTube musician Andrew Huang create new music (see also the behind the scenes video with Nat & Friends), and showed how to teach machines to draw.
A garden drawn by the SketchRNN model; an interactive demo is available.
We also demonstrated how to control deep generative models running in the browser to create new music. This work won the NIPS 2017 Best Demo Award, making this the second year in a row that members of the Brain team’s Magenta project have won this award, following on our receipt of the NIPS 2016 Best Demo Award for Interactive musical improvisation with Magenta. In the YouTube video below, you can listen to one part of the demo, the MusicVAE variational autoencoder model morphing smoothly from one melody to another.
People + AI Research (PAIR) Initiative
Advances in machine learning offer entirely new possibilities for how people might interact with computers. At the same time, it’s critical to make sure that society can broadly benefit from the technology we’re building. We see these opportunities and challenges as an urgent matter, and teamed up with a number of people throughout Google to create the People + AI Research (PAIR) initiative.

PAIR’s goal is to study and design the most effective ways for people to interact with AI systems. We kicked off the initiative with a public symposium bringing together academics and practitioners across disciplines ranging from computer science, design, and even art. PAIR works on a wide range of topics, some of which we’ve already mentioned: helping researchers understand ML systems through work on interpretability and expanding the community of developers with deeplearn.js. Another example of our human-centered approach to ML engineering is the launch of Facets, a tool for visualizing and understanding training datasets.
Facets provides insights into your training datasets.
Fairness and Inclusion in Machine Learning
As ML plays an increasing role in technology, considerations of inclusivity and fairness grow in importance. The Brain team and PAIR have been working hard to make progress in these areas. We’ve published on how to avoid discrimination in ML systems via causal reasoning, the importance of geodiversity in open datasets, and posted an analysis of an open dataset to understand diversity and cultural differences. We’ve also been working closely with the Partnership on AI, a cross-industry initiative, to help make sure that fairness and inclusion are promoted as goals for all ML practitioners.

Cultural differences can surface in training data even in objects as “universal” as chairs, as observed in these doodle patterns on the left. The chart on the right shows how we uncovered geo-location biases in standard open source data sets such as ImageNet. Undetected or uncorrected, such biases may strongly influence model behavior.
We made this video in collaboration with our colleagues at Google Creative Lab as a non-technical introduction to some of the issues in this area.
Our Culture
One aspect of our group’s research culture is to empower researchers and engineers to tackle the basic research problems that they view as most important. In September, we posted about our general approach to conducting research. Educating and mentoring young researchers is something we do through our research efforts. Our group hosted over 100 interns last year, and roughly 25% of our research publications in 2017 have intern co-authors. In 2016, we started the Google Brain Residency, a program for mentoring people who wanted to learn to do machine learning research. In the inaugural year (June 2016 to May 2017), 27 residents joined our group, and we posted updates about the first year of the program in halfway through and just after the end highlighting the research accomplishments of the residents. Many of the residents in the first year of the program have stayed on in our group as full-time researchers and research engineers, and most of those that did not have gone on to Ph.D. programs at top machine learning graduate programs like Berkeley, CMU, Stanford, NYU and Toronto. In July, 2017, we also welcomed our second cohort of 35 residents, who will be with us until July, 2018, and they’ve already done some exciting research and published at numerous research venues. We’ve now broadened the program to include many other research groups across Google and renamed it the Google AI Residency program (the application deadline for this year's program has just passed; look for information about next year's program at g.co/airesidency/apply).

Our work in 2017 spanned more than we’ve highlighted on in this two-part blog post. We believe in publishing our work in top research venues, and last year our group published 140 papers, including more than 60 at ICLR, ICML, and NIPS. To learn more about our work, you can peruse our research papers.

You can also meet some of our team members in this video, or read our responses to our second Ask Me Anything (AMA) post on r/MachineLearning (and check out the 2016’s AMA, too).

The Google Brain team is becoming more spread out, with team members across North America and Europe. If the work we’re doing sounds interesting and you’d like to join us, you can see our open positions and apply for internships, the AI Residency program, visiting faculty, or full-time research or engineering roles using the links at the bottom of g.co/brain. You can also follow our work throughout 2018 here on the Google Research blog, or on Twitter at @GoogleResearch. You can also follow my personal account at @JeffDean.

Thanks for reading!

The Google Brain Team — Looking Back on 2017 (Part 1 of 2)



The Google Brain team works to advance the state of the art in artificial intelligence by research and systems engineering, as one part of the overall Google AI effort. Last year we shared a summary of our work in 2016. Since then, we’ve continued to make progress on our long-term research agenda of making machines intelligent, and have collaborated with a number of teams across Google and Alphabet to use the results of our research to improve people’s lives. This first of two posts will highlight some of our work in 2017, including some of our basic research work, as well as updates on open source software, datasets, and new hardware for machine learning. In the second post we’ll dive into the research we do in specific domains where machine learning can have a large impact, such as healthcare, robotics, and some areas of basic science, as well as cover our work on creativity, fairness and inclusion and tell you a bit more about who we are.

Core Research
A significant focus of our team is pursuing research that advances our understanding and improves our ability to solve new problems in the field of machine learning. Below are several themes from our research last year.

AutoML
The goal of automating machine learning is to develop techniques for computers to solve new machine learning problems automatically, without the need for human machine learning experts to intervene on every new problem. If we’re ever going to have truly intelligent systems, this is a fundamental capability that we will need. We developed new approaches for designing neural network architectures using both reinforcement learning and evolutionary algorithms, scaled this work to state-of-the-art results on ImageNet classification and detection, and also showed how to learn new optimization algorithms and effective activation functions automatically. We are actively working with our Cloud AI team to bring this technology into the hands of Google customers, as well as continuing to push the research in many directions.
Convolutional architecture discovered by Neural Architecture Search
Object detection with a network discovered by AutoML
Speech Understanding and Generation
Another theme is on developing new techniques that improve the ability of our computing systems to understand and generate human speech, including our collaboration with the speech team at Google to develop a number of improvements for an end-to-end approach to speech recognition, which reduces the relative word error rate over Google’s production speech recognition system by 16%. One nice aspect of this work is that it required many separate threads of research to come together (which you can find on Arxiv: 1, 2, 3, 4, 5, 6, 7, 8, 9).
Components of the Listen-Attend-Spell end-to-end model for speech recognition
We also collaborated with our research colleagues on Google’s Machine Perception team to develop a new approach for performing text-to-speech generation (Tacotron 2) that dramatically improves the quality of the generated speech. This model achieves a mean opinion score (MOS) of 4.53 compared to a MOS of 4.58 for professionally recorded speech like you might find in an audiobook, and 4.34 for the previous best computer-generated speech system. You can listen for yourself.
Tacotron 2’s model architecture
New Machine Learning Algorithms and Approaches
We continued to develop novel machine learning algorithms and approaches, including work on capsules (which explicitly look for agreement in activated features as a way of evaluating many different noisy hypotheses when performing visual tasks), sparsely-gated mixtures of experts (which enable very large models that are still computational efficient), hypernetworks (which use the weights of one model to generate weights for another model), new kinds of multi-modal models (which perform multi-task learning across audio, visual, and textual inputs in the same model), attention-based mechanisms (as an alternative to convolutional and recurrent models), symbolic and non-symbolic learned optimization methods, a technique to back-propagate through discrete variables, and a few new reinforcement learning algorithmic improvements.

Machine Learning for Computer Systems
The use of machine learning to replace traditional heuristics in computer systems also greatly interests us. We have shown how to use reinforcement learning to make placement decisions for mapping computational graphs onto a set of computational devices that are better than human experts. With other colleagues in Google Research, we have shown in “The Case for Learned Index Structures” that neural networks can be both faster and much smaller than traditional data structures such as B-trees, hash tables, and Bloom filters. We believe that we are just scratching the surface in terms of the use of machine learning in core computer systems, as outlined in a NIPS workshop talk on Machine Learning for Systems and Systems for Machine Learning.
Learned Models as Index Structures
Privacy and Security
Machine learning and its interactions with security and privacy continue to be major research foci for us. We showed that machine learning techniques can be applied in a way that provides differential privacy guarantees, in a paper that received one of the best paper awards at ICLR 2017. We also continued our investigation into the properties of adversarial examples, including demonstrating adversarial examples in the physical world, and how to harness adversarial examples at scale during the training process to make models more robust to adversarial examples.

Understanding Machine Learning Systems
While we have seen impressive results with deep learning, it is important to understand why it works, and when it won’t. In another one of the best paper awards of ICLR 2017, we showed that current machine learning theoretical frameworks fail to explain the impressive results of deep learning approaches. We also showed that the “flatness” of minima found by optimization methods is not as closely linked to good generalization as initially thought. In order to better understand how training proceeds in deep architectures, we published a series of papers analyzing random matrices, as they are the starting point of most training approaches. Another important avenue to understand deep learning is to better measure their performance. We showed the importance of good experimental design and statistical rigor in a recent study comparing many GAN approaches that found many popular enhancements to generative models do not actually improve performance. We hope this study will give an example for other researchers to follow in making robust experimental studies.

We are developing methods that allow better interpretability of machine learning systems. And in March, in collaboration with OpenAI, DeepMind, YC Research and others, we announced the launch of Distill, a new online open science journal dedicated to supporting human understanding of machine learning. It has gained a reputation for clear exposition of machine learning concepts and for excellent interactive visualization tools in its articles. In its first year, Distill has published many illuminating articles aimed at understanding the inner working of various machine learning techniques, and we look forward to the many more sure to come in 2018.
Feature Visualization
How to Use t-SNE effectively
Open Datasets for Machine Learning Research
Open datasets like MNIST, CIFAR-10, ImageNet, SVHN, and WMT have pushed the field of machine learning forward tremendously. Our team and Google Research as a whole have been active in open-sourcing interesting new datasets for open machine learning research over the past year or so, by providing access to more large labeled datasets including:
Examples from the YouTube-Bounding Boxes dataset: Video segments sampled at 1 frame per second, with bounding boxes successfully identified around the items of interest.
TensorFlow and Open Source Software
A map showing the broad distribution of TensorFlow users (source)
Throughout our team’s history, we have built tools that help us to conduct machine learning research and deploy machine learning systems in Google’s many products. In November 2015, we open-sourced our second-generation machine learning framework, TensorFlow, with the hope of allowing the machine learning community as a whole to benefit from our investment in machine learning software tools. In February, we released TensorFlow 1.0, and in November, we released v1.4 with these significant additions: Eager execution for interactive imperative-style programming, XLA, an optimizing compiler for TensorFlow programs, and TensorFlow Lite, a lightweight solution for mobile and embedded devices. The pre-compiled TensorFlow binaries have now been downloaded more than 10 million times in over 180 countries, and the source code on GitHub now has more than 1,200 contributors.

In February, we hosted the first ever TensorFlow Developer Summit, with over 450 people attending live in Mountain View and more than 6,500 watching on live streams around the world, including at more than 85 local viewing events in 35 countries. All talks were recorded, with topics ranging from new features, techniques for using TensorFlow, or detailed looks under the hoods at low-level TensorFlow abstractions. We’ll be hosting another TensorFlow Developer Summit on March 30, 2018 in the Bay Area. Sign up now to save the date and stay updated on the latest news.
This rock-paper-scissors science experiment is a novel use of TensorFlow. We’ve been excited by the wide variety of uses of TensorFlow we saw in 2017, including automating cucumber sorting, finding sea cows in aerial imagery, sorting diced potatoes to make safer baby food, identifying skin cancer, helping to interpret bird call recordings in a New Zealand bird sanctuary, and identifying diseased plants in the most popular root crop on Earth in Tanzania!
In November, TensorFlow celebrated its second anniversary as an open-source project. It has been incredibly rewarding to see a vibrant community of TensorFlow developers and users emerge. TensorFlow is the #1 machine learning platform on GitHub and one of the top five repositories on GitHub overall, used by many companies and organizations, big and small, with more than 24,500 distinct repositories on GitHub related to TensorFlow. Many research papers are now published with open-source TensorFlow implementations to accompany the research results, enabling the community to more easily understand the exact methods used and to reproduce or extend the work.

TensorFlow has also benefited from other Google Research teams open-sourcing related work, including TF-GAN, a lightweight library for generative adversarial models in TensorFlow, TensorFlow Lattice, a set of estimators for working with lattice models, as well as the TensorFlow Object Detection API. The TensorFlow model repository continues to grow with an ever-widening set of models.

In addition to TensorFlow, we released deeplearn.js, an open-source hardware-accelerated implementation of deep learning APIs right in the browser (with no need to download or install anything). The deeplearn.js homepage has a number of great examples, including Teachable Machine, a computer vision model you train using your webcam, and Performance RNN, a real-time neural-network based piano composition and performance demonstration. We’ll be working in 2018 to make it possible to deploy TensorFlow models directly into the deeplearn.js environment.

TPUs
Cloud TPUs deliver up to 180 teraflops of machine learning acceleration
About five years ago, we recognized that deep learning would dramatically change the kinds of hardware we would need. Deep learning computations are very computationally intensive, but they have two special properties: they are largely composed of dense linear algebra operations (matrix multiples, vector operations, etc.), and they are very tolerant of reduced precision. We realized that we could take advantage of these two properties to build specialized hardware that can run neural network computations very efficiently. We provided design input to Google’s Platforms team and they designed and produced our first generation Tensor Processing Unit (TPU): a single-chip ASIC designed to accelerate inference for deep learning models (inference is the use of an already-trained neural network, and is distinct from training). This first-generation TPU has been deployed in our data centers for three years, and it has been used to power deep learning models on every Google Search query, for Google Translate, for understanding images in Google Photos, for the AlphaGo matches against Lee Sedol and Ke Jie, and for many other research and product uses. In June, we published a paper at ISCA 2017, showing that this first-generation TPU was 15X - 30X faster than its contemporary GPU or CPU counterparts, with performance/Watt about 30X - 80X better.
Cloud TPU Pods deliver up to 11.5 petaflops of machine learning acceleration
Experiments with ResNet-50 training on ImageNet show near-perfect speed-up as the number of TPU devices used increases.
Inference is important, but accelerating the training process is an even more important problem - and also much harder. The faster researchers can try a new idea, the more breakthroughs we can make. Our second-generation TPU, announced at Google I/O in May, is a whole system (custom ASIC chips, board and interconnect) that is designed to accelerate both training and inference, and we showed a single device configuration as well as a multi-rack deep learning supercomputer configuration called a TPU Pod. We announced that these second generation devices will be offered on the Google Cloud Platform as Cloud TPUs. We also unveiled the TensorFlow Research Cloud (TFRC), a program to provide top ML researchers who are committed to sharing their work with the world to access a cluster of 1,000 Cloud TPUs for free. In December, we presented work showing that we can train a ResNet-50 ImageNet model to a high level of accuracy in 22 minutes on a TPU Pod as compared to days or longer on a typical workstation. We think lowering research turnaround times in this fashion will dramatically increase the productivity of machine learning teams here at Google and at all of the organizations that use Cloud TPUs. If you’re interested in Cloud TPUs, TPU Pods, or the TensorFlow Research Cloud, you can sign up to learn more at g.co/tpusignup. We’re excited to enable many more engineers and researchers to use TPUs in 2018!

Thanks for reading!

(In part 2 we’ll discuss our research in the application of machine learning to domains like healthcare, robotics, different fields of science, and creativity, as well as cover our work on fairness and inclusion.)

Tacotron 2: Generating Human-like Speech from Text



Generating very natural sounding speech from text (text-to-speech, TTS) has been a research goal for decades. There has been great progress in TTS research over the last few years and many individual pieces of a complete TTS system have greatly improved. Incorporating ideas from past work such as Tacotron and WaveNet, we added more improvements to end up with our new system, Tacotron 2. Our approach does not use complex linguistic and acoustic features as input. Instead, we generate human-like speech from text using neural networks trained using only speech examples and corresponding text transcripts.

A full description of our new system can be found in our paper “Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions.” In a nutshell it works like this: We use a sequence-to-sequence model optimized for TTS to map a sequence of letters to a sequence of features that encode the audio. These features, an 80-dimensional audio spectrogram with frames computed every 12.5 milliseconds, capture not only pronunciation of words, but also various subtleties of human speech, including volume, speed and intonation. Finally these features are converted to a 24 kHz waveform using a WaveNet-like architecture.
A detailed look at Tacotron 2's model architecture. The lower half of the image describes the sequence-to-sequence model that maps a sequence of letters to a spectrogram. For technical details, please refer to the paper.
You can listen to some of the Tacotron 2 audio samples that demonstrate the results of our state-of-the-art TTS system. In an evaluation where we asked human listeners to rate the naturalness of the generated speech, we obtained a score that was comparable to that of professional recordings.

While our samples sound great, there are still some difficult problems to be tackled. For example, our system has difficulties pronouncing complex words (such as “decorum” and “merlot”), and in extreme cases it can even randomly generate strange noises. Also, our system cannot yet generate audio in realtime. Furthermore, we cannot yet control the generated speech, such as directing it to sound happy or sad. Each of these is an interesting research problem on its own.

Acknowledgements
Jonathan Shen, Ruoming Pang, Ron J. Weiss, Mike Schuster, Navdeep Jaitly, Zongheng Yang, Zhifeng Chen, Yu Zhang, Yuxuan Wang, RJ Skerry-Ryan, Rif A. Saurous, Yannis Agiomyrgiannakis, Yonghui Wu, Sound Understanding team, TTS Research team, and TensorFlow team.

Improving End-to-End Models For Speech Recognition



Traditional automatic speech recognition (ASR) systems, used for a variety of voice search applications at Google, are comprised of an acoustic model (AM), a pronunciation model (PM) and a language model (LM), all of which are independently trained, and often manually designed, on different datasets [1]. AMs take acoustic features and predict a set of subword units, typically context-dependent or context-independent phonemes. Next, a hand-designed lexicon (the PM) maps a sequence of phonemes produced by the acoustic model to words. Finally, the LM assigns probabilities to word sequences. Training independent components creates added complexities and is suboptimal compared to training all components jointly. Over the last several years, there has been a growing popularity in developing end-to-end systems, which attempt to learn these separate components jointly as a single system. While these end-to-end models have shown promising results in the literature [2, 3], it is not yet clear if such approaches can improve on current state-of-the-art conventional systems.

Today we are excited to share “State-of-the-art Speech Recognition With Sequence-to-Sequence Models [4],” which describes a new end-to-end model that surpasses the performance of a conventional production system [1]. We show that our end-to-end system achieves a word error rate (WER) of 5.6%, which corresponds to a 16% relative improvement over a strong conventional system which achieves a 6.7% WER. Additionally, the end-to-end model used to output the initial word hypothesis, before any hypothesis rescoring, is 18 times smaller than the conventional model, as it contains no separate LM and PM.

Our system builds on the Listen-Attend-Spell (LAS) end-to-end architecture, first presented in [2]. The LAS architecture consists of 3 components. The listener encoder component, which is similar to a standard AM, takes the a time-frequency representation of the input speech signal, x, and uses a set of neural network layers to map the input to a higher-level feature representation, henc. The output of the encoder is passed to an attender, which uses henc to learn an alignment between input features x and predicted subword units {yn, … y0}, where each subword is typically a grapheme or wordpiece. Finally, the output of the attention module is passed to the speller (i.e., decoder), similar to an LM, that produces a probability distribution over a set of hypothesized words.
Components of the LAS End-to-End Model.
All components of the LAS model are trained jointly as a single end-to-end neural network, instead of as separate modules like conventional systems, making it much simpler.
Additionally, because the LAS model is fully neural, there is no need for external, manually designed components such as finite state transducers, a lexicon, or text normalization modules. Finally, unlike conventional models, training end-to-end models does not require bootstrapping from decision trees or time alignments generated from a separate system, and can be trained given pairs of text transcripts and the corresponding acoustics.

In [4], we introduce a variety of novel structural improvements, including improving the attention vectors passed to the decoder and training with longer subword units (i.e., wordpieces). In addition, we also introduce numerous optimization improvements for training, including the use of minimum word error rate training [5]. These structural and optimization improvements are what accounts for obtaining the 16% relative improvement over the conventional model.

Another exciting potential application for this research is multi-dialect and multi-lingual systems, where the simplicity of optimizing a single neural network makes such a model very attractive. Here data for all dialects/languages can be combined to train one network, without the need for a separate AM, PM and LM for each dialect/language. We find that these models work well on 7 english dialects [6] and 9 Indian languages [7], while outperforming a model trained separately on each individual language/dialect.

While we are excited by our results, our work is not done. Currently, these models cannot process speech in real time [8, 9], which is a strong requirement for latency-sensitive applications such as voice search. In addition, these models still compare negatively to production when evaluated on live production data. Furthermore, our end-to-end model is learned on 22,000 audio-text pair utterances compared to a conventional system that is typically trained on significantly larger corpora. In addition, our proposed model is not able to learn proper spellings for rarely used words such as proper nouns, which is normally performed with a hand-designed PM. Our ongoing efforts are focused now on addressing these challenges.

Acknowledgements
This work was done as a strong collaborative effort between Google Brain and Speech teams. Contributors include Tara Sainath, Rohit Prabhavalkar, Bo Li, Kanishka Rao, Shankar Kumar, Shubham Toshniwal, Michiel Bacchiani and Johan Schalkwyk from the Speech team; as well as Yonghui Wu, Patrick Nguyen, Zhifeng Chen, Chung-cheng Chiu, Anjuli Kannan, Ron Weiss and Navdeep Jaitly from the Google Brain team. The work is described in more detail in papers [4-11]

References
[1] G. Pundak and T. N. Sainath, “Lower Frame Rate Neural Network Acoustic Models," in Proc. Interspeech, 2016.

[2] W. Chan, N. Jaitly, Q. V. Le, and O. Vinyals, “Listen, attend and spell,” CoRR, vol. abs/1508.01211, 2015

[3] R. Prabhavalkar, K. Rao, T. N. Sainath, B. Li, L. Johnson, and N. Jaitly, “A Comparison of Sequence-to-sequence Models for Speech Recognition,” in Proc. Interspeech, 2017.

[4] C.C. Chiu, T.N. Sainath, Y. Wu, R. Prabhavalkar, P. Nguyen, Z. Chen, A. Kannan, R.J. Weiss, K. Rao, K. Gonina, N. Jaitly, B. Li, J. Chorowski and M. Bacchiani, “State-of-the-art Speech Recognition With Sequence-to-Sequence Models,” submitted to ICASSP 2018.

[5] R. Prabhavalkar, T.N. Sainath, Y. Wu, P. Nguyen, Z. Chen, C.C. Chiu and A. Kannan, “Minimum Word Error Rate Training for Attention-based Sequence-to-Sequence Models,” submitted to ICASSP 2018.

[6] B. Li, T.N. Sainath, K. Sim, M. Bacchiani, E. Weinstein, P. Nguyen, Z. Chen, Y. Wu and K. Rao, “Multi-Dialect Speech Recognition With a Single Sequence-to-Sequence Model” submitted to ICASSP 2018.

[7] S. Toshniwal, T.N. Sainath, R.J. Weiss, B. Li, P. Moreno, E. Weinstein and K. Rao, “End-to-End Multilingual Speech Recognition using Encoder-Decoder Models”, submitted to ICASSP 2018.

[8] T.N. Sainath, C.C. Chiu, R. Prabhavalkar, A. Kannan, Y. Wu, P. Nguyen and Z. Chen, “Improving the Performance of Online Neural Transducer Models”, submitted to ICASSP 2018.

[9] D. Lawson*, C.C. Chiu*, G. Tucker*, C. Raffel, K. Swersky, N. Jaitly. “Learning Hard Alignments with Variational Inference”, submitted to ICASSP 2018.

[10] T.N. Sainath, R. Prabhavalkar, S. Kumar, S. Lee, A. Kannan, D. Rybach, V. Schogol, P. Nguyen, B. Li, Y. Wu, Z. Chen and C.C. Chiu, “No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models,” submitted to ICASSP 2018.

[11] A. Kannan, Y. Wu, P. Nguyen, T.N. Sainath, Z. Chen and R. Prabhavalkar. “An Analysis of Incorporating an External Language Model into a Sequence-to-Sequence Model,” submitted to ICASSP 2018.

DeepVariant: Highly Accurate Genomes With Deep Neural Networks

Crossposted on the Google Research Blog

Across many scientific disciplines, but in particular in the field of genomics, major breakthroughs have often resulted from new technologies. From Sanger sequencing, which made it possible to sequence the human genome, to the microarray technologies that enabled the first large-scale genome-wide experiments, new instruments and tools have allowed us to look ever more deeply into the genome and apply the results broadly to health, agriculture and ecology.

One of the most transformative new technologies in genomics was high-throughput sequencing (HTS), which first became commercially available in the early 2000s. HTS allowed scientists and clinicians to produce sequencing data quickly, cheaply, and at scale. However, the output of HTS instruments is not the genome sequence for the individual being analyzed — for humans this is 3 billion paired bases (guanine, cytosine, adenine and thymine) organized into 23 pairs of chromosomes. Instead, these instruments generate ~1 billion short sequences, known as reads. Each read represents just 100 of the 3 billion bases, and per-base error rates range from 0.1-10%. Processing the HTS output into a single, accurate and complete genome sequence is a major outstanding challenge. The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.

CAPTION: For any given location in the genome, there are multiple reads among the ~1 billion that include a base at that position. Each read is aligned to a reference, and then each of the bases in the read is compared to the base of the reference at that location. When a read includes a base that differs from the reference, it may indicate a variant (a difference in the true sequence), or it may be an error.

Today, we announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods. This work is the product of more than two years of research by the Google Brain team, in collaboration with Verily Life Sciences. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google's existing technology and expertise.

CAPTION: Each of the four images above is a visualization of actual sequencer reads aligned to a reference genome. A key question is how to use the reads to determine whether there is a variant on both chromosomes, on just one chromosome, or on neither chromosome. There is more than one type of variant, with SNPs and insertions/deletions being the most common. A: a true SNP on one chromosome pair, B: a deletion on one chromosome, C: a deletion on both chromosomes, D: a false variant caused by errors. It's easy to see that these look quite distinct when visualized in this manner.

We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.


DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.

DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.

By Mark DePristo and Ryan Poplin, Google Brain Team

DeepVariant: Highly Accurate Genomes With Deep Neural Networks

Crossposted on the Google Research Blog

Across many scientific disciplines, but in particular in the field of genomics, major breakthroughs have often resulted from new technologies. From Sanger sequencing, which made it possible to sequence the human genome, to the microarray technologies that enabled the first large-scale genome-wide experiments, new instruments and tools have allowed us to look ever more deeply into the genome and apply the results broadly to health, agriculture and ecology.

One of the most transformative new technologies in genomics was high-throughput sequencing (HTS), which first became commercially available in the early 2000s. HTS allowed scientists and clinicians to produce sequencing data quickly, cheaply, and at scale. However, the output of HTS instruments is not the genome sequence for the individual being analyzed — for humans this is 3 billion paired bases (guanine, cytosine, adenine and thymine) organized into 23 pairs of chromosomes. Instead, these instruments generate ~1 billion short sequences, known as reads. Each read represents just 100 of the 3 billion bases, and per-base error rates range from 0.1-10%. Processing the HTS output into a single, accurate and complete genome sequence is a major outstanding challenge. The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.

CAPTION: For any given location in the genome, there are multiple reads among the ~1 billion that include a base at that position. Each read is aligned to a reference, and then each of the bases in the read is compared to the base of the reference at that location. When a read includes a base that differs from the reference, it may indicate a variant (a difference in the true sequence), or it may be an error.

Today, we announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods. This work is the product of more than two years of research by the Google Brain team, in collaboration with Verily Life Sciences. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google's existing technology and expertise.

CAPTION: Each of the four images above is a visualization of actual sequencer reads aligned to a reference genome. A key question is how to use the reads to determine whether there is a variant on both chromosomes, on just one chromosome, or on neither chromosome. There is more than one type of variant, with SNPs and insertions/deletions being the most common. A: a true SNP on one chromosome pair, B: a deletion on one chromosome, C: a deletion on both chromosomes, D: a false variant caused by errors. It's easy to see that these look quite distinct when visualized in this manner.

We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.


DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.

DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.

By Mark DePristo and Ryan Poplin, Google Brain Team

DeepVariant: Highly Accurate Genomes With Deep Neural Networks

Crossposted on the Google Research Blog

Across many scientific disciplines, but in particular in the field of genomics, major breakthroughs have often resulted from new technologies. From Sanger sequencing, which made it possible to sequence the human genome, to the microarray technologies that enabled the first large-scale genome-wide experiments, new instruments and tools have allowed us to look ever more deeply into the genome and apply the results broadly to health, agriculture and ecology.

One of the most transformative new technologies in genomics was high-throughput sequencing (HTS), which first became commercially available in the early 2000s. HTS allowed scientists and clinicians to produce sequencing data quickly, cheaply, and at scale. However, the output of HTS instruments is not the genome sequence for the individual being analyzed — for humans this is 3 billion paired bases (guanine, cytosine, adenine and thymine) organized into 23 pairs of chromosomes. Instead, these instruments generate ~1 billion short sequences, known as reads. Each read represents just 100 of the 3 billion bases, and per-base error rates range from 0.1-10%. Processing the HTS output into a single, accurate and complete genome sequence is a major outstanding challenge. The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.

CAPTION: For any given location in the genome, there are multiple reads among the ~1 billion that include a base at that position. Each read is aligned to a reference, and then each of the bases in the read is compared to the base of the reference at that location. When a read includes a base that differs from the reference, it may indicate a variant (a difference in the true sequence), or it may be an error.

Today, we announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods. This work is the product of more than two years of research by the Google Brain team, in collaboration with Verily Life Sciences. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google's existing technology and expertise.

CAPTION: Each of the four images above is a visualization of actual sequencer reads aligned to a reference genome. A key question is how to use the reads to determine whether there is a variant on both chromosomes, on just one chromosome, or on neither chromosome. There is more than one type of variant, with SNPs and insertions/deletions being the most common. A: a true SNP on one chromosome pair, B: a deletion on one chromosome, C: a deletion on both chromosomes, D: a false variant caused by errors. It's easy to see that these look quite distinct when visualized in this manner.

We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.


DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.

DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.

By Mark DePristo and Ryan Poplin, Google Brain Team

DeepVariant: Highly Accurate Genomes With Deep Neural Networks



(Crossposted on the Google Open Source Blog)

Across many scientific disciplines, but in particular in the field of genomics, major breakthroughs have often resulted from new technologies. From Sanger sequencing, which made it possible to sequence the human genome, to the microarray technologies that enabled the first large-scale genome-wide experiments, new instruments and tools have allowed us to look ever more deeply into the genome and apply the results broadly to health, agriculture and ecology.

One of the most transformative new technologies in genomics was high-throughput sequencing (HTS), which first became commercially available in the early 2000s. HTS allowed scientists and clinicians to produce sequencing data quickly, cheaply, and at scale. However, the output of HTS instruments is not the genome sequence for the individual being analyzed — for humans this is 3 billion paired bases (guanine, cytosine, adenine and thymine) organized into 23 pairs of chromosomes. Instead, these instruments generate ~1 billion short sequences, known as reads. Each read represents just 100 of the 3 billion bases, and per-base error rates range from 0.1-10%. Processing the HTS output into a single, accurate and complete genome sequence is a major outstanding challenge. The importance of this problem, for biomedical applications in particular, has motivated efforts such as the Genome in a Bottle Consortium (GIAB), which produces high confidence human reference genomes that can be used for validation and benchmarking, as well as the precisionFDA community challenges, which are designed to foster innovation that will improve the quality and accuracy of HTS-based genomic tests.
For any given location in the genome, there are multiple reads among the ~1 billion that include a base at that position. Each read is aligned to a reference, and then each of the bases in the read is compared to the base of the reference at that location. When a read includes a base that differs from the reference, it may indicate a variant (a difference in the true sequence), or it may be an error.
Today, we announce the open source release of DeepVariant, a deep learning technology to reconstruct the true genome sequence from HTS sequencer data with significantly greater accuracy than previous classical methods. This work is the product of more than two years of research by the Google Brain team, in collaboration with Verily Life Sciences. DeepVariant transforms the task of variant calling, as this reconstruction problem is known in genomics, into an image classification problem well-suited to Google's existing technology and expertise.
Each of the four images above is a visualization of actual sequencer reads aligned to a reference genome. A key question is how to use the reads to determine whether there is a variant on both chromosomes, on just one chromosome, or on neither chromosome. There is more than one type of variant, with SNPs and insertions/deletions being the most common. A: a true SNP on one chromosome pair, B: a deletion on one chromosome, C: a deletion on both chromosomes, D: a false variant caused by errors. It's easy to see that these look quite distinct when visualized in this manner.
We started with GIAB reference genomes, for which there is high-quality ground truth (or the closest approximation currently possible). Using multiple replicates of these genomes, we produced tens of millions of training examples in the form of multi-channel tensors encoding the HTS instrument data, and then trained a TensorFlow-based image classification model to identify the true genome sequence from the experimental data produced by the instruments. Although the resulting deep learning model, DeepVariant, had no specialized knowledge about genomics or HTS, within a year it had won the the highest SNP accuracy award at the precisionFDA Truth Challenge, outperforming state-of-the-art methods. Since then, we've further reduced the error rate by more than 50%.
DeepVariant is being released as open source software to encourage collaboration and to accelerate the use of this technology to solve real world problems. To further this goal, we partnered with Google Cloud Platform (GCP) to deploy DeepVariant workflows on GCP, available today, in configurations optimized for low-cost and fast turnarounds using scalable GCP technologies like the Pipelines API. This paired set of releases provides a smooth ramp for users to explore and evaluate the capabilities of DeepVariant in their current compute environment while providing a scalable, cloud-based solution to satisfy the needs of even the largest genomics datasets.

DeepVariant is the first of what we hope will be many contributions that leverage Google's computing infrastructure and ML expertise to both better understand the genome and to provide deep learning-based genomics tools to the community. This is all part of a broader goal to apply Google technologies to healthcare and other scientific applications, and to make the results of these efforts broadly accessible.

Google at NIPS 2017



This week, Long Beach, California hosts the 31st annual Conference on Neural Information Processing Systems (NIPS 2017), a machine learning and computational neuroscience conference that includes invited talks, demonstrations and presentations of some of the latest in machine learning research. Google will have a strong presence at NIPS 2017, with over 450 Googlers attending to contribute to, and learn from, the broader academic research community via technical talks and posters, workshops, competitions and tutorials.

Google is at the forefront of machine learning, actively exploring virtually all aspects of the field from classical algorithms to deep learning and more. Focusing on both theory and application, much of our work on language understanding, speech, translation, visual processing, and prediction relies on state-of-the-art techniques that push the boundaries of what is possible. In all of those tasks and many others, we develop learning approaches to understand and generalize, providing us with new ways of looking at old problems and helping transform how we work and live.

If you are attending NIPS 2017, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented in the list below (Googlers highlighted in blue).

Google is a Platinum Sponsor of NIPS 2017.

Organizing Committee
Program Chair: Samy Bengio
Senior Area Chairs include: Corinna Cortes, Dale Schuurmans, Hugo Larochelle
Area Chairs include: Afshin Rostamizadeh, Amir Globerson, Been Kim, D. Sculley, Dumitru Erhan, Gal Chechik, Hartmut Neven, Honglak Lee, Ian Goodfellow, Jasper Snoek, John Wright, Jon Shlens, Kun Zhang, Lihong Li, Maya Gupta, Moritz Hardt, Navdeep Jaitly, Ryan Adams, Sally Goldman, Sanjiv Kumar, Surya Ganguli, Tara Sainath, Umar Syed, Viren Jain, Vitaly Kuznetsov

Invited Talk
Powering the next 100 years
John Platt

Accepted Papers
A Meta-Learning Perspective on Cold-Start Recommendations for Items
Manasi Vartak, Hugo Larochelle, Arvind Thiagarajan

AdaGAN: Boosting Generative Models
Ilya Tolstikhin, Sylvain Gelly, Olivier Bousquet, Carl-Johann Simon-Gabriel, Bernhard Schölkopf

Deep Lattice Networks and Partial Monotonic Functions
Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya Gupta

From which world is your graph
Cheng Li, Varun Kanade, Felix MF Wong, Zhenming Liu

Hiding Images in Plain Sight: Deep Steganography
Shumeet Baluja

Improved Graph Laplacian via Geometric Self-Consistency
Dominique Joncas, Marina Meila, James McQueen

Model-Powered Conditional Independence Test
Rajat Sen, Ananda Theertha Suresh, Karthikeyan Shanmugam, Alexandros Dimakis, Sanjay Shakkottai

Nonlinear random matrix theory for deep learning
Jeffrey Pennington, Pratik Worah

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice
Jeffrey Pennington, Samuel Schoenholz, Surya Ganguli

SGD Learns the Conjugate Kernel Class of the Network
Amit Daniely

SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability
Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein

Learning Hierarchical Information Flow with Recurrent Neural Modules
Danijar Hafner, Alexander Irpan, James Davidson, Nicolas Heess

Online Learning with Transductive Regret
Scott Yang, Mehryar Mohri

Acceleration and Averaging in Stochastic Descent Dynamics
Walid Krichene, Peter Bartlett

Parameter-Free Online Learning via Model Selection
Dylan J Foster, Satyen Kale, Mehryar Mohri, Karthik Sridharan

Dynamic Routing Between Capsules
Sara Sabour, Nicholas Frosst, Geoffrey E Hinton

Modulating early visual processing by language
Harm de Vries, Florian Strub, Jeremie Mary, Hugo Larochelle, Olivier Pietquin, Aaron C Courville

MarrNet: 3D Shape Reconstruction via 2.5D Sketches
Jiajun Wu, Yifan Wang, Tianfan Xue, Xingyuan Sun, Bill Freeman, Josh Tenenbaum

Affinity Clustering: Hierarchical Clustering at Scale
Mahsa Derakhshan, Soheil Behnezhad, Mohammadhossein Bateni, Vahab Mirrokni, MohammadTaghi Hajiaghayi, Silvio Lattanzi, Raimondas Kiveris

Asynchronous Parallel Coordinate Minimization for MAP Inference
Ofer Meshi, Alexander Schwing

Cold-Start Reinforcement Learning with Softmax Policy Gradient
Nan Ding, Radu Soricut

Filtering Variational Objectives
Chris J Maddison, Dieterich Lawson, George Tucker, Mohammad Norouzi, Nicolas Heess, Andriy Mnih, Yee Whye Teh, Arnaud Doucet

Multi-Armed Bandits with Metric Movement Costs
Tomer Koren, Roi Livni, Yishay Mansour

Multiscale Quantization for Fast Similarity Search
Xiang Wu, Ruiqi Guo, Ananda Theertha Suresh, Sanjiv Kumar, Daniel Holtmann-Rice, David Simcha, Felix Yu

Reducing Reparameterization Gradient Variance
Andrew Miller, Nicholas Foti, Alexander D'Amour, Ryan Adams

Statistical Cost Sharing
Eric Balkanski, Umar Syed, Sergei Vassilvitskii

The Unreasonable Effectiveness of Structured Random Orthogonal Embeddings
Krzysztof Choromanski, Mark Rowland, Adrian Weller

Value Prediction Network
Junhyuk Oh, Satinder Singh, Honglak Lee

REBAR: Low-variance, unbiased gradient estimates for discrete latent variable models
George Tucker, Andriy Mnih, Chris J Maddison, Dieterich Lawson, Jascha Sohl-Dickstein

Approximation and Convergence Properties of Generative Adversarial Learning
Shuang Liu, Olivier Bousquet, Kamalika Chaudhuri

Attention is All you Need
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, Illia Polosukhin

PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Jonathan Huggins, Ryan Adams, Tamara Broderick

Repeated Inverse Reinforcement Learning
Kareem Amin, Nan Jiang, Satinder Singh

Fair Clustering Through Fairlets
Flavio Chierichetti, Ravi Kumar, Silvio Lattanzi, Sergei Vassilvitskii

Affine-Invariant Online Optimization and the Low-rank Experts Problem
Tomer Koren, Roi Livni

Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models
Sergey Ioffe

Bridging the Gap Between Value and Policy Based Reinforcement Learning
Ofir Nachum, Mohammad Norouzi, Kelvin Xu, Dale Schuurmans

Discriminative State Space Models
Vitaly Kuznetsov, Mehryar Mohri

Dynamic Revenue Sharing
Santiago Balseiro, Max Lin, Vahab Mirrokni, Renato Leme, Song Zuo

Multi-view Matrix Factorization for Linear Dynamical System Estimation
Mahdi Karami, Martha White, Dale Schuurmans, Csaba Szepesvari

On Blackbox Backpropagation and Jacobian Sensing
Krzysztof Choromanski, Vikas Sindhwani

On the Consistency of Quick Shift
Heinrich Jiang

Revenue Optimization with Approximate Bid Predictions
Andres Munoz, Sergei Vassilvitskii

Shape and Material from Sound
Zhoutong Zhang, Qiujia Li, Zhengjia Huang, Jiajun Wu, Josh Tenenbaum, Bill Freeman

Learning to See Physics via Visual De-animation
Jiajun Wu, Erika Lu, Pushmeet Kohli, Bill Freeman, Josh Tenenbaum

Conference Demos
Electronic Screen Protector with Efficient and Robust Mobile Vision
Hee Jung Ryu, Florian Schroff

Magenta and deeplearn.js: Real-time Control of DeepGenerative Music Models in the Browser
Curtis Hawthorne, Ian Simon, Adam Roberts, Jesse Engel, Daniel Smilkov, Nikhil Thorat, Douglas Eck

Workshops
6th Workshop on Automated Knowledge Base Construction (AKBC) 2017
Program Committee includes: Arvind Neelakanta
Authors include: Jiazhong Nie, Ni Lao

Acting and Interacting in the Real World: Challenges in Robot Learning
Invited Speakers include: Pierre Sermanet

Advances in Approximate Bayesian Inference
Panel moderator: Matthew D. Hoffman

Conversational AI - Today's Practice and Tomorrow's Potential
Invited Speakers include: Matthew Henderson, Dilek Hakkani-Tur
Organizers include: Larry Heck

Extreme Classification: Multi-class and Multi-label Learning in Extremely Large Label Spaces
Invited Speakers include: Ed Chi, Mehryar Mohri

Learning in the Presence of Strategic Behavior
Invited Speakers include: Mehryar Mohri
Presenters include: Andres Munoz Medina, Sebastien Lahaie, Sergei Vassilvitskii, Balasubramanian Sivan

Learning on Distributions, Functions, Graphs and Groups
Invited speakers include: Corinna Cortes

Machine Deception
Organizers include: Ian Goodfellow
Invited Speakers include: Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

Machine Learning and Computer Security
Invited Speakers include: Ian Goodfellow
Organizers include: Nicolas Papernot
Authors include: Jacob Buckman, Aurko Roy, Colin Raffel, Ian Goodfellow

Machine Learning for Creativity and Design
Keynote Speakers include: Ian Goodfellow
Organizers include: Doug Eck, David Ha

Machine Learning for Audio Signal Processing (ML4Audio)
Authors include: Aren Jansen, Manoj Plakal, Dan Ellis, Shawn Hershey, Channing Moore, Rif A. Saurous, Yuxuan Wang, RJ Skerry-Ryan, Ying Xiao, Daisy Stanton, Joel Shor, Eric Batternberg, Rob Clark

Machine Learning for Health (ML4H)
Organizers include: Jasper Snoek, Alex Wiltschko
Keynote: Fei-Fei Li

NIPS Time Series Workshop 2017
Organizers include: Vitaly Kuznetsov
Authors include: Brendan Jou

OPT 2017: Optimization for Machine Learning
Organizers include: Sashank Reddi

ML Systems Workshop
Invited Speakers include: Rajat Monga, Alexander Mordvintsev, Chris Olah, Jeff Dean
Authors include: Alex Beutel, Tim Kraska, Ed H. Chi, D. Scully, Michael Terry

Aligned Artificial Intelligence
Invited Speakers include: Ian Goodfellow

Bayesian Deep Learning
Organizers include: Kevin Murphy
Invited speakers include: Nal Kalchbrenner, Matthew D. Hoffman

BigNeuro 2017
Invited speakers include: Viren Jain

Cognitively Informed Artificial Intelligence: Insights From Natural Intelligence
Authors include: Jiazhong Nie, Ni Lao

Deep Learning At Supercomputer Scale
Organizers include: Erich Elsen, Zak Stone, Brennan Saeta, Danijar Haffner

Deep Learning: Bridging Theory and Practice
Invited Speakers include: Ian Goodfellow

Interpreting, Explaining and Visualizing Deep Learning
Invited Speakers include: Been Kim, Honglak Lee
Authors include: Pieter Kinderman, Sara Hooker, Dumitru Erhan, Been Kim

Learning Disentangled Features: from Perception to Control
Organizers include: Honglak Lee
Authors include: Jasmine Hsu, Arkanath Pathak, Abhinav Gupta, James Davidson, Honglak Lee

Learning with Limited Labeled Data: Weak Supervision and Beyond
Invited Speakers include: Ian Goodfellow

Machine Learning on the Phone and other Consumer Devices
Invited Speakers include: Rajat Monga
Organizers include: Hrishikesh Aradhye
Authors include: Suyog Gupta, Sujith Ravi

Optimal Transport and Machine Learning
Organizers include: Olivier Bousquet

The future of gradient-based machine learning software & techniques
Organizers include: Alex Wiltschko, Bart van Merriënboer

Workshop on Meta-Learning
Organizers include: Hugo Larochelle
Panelists include: Samy Bengio
Authors include: Aliaksei Severyn, Sascha Rothe

Symposiums
Deep Reinforcement Learning Symposium
Authors include: Benjamin Eysenbach, Shane Gu, Julian Ibarz, Sergey Levine

Interpretable Machine Learning
Authors include: Minmin Chen

Metalearning
Organizers include: Quoc V Le

Competitions
Adversarial Attacks and Defences
Organizers include: Alexey Kurakin, Ian Goodfellow, Samy Bengio

Competition IV: Classifying Clinically Actionable Genetic Mutations
Organizers include: Wendy Kan

Tutorial
Fairness in Machine Learning
Solon Barocas, Moritz Hardt


Tangent: Source-to-Source Debuggable Derivatives

Crossposted on the Google Research Blog

Tangent is a new, free, and open source Python library for automatic differentiation. In contrast to existing machine learning libraries, Tangent is a source-to-source system, consuming a Python function f and emitting a new Python function that computes the gradient of f. This allows much better user visibility into gradient computations, as well as easy user-level editing and debugging of gradients. Tangent comes with many more features for debugging and designing machine learning models.
This post gives an overview of the Tangent API. It covers how to use Tangent to generate gradient code in Python that is easy to interpret, debug and modify.

Neural networks (NNs) have led to great advances in machine learning models for images, video, audio, and text. The fundamental abstraction that lets us train NNs to perform well at these tasks is a 30-year-old idea called reverse-mode automatic differentiation (also known as backpropagation), which comprises two passes through the NN. First, we run a “forward pass” to calculate the output value of each node. Then we run a “backward pass” to calculate a series of derivatives to determine how to update the weights to increase the model’s accuracy.

Training NNs, and doing research on novel architectures, requires us to compute these derivatives correctly, efficiently, and easily. We also need to be able to debug these derivatives when our model isn’t training well, or when we’re trying to build something new that we do not yet understand. Automatic differentiation, or just “autodiff,” is a technique to calculate the derivatives of computer programs that denote some mathematical function, and nearly every machine learning library implements it.

Existing libraries implement automatic differentiation by tracing a program’s execution (at runtime, like TF Eager, PyTorch and Autograd) or by building a dynamic data-flow graph and then differentiating the graph (ahead-of-time, like TensorFlow). In contrast, Tangent performs ahead-of-time autodiff on the Python source code itself, and produces Python source code as its output.
As a result, you can finally read your automatic derivative code just like the rest of your program. Tangent is useful to researchers and students who not only want to write their models in Python, but also read and debug automatically-generated derivative code without sacrificing speed and flexibility.

You can easily inspect and debug your models written in Tangent, without special tools or indirection. Tangent works on a large and growing subset of Python, provides extra autodiff features other Python ML libraries don’t have, is high-performance, and is compatible with TensorFlow and NumPy.

Automatic differentiation of Python code

How do we automatically generate derivatives of plain Python code? Math functions like tf.exp or tf.log have derivatives, which we can compose to build the backward pass. Similarly, pieces of syntax, such as  subroutines, conditionals, and loops, also have backward-pass versions. Tangent contains recipes for generating derivative code for each piece of Python syntax, along with many NumPy and TensorFlow function calls.

Tangent has a one-function API:
import tangent
df = tangent.grad(f)
Here’s an animated graphic of what happens when we call tangent.grad on a Python function:
If you want to print out your derivatives, you can run
import tangent
df = tangent.grad(f, verbose=1)
Under the hood, tangent.grad first grabs the source code of the Python function you pass it. Tangent has a large library of recipes for the derivatives of Python syntax, as well as TensorFlow Eager functions. The function tangent.grad then walks your code in reverse order, looks up the matching backward-pass recipe, and adds it to the end of the derivative function. This reverse-order processing gives the technique its name: reverse-mode automatic differentiation.

The function df above only works for scalar (non-array) inputs. Tangent also supports
Although we started with TensorFlow Eager support, Tangent isn’t tied to one numeric library or another—we would gladly welcome pull requests adding PyTorch or MXNet derivative recipes.

Next Steps

Tangent is open source now at github.com/google/tangent. Go check it out for download and installation instructions. Tangent is still an experiment, so expect some bugs. If you report them to us on GitHub, we will do our best to fix them quickly.

We are working to add support in Tangent for more aspects of the Python language (e.g., closures, inline function definitions, classes, more NumPy and TensorFlow functions). We also hope to add more advanced automatic differentiation and compiler functionality in the future, such as automatic trade-off between memory and compute (Griewank and Walther 2000; Gruslys et al., 2016), more aggressive optimizations, and lambda lifting.

We intend to develop Tangent together as a community. We welcome pull requests with fixes and features. Happy deriving!

By Alex Wiltschko, Research Scientist, Google Brain Team

Acknowledgments

Bart van Merriënboer contributed immensely to all aspects of Tangent during his internship, and Dan Moldovan led TF Eager integration, infrastructure and benchmarking. Also, thanks to the Google Brain team for their support of this post and special thanks to Sanders Kleinfeld and Aleks Haecky for their valuable contribution for the technical aspects of the post.