Tag Archives: Machine Intelligence

AIY Projects: Updated kits for 2018

Posted by Billy Rutledge, Director of AIY Projects

Last year, AIY Projects launched to give makers the power to build AI into their projects with two do-it-yourself kits. We're seeing continued demand for the kits, especially from the STEM audience where parents and teachers alike have found the products to be great tools for the classroom. The changing nature of work in the future means students may have jobs that haven't yet been imagined, and we know that computer science skills, like analytical thinking and creative problem solving, will be crucial.

We're taking the first of many steps to help educators integrate AIY into STEM lesson plans and help prepare students for the challenges of the future by launching a new version of our AIY kits. The Voice Kit lets you build a voice controlled speaker, while the Vision Kit lets you build a camera that learns to recognize people and objects (check it out here). The new kits make getting started a little easier with clearer instructions, a new app and all the parts in one box.

To make setup easier, both kits have been redesigned to work with the new Raspberry Pi Zero WH, which comes included in the box, along with the USB connector cable and pre-provisioned SD card. Now users no longer need to download the software image and can get running faster. The updated AIY Vision Kit v1.1 also includes the Raspberry Pi Camera v2.

AIY Voice Kit v2 includes Raspberry Pi Zero WH and pre-provisioned SD card

AIY Voice Kit v1.1 includes Raspberry Pi Zero WH, Raspberry Pi Cam 2 and pre-provisioned SD card

We're also introducing the AIY companion app for Android, available here in Google Play, to make wireless setup and configuration a snap. The kits still work with monitor, keyboard and mouse as an alternate path and we're working on iOS and Chrome companions which will be coming soon.

The AIY website has been refreshed with improved documentation, now easier for young makers to get started and learn as they build. It also includes a new AIY Models area, showcasing a collection of neural networks designed to work with AIY kits. While we've solved one barrier to entry for the STEM audience, we recognize that there are many other things that we can do to make our kits even more useful. We'll once again be at #MakerFaire events to gather feedback from our users and in June we'll be working with teachers from all over the world at the ISTE conference in Chicago.

The new AIY Voice Kit and Vision Kit have arrived at Target Stores and Target.com (US) this month and we're working to make them globally available through retailers worldwide. Sign up on our mailing list to be notified when our products become available.

We hope you'll pick up one of the new AIY kits and learn more about how to build your own smart devices. Be sure to share your recipes on Hackster.io and social media using #aiyprojects.

Introducing the iNaturalist 2018 Challenge



Thanks to recent advances in deep learning, the visual recognition abilities of machines have improved dramatically, permitting the practical application of computer vision to tasks ranging from pedestrian detection for self-driving cars to expression recognition in virtual reality. One area that remains challenging for computers, however, is fine-grained and instance-level recognition. Earlier this month, we posted an instance-level landmark recognition challenge for identifying individual landmarks. Here we focus on fine-grained visual recognition, which is to distinguish species of animals and plants, car and motorcycle models, architectural styles, etc. For computers, discriminating fine-grained categories is challenging because many categories have relatively few training examples (i.e., the long tail problem), the examples that do exist often lack authoritative training labels, and there is variability in illumination, viewing angle and object occlusion.

To help confront these hurdles, we are excited to announce the 2018 iNaturalist Challenge (iNat-2018), a species classification competition offered in partnership with iNaturalist and Visipedia (short for Visual Encyclopedia), a project for which Caltech and Cornell Tech received a Google Focused Research Award. This is a flagship challenge for the 5th International Workshop on Fine Grained Visual Categorization (FGVC5) at CVPR 2018. Building upon the first iNaturalist challenge, iNat-2017, iNat-2018 spans over 8000 categories of plants, animals, and fungi, with a total of more than 450,000 training images. We invite participants to enter the competition on Kaggle, with final submissions due in early June. Training data, annotations, and links to pretrained models can be found on our GitHub repo.

iNaturalist has emerged as a world leader for citizen scientists to share observations of species and connect with nature since its founding in 2008. It hosts research-grade photos and annotations submitted by a thriving, engaged community of users. Consider the following photo from iNaturalist:
The map on the right shows where the photo was taken. Image credit: Serge Belongie.
You may notice that the photo on the left contains a turtle. But did you also know this is a Trachemys scripta, common name “Pond Slider?” If you knew the latter, you possess knowledge of fine-grained or subordinate categories.

In contrast to other image classification datasets such as ImageNet, the dataset in the iNaturalist challenge exhibits a long-tailed distribution, with many species having relatively few images. It is important to enable machine learning models to handle categories in the long-tail, as the natural world is heavily imbalanced – some species are more abundant and easier to photograph than others. The iNaturalist challenge will encourage progress because the training distribution of iNat-2018 has an even longer tail than iNat-2017.
Distribution of training images per species for iNat-2017 and iNat-2018, plotted on a log-log scale, illustrating the long-tail behavior typical of fine-grained classification problems. Image Credit: Grant Van Horn and Oisin Mac Aodha.
Along with iNat-2018, FGVC5 will also host the iMaterialist 2018 challenge (including a furniture categorization challenge and a fashion attributes challenge for product images) and a set of “FGVCx” challenges representing smaller scale – but still significant – challenges, featuring content such as food and modern art.

FGVC5 will be showcased on the main stage at CVPR 2018, thereby ensuring broad exposure for the top performing teams. This project will advance the state-of-the-art in automatic image classification for real world, fine-grained categories, with heavy class imbalances, and large numbers of classes. We cordially invite you to participate in these competitions and help move the field forward!

Acknowledgements
We’d like to thank our colleagues and friends at iNaturalist, Visipedia, and FGVC5 for working together to advance this important area. At Google we would like to thank Hartwig Adam, Weijun Wang, Nathan Frey, Andrew Howard, Alessandro Fin, Yuning Chai, Xiao Zhang, Jack Sim, Yuan Li, Grant Van Horn, Yin Cui, Chen Sun, Yanan Qian, Grace Vesom, Tanya Birch, Wendy Kan, and Maggie Demkin.

Understanding Medical Conversations



Good documentation helps create good clinical care by communicating a doctor's thinking, their concerns, and their plans to the rest of the team. Unfortunately, physicians routinely spend more time doing documentation than doing what they love most — caring for patients. Part of the reason is that doctors spend ~6 hours in an 11-hour workday in the Electronic Health Records (EHR) on documentation.1 Consequently, one study found that more than half of surveyed doctors report at least one symptom of burnout.2

In order to help offload note-taking, many doctors have started using medical scribes as a part of their workflow. These scribes listen to the patient-doctor conversations and create notes for the EHR. According to a recent study, introducing scribes not only improved physician satisfaction, but also medical chart quality and accuracy.3 But the number of doctor-patient conversations that need a scribe is far beyond the capacity of people who are available for medical scribing.

We wondered: could the voice recognition technologies already available in Google Assistant, Google Home, and Google Translate be used to document patient-doctor conversations and help doctors and scribes summarize notes more quickly?
In “Speech Recognition for Medical Conversations”, we show that it is possible to build Automatic Speech Recognition (ASR) models for transcribing medical conversations. While most of the current ASR solutions in medical domain focus on transcribing doctor dictations (i.e., single speaker speech consisting of predictable medical terminology), our research shows that it is possible to build an ASR model which can handle multiple speaker conversations covering everything from weather to complex medical diagnosis.

Using this technology, we will start working with physicians and researchers at Stanford University, who have done extensive research on how scribes can improve physician satisfaction, to understand how deep learning techniques such as ASR can facilitate the scribing process of physician notes. In our pilot study, we investigate what types of clinically relevant information can be extracted from medical conversations to assist physicians in reducing their interactions with the EHR. The study is fully patient-consented and the content of the recording will be de-identified to protect patient privacy.

We hope these technologies will not only help return joy to practice by facilitating doctors and scribes with their everyday workload, but also help the patients get more dedicated and thorough medical attention, ideally, leading to better care.


1 http://www.annfammed.org/content/15/5/419.full
2 http://www.mayoclinicproceedings.org/article/S0025-6196%2815%2900716-8/abstract
3 http://www.annfammed.org/content/15/5/427.full

Quick Access in Drive: Using Machine Learning to Save You Time



At Google, we research cutting-edge machine learning (ML) techniques that allow us to provide products and services aimed at helping you focus on what’s important. From providing language translations to understanding images to helping you respond to emails, it is our goal to help you save time, making life — and work — a little more convenient.

Recent studies have shown that finding information is second only to managing email as a drain on workplace productivity. To help address this, last year we launched Quick Access, a feature in Google Drive that uses ML to surface the most relevant documents as soon as you visit the Google Drive home screen. Originally available only for G Suite customers on Android, Quick Access is now available for anyone who uses Google Drive (on the Web, Android, and iOS), saving you from having to enter a search or to browse through your folders. Our metrics show that Quick Access takes you to the documents you need in half the time compared to manually navigating or searching.
Quick Access uses deep neural networks to determine patterns from various signals, such as activity in Drive, meetings on your Calendar, and more, to anticipate your needs and show the appropriate documents on the Drive home screen. Traditional ML approaches require domain experts to derive complex features from data, which are in turn used to train the model. For Quick Access, however, we constructed thousands of simple features from the various signals above (for instance, the timestamps of the last 20 edit events on a document would constitute 20 simple input features), and combined them with the power of deep neural networks to learn from the aggregated activity of our users. By using deep neural networks we were able to develop accurate predictive models with simpler features and less feature engineering effort.
Quick Access suggestions on the top row in Drive on a desktop browser.
The model computes a relevance score for each of the documents in Drive and the top scoring documents are presented on the home screen. For example, if you have a Calendar entry for a meeting with a coworker in the next few minutes, Quick Access might predict that the presentation you’ve been working on with that coworker is more relevant compared to your monthly budget spreadsheet or the photos you uploaded last week. If you’ve been updating a spreadsheet every weekend, then next weekend, Quick Access will likely display that spreadsheet ahead of the other documents you viewed during the week.

We hope Quick Access helps you use Drive more effectively, allowing you to save time and be more productive. To learn more, watch this talk from Google Cloud Next ‘17 that dives into more details on the ML behind Quick Access.

Acknowledgements
Thanks to Alexandrin Popescul and Marc Najork for contributions that made this application of machine learning technology possible. This work was in close collaboration with several engineers on the Drive team including Sean Abraham, Brian Calaci, Mike Colagrosso, Mike Procopio, Jesse Sterr, and Timothy Vis.

On-Device Machine Intelligence



To build the cutting-edge technologies that enable conversational understanding and image recognition, we often apply combinations of machine learning technologies such as deep neural networks and graph-based machine learning. However, the machine learning systems that power most of these applications run in the cloud and are computationally intensive and have significant memory requirements. What if you want machine intelligence to run on your personal phone or smartwatch, or on IoT devices, regardless of whether they are connected to the cloud?

Yesterday, we announced the launch of Android Wear 2.0, along with brand new wearable devices, that will run Google's first entirely “on-device” ML technology for powering smart messaging. This on-device ML system, developed by the Expander research team, enables technologies like Smart Reply to be used for any application, including third-party messaging apps, without ever having to connect with the cloud…so now you can respond to incoming chat messages directly from your watch, with a tap.
The research behind this began last year while our team was developing the machine learning systems that enable conversational understanding capability in Allo and Inbox. The Android Wear team reached out to us and was interested to know whether it would be possible to deploy this Smart Reply technology directly onto a smart device. Because of the limited computing power and memory on smart devices, we quickly realized that it was not possible to do so. Our product manager, Patrick McGregor, realized that this presented a unique challenge and an opportunity for the Expander team to return to the drawing board to design a completely new, lightweight, machine learning architecture — not only to enable Smart Reply on Android Wear, but also to power a wealth of other on-device mobile applications. Together with Tom Rudick, Nathan Beach, and other colleagues from the Android Wear team, we set out to build the new system.

Learning with Projections
A simple strategy to build lightweight conversational models might be to create a small dictionary of common rules (input → reply mappings) on the device and use a naive look-up strategy at inference time. This can work for simple prediction tasks involving a small set of classes using a handful of features (such as binary sentiment classification from text, e.g. “I love this movie” conveys a positive sentiment whereas the sentence “The acting was horrible” is negative). But, it does not scale to complex natural language tasks involving rich vocabularies and the wide language variability observed in chat messages. On the other hand, machine learning models like recurrent neural networks (such as LSTMs), in conjunction with graph learning, have proven to be extremely powerful tools for complex sequence learning in natural language understanding tasks, including Smart Reply. However, compressing such rich models to fit in device memory and produce robust predictions at low computation cost (rapidly on-demand) is extremely challenging. Early experiments with restricting the model to predict only a small handful of replies or using other techniques like quantization or character-level models did not produce useful results.

Instead, we built a different solution for the on-device ML system. We first use a fast, efficient mechanism to group similar incoming messages and project them to similar (“nearby”) bit vector representations. While there are several ways to perform this projection step, such as using word embeddings or encoder networks, we employ a modified version of locality sensitive hashing (LSH) to reduce dimension from millions of unique words to a short, fixed-length sequence of bits. This allows us to compute a projection for an incoming message very fast, on-the-fly, with a small memory footprint on the device since we do not need to store the incoming messages, word embeddings, or even the full model used for training.
Projection step: Similar messages are grouped together and projected to nearby vectors. For example, the messages "hey, how's it going?" and "How's it going buddy?" share similar content and might be projected to the same vector 11100011. Another related message “Howdy, everything going well?” is mapped to a nearby vector 11100110 that differs only in 2 bits.
Next, our system takes the incoming message along with its projections and jointly trains a “message projection model” that learns to predict likely replies using our semi-supervised graph learning framework. The graph learning framework enables training a robust model by combining semantic relationships from multiple sources — message/reply interactions, word/phrase similarity, semantic cluster information — learning useful projection operations that can be mapped to good reply predictions.
Learning step: (Top) Messages along with projections and corresponding replies, if available, are used in a machine learning framework to jointly learn a “message projection model”. (Bottom) The message projection model learns to associate replies with the projections of the corresponding incoming messages. For example, the model projects two different messages “Howdy, everything going well?” and “How’s it going buddy?” (bottom center) to nearby bit vectors and learns to map these to relevant replies (bottom right).
It’s worth noting that while the message projection model can be trained using complex machine learning architectures and the power of the cloud, as described above, the model itself resides and performs inference completely on device. Apps running on the device can pass a user’s incoming messages and receive reply predictions from the on-device model without data leaving the device. The model can also be adapted to cater to the user’s writing style and individual preferences to provide a personalized experience.
Inference step: The model applies the learned projections to an incoming message (or sequence of messages) and suggests relevant and diverse replies. Inference is performed on the device, allowing the model to adapt to user data and personal writing styles.
To get the on-device system to work out of the box, we had to make a few additional improvements such as optimizing for speeding up computations on device and generating rich, diverse replies from the model. We will have a forthcoming scientific publication that describes the on-device machine learning work in more detail.

Converse from Your Wrist
When we embarked on our journey to build this technology from scratch, we weren’t sure if the predictions would be useful or of sufficient quality. We’re quite surprised and excited about how well it works even on Android wearable devices with very limited computation and memory resources. We look forward to continuing to improve the models to provide users with more delightful conversational experiences, and we will be leveraging this on-device ML platform to enable completely new applications in the months to come.

You can now use this feature to respond to your messages directly from your Google watches or any watch that runs Android Wear 2.0. It is already enabled on Google Hangouts, Google Messenger, and many third-party messaging apps. We also provide an API for developers of third-party Wear apps.

Acknowledgements
On behalf of the Google Expander team, I would also like to thank the following people who helped make this technology a success: Andrei Broder, Andrew Tomkins, David Singleton, Mirko Ranieri, Robin Dua and Yicheng Fan.

NIPS 2016 & Research at Google



This week, Barcelona hosts the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), a machine learning and computational neuroscience conference that includes invited talks, demonstrations and oral and poster presentations of some of the latest in machine learning research. Google will have a strong presence at NIPS 2016, with over 280 Googlers attending in order to contribute to and learn from the broader academic research community by presenting technical talks and posters, in addition to hosting workshops and tutorials.

Research at Google is at the forefront of innovation in Machine Intelligence, actively exploring virtually all aspects of machine learning including classical algorithms as well as cutting-edge techniques such as deep learning. Focusing on both theory as well as application, much of our work on language understanding, speech, translation, visual processing, ranking, and prediction relies on Machine Intelligence. In all of those tasks and many others, we gather large volumes of direct or indirect evidence of relationships of interest, and develop learning approaches to understand and generalize.

If you are attending NIPS 2016, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people, and to see demonstrations of some of the exciting research we pursue. You can also learn more about our work being presented at NIPS 2016 in the list below (Googlers highlighted in blue).

Google is a Platinum Sponsor of NIPS 2016.

Organizing Committee
Executive Board includes: Corinna Cortes, Fernando Pereira
Advisory Board includes: John C. Platt
Area Chairs include: John Shlens, Moritz Hardt, Navdeep JaitlyHugo Larochelle, Honglak Lee, Sanjiv Kumar, Gal Chechik

Invited Talk
Dynamic Legged Robots
Marc Raibert

Accepted Papers:
Boosting with Abstention
Corinna Cortes, Giulia DeSalvo, Mehryar Mohri

Community Detection on Evolving Graphs
Stefano Leonardi, Aris Anagnostopoulos, Jakub Łącki, Silvio Lattanzi, Mohammad Mahdian

Linear Relaxations for Finding Diverse Elements in Metric Spaces
Aditya Bhaskara, Mehrdad Ghadiri, Vahab Mirrokni, Ola Svensson

Nearly Isometric Embedding by Relaxation
James McQueen, Marina Meila, Dominique Joncas

Optimistic Bandit Convex Optimization
Mehryar Mohri, Scott Yang

Reward Augmented Maximum Likelihood for Neural Structured Prediction
Mohammad Norouzi, Samy Bengio, Zhifeng Chen, Navdeep Jaitly, Mike Schuster, Yonghui Wu, Dale Schuurmans

Stochastic Gradient MCMC with Stale Gradients
Changyou Chen, Nan Ding, Chunyuan Li, Yizhe Zhang, Lawrence Carin

Unsupervised Learning for Physical Interaction through Video Prediction
Chelsea Finn*, Ian Goodfellow, Sergey Levine

Using Fast Weights to Attend to the Recent Past
Jimmy Ba, Geoffrey Hinton, Volodymyr Mnih, Joel Leibo, Catalin Ionescu

A Credit Assignment Compiler for Joint Prediction
Kai-Wei Chang, He He, Stephane Ross, Hal III

A Neural Transducer
Navdeep Jaitly, Quoc Le, Oriol Vinyals, Ilya Sutskever, David Sussillo, Samy Bengio

Attend, Infer, Repeat: Fast Scene Understanding with Generative Models
S. M. Ali Eslami, Nicolas Heess, Theophane Weber, Yuval Tassa, David Szepesvari, Koray Kavukcuoglu, Geoffrey Hinton

Bi-Objective Online Matching and Submodular Allocations
Hossein Esfandiari, Nitish Korula, Vahab Mirrokni

Combinatorial Energy Learning for Image Segmentation
Jeremy Maitin-Shepard, Viren Jain, Michal Januszewski, Peter Li, Pieter Abbeel

Deep Learning Games
Dale Schuurmans, Martin Zinkevich

DeepMath - Deep Sequence Models for Premise Selection
Geoffrey Irving, Christian Szegedy, Niklas Een, Alexander Alemi, François Chollet, Josef Urban

Density Estimation via Discrepancy Based Adaptive Sequential Partition
Dangna Li, Kun Yang, Wing Wong

Domain Separation Networks
Konstantinos Bousmalis, George Trigeorgis, Nathan Silberman Dilip KrishnanDumitru Erhan

Fast Distributed Submodular Cover: Public-Private Data Summarization
Baharan Mirzasoleiman, Morteza Zadimoghaddam, Amin Karbasi

Satisfying Real-world Goals with Dataset Constraints
Gabriel Goh, Andrew Cotter, Maya Gupta, Michael P Friedlander

Can Active Memory Replace Attention?
Łukasz Kaiser, Samy Bengio

Fast and Flexible Monotonic Functions with Ensembles of Lattices
Kevin Canini Andy Cotter Maya Gupta Mahdi Fard Jan Pfeifer

Launch and Iterate: Reducing Prediction Churn
Quentin Cormier, Mahdi Fard, Kevin Canini, Maya Gupta

On Mixtures of Markov Chains
Rishi Gupta, Ravi Kumar, Sergei Vassilvitskii

Orthogonal Random Features
Felix Xinnan Yu Ananda Theertha Suresh Krzysztof Choromanski Dan Holtmann-Rice
Sanjiv Kumar


Perspective Transformer Nets: Learning Single-View 3D Object Reconstruction without 3D
Supervision
Xinchen Yan, Jimei Yang, Ersin Yumer, Yijie Guo, Honglak Lee

Structured Prediction Theory Based on Factor Graph Complexity
Corinna Cortes, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang

Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity
Amit Daniely, Roy Frostig, Yoram Singer

Demonstrations
Interactive musical improvisation with Magenta
Adam Roberts, Sageev Oore, Curtis Hawthorne, Douglas Eck

Content-based Related Video Recommendation
Joonseok Lee

Workshops, Tutorials and Symposia
Advances in Approximate Bayesian Inference
Advisory Committee includes: Kevin P. Murphy
Invited Speakers include: Matt Johnson
Panelists include: Ryan Sepassi

Adversarial Training
Accepted Authors: Luke Metz, Ben Poole, David Pfau, Jascha Sohl-Dickstein, Augustus Odena, Christopher Olah, Jonathon Shlens

Bayesian Deep Learning
Organizers include: Kevin P. Murphy
Accepted Authors include: Rif A. Saurous, Eugene Brevdo, Kevin Murphy

Brains & Bits: Neuroscience Meets Machine Learning
Organizers include: Jascha Sohl-Dickstein

Connectomics II: Opportunities & Challanges for Machine Learning
Organizers include: Viren Jain

Constructive Machine Learning
Invited Speakers include: Douglas Eck

Continual Learning & Deep Networks
Invited Speakers include: Honglak Lee

Deep Learning for Action & Interaction
Organizers include: Sergey Levine
Invited Speakers include: Honglak Lee
Accepted Authors include: Pararth Shah, Dilek Hakkani-Tur, Larry Heck

End-to-end Learning for Speech and Audio Processing
Invited Speakers include: Tara Sainath
Accepted Authors include: Brian Patton, Yannis Agiomyrgiannakis, Michael Terry, Kevin Wilson, Rif A. Saurous, D. Sculley

Extreme Classification: Multi-class & Multi-label Learning in Extremely Large Label Spaces
Organizers include: Samy Bengio

Interpretable Machine Learning for Complex Systems
Invited Speaker: Honglak Lee
Accepted Authors include: Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda Viegas, Martin Wattenberg

Large Scale Computer Vision Systems
Organizers include: Gal Chechik

Machine Learning Systems
Invited Speakers include: Jeff Dean

Nonconvex Optimization for Machine Learning: Theory & Practice
Organizers include: Hossein Mobahi

Optimizing the Optimizers
Organizers include: Alex Davies

Reliable Machine Learning in the Wild
Accepted Authors: Andres Medina, Sergei Vassilvitskii

The Future of Gradient-Based Machine Learning Software
Invited Speakers: Jeff Dean, Matt Johnson

Time Series Workshop
Organizers include: Vitaly Kuznetsov
Invited Speakers include: Mehryar Mohri

Theory and Algorithms for Forecasting Non-Stationary Time Series
Tutorial Organizers: Vitaly Kuznetsov, Mehryar Mohri

Women in Machine Learning
Invited Speakers include: Maya Gupta



* Work done as part of the Google Brain team

Graph-powered Machine Learning at Google



Recently, there have been significant advances in Machine Learning that enable computer systems to solve complex real-world problems. One of those advances is Google’s large scale, graph-based machine learning platform, built by the Expander team in Google Research. A technology that is behind many of the Google products and features you may use everyday, graph-based machine learning is a powerful tool that can be used to power useful features such as reminders in Inbox and smart messaging in Allo, or used in conjunction with deep neural networks to power the latest image recognition system in Google Photos.
Learning with Minimal Supervision

Much of the recent success in deep learning and machine learning, in general, can be attributed to models that demonstrate high predictive capacity when trained on large amounts of labeled data -- often millions of training examples. This is commonly referred to as “supervised learning” since it requires supervision, in the form of labeled data, to train the machine learning systems. (Conversely, some machine learning methods operate directly on raw data without any supervision, a paradigm referred to as unsupervised learning.)

However, the more difficult the task, the harder it is to get sufficient high-quality labeled data. It is often prohibitively labor intensive and time-consuming to collect labeled data for every new problem. This motivated the Expander research team to build new technology for powering machine learning applications at scale and with minimal supervision.

Expander’s technology draws inspiration from how humans learn to generalize and bridge the gap between what they already know (labeled information) and novel, unfamiliar observations (unlabeled information). Known as “semi-supervised” learning, this powerful technique enables us to build systems that can work in situations where training data may be sparse. One of the key advantages to a graph-based semi-supervised machine learning approach is the fact that (a) one models labeled and unlabeled data jointly during learning, leveraging the underlying structure in the data, (b) one can easily combine multiple types of signals (for example, relational information from Knowledge Graph along with raw features) into a single graph representation and learn over them. This is in contrast to other machine learning approaches, such as neural network methods, in which it is typical to first train a system using labeled data with features and then apply the trained system to unlabeled data.

Graph Learning: How It Works

At its core, Expander’s platform combines semi-supervised machine learning with large-scale graph-based learning by building a multi-graph representation of the data with nodes corresponding to objects or concepts and edges connecting concepts that share similarities. The graph typically contains both labeled data (nodes associated with a known output category or label) and unlabeled data (nodes for which no labels were provided). Expander’s framework then performs semi-supervised learning to label all nodes jointly by propagating label information across the graph.

However, this is easier said than done! We have to (1) learn efficiently at scale with minimal supervision (i.e., tiny amount of labeled data), (2) operate over multi-modal data (i.e., heterogeneous representations and various sources of data), and (3) solve challenging prediction tasks (i.e., large, complex output spaces) involving high dimensional data that might be noisy.

One of the primary ingredients in the entire learning process is the graph and choice of connections. Graphs come in all sizes, shapes and can be combined from multiple sources. We have observed that it is often beneficial to learn over multi-graphs that combine information from multiple types of data representations (e.g., image pixels, object categories and chat response messages for PhotoReply in Allo). The Expander team’s graph learning platform automatically generates graphs directly from data based on the inferred or known relationships between data elements. The data can be structured (for example, relational data) or unstructured (for example, sparse or dense feature representations extracted from raw data).

To understand how Expander’s system learns, let us consider an example graph shown below.
There are two types of nodes in the graph: “grey” represents unlabeled data whereas the colored nodes represent labeled data. Relationships between node data is represented via edges and thickness of each edge indicates strength of the connection. We can formulate the semi-supervised learning problem on this toy graph as follows: predict a color (“red” or “blue”) for every node in the graph. Note that the specific choice of graph structure and colors depend on the task. For example, as shown in this research paper we recently published, a graph that we built for the Smart Reply feature in Inbox represents email messages as nodes and colors indicate semantic categories of user responses (e.g., “yes”, “awesome”, “funny”).

The Expander graph learning framework solves this labeling task by treating it as an optimization problem. At the simplest level, it learns a color label assignment for every node in the graph such that neighboring nodes are assigned similar colors depending on the strength of their connection. A naive way to solve this would be to try to learn a label assignment for all nodes at once -- this method does not scale to large graphs. Instead, we can optimize the problem formulation by propagating colors from labeled nodes to their neighbors, and then repeating the process. In each step, an unlabeled node is assigned a label by inspecting color assignments of its neighbors. We can update every node’s label in this manner and iterate until the whole graph is colored. This process is a far more efficient way to optimize the same problem and the sequence of iterations converges to a unique solution in this case. The solution at the end of the graph propagation looks something like this:
Semi-supervised learning on a graph
In practice, we use complex optimization functions defined over the graph structure, which incorporate additional information and constraints for semi-supervised graph learning that can lead to hard, non-convex problems. The real challenge, however, is to scale this efficiently to graphs containing billions of nodes, trillions of edges and for complex tasks involving billions of different label types.

To tackle this challenge, we created an approach outlined in Large Scale Distributed Semi-Supervised Learning Using Streaming Approximation, published last year. It introduces a streaming algorithm to process information propagated from neighboring nodes in a distributed manner that makes it work on very large graphs. In addition, it addresses other practical concerns, notably it guarantees that the space complexity or memory requirements of the system stays constant regardless of the difficulty of the task, i.e., the overall system uses the same amount of memory regardless of whether the number of prediction labels is two (as in the above toy example) or a million or even a billion. This enables wide-ranging applications for natural language understanding, machine perception, user modeling and even joint multimodal learning for tasks involving multiple modalities such as text, image and video inputs.

Language Graphs for Learning Humor

As an example use of graph-based machine learning, consider emotion labeling, a language understanding task in Smart Reply for Inbox, where the goal is to label words occurring in natural language text with their fine-grained emotion categories. A neural network model is first applied to a text corpus to learn word embeddings, i.e., a mathematical vector representation of the meaning of each word. The dense embedding vectors are then used to build a sparse graph where nodes correspond to words and edges represent semantic relationship between them. Edge strength is computed using similarity between embedding vectors — low similarity edges are ignored. We seed the graph with emotion labels known a priori for a few nodes (e.g., laugh is labeled as “funny”) and then apply semi-supervised learning over the graph to discover emotion categories for remaining words (e.g., ROTFL gets labeled as “funny” owing to its multi-hop semantic connection to the word “laugh”).
Learning emotion associations using graph constructed from word embedding vectors
For applications involving large datasets or dense representations that are observed (e.g., pixels from images) or learned using neural networks (e.g., embedding vectors), it is infeasible to compute pairwise similarity between all objects to construct edges in the graph. The Expander team solves this problem by leveraging approximate, linear-time graph construction algorithms.

Graph-based Machine Intelligence in Action

The Expander team’s machine learning system is now being used on massive graphs (containing billions of nodes and trillions of edges) to recognize and understand concepts in natural language, images, videos, and queries, powering Google products for applications like reminders, question answering, language translation, visual object recognition, dialogue understanding, and more.

We are excited that with the recent release of Allo, millions of chat users are now experiencing smart messaging technology powered by the Expander team’s system for understanding and assisting with chat conversations in multiple languages. Also, this technology isn’t used only for large-scale models in the cloud - as announced this past week, Android Wear has opened up an on-device Smart Reply capability for developers that will provide smart replies for any messaging application. We’re excited to tackle even more challenging Internet-scale problems with Expander in the years to come.

Acknowledgements

We wish to acknowledge the hard work of all the researchers, engineers, product managers, and leaders across Google who helped make this technology a success. In particular, we would like to highlight the efforts of Allan Heydon, Andrei Broder, Andrew Tomkins, Ariel Fuxman, Bo Pang, Dana Movshovitz-Attias, Fritz Obermeyer, Krishnamurthy Viswanathan, Patrick McGregor, Peter Young, Robin Dua, Sujith Ravi and Vivek Ramavajjala.

Chat Smarter with Allo



At Google, we are continuously building products powered by Machine Learning to delight our users and simplify their lives. Today, we are excited to talk about the technology behind Allo, a new smart messaging app that uses the power of neural networks and Google Search to make your text conversations easier and more productive.

Just like Smart Reply for Inbox, Allo understands the conversation history to generate a set of suggestions that the user will likely want to respond with. In addition to understanding the context of your conversation, Allo learns your individual style, so the responses are personalized for you.
How does it work?

About a year ago, we started exploring how we can make communication easier and more fun to use. The idea of Smart Reply for Allo came from my teammate Sushant Prakash who, along with Ori Gershony, led their teams to build this technology. We began by experimenting with neural network based model architectures which had proven to be successful for sequence prediction, including the encoder-decoder model used in Smart Reply for Inbox.

One challenge we faced was that response generation in online conversations have very strict latency requirements. To address this, Pavel Sountsov and Sushant came up with an innovative two-stage model that works as follows. First, a recurrent neural network looks at the conversation context one word at a time and encodes it in the hidden state of a long short term memory (LSTM). Below, we show an example with a context ‘Where are you?’. The context has three tokens, each of which is embedded into a continuous space and input to the LSTM. The LSTM state now encodes the context as a continuous vector. This vector is used to generate the response as a discretized semantic class.
Each semantic class is associated with a set of possible messages that belong to it. We use a second recurrent network to generate a specific message from that set. This network also converts the context into a hidden LSTM state but this time the hidden state is used to generate the full message of the reply one token at a time. For example, now the LSTM after seeing the context “Where are you?” generates the tokens in the response: “I’m at work”.
A beam search is used to efficiently select the top-N highest scoring responses from among the very large set of possible messages that a LSTM can generate. A snippet of the search space explored by such a beam-search technique is shown below.
As with any large-scale product, there were several engineering challenges we had to solve in generating a set of high-quality responses efficiently. For example, in spite of the two staged architecture, our first few networks were very slow and required about half a second to generate a response. This was obviously a deal breaker when we are talking about real time communication apps! So we had to evolve our neural network architecture further to reduce the latency to less than 200ms. We moved from using a softmax layer to a hierarchical softmax layer which traverses a tree of words instead of traversing a list of words thus making it more efficient.

Another interesting challenge we had to solve when generating predictions is controlling for message length. Sometimes none of the most probable responses are appropriate - if the model predicts too short a message, it might not be useful to the user, and if we predict something too long, it might not fit on the phone screen. We solved this by biasing the beam search to follow paths that lead to higher utility responses instead of favoring just the responses that are most probable. That way, we can efficiently generate appropriate length response predictions that are useful to our users.

Personalized for you

The best part about these suggestions is that over time they are personalized to you so that your individual style is reflected in your conversations. For example, if you often reply to “How are you?” with “Fine.” instead of “I am good.”, it will learn your preference and your future suggestions will take that into account. This was accomplished by incorporating a user's "style" as one of the features in a Neural Network that is used to predict the next word in a response, resulting in suggestions that are customized for your personality and individual preferences. The user's style is captured in a sequence of numbers that we call the user embedding. These embeddings can be generated as part of the regular model training, but this approach requires waiting for many days for training to be complete and it cannot handle more than a handful of millions of users. To solve this issue, Alon Shafrir implemented a L-BFGS based technique to generate user embeddings quickly and at scale. Now, you'll be able to enjoy personalized suggestions after only a short time of using Allo.

More than just English

The neural network model described above is language agnostic so building separate prediction models for each language works quite well. To make sure that responses for each language benefit from our semantic understanding of other languages, Sujith Ravi came up with a graph-based machine learning technique that can connect possible responses across languages. Dana Movshovitz-Attias and Peter Young applied this technique to build a graph that connects responses to incoming messages and to other responses that have similar word embeddings and syntactic relationships. It also connects responses with similar meaning across languages based on the machine translation models developed by our Translate team.

With this graph, we use semi-supervised learning, as described in this paper, to learn the semantic meaning of responses and determine which are the most useful clusters of possible responses. As a result, we can allow the LSTM to score many possible variants of each possible response meaning, allowing the personalization routines to select the best response for the user in the context of the conversation. This also helps enforce diversity as we can now pick the final set of responses from different semantic clusters.

Here’s an example of how the graph might look for a set of messages related to greetings:
Beyond Smart Reply

I am also very excited about the Google assistant in Allo with which you can converse and get information about anything that Google Search knows about. It understands your sentences and helps you accomplish tasks directly from the conversation. For example, the Google assistant can help you discover a restaurant and reserve a table from within the Allo app when chatting with your friends. This has been made possible because of the cutting-edge research in natural language understanding that we have been doing at Google. More details to follow soon!

These smart features will be part of the Android and iOS apps for Allo that will be available later this summer. We can’t wait for you to try and enjoy it!

We wish to acknowledge the hard work of the following in building Smart Reply:

Ryan Cassidy, Dave Citron, Ori Gershony, Max Gubin, Pranav Khaitan, Harini Krishnamurthy, Patrick McGregor, Dana Movshovitz-Attias, Sergey Nazarov, Hung Pham, Sushant Prakash, Vivek Ramavajjala, Sujith Ravi, Sunita Sarawagi, Alon Shafrir, Pavel Sountsov, Peter Young, Shu Zhang

Research at Google and ICLR 2016



This week, San Juan, Puerto Rico hosts the 4th International Conference on Learning Representations (ICLR 2016), a conference focused on how one can learn meaningful and useful representations of data for Machine Learning. ICLR includes conference and workshop tracks, with invited talks along with oral and poster presentations of some of the latest research on deep learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization.

At the forefront of innovation in cutting-edge technology in Neural Networks and Deep Learning, Google focuses on both theory and application, developing learning approaches to understand and generalize. As Platinum Sponsor of ICLR 2016, Google will have a strong presence with over 40 researchers attending (many from the Google Brain team and Google DeepMind), contributing to and learning from the broader academic research community by presenting papers and posters, in addition to participating on organizing committees and in workshops.

If you are attending ICLR 2016, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about our research being presented at ICLR 2016 in the list below (Googlers highlighted in blue).

Organizing Committee

Program Chairs
Samy Bengio, Brian Kingsbury

Area Chairs include:
John Platt, Tara Sanaith

Oral Sessions

Neural Programmer-Interpreters (Best Paper Award Recipient)
Scott Reed, Nando de Freitas

Net2Net: Accelerating Learning via Knowledge Transfer
Tianqi Chen, Ian Goodfellow, Jon Shlens

Conference Track Posters

Prioritized Experience Replay
Tom Schau, John Quan, Ioannis Antonoglou, David Silver

Reasoning about Entailment with Neural Attention
Tim Rocktäschel, Edward GrefenstetteKarl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

Neural Programmer: Inducing Latent Programs With Gradient Descent
Arvind Neelakantan, Quoc Le, Ilya Sutskever

MuProp: Unbiased Backpropagation For Stochastic Neural Networks
Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih

Multi-Task Sequence to Sequence Learning
Minh-Thang Luong, Quoc LeIlya Sutskever, Oriol Vinyals, Lukasz Kaiser

A Test of Relative Similarity for Model Selection in Generative Models
Eugene Belilovsky, Wacha Bounliphone, Matthew Blaschko, Ioannis Antonoglou, Arthur Gretton

Continuous control with deep reinforcement learning
Timothy Lillicrap, Jonathan HuntAlexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

Policy Distillation
Andrei Rusu, Sergio Gomez, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

Neural Random-Access Machines
Karol Kurach, Marcin Andrychowicz, Ilya Sutskever

Variable Rate Image Compression with Recurrent Neural Networks
George Toderici, Sean O'Malley, Damien Vincent, Sung Jin Hwang, Michele Covell, Shumeet Baluja, Rahul Sukthankar, David Minnen

Order Matters: Sequence to Sequence for Sets
Oriol Vinyals, Samy Bengio, Manjunath Kudlur

Grid Long Short-Term Memory
Nal Kalchbrenner, Alex Graves, Ivo Danihelka

Neural GPUs Learn Algorithms
Lukasz Kaiser, Ilya Sutskever

ACDC: A Structured Efficient Linear Layer
Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas

Workshop Track Posters

Revisiting Distributed Synchronous SGD
Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz

Black Box Variational Inference for State Space Models
Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, Liam Paninski

A Minimalistic Approach to Sum-Product Network Learning for Real Applications
Viktoriya Krakovna, Moshe Looks

Efficient Inference in Occlusion-Aware Generative Models of Images
Jonathan Huang, Kevin Murphy

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke

Deep Autoresolution Networks
Gabriel Pereyra, Christian Szegedy

Learning visual groups from co-occurrences in space and time
Phillip Isola, Daniel Zoran, Dilip Krishnan, Edward H. Adelson

Adding Gradient Noise Improves Learning For Very Deep Networks
Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

Adversarial Autoencoders
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow

Generating Sentences from a Continuous Space
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

DeepMind moves to TensorFlow



At DeepMind, we conduct state-of-the-art research on a wide range of algorithms, from deep learning and reinforcement learning to systems neuroscience, towards the goal of building Artificial General Intelligence. A key factor in facilitating rapid progress is the software environment used for research. For nearly four years, the open source Torch7 machine learning library has served as our primary research platform, combining excellent flexibility with very fast runtime execution, enabling rapid prototyping. Our team has been proud to contribute to the open source project in capacities ranging from occasional bug fixes to being core maintainers of several crucial components.

With Google’s recent open source release of TensorFlow, we initiated a project to test its suitability for our research environment. Over the last six months, we have re-implemented more than a dozen different projects in TensorFlow to develop a deeper understanding of its potential use cases and the tradeoffs for research. Today we are excited to announce that DeepMind will start using TensorFlow for all our future research. We believe that TensorFlow will enable us to execute our ambitious research goals at much larger scale and an even faster pace, providing us with a unique opportunity to further accelerate our research programme.

As one of the core contributors of Torch7, I have had the pleasure of working closely with an excellent community of developers and researchers, and it has been amazing to see all the great work that has been built on top of the platform and the impact this has had on the field. Torch7 is currently being used by Facebook, Twitter, and many start-ups and academic labs as well as DeepMind, and I’m proud of the significant contribution it has made to a large community in both research and industry. Our transition to TensorFlow represents a new chapter, and I feel very excited about the prospect of DeepMind contributing heavily to another great open source machine learning platform that everyone can use to advance the state-of-the-art.