Author Archives: Research Blog

Announcing Google Research, Europe



Google’s ongoing research in Machine Intelligence is what powers many of the products being used by hundreds of millions of people a day - from Translate to Photo Search to Smart Reply for Inbox. One of the things that enables these advances is the extensive collaboration between the Google researchers in our offices across the world, all contributing their unique knowledge and disseminating ideas in state-of-the-art Machine Learning (ML) technologies and techniques in order to develop useful tools and products.

Today, we’re excited to announce a dedicated Machine Learning research group in Europe, based in our Zurich office. Google Research, Europe, will foster an environment where software engineers and researchers specialising in ML will have the opportunity to develop products and conduct research right here in Europe, as part of the wider efforts at Google.
Zurich is already the home of Google’s largest engineering office outside the US, and is responsible for developing the engine that powers Knowledge Graph, as well as the conversation engine that powers the Google Assistant in Allo. In addition to continued collaboration with Google’s various research teams, Google Research, Europe will be focused on three key areas:
In pursuit of these areas, the team will actively research ways in which to improve ML infrastructure, broadly facilitating research for the community, and enabling it to be put to practical use. Furthermore, researchers in the Zurich office will be uniquely able to work closely with team linguists, advancing Natural Language Understanding in collaboration with Google Research groups across the world, all while enjoying Mountain Views of a different kind.
Europe is home to some of the world’s premier technical universities, making it an ideal place to build a top-notch research team. We look forward to collaborating with all the excellent Computer Science research that is coming from the region, and hope to contribute towards the wider academic community through our publications and academic support.

Quantum annealing with a digital twist



One of the key benefits of quantum computing is that it has the potential to solve some of the most complex problems in nature, from physics to chemistry to biology. For example, when attempting to calculate protein folding, or when exploring reaction catalysts and “designer” molecules, one can look at computational challenges as optimization problems, and represent the different configurations of a molecule as an energy landscape in a quantum computer. By letting the system cool, or “anneal”, one finds the lowest energy state in the landscape - the most stable form of the molecule. Thanks to the peculiarities of quantum mechanics, the correct answer simply drops out at the end of the quantum computation. In fact, many tough problems can be dealt with this way, this combination of simplicity and generality makes it appealing.

But finding the lowest energy state in a system is like being put in the Alps, and being told to find the lowest elevation - it’s easy to get stuck in a “local” valley, and not know that there is an even lower point elsewhere. Therefore, we use a different approach: We start with a very simple energy landscape - a flat meadow - and initialize the system of quantum bits (qubits) to represent the known lowest energy point, or “ground state”, in that landscape. We then begin to adjust the simple landscape towards one that represents the problem we are trying to solve - from the smooth meadow to the highly uneven terrain of the Alps. Here’s the fun part: if one evolves the landscape very slowly, the ground state of the qubits also evolves, so that they stay in the ground state of the changing system. This is called “adiabatic quantum computing”, and qubits exploit quantum tunneling to ensure they always find the lowest energy "valley" in the changing system.

While this is great in theory, getting this to work in practice is challenging, as you have to set up the energy landscape using the available qubit interactions. Ideally you’d have multiple interactions going on between all of the qubits, but for a large-scale solver the requirements to accurately keep track of these interactions become enormous. Realistically, the connectivity has to be reduced, but this presents a major limitation for the computational possibilities.

In "Digitized adiabatic quantum computing with a superconducting circuit", published in Nature, we’ve overcome this obstacle by giving quantum annealing a digital twist. With a limited connectivity between qubits you can still construct any of the desired interactions: Whether the interaction is ferromagnetic (the quantum bits prefer an aligned) or antiferromagnetic (anti-aligned orientation), or even defined along an arbitrary different direction, you can make it happen using easy to combine discrete building blocks. In this case, the blocks we use are the logic gates that we've been developing with our superconducting architecture.
Superconducting quantum chip with nine qubits. Each qubit (cross-shaped structures in the center) is connected to its neighbors and individually controlled. Photo credit: Julian Kelly.
The key is controllability. Qubits, like other physical objects in nature, have a resonance frequency, and can be addressed individually with short voltage and current pulses. In our architecture we can steer this frequency, much like you would tune a radio to a broadcast. We can even tune one qubit to the frequency of another one. By moving qubit frequencies to or away from each other, interactions can be turned on or off. The exchange of quantum information resembles a relay race, where the baton can be handed down when the runners meet.

You can see the algorithm in action below. Any problem is encoded as local “directions” we want qubits to point to - like a weathervane pointing into the wind - and interactions, depicted here as links between the balls. We start by aligning all qubits into the same direction, and the interactions between the qubits turned off - this is the simplest ground state of the system. Next, we turn on interactions and change qubit directions to start evolving towards the energy landscape we wish to solve. The algorithmic steps are implemented with many control pulses, illustrating how the problem gets solved in a giant dance of quantum entanglement.
Top: Depiction of the problem, with the gold arrows in the blue balls representing the directions we’d like each qubit to align to, like a weathervane pointing to the wind. The thickness of the link between the balls indicates the strength of the interaction - red denotes a ferromagnetic link, and blue an antiferromagnetic link. Middle: Implementation with qubits (yellow crosses) with control pulses (red) and steering the frequency (vertical direction). Qubits turn blue when there is interaction. The qubits turn green when they are being measured. Bottom: Zoom in of the physical device, showing the corresponding nine qubits (cross-shaped).
To run the adiabatic quantum computation efficiently and design a set of test experiments we teamed up with the QUTIS group at the University of the Basque Country in Bilbao, Spain, led by Prof. E. Solano and Dr. L. Lamata, who are experts in synthesizing digital algorithms. It’s the largest digital algorithm to date, with up to nine qubits and using over one thousand logic gates.

The crucial advantage for the future is that this digital implementation is fully compatible with known quantum error correction techniques, and can therefore be protected from the effects of noise. Otherwise, the noise will set a hard limit, as even the slightest amount can derail the state from following the fragile path to the solution. Since each quantum bit and interaction element can add noise to the system, some of the most important problems are well beyond reach, as they have many degrees of freedom and need a high connectivity. But with error correction, this approach becomes a general-purpose algorithm which can be scaled to an arbitrarily large quantum computer.

Motion Stills – Create beautiful GIFs from Live Photos



Today we are releasing Motion Stills, an iOS app from Google Research that acts as a virtual camera operator for your Apple Live Photos. We use our video stabilization technology to freeze the background into a still photo or create sweeping cinematic pans. The resulting looping GIFs and movies come alive, and can easily be shared via messaging or on social media.
With Motion Stills, we provide an immersive stream experience that makes your clips fun to watch and share. You can also tell stories of your adventures by combining multiple clips into a movie montage. All of this works right on your phone, no Internet connection needed.
A Live Photo before and after stabilization with Motion Stills
How does it work?
We pioneered this technology by stabilizing hundreds of millions of videos and creating GIF animations from photo bursts. Our algorithm uses linear programming to compute a virtual camera path that is optimized to recast videos and bursts as if they were filmed using stabilization equipment, yielding a still background or creating cinematic pans to remove shakiness.

Our challenge was to take technology designed to run distributed in a data center and shrink it down to run even faster on your mobile phone. We achieved a 40x speedup by using techniques such as temporal subsampling, decoupling of motion parameters, and using Google Research’s custom linear solver, GLOP. We obtain further speedup and conserve storage by computing low-resolution warp textures to perform real-time GPU rendering, just like in a videogame.
Making it loop
Short videos are perfect for creating loops, so we added loop optimization to bring out the best in your captures. Our approach identifies optimal start and end points, and also discards blurry frames. As an added benefit, this fixes “pocket shots” (footage of the phone being put back into the pocket).

To keep the background steady while looping, Motion Stills has to separate the background from the rest of the scene. This is a difficult task when foreground elements occlude significant portions of the video, as in the example below. Our novel method classifies motion vectors into foreground (red) and background (green) in a temporally consistent manner. We use a cascade of motion models, moving our motion estimation from simple to more complex models and biasing our results along the way.
Left: Original with virtual camera path (red rectangle) and motion classification; foreground(red) vs. background(green) Right: Motion Stills result
Try it out
We’re excited to see what you can create with this app. From fun family moments to exciting adventures with friends, try it out and let us know what you think. Motion Stills is an on-device experience with no sign-in: even if you’re on top of a glacier without signal, you can see your results immediately. You can show us your favorite clips by using #motionstill on social media.

This app is a way for us to experiment and iterate quickly on the technology needed for short video creation. Based on the feedback we receive, we hope to integrate this feature into existing products like Google Photos.

Motion Stills is available on the App Store.

"Aw, so cute!": Allo helps you respond to shared photos



Today, Google announced Allo — our new mobile messaging app. From day one of the Allo development effort, we set out to build a truly special product that is powered by Google’s strengths in machine intelligence to make messaging easier, more efficient, and more expressive. Photo Reply is a unique feature of Allo that just does that! We use machine learning to understand what a shared photo depicts and to suggest rich natural language replies that the user can tap to send. This makes it easier for users to sustain meaningful conversations while using small mobile keyboards.

Here is an example of the responses that Allo suggests when a friend shares a photo of his child.
Photo Reply — Under the Hood

During the winter, our product managers, Patrick McGregor and Ryan Cassidy, challenged us to develop new approaches to simplify media sharing in messaging while simultaneously delighting users with Google insights. With my colleagues Vivek Ramavajjala, Sergey Nazarov, and Sujith Ravi, we set out to build Photo Reply.

We utilize Google's image recognition technology, developed by our Machine Perception team, to associate images with semantic entities — people, animals, cars, etc. We then apply a machine learned model that maps those recognized entities to actual natural language responses. Our system produces replies for thousands of entity types that are drawn from a taxonomy that is a subset of Google's Knowledge Graph and may be at different granularity levels. For example, when you receive a photo of a dog, the system may detect that the dog is actually a labrador and suggest "Love that lab!". Or given a photo of a pasta dish, it may detect the type of pasta ("Yum linguine!") and even the cuisine ("I love Italian food!").
Examples of response suggestions reflecting fine-grained object classes
One aspect of the system that we find very useful is that it can suggest responses not just for physical objects but also for abstract concepts. It can produce suggestions for events (birthday parties, weddings, etc.), nature (sunrises, mountains, etc.), recreational activities (hiking, camping, etc.), and many more categories. Also, the system can generate responses that reflect the emotions that might be associated with an image, such as “happiness”. Here are some examples of responses for abstract concepts:
Response suggestions reflecting abstract concepts
Learning entity-response associations

At runtime, Photo Reply recognizes entities in the shared photo and triggers responses for the entities. The model that maps entities to natural language responses is learned offline using Expander, which is a large-scale graph-based semi-supervised learning platform at Google. We built a massive a graph where nodes correspond to photos, semantic entities, and textual responses. Edges in the graph indicate when an entity was recognized for a photo, when a specific response was given for a photo, and visual similarities between photos. Some of the nodes are "labeled" and we learn associations for the unlabeled nodes by propagating label information across the graph.

To illustrate this, consider the graph below. There are two labels: the red label corresponds to the response "yummy" and the blue label corresponds to "delicious". The nodes for "spaghetti" and "linguine" are unlabeled, but from the fact that they are close to the red and blue nodes, the algorithm can learn that they should be associated to the "yummy" and "delicious" responses. Notice that in this way, we are associating the entity "linguine" to the response "yummy" even though none of the linguine photos in the graph are directly connected to this answer. Expander can perform this kind of learning at very large scale, for graphs containing billions of nodes and hundred of billions of edges.
Graph of entities, photos, and responses
Photo Reply is an exciting example of multimodal learning, where computer vision and natural language processing come together in order to create a compelling user experience. Allo will be available on Android and iOS later this summer. Be sure to check out what Allo sees in your beautiful photos!

Chat Smarter with Allo



At Google, we are continuously building products powered by Machine Learning to delight our users and simplify their lives. Today, we are excited to talk about the technology behind Allo, a new smart messaging app that uses the power of neural networks and Google Search to make your text conversations easier and more productive.

Just like Smart Reply for Inbox, Allo understands the conversation history to generate a set of suggestions that the user will likely want to respond with. In addition to understanding the context of your conversation, Allo learns your individual style, so the responses are personalized for you.
How does it work?

About a year ago, we started exploring how we can make communication easier and more fun to use. The idea of Smart Reply for Allo came from my teammate Sushant Prakash who, along with Ori Gershony, led their teams to build this technology. We began by experimenting with neural network based model architectures which had proven to be successful for sequence prediction, including the encoder-decoder model used in Smart Reply for Inbox.

One challenge we faced was that response generation in online conversations have very strict latency requirements. To address this, Pavel Sountsov and Sushant came up with an innovative two-stage model that works as follows. First, a recurrent neural network looks at the conversation context one word at a time and encodes it in the hidden state of a long short term memory (LSTM). Below, we show an example with a context ‘Where are you?’. The context has three tokens, each of which is embedded into a continuous space and input to the LSTM. The LSTM state now encodes the context as a continuous vector. This vector is used to generate the response as a discretized semantic class.
Each semantic class is associated with a set of possible messages that belong to it. We use a second recurrent network to generate a specific message from that set. This network also converts the context into a hidden LSTM state but this time the hidden state is used to generate the full message of the reply one token at a time. For example, now the LSTM after seeing the context “Where are you?” generates the tokens in the response: “I’m at work”.
A beam search is used to efficiently select the top-N highest scoring responses from among the very large set of possible messages that a LSTM can generate. A snippet of the search space explored by such a beam-search technique is shown below.
As with any large-scale product, there were several engineering challenges we had to solve in generating a set of high-quality responses efficiently. For example, in spite of the two staged architecture, our first few networks were very slow and required about half a second to generate a response. This was obviously a deal breaker when we are talking about real time communication apps! So we had to evolve our neural network architecture further to reduce the latency to less than 200ms. We moved from using a softmax layer to a hierarchical softmax layer which traverses a tree of words instead of traversing a list of words thus making it more efficient.

Another interesting challenge we had to solve when generating predictions is controlling for message length. Sometimes none of the most probable responses are appropriate - if the model predicts too short a message, it might not be useful to the user, and if we predict something too long, it might not fit on the phone screen. We solved this by biasing the beam search to follow paths that lead to higher utility responses instead of favoring just the responses that are most probable. That way, we can efficiently generate appropriate length response predictions that are useful to our users.

Personalized for you

The best part about these suggestions is that over time they are personalized to you so that your individual style is reflected in your conversations. For example, if you often reply to “How are you?” with “Fine.” instead of “I am good.”, it will learn your preference and your future suggestions will take that into account. This was accomplished by incorporating a user's "style" as one of the features in a Neural Network that is used to predict the next word in a response, resulting in suggestions that are customized for your personality and individual preferences. The user's style is captured in a sequence of numbers that we call the user embedding. These embeddings can be generated as part of the regular model training, but this approach requires waiting for many days for training to be complete and it cannot handle more than a handful of millions of users. To solve this issue, Alon Shafrir implemented a L-BFGS based technique to generate user embeddings quickly and at scale. Now, you'll be able to enjoy personalized suggestions after only a short time of using Allo.

More than just English

The neural network model described above is language agnostic so building separate prediction models for each language works quite well. To make sure that responses for each language benefit from our semantic understanding of other languages, Sujith Ravi came up with a graph-based machine learning technique that can connect possible responses across languages. Dana Movshovitz-Attias and Peter Young applied this technique to build a graph that connects responses to incoming messages and to other responses that have similar word embeddings and syntactic relationships. It also connects responses with similar meaning across languages based on the machine translation models developed by our Translate team.

With this graph, we use semi-supervised learning, as described in this paper, to learn the semantic meaning of responses and determine which are the most useful clusters of possible responses. As a result, we can allow the LSTM to score many possible variants of each possible response meaning, allowing the personalization routines to select the best response for the user in the context of the conversation. This also helps enforce diversity as we can now pick the final set of responses from different semantic clusters.

Here’s an example of how the graph might look for a set of messages related to greetings:
Beyond Smart Reply

I am also very excited about the Google assistant in Allo with which you can converse and get information about anything that Google Search knows about. It understands your sentences and helps you accomplish tasks directly from the conversation. For example, the Google assistant can help you discover a restaurant and reserve a table from within the Allo app when chatting with your friends. This has been made possible because of the cutting-edge research in natural language understanding that we have been doing at Google. More details to follow soon!

These smart features will be part of the Android and iOS apps for Allo that will be available later this summer. We can’t wait for you to try and enjoy it!

We wish to acknowledge the hard work of the following in building Smart Reply:

Ryan Cassidy, Dave Citron, Ori Gershony, Max Gubin, Pranav Khaitan, Harini Krishnamurthy, Patrick McGregor, Dana Movshovitz-Attias, Sergey Nazarov, Hung Pham, Sushant Prakash, Vivek Ramavajjala, Sujith Ravi, Sunita Sarawagi, Alon Shafrir, Pavel Sountsov, Peter Young, Shu Zhang

Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source



At Google, we spend a lot of time thinking about how computer systems can read and understand human language in order to process it in intelligent ways. Today, we are excited to share the fruits of our research with the broader community by releasing SyntaxNet, an open-source neural network framework implemented in TensorFlow that provides a foundation for Natural Language Understanding (NLU) systems. Our release includes all the code needed to train new SyntaxNet models on your own data, as well as Parsey McParseface, an English parser that we have trained for you and that you can use to analyze English text.

Parsey McParseface is built on powerful machine learning algorithms that learn to analyze the linguistic structure of language, and that can explain the functional role of each word in a given sentence. Because Parsey McParseface is the most accurate such model in the world, we hope that it will be useful to developers and researchers interested in automatic extraction of information, translation, and other core applications of NLU.

How does SyntaxNet work?

SyntaxNet is a framework for what’s known in academic circles as a syntactic parser, which is a key first component in many NLU systems. Given a sentence as input, it tags each word with a part-of-speech (POS) tag that describes the word's syntactic function, and it determines the syntactic relationships between words in the sentence, represented in the dependency parse tree. These syntactic relationships are directly related to the underlying meaning of the sentence in question. To take a very simple example, consider the following dependency tree for Alice saw Bob:


This structure encodes that Alice and Bob are nouns and saw is a verb. The main verb saw is the root of the sentence and Alice is the subject (nsubj) of saw, while Bob is its direct object (dobj). As expected, Parsey McParseface analyzes this sentence correctly, but also understands the following more complex example:


This structure again encodes the fact that Alice and Bob are the subject and object respectively of saw, in addition that Alice is modified by a relative clause with the verb reading, that saw is modified by the temporal modifier yesterday, and so on. The grammatical relationships encoded in dependency structures allow us to easily recover the answers to various questions, for example whom did Alice see?, who saw Bob?, what had Alice been reading about? or when did Alice see Bob?.

Why is Parsing So Hard For Computers to Get Right?

One of the main problems that makes parsing so challenging is that human languages show remarkable levels of ambiguity. It is not uncommon for moderate length sentences - say 20 or 30 words in length - to have hundreds, thousands, or even tens of thousands of possible syntactic structures. A natural language parser must somehow search through all of these alternatives, and find the most plausible structure given the context. As a very simple example, the sentence Alice drove down the street in her car has at least two possible dependency parses:


The first corresponds to the (correct) interpretation where Alice is driving in her car; the second corresponds to the (absurd, but possible) interpretation where the street is located in her car. The ambiguity arises because the preposition in can either modify drove or street; this example is an instance of what is called prepositional phrase attachment ambiguity.

Humans do a remarkable job of dealing with ambiguity, almost to the point where the problem is unnoticeable; the challenge is for computers to do the same. Multiple ambiguities such as these in longer sentences conspire to give a combinatorial explosion in the number of possible structures for a sentence. Usually the vast majority of these structures are wildly implausible, but are nevertheless possible and must be somehow discarded by a parser.

SyntaxNet applies neural networks to the ambiguity problem. An input sentence is processed from left to right, with dependencies between words being incrementally added as each word in the sentence is considered. At each point in processing many decisions may be possible—due to ambiguity—and a neural network gives scores for competing decisions based on their plausibility. For this reason, it is very important to use beam search in the model. Instead of simply taking the first-best decision at each point, multiple partial hypotheses are kept at each step, with hypotheses only being discarded when there are several other higher-ranked hypotheses under consideration. An example of a left-to-right sequence of decisions that produces a simple parse is shown below for the sentence I booked a ticket to Google.
Furthermore, as described in our paper, it is critical to tightly integrate learning and search in order to achieve the highest prediction accuracy. Parsey McParseface and other SyntaxNet models are some of the most complex networks that we have trained with the TensorFlow framework at Google. Given some data from the Google supported Universal Treebanks project, you can train a parsing model on your own machine.

So How Accurate is Parsey McParseface?

On a standard benchmark consisting of randomly drawn English newswire sentences (the 20 year old Penn Treebank), Parsey McParseface recovers individual dependencies between words with over 94% accuracy, beating our own previous state-of-the-art results, which were already better than any previous approach. While there are no explicit studies in the literature about human performance, we know from our in-house annotation projects that linguists trained for this task agree in 96-97% of the cases. This suggests that we are approaching human performance—but only on well-formed text. Sentences drawn from the web are a lot harder to analyze, as we learned from the Google WebTreebank (released in 2011). Parsey McParseface achieves just over 90% of parse accuracy on this dataset.

While the accuracy is not perfect, it’s certainly high enough to be useful in many applications. The major source of errors at this point are examples such as the prepositional phrase attachment ambiguity described above, which require real world knowledge (e.g. that a street is not likely to be located in a car) and deep contextual reasoning. Machine learning (and in particular, neural networks) have made significant progress in resolving these ambiguities. But our work is still cut out for us: we would like to develop methods that can learn world knowledge and enable equal understanding of natural language across all languages and contexts.

To get started, see the SyntaxNet code and download the Parsey McParseface parser model. Happy parsing from the main developers, Chris Alberti, David Weiss, Daniel Andor, Michael Collins & Slav Petrov.

Research at Google and ICLR 2016



This week, San Juan, Puerto Rico hosts the 4th International Conference on Learning Representations (ICLR 2016), a conference focused on how one can learn meaningful and useful representations of data for Machine Learning. ICLR includes conference and workshop tracks, with invited talks along with oral and poster presentations of some of the latest research on deep learning, metric learning, kernel learning, compositional models, non-linear structured prediction, and issues regarding non-convex optimization.

At the forefront of innovation in cutting-edge technology in Neural Networks and Deep Learning, Google focuses on both theory and application, developing learning approaches to understand and generalize. As Platinum Sponsor of ICLR 2016, Google will have a strong presence with over 40 researchers attending (many from the Google Brain team and Google DeepMind), contributing to and learning from the broader academic research community by presenting papers and posters, in addition to participating on organizing committees and in workshops.

If you are attending ICLR 2016, we hope you’ll stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for billions of people. You can also learn more about our research being presented at ICLR 2016 in the list below (Googlers highlighted in blue).

Organizing Committee

Program Chairs
Samy Bengio, Brian Kingsbury

Area Chairs include:
John Platt, Tara Sanaith

Oral Sessions

Neural Programmer-Interpreters (Best Paper Award Recipient)
Scott Reed, Nando de Freitas

Net2Net: Accelerating Learning via Knowledge Transfer
Tianqi Chen, Ian Goodfellow, Jon Shlens

Conference Track Posters

Prioritized Experience Replay
Tom Schau, John Quan, Ioannis Antonoglou, David Silver

Reasoning about Entailment with Neural Attention
Tim Rocktäschel, Edward GrefenstetteKarl Moritz Hermann, Tomáš Kočiský, Phil Blunsom

Neural Programmer: Inducing Latent Programs With Gradient Descent
Arvind Neelakantan, Quoc Le, Ilya Sutskever

MuProp: Unbiased Backpropagation For Stochastic Neural Networks
Shixiang Gu, Sergey Levine, Ilya Sutskever, Andriy Mnih

Multi-Task Sequence to Sequence Learning
Minh-Thang Luong, Quoc LeIlya Sutskever, Oriol Vinyals, Lukasz Kaiser

A Test of Relative Similarity for Model Selection in Generative Models
Eugene Belilovsky, Wacha Bounliphone, Matthew Blaschko, Ioannis Antonoglou, Arthur Gretton

Continuous control with deep reinforcement learning
Timothy Lillicrap, Jonathan HuntAlexander Pritzel, Nicolas Heess, Tom Erez, Yuval Tassa, David Silver, Daan Wierstra

Policy Distillation
Andrei Rusu, Sergio Gomez, Caglar Gulcehre, Guillaume Desjardins, James Kirkpatrick, Razvan Pascanu, Volodymyr Mnih, Koray Kavukcuoglu, Raia Hadsell

Neural Random-Access Machines
Karol Kurach, Marcin Andrychowicz, Ilya Sutskever

Variable Rate Image Compression with Recurrent Neural Networks
George Toderici, Sean O'Malley, Damien Vincent, Sung Jin Hwang, Michele Covell, Shumeet Baluja, Rahul Sukthankar, David Minnen

Order Matters: Sequence to Sequence for Sets
Oriol Vinyals, Samy Bengio, Manjunath Kudlur

Grid Long Short-Term Memory
Nal Kalchbrenner, Alex Graves, Ivo Danihelka

Neural GPUs Learn Algorithms
Lukasz Kaiser, Ilya Sutskever

ACDC: A Structured Efficient Linear Layer
Marcin Moczulski, Misha Denil, Jeremy Appleyard, Nando de Freitas

Workshop Track Posters

Revisiting Distributed Synchronous SGD
Jianmin Chen, Rajat Monga, Samy Bengio, Rafal Jozefowicz

Black Box Variational Inference for State Space Models
Evan Archer, Il Memming Park, Lars Buesing, John Cunningham, Liam Paninski

A Minimalistic Approach to Sum-Product Network Learning for Real Applications
Viktoriya Krakovna, Moshe Looks

Efficient Inference in Occlusion-Aware Generative Models of Images
Jonathan Huang, Kevin Murphy

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke

Deep Autoresolution Networks
Gabriel Pereyra, Christian Szegedy

Learning visual groups from co-occurrences in space and time
Phillip Isola, Daniel Zoran, Dilip Krishnan, Edward H. Adelson

Adding Gradient Noise Improves Learning For Very Deep Networks
Arvind Neelakantan, Luke Vilnis, Quoc V. Le, Ilya Sutskever, Lukasz Kaiser, Karol Kurach, James Martens

Adversarial Autoencoders
Alireza Makhzani, Jonathon Shlens, Navdeep Jaitly, Ian Goodfellow

Generating Sentences from a Continuous Space
Samuel R. Bowman, Luke Vilnis, Oriol Vinyals, Andrew M. Dai, Rafal Jozefowicz, Samy Bengio

DeepMind moves to TensorFlow



At DeepMind, we conduct state-of-the-art research on a wide range of algorithms, from deep learning and reinforcement learning to systems neuroscience, towards the goal of building Artificial General Intelligence. A key factor in facilitating rapid progress is the software environment used for research. For nearly four years, the open source Torch7 machine learning library has served as our primary research platform, combining excellent flexibility with very fast runtime execution, enabling rapid prototyping. Our team has been proud to contribute to the open source project in capacities ranging from occasional bug fixes to being core maintainers of several crucial components.

With Google’s recent open source release of TensorFlow, we initiated a project to test its suitability for our research environment. Over the last six months, we have re-implemented more than a dozen different projects in TensorFlow to develop a deeper understanding of its potential use cases and the tradeoffs for research. Today we are excited to announce that DeepMind will start using TensorFlow for all our future research. We believe that TensorFlow will enable us to execute our ambitious research goals at much larger scale and an even faster pace, providing us with a unique opportunity to further accelerate our research programme.

As one of the core contributors of Torch7, I have had the pleasure of working closely with an excellent community of developers and researchers, and it has been amazing to see all the great work that has been built on top of the platform and the impact this has had on the field. Torch7 is currently being used by Facebook, Twitter, and many start-ups and academic labs as well as DeepMind, and I’m proud of the significant contribution it has made to a large community in both research and industry. Our transition to TensorFlow represents a new chapter, and I feel very excited about the prospect of DeepMind contributing heavily to another great open source machine learning platform that everyone can use to advance the state-of-the-art.

Computer Science Education for All Students



(Cross-posted on the Google for Education Blog)

Computer science education is a pathway to innovation, to creativity, and to exciting career prospects. No longer considered an optional skill, CS is quickly becoming a “new basic”, foundational for learning. In order for our students to be equipped for the world of tomorrow, we need to provide them with access to computer science education today.

At Google, we believe that all students deserve these opportunities. Today we join some of America’s leading companies, governors, and educators to support an open letter to Congress, asking for funding to provide every student in every school the opportunity to learn computer science. Google has long been committed to developing programs, resources, tools and community partnerships that make computer science engaging and accessible for all students.

We are strengthening that commitment today by announcing an additional investment of $10 million towards computer science education for 2017, along with the $23.5 million that we have allocated for 2016. This funding will allow us to build more resources, scale our programs, and provide additional support to our partners, with a goal of reaching an additional 5 million students.

With Congress’ help, we can ensure that every child has access to computer science education. Please join us by signing our online petition at www.change.org/computerscience.

Helping webmasters re-secure their sites



Every week, over 10 million users encounter harmful websites that deliver malware and scams. Many of these sites are compromised personal blogs or small business pages that have fallen victim due to a weak password or outdated software. Safe Browsing and Google Search protect visitors from dangerous content by displaying browser warnings and labeling search results with ‘this site may harm your computer’. While this helps keep users safe in the moment, the compromised site remains a problem that needs to be fixed.

Unfortunately, many webmasters for compromised sites are unaware anything is amiss. Worse yet, even when they learn of an incident, they may lack the security expertise to take action and address the root cause of compromise. Quoting one webmaster from a survey we conducted, “our daily and weekly backups were both infected” and even after seeking the help of a specialist, after “lots of wasted hours/days” the webmaster abandoned all attempts to restore the site and instead refocused his efforts on “rebuilding the site from scratch”.

In order to find the best way to help webmasters clean-up from compromise, we recently teamed up with the University of California, Berkeley to explore how to quickly contact webmasters and expedite recovery while minimizing the distress involved. We’ve summarized our key lessons below. The full study, which you can read here, was recently presented at the International World Wide Web Conference.

When Google works directly with webmasters during critical moments like security breaches, we can help 75% of webmasters re-secure their content. The whole process takes a median of 3 days. This is a better experience for webmasters and their audience.

How many sites get compromised?
Number of freshly compromised sites Google detects every week.
Over the last year Google detected nearly 800,000 compromised websites—roughly 16,500 new sites every week from around the globe. Visitors to these sites are exposed to low-quality scam content and malware via drive-by downloads. While browser and search warnings help protect visitors from harm, these warnings can at times feel punitive to webmasters who learn only after-the-fact that their site was compromised. To balance the safety of our users with the experience of webmasters, we set out to find the best approach to help webmasters recover from security breaches and ultimately reconnect websites with their audience.

Finding the most effective ways to aid webmasters
  1. Getting in touch with webmasters: One of the hardest steps on the road to recovery is first getting in contact with webmasters. We tried three notification channels: email, browser warnings, and search warnings. For webmasters who proactively registered their site with Search Console, we found that email communication led to 75% of webmasters re-securing their pages. When we didn’t know a webmaster’s email address, browser warnings and search warnings helped 54% and 43% of sites clean up respectively.
  2. Providing tips on cleaning up harmful content: Attackers rely on hidden files, easy-to-miss redirects, and remote inclusions to serve scams and malware. This makes clean-up increasingly tricky. When we emailed webmasters, we included tips and samples of exactly which pages contained harmful content. This, combined with expedited notification, helped webmasters clean up 62% faster compared to no tips—usually within 3 days.
  3. Making sure sites stay clean: Once a site is no longer serving harmful content, it’s important to make sure attackers don’t reassert control. We monitored recently cleaned websites and found 12% were compromised again in 30 days. This illustrates the challenge involved in identifying the root cause of a breach versus dealing with the side-effects.
Making security issues less painful for webmasters—and everyone

We hope that webmasters never have to deal with a security incident. If you are a webmaster, there are some quick steps you can take to reduce your risk. We’ve made it easier to receive security notifications through Google Analytics as well as through Search Console. Make sure to register for both services. Also, we have laid out helpful tips for updating your site’s software and adding additional authentication that will make your site safer.

If you’re a hosting provider or building a service that needs to notify victims of compromise, understand that the entire process is distressing for users. Establish a reliable communication channel before a security incident occurs, make sure to provide victims with clear recovery steps, and promptly reply to inquiries so the process feels helpful, not punitive.

As we work to make the web a safer place, we think it’s critical to empower webmasters and users to make good security decisions. It’s easy for the security community to be pessimistic about incident response being ‘too complex’ for victims, but as our findings demonstrate, even just starting a dialogue can significantly expedite recovery.