Author Archives: Research Blog

Meet Parsey’s Cousins: Syntax for 40 languages, plus new SyntaxNet capabilities



Just in time for ACL 2016, we are pleased to announce that Parsey McParseface, released in May as part of SyntaxNet and the basis for the Cloud Natural Language API, now has 40 cousins! Parsey’s Cousins is a collection of pretrained syntactic models for 40 languages, capable of analyzing the native language of more than half of the world’s population at often unprecedented accuracy. To better address the linguistic phenomena occurring in these languages we have endowed SyntaxNet with new abilities for Text Segmentation and Morphological Analysis.

When we released Parsey, we were already planning to expand to more languages, and it soon became clear that this was both urgent and important, because researchers were having trouble creating top notch SyntaxNet models for other languages.

The reason for that is a little bit subtle. SyntaxNet, like other TensorFlow models, has a lot of knobs to turn, which affect accuracy and speed. These knobs are called hyperparameters, and control things like the learning rate and its decay, momentum, and random initialization. Because neural networks are more sensitive to the choice of these hyperparameters than many other machine learning algorithms, picking the right hyperparameter setting is very important. Unfortunately there is no tested and proven way of doing this and picking good hyperparameters is mostly an empirical science -- we try a bunch of settings and see what works best.

An additional challenge is that training these models can take a long time, several days on very fast hardware. Our solution is to train many models in parallel via MapReduce, and when one looks promising, train a bunch more models with similar settings to fine-tune the results. This can really add up -- on average, we train more than 70 models per language. The plot below shows how the accuracy varies depending on the hyperparameters as training progresses. The best models are up to 4% absolute more accurate than ones trained without hyperparameter tuning.
Held-out set accuracy for various English parsing models with different hyperparameters (each line corresponds to one training run with specific hyperparameters). In some cases training is a lot slower and in many cases a suboptimal choice of hyperparameters leads to significantly lower accuracy. We are releasing the best model that we were able to train for each language.
In order to do a good job at analyzing the grammar of other languages, it was not sufficient to just fine-tune our English setup. We also had to expand the capabilities of SyntaxNet. The first extension is a model for text segmentation, which is the task of identifying word boundaries. In languages like English, this isn’t very hard -- you can mostly look for spaces and punctuation. In Chinese, however, this can be very challenging, because words are not separated by spaces. To correctly analyze dependencies between Chinese words, SyntaxNet needs to understand text segmentation -- and now it does.
Analysis of a Chinese string into a parse tree showing dependency labels, word tokens, and parts of speech (read top to bottom for each word token).
The second extension is a model for morphological analysis. Morphology is a language feature that is poorly represented in English. It describes inflection: i.e., how the grammatical function and meaning of the word changes as its spelling changes. In English, we add an -s to a word to indicate plurality. In Russian, a heavily inflected language, morphology can indicate number, gender, whether the word is the subject or object of a sentence, possessives, prepositional phrases, and more. To understand the syntax of a sentence in Russian, SyntaxNet needs to understand morphology -- and now it does.
Parse trees showing dependency labels, parts of speech, and morphology.
As you might have noticed, the parse trees for all of the sentences above look very similar. This is because we follow the content-head principle, under which dependencies are drawn between content words, with function words becoming leaves in the parse tree. This idea was developed by the Universal Dependencies project in order to increase parallelism between languages. Parsey’s Cousins are trained on treebanks provided by this project and are designed to be cross-linguistically consistent and thus easier to use in multi-lingual language understanding applications.

Using the same set of labels across languages can help us understand how sentences in different languages, or variations in the same language, convey the same meaning. In all of the above examples, the root indicates the main verb of the sentence and there is a passive nominal subject (indicated by the arc labeled with ‘nsubjpass’) and a passive auxiliary (‘auxpass’). If you look closely, you will also notice some differences because the grammar of each language differs. For example, English uses the preposition ‘by,’ where Russian uses morphology to mark that the phrase ‘the publisher (издателем)’ is in instrumental case -- the meaning is the same, it is just expressed differently.

Google has been involved in the Universal Dependencies project since its inception and we are very excited to be able to bring together our efforts on datasets and modeling. We hope that this release will facilitate research progress in building computer systems that can understand all of the world’s languages.

Parsey's Cousins can be found on GitHub, along with Parsey McParseface and SyntaxNet.

ACL 2016 & Research at Google



This week, Berlin hosts the 2016 Annual Meeting of the Association for Computational Linguistics (ACL 2016), the premier conference of the field of computational linguistics, covering a broad spectrum of diverse research areas that are concerned with computational approaches to natural language. As a leader in Natural Language Processing (NLP) and a Platinum Sponsor of the conference, Google will be on hand to showcase research interests that include syntax, semantics, discourse, conversation, multilingual modeling, sentiment analysis, question answering, summarization, and generally building better learners using labeled and unlabeled data, state-of-the-art modeling, and learning from indirect supervision.

Our systems are used in numerous ways across Google, impacting user experience in search, mobile, apps, ads, translate and more. Our work spans the range of traditional NLP tasks, with general-purpose syntax and semantic algorithms underpinning more specialized systems.
Our researchers are experts in natural language processing and machine learning, and combine methodological research with applied science, and our engineers are equally involved in long-term research efforts and driving immediate applications of our technology.

If you’re attending ACL 2016, we hope that you’ll stop by the booth to check out some demos, meet our researchers and discuss projects and opportunities at Google that go into solving interesting problems for billions of people. Learn more about Google research being presented at ACL 2016 below (Googlers highlighted in blue), and visit the Natural Language Understanding Team page at g.co/NLUTeam.

Papers
Generalized Transition-based Dependency Parsing via Control Parameters
Bernd Bohnet, Ryan McDonald, Emily Pitler, Ji Ma

Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning
Yulia Tsvetkov, Manaal Faruqui, Wang Ling (Google DeepMind), Chris Dyer (Google DeepMind)

Morpho-syntactic Lexicon Generation Using Graph-based Semi-supervised Learning (TACL)
Manaal Faruqui, Ryan McDonald, Radu Soricut

Many Languages, One Parser (TACL)
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer (Google DeepMind)*, Noah A. Smith

Latent Predictor Networks for Code Generation
Wang Ling (Google DeepMind), Phil Blunsom (Google DeepMind), Edward Grefenstette (Google DeepMind), Karl Moritz Hermann (Google DeepMind), Tomáš Kočiský (Google DeepMind), Fumin Wang (Google DeepMind), Andrew Senior (Google DeepMind)

Collective Entity Resolution with Multi-Focal Attention
Amir Globerson, Nevena Lazic, Soumen Chakrabarti, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira

Plato: A Selective Context Model for Entity Resolution (TACL)
Nevena Lazic, Amarnag Subramanya, Michael Ringgaard, Fernando Pereira

WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia
Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fandrianto, Jay Han, Matthew Kelcey, David Berthelot

Stack-propagation: Improved Representation Learning for Syntax
Yuan Zhang, David Weiss

Cross-lingual Models of Word Embeddings: An Empirical Comparison
Shyam Upadhyay, Manaal Faruqui, Chris Dyer (Google DeepMind)Dan Roth

Globally Normalized Transition-Based Neural Networks (Outstanding Papers Session)
Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman GanchevSlav Petrov, Michael Collins

Posters
Cross-lingual projection for class-based language models
Beat Gfeller, Vlad Schogol, Keith Hall

Synthesizing Compound Words for Machine Translation
Austin Matthews, Eva Schlinger*, Alon Lavie, Chris Dyer (Google DeepMind)*

Cross-Lingual Morphological Tagging for Low-Resource Languages
Jan Buys, Jan A. Botha

Workshops
1st Workshop on Representation Learning for NLP
Keynote Speakers include: Raia Hadsell (Google DeepMind)
Workshop Organizers include: Edward Grefenstette (Google DeepMind), Phil Blunsom (Google DeepMind), Karl Moritz Hermann (Google DeepMind)
Program Committee members include: Tomáš Kočiský (Google DeepMind), Wang Ling (Google DeepMind), Ankur Parikh (Google), John Platt (Google), Oriol Vinyals (Google DeepMind)

1st Workshop on Evaluating Vector-Space Representations for NLP
Contributed Papers:
Problems With Evaluation of Word Embeddings Using Word Similarity Tasks
Manaal Faruqui, Yulia Tsvetkov, Pushpendre Rastogi, Chris Dyer (Google DeepMind)*

Correlation-based Intrinsic Evaluation of Word Vector Representations
Yulia Tsvetkov, Manaal Faruqui, Chris Dyer (Google DeepMind)

SIGFSM Workshop on Statistical NLP and Weighted Automata
Contributed Papers:
Distributed representation and estimation of WFST-based n-gram models
Cyril Allauzen, Michael Riley, Brian Roark

Pynini: A Python library for weighted finite-state grammar compilation
Kyle Gorman


* Work completed at CMU

Computational Thinking for All Students



(Crossposted on the Google for Education Blog, and the the Huffington Post)

Last year, I wrote about the importance of teaching computational thinking to all K-12 students. Given the growing use of computing, algorithms and data in all fields from the humanities to medicine to business, it’s becoming increasingly important for students to understand the basics of computer science (CS). One lesson we have learned through Google’s CS education outreach efforts is that these skills can be accessible to all students, if we introduce them early in K-5. These are truly 21st century skills which can, over time, produce a workforce ready for a technology-enabled and driven economy.

How can teachers start introducing computational thinking in early school curriculum? It is already present in many topic areas - algorithms for solving math problems, for example. However, what is often missing in current examples of computational thinking is the explicit connection between what students are learning and its application in computing. For example, once a student has mastered adding multi-digit numbers, the following algorithm could be presented:
  1. Add together the digits in the ones place. If the result is < 10, it becomes the ones digit of the answer. If it's >= 10 or greater, the ones digit of the result becomes the ones digit of the answer, and you add 1 to the next column.
  2. Add together the digits in the tens place, plus the 1 carried over from the ones place, if necessary. If the answer < than 10, it becomes the tens digit of the answer; if it's >= 10, the ones digit becomes the tens digit of the answer and 1 is added to the next column.
  3. Repeat this process for any additional columns until they are all added.
This allows a teacher to present the concept of an algorithm and its use in computing, as well as the most important elements of any computer program: conditional branching (“if the result is less than 10…”) and iteration (“repeat this process…”). Going a step farther, a teacher translating the algorithm into a running program can have a compelling effect. When something that students have used to solve an instance of a problem can automatically solve all instances of the that problem, it’s quite a powerful moment for them even if they don’t do the coding themselves.

Google has created an online course for K-12 teachers to learn about computational thinking and how to make these explicit connections for their students. We also have a large repository of lessons, explorations and programs to support teachers and students. Our videos illustrate real-world examples of the application of computational thinking in Google’s products and services, and we have compiled a set of great resources showing how to integrate computational thinking into existing curriculum. We also recently announced Project Bloks to engage younger children in computational thinking. Finally, code.org, for whom Google is a primary sponsor, has curriculum and materials for K-5 teachers and students.

We feel that computational thinking is a core skill for all students. If we can make these explicit connections for students, they will see how the devices and apps that they use everyday are powered by algorithms and programs. They will learn the importance of data in making decisions. They will learn skills that will prepare them for a workforce that will be doing vastly different tasks than the workforce of today. We owe it to all students to give them every possible opportunity to be productive and successful members of society.

Announcing an Open Source ADC board for BeagleBone



(Cross-posted on the Google Open Source Blog)

Working with electronics, we often find ourselves soldering up a half baked electronic circuit to detect some sort of signal. For example, last year we wanted to measure the strength of a carrier. We started with traditional analog circuits — amplifier, filter, envelope detector, threshold. You can see some of our prototypes in the image below; they get pretty messy.
While there's a certain satisfaction in taming a signal using the physical properties of capacitors, coils of wire and transistors, it's usually easier to digitize the signal with an Analog to Digital Converter (ADC) and manage it with Digital Signal Processing (DSP) instead of electronic parts. Tweaking software doesn't require a soldering iron, and lets us modify signals in ways that would require impossible analog circuits.

There are several standard solutions for digitizing a signal: connect a laptop to an oscilloscope or Data Acquisition System (DAQ) via USB or Ethernet, or use the onboard ADCs of a maker board like an Arduino. The former are sensitive and accurate, but also big and power hungry. The latter are cheap and tiny, but slower and have enough RAM for only milliseconds worth of high speed sample data.

That led us to investigate single board computers like the BeagleBone and Raspberry Pi, which are small and cheap like an Arduino, but have specs like a smartphone. And crucially, the BeagleBone's system-on-a-chip (SoC) combines a beefy ARMv7 CPU with two smaller Programmable Realtime Units (PRUs) that have access to all 512MB of system RAM. This lets us dedicate the PRUs to the time-sensitive and repetitive task of reading each sample out of an external ADC, while the main CPU lets us use the data with the GNU/Linux tools we're used to.

The result is an open source BeagleBone cape we've named PRUDAQ. It's built around the Analog Devices AD9201 ADC, which samples two inputs simultaneously at up to 20 megasamples per second, per channel. Simultaneous sampling and high sample rates make it useful for software-defined radio (SDR) and scientific applications where a built-in ADC isn't quite up to the task.

Our open source electrical design and sample code are available on GitHub, and GroupGets has boards ready to ship for $79. We also were fortunate to have help from Google intern Kumar Abhishek. He added support for PRUDAQ to his Google Summer of Code project BeagleLogic that performs much better than our sample code.

We started PRUDAQ for our own needs, but quickly realized that others might also find it useful. We're excited to get your feedback through the email list. Tell us what can be done with inexpensive fast ADCs paired with inexpensive fast CPUs!

Towards an exact (quantum) description of chemistry



...nature isn't classical, dammit, and if you want to make a simulation of nature, you'd better make it quantum mechanical...” - Richard Feynman, Simulating Physics with Computers

One of the most promising applications of quantum computing is the ability to efficiently model quantum systems in nature that are considered intractable for classical computers. Now, in collaboration with the Aspuru-Guzik group at Harvard and researchers from Lawrence Berkeley National Labs, UC Santa Barbara, Tufts University and University College London, we have performed the first completely scalable quantum simulation of a molecule. Our experimental results are detailed in the paper Scalable Quantum Simulation of Molecular Energies, which recently appeared in Physical Review X.

The goal of our experiment was to use quantum hardware to efficiently solve the molecular electronic structure problem, which seeks the solution for the lowest energy configuration of electrons in the presence of a given nuclear configuration. In order to predict chemical reaction rates (which govern the mechanism of chemical reactions), one must make these calculations to extremely high precision. The ability to predict such rates could revolutionize the design of solar cells, industrial catalysts, batteries, flexible electronics, medicines, materials and more. The primary difficulty is that molecular systems form highly entangled quantum superposition states which require exponentially many classical computing resources in order to represent to sufficiently high precision. For example, exactly computing the energies of methane (CH4) takes about one second, but the same calculation takes about ten minutes for ethane (C2H6) and about ten days for propane (C3H8).

In our experiment, we focus on an approach known as the variational quantum eigensolver (VQE), which can be understood as a quantum analog of a neural network. Whereas a classical neural network is a parameterized mapping that one trains in order to model classical data, VQE is a parameterized mapping (e.g. a quantum circuit) that one trains in order to model quantum data (e.g. a molecular wavefunction). The training objective for VQE is the molecular energy function, which is always minimized by the true ground state. The quantum advantage of VQE is that quantum bits can efficiently represent the molecular wavefunction whereas exponentially many classical bits would be required.

Using VQE, we quantum computed the energy landscape of molecular hydrogen, H2. We compared the performance of VQE to another quantum algorithm for chemistry, the phase estimation algorithm (PEA). Experimentally computed energies, as a function of the H - H bond length, are shown below alongside the exact curve. We were able to obtain such high performance with VQE because the neural-network-like training loop helped to establish experimentally optimal circuit parameters for representing the wavefunction in the presence of systematic control errors. One can understand this by considering a hardware implementation of a neural network with a faulty weight, e.g. the weight is only represented half as strong as it should be. Because the weights of the neural network are established via a closed-loop training procedure which can compensate for such systematic errors, the hardware neural network is robust against such imperfections. Likewise, despite systematic errors in our implementation of the VQE circuit, we are still able to learn an accurate model for the wavefunction. This robustness inspires hope that VQE may be able to solve classically intractable problems without quantum error correction.
While the energies of molecular hydrogen can be computed classically (albeit inefficiently), as one scales up quantum hardware it becomes possible to simulate even larger chemical systems, including classically intractable ones. For instance, with only about a hundred reliable quantum bits one could model the process by which bacteria produce fertilizer at room temperature. Elucidating this mechanism is a famous open problem in chemistry because the way humans produce fertilizer is extremely inefficient and consumes 1-2% of the world's energy annually. Such calculations could also assist with breakthroughs in fundamental science, for instance, in the understanding of high temperature superconductivity.

Though many theoretical and experimental challenges lay ahead, a quantum enabled paradigm shift from qualitative / descriptive chemistry simulations to quantitative / predictive chemistry simulations could modernize the field so dramatically that the examples imaginable today are just the tip of the iceberg.

Wide & Deep Learning: Better Together with TensorFlow



"Learn the rules like a pro, so you can break them like an artist." — Pablo Picasso

The human brain is a sophisticated learning machine, forming rules by memorizing everyday events (“sparrows can fly” and “pigeons can fly”) and generalizing those learnings to apply to things we haven't seen before (“animals with wings can fly”). Perhaps more powerfully, memorization also allows us to further refine our generalized rules with exceptions (“penguins can't fly”). As we were exploring how to advance machine intelligence, we asked ourselves the question—can we teach computers to learn like humans do, by combining the power of memorization and generalization?

It's not an easy question to answer, but by jointly training a wide linear model (for memorization) alongside a deep neural network (for generalization), one can combine the strengths of both to bring us one step closer. At Google, we call it Wide & Deep Learning. It's useful for generic large-scale regression and classification problems with sparse inputs (categorical features with a large number of possible feature values), such as recommender systems, search, and ranking problems.
Today we’re open-sourcing our implementation of Wide & Deep Learning as part of the TF.Learn API so that you can easily train a model yourself. Please check out the TensorFlow tutorials on Linear Models and Wide & Deep Learning, as well as our research paper to learn more.

How Wide & Deep Learning works.
Let's say one day you wake up with an idea for a new app called FoodIO*. A user of the app just needs to say out loud what kind of food he/she is craving for (the query). The app magically predicts the dish that the user will like best, and the dish gets delivered to the user's front door (the item). Your key metric is consumption rate—if a dish was eaten by the user, the score is 1; otherwise it's 0 (the label).

You come up with some simple rules to start, like returning the items that match the most characters in the query, and you release the first version of FoodIO. Unfortunately, you find that the consumption rate is pretty low because the matches are too crude to be really useful (people shouting “fried chicken” end up getting “chicken fried rice”), so you decide to add machine learning to learn from the data.

The Wide model.
In the 2nd version, you want to memorize what items work the best for each query. So, you train a linear model in TensorFlow with a wide set of cross-product feature transformations to capture how the co-occurrence of a query-item feature pair correlates with the target label (whether or not an item is consumed). The model predicts the probability of consumption P(consumption | query, item) for each item, and FoodIO delivers the top item with the highest predicted consumption rate. For example, the model learns that feature AND(query="fried chicken", item="chicken and waffles") is a huge win, while AND(query="fried chicken", item="chicken fried rice") doesn't get as much love even though the character match is higher. In other words, FoodIO 2.0 does a pretty good job memorizing what users like, and it starts to get more traction.
The Deep model.
Later on you discover that many users are saying that they're tired of the recommendations. They're eager to discover similar but different cuisines with a “surprise me” state of mind. So you brush up on your TensorFlow toolkit again and train a deep feed-forward neural network for FoodIO 3.0. With your deep model, you're learning lower-dimensional dense representations (usually called embedding vectors) for every query and item. With that, FoodIO is able to generalize by matching items to queries that are close to each other in the embedding space. For example, you find that people who asked for “fried chicken” often don't mind having “burgers” as well.
Combining Wide and Deep models.
However, you discover that the deep neural network sometimes generalizes too much and recommends irrelevant dishes. You dig into the historic traffic, and find that there are actually two distinct types of query-item relationships in the data.

The first type of queries is very targeted. People shouting very specific items like “iced decaf latte with nonfat milk” really mean it. Just because it's pretty close to “hot latte with whole milk” in the embedding space doesn't mean it's an acceptable alternative. And there are millions of these rules where the transitivity of embeddings may actually do more harm than good. On the other hand, queries that are more exploratory like “seafood” or “italian food” may be open to more generalization and discovering a diverse set of related items. Having realized these, you have an epiphany: Why do I have to choose either wide or deep models? Why not both?
Finally, you build FoodIO 4.0 with Wide & Deep Learning in TensorFlow. As shown in the graph above, the sparse features like query="fried chicken" and item="chicken fried rice" are used in both the wide part (left) and the deep part (right) of the model. During training, the prediction errors are backpropagated to both sides to train the model parameters. The cross-feature transformation in the wide model component can memorize all those sparse, specific rules, while the deep model component can generalize to similar items via embeddings.

Wider. Deeper. Together.
We're excited to share the TensorFlow API and implementation of Wide & Deep Learning with you, so you can try out your ideas with it and share your findings with everyone else. To get started, check out the code on GitHub and our TensorFlow tutorials on Linear Models and Wide & Deep Learning.

Acknowledgement
Bringing Wide & Deep from idea, research to implementation has been a huge team effort. We'd to like to thank all the people who have contributed to the project or have given us advice, including: Heng-Tze Cheng, Mustafa Ispir, Zakaria Haque, Lichan Hong, Rohan Anil, Denis Baylor, Vihan Jain, Salem Haykal, Robson Araujo, Xiaobing Liu, Yonghui Wu, Thomas Strohmann, Tal Shaked, Jeremiah Harmsen, Greg Corrado, Glen Anderson, D. Sculley, Tushar Chandra, Ed Chi, Rajat Monga, Rob von Behren, Jarek Wilkiewicz, Christine Robson, Illia Polosukhin, Martin Wicke, Gus Katsiapis, Alexandre Passos, Olivier Chapelle, Levent Koc, Akshay Naresh Modi, Wei Chai, Hrishi Aradhye, Othar Hansson, Xinran He, Martin Zinkevich, Joe Toth, Anton Rusanov, Hemal Shah, Petros Mol, Frank Li, Yutaka Suematsu, Sameer Ahuja, Eugene Brevdo, Philip Tucker, Shanqing Cai, Kester Tong, and more.

* For illustration only. FoodIO is not a real app.

CVPR 2016 & Research at Google



This week, Las Vegas hosts the 2016 Conference on Computer Vision and Pattern Recognition (CVPR 2016), the premier annual computer vision event comprising the main conference and several co-located workshops and short courses. As a leader in computer vision research, Google has a strong presence at CVPR 2016, with many Googlers presenting papers and invited talks at the conference, tutorials and workshops.

We congratulate Google Research Scientist Ce Liu and Google Faculty Advisor Abhinav Gupta, who were selected as this year’s recipients of the PAMI Young Researcher Award for outstanding research contributions within computer vision. We also congratulate Googler Henrik Stewenius for receiving the Longuet-Higgins Prize, a retrospective award that recognizes up to two CVPR papers from ten years ago that have made a significant impact on computer vision research, for his 2006 CVPR paper “Scalable Recognition with a Vocabulary Tree”, co-authored with David Nister.

If you are attending CVPR this year, please stop by our booth and chat with our researchers about the projects and opportunities at Google that go into solving interesting problems for hundreds of millions of people. The Google booth will also showcase sveral recent efforts, including the technology behind Motion Stills and a live demo of neural network-based image compression. Learn more about our research being presented at CVPR 2016 in the list below (Googlers highlighted in blue).

Oral Presentations
Generation and Comprehension of Unambiguous Object Descriptions
Junhua Mao, Jonathan Huang, Alexander Toshev, Oana Camburu, Alan L. Yuille, Kevin Murphy

Detecting Events and Key Actors in Multi-Person Videos
Vignesh Ramanathan, Jonathan Huang, Sami Abu-El-Haija, Alexander Gorban, Kevin Murphy, Li Fei-Fei

Spotlight Session: 3D Reconstruction
DeepStereo: Learning to Predict New Views From the World’s Imagery
John Flynn, Ivan Neulander, James Philbin, Noah Snavely

Posters
Discovering the Physical Parts of an Articulated Object Class From Multiple Videos
Luca Del Pero, Susanna Ricco, Rahul Sukthankar, Vittorio Ferrari

Blockout: Dynamic Model Selection for Hierarchical Deep Networks
Calvin Murdock, Zhen Li, Howard Zhou, Tom Duerig

Rethinking the Inception Architecture for Computer Vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, Zbigniew Wojna

Improving the Robustness of Deep Neural Networks via Stability Training
Stephan Zheng, Yang Song, Thomas Leung, Ian Goodfellow

Semantic Image Segmentation With Task-Specific Edge Detection Using CNNs and a Discriminatively Trained Domain Transform
Liang-Chieh Chen, Jonathan T. Barron, George Papandreou, Kevin Murphy, Alan L. Yuille

Tutorial
Optimization Algorithms for Subset Selection and Summarization in Large Data Sets
Ehsan Elhamifar, Jeff Bilmes, Alex Kulesza, Michael Gygli

Workshops
Perceptual Organization in Computer Vision: The Role of Feedback in Recognition and Reorganization
Organizers: Katerina Fragkiadaki, Phillip Isola, Joao Carreira
Invited talks: Viren Jain, Jitendra Malik

VQA Challenge Workshop
Invited talks: Jitendra Malik, Kevin Murphy

Women in Computer Vision
Invited talk: Caroline Pantofaru

Computational Models for Learning Systems and Educational Assessment
Invited talk: Jonathan Huang

Large-Scale Scene Understanding (LSUN) Challenge
Invited talk: Jitendra Malik

Large Scale Visual Recognition and Retrieval: BigVision 2016
General Chairs: Jason Corso, Fei-Fei Li, Samy Bengio

ChaLearn Looking at People
Invited talk: Florian Schroff

Medical Computer Vision
Invited talk: Ramin Zabih

Project Bloks: Making code physical for kids



At Google, we’re passionate about empowering children to create and explore with technology. We believe that when children learn to code, they’re not just learning how to program a computer—they’re learning a new language for creative expression and are developing computational thinking: a skillset for solving problems of all kinds.

In fact, it’s a skillset whose importance is being recognised around the world—from President Obama’s CS4All program to the inclusion of Computer Science in the UK National Curriculum. We’ve long supported and advocated the furthering of CS education through programs and platforms such as Blockly, Scratch Blocks, CS First and Made w/ Code.

Today, we’re happy to announce Project Bloks, a research collaboration between Google, Paulo Blikstein (Stanford University) and IDEO with the goal of creating an open hardware platform that researchers, developers and designers can use to build physical coding experiences. As a first step, we’ve created a system for tangible programming and built a working prototype with it. We’re sharing our progress before conducting more research over the summer to inform what comes next.

Physical coding
Kids are inherently playful and social. They naturally play and learn by using their hands, building stuff and doing things together. Making code physical - known as tangible programming - offers a unique way to combine the way children innately play and learn with computational thinking.

Project Bloks is preceded and shaped by a long history of educational theory and research in the area of hands-on learning. From Friedrich Froebel, Maria Montessori and Jean Piaget’s pioneering work in the area of learning by experience, exploration and manipulation, to the research started in the 1970s by Seymour Papert and Radia Perlman with LOGO and TORTIS. This exploration has continued to grow and includes a wide range of research and platforms.

However, designing kits for tangible programming is challenging—requiring the resources and time to develop both the software and the hardware. Our goal is to remove those barriers. By creating an open platform, Project Bloks will allow designers, developers and researchers to focus on innovating, experimenting and creating new ways to help kids develop computational thinking. Our vision is that, one day, the Project Bloks platform becomes for tangible programming what Blockly is for on-screen programming.
The Project Bloks system
We’ve designed a system that developers can customise, reconfigure and rearrange to create all kinds of different tangible programming experiences.
A birdseye view of the customisable and reconfigurable Project Bloks system
The Project Bloks system is made up of three core components the “Brain Board”, “Base Boards” and “Pucks”. When connected together they create a set of instructions which can be sent to connected devices, things like toys or tablets, over wifi or Bluetooth.
The three core components of the Project Bloks system
Pucks: abundant, inexpensive, customisable physical instructions
Pucks are what make the Project Bloks system so versatile. They help bring the infinite flexibility of software programming commands to tangible programming experiences. Pucks can be programmed with different instructions, such as ‘turn on or off’, ‘move left’ or ‘jump’. They can also take the shape of many different interactive forms—like switches, dials or buttons. With no active electronic components, they’re also incredibly cheap and easy to make. At a minimum, all you'd need to make a puck is a piece of paper and some conductive ink.
Pucks allow for the creation and customisation of endless amount of different domain-specific physical instructions cheaply and easily.
Base Boards: a modular design for diverse tangible programming experiences
Base Boards read a Puck’s instruction through a capacitive sensor. They act as a conduit for a Puck’s command to the Brain Board. Base Boards are modular and can be connected in sequence and in different orientations to create different programming flows and experiences.
The modularity of the Base Boards means they can be arranged in different configurations and flows
Each Base Board is fitted with a haptic motor and LEDs that can be used to give end-users real time feedback on their programming experience. The Base Boards can also trigger audio feedback from the Brain Board’s built-in speaker.

Brain Board: control any device that has an API over WiFi or Bluetooth
The Brain Board is the processing unit of the system, built on a Raspberry Pi Zero. It also provides the other boards with power, and contains an API to receive and send data to the Base Boards. It sends the Base Boards’ instructions to any device with WiFi or Bluetooth connectivity and an API.

As a whole, the Project Bloks system can take on different form factors and be made out of different materials. This means developers have the flexibility to create diverse experiences that can help kids develop computational thinking: from composing music using functions to playing around with sensors or anything else they care to invent.
The Project Bloks system can be used to create all sorts of different physical programming experiences for kids
The Coding Kit
To show how designers, developers, and researchers might make use of system, the Project Bloks team worked with IDEO to create a reference device, called the Coding Kit. It lets kids learn basic concepts of programming by allowing them to put code bricks together to create a set of instructions that can be sent to control connected toys and devices—anything from a tablet, to a drawing robot or educational tools for exploring science like LEGO® Education WeDo 2.0.
What’s next?
We are looking for participants (educators, developers, parents and researchers) from around the world who would like to help shape the future of Computer Science education by remotely taking part in our research studies later in the year. If you would like to be part of our research study or simply receive updates on the project, please sign up.

If you want more context and detail on Project Bloks, you can read our position paper.

Finally, a big thank you to the team beyond Google who’ve helped us get this far—including the pioneers of tangible learning and programming who’ve inspired us and informed so much of our thinking.


Bringing Precision to the AI Safety Discussion



We believe that AI technologies are likely to be overwhelmingly useful and beneficial for humanity. But part of being a responsible steward of any new technology is thinking through potential challenges and how best to address any associated risks. So today we’re publishing a technical paper, Concrete Problems in AI Safety, a collaboration among scientists at Google, OpenAI, Stanford and Berkeley.

While possible AI safety risks have received a lot of public attention, most previous discussion has been very hypothetical and speculative. We believe it’s essential to ground concerns in real machine learning research, and to start developing practical approaches for engineering AI systems that operate safely and reliably.

We’ve outlined five problems we think will be very important as we apply AI in more general circumstances. These are all forward thinking, long-term research questions -- minor issues today, but important to address for future systems:

  • Avoiding Negative Side Effects: How can we ensure that an AI system will not disturb its environment in negative ways while pursuing its goals, e.g. a cleaning robot knocking over a vase because it can clean faster by doing so?
  • Avoiding Reward Hacking: How can we avoid gaming of the reward function? For example, we don’t want this cleaning robot simply covering over messes with materials it can’t see through.
  • Scalable Oversight: How can we efficiently ensure that a given AI system respects aspects of the objective that are too expensive to be frequently evaluated during training? For example, if an AI system gets human feedback as it performs a task, it needs to use that feedback efficiently because asking too often would be annoying.
  • Safe Exploration: How do we ensure that an AI system doesn’t make exploratory moves with very negative repercussions? For example, maybe a cleaning robot should experiment with mopping strategies, but clearly it shouldn’t try putting a wet mop in an electrical outlet.
  • Robustness to Distributional Shift: How do we ensure that an AI system recognizes, and behaves robustly, when it’s in an environment very different from its training environment? For example, heuristics learned for a factory workfloor may not be safe enough for an office.

We go into more technical detail in the paper. The machine learning research community has already thought quite a bit about most of these problems and many related issues, but we think there’s a lot more work to be done.

We believe in rigorous, open, cross-institution work on how to build machine learning systems that work as intended. We’re eager to continue our collaborations with other research groups to make positive progress on AI.

ICML 2016 & Research at Google



This week, New York hosts the 2016 International Conference on Machine Learning (ICML 2016), a premier annual Machine Learning event supported by the International Machine Learning Society (IMLS). Machine Learning is a key focus area at Google, with highly active research groups exploring virtually all aspects of the field, including deep learning and more classical algorithms.

We work on an extremely wide variety of machine learning problems that arise from a broad range of applications at Google. One particularly important setting is that of large-scale learning, where we utilize scalable tools and architectures to build machine learning systems that work with large volumes of data that often preclude the use of standard single-machine training algorithms. In doing so, we are able to solve deep scientific problems and engineering challenges, exploring theory as well as application, in areas of language, speech, translation, music, visual processing and more.

As Gold Sponsor, Google has a strong presence at ICML 2016 with many Googlers publishing their research and hosting workshops. If you’re attending, we hope you’ll visit the Google booth and talk with our researchers to learn more about the exciting work, creativity and fun that goes into solving interesting ML problems that impact millions of people. You can also learn more about our research being presented at ICML 2016 in the list below (Googlers highlighted in blue).

ICML 2016 Organizing Committee
Area Chairs include: Corinna Cortes, John Blitzer, Maya Gupta, Moritz Hardt, Samy Bengio

IMLS
Board Members include: Corinna Cortes

Accepted Papers
ADIOS: Architectures Deep In Output Space
Moustapha Cisse, Maruan Al-Shedivat, Samy Bengio

Associative Long Short-Term Memory
Ivo Danihelka, Greg Wayne, Benigno Uria, Nal Kalchbrenner, Alex Graves

Asynchronous Methods for Deep Reinforcement Learning
Volodymyr Mnih, Adria Puigdomenech Badia, Mehdi Mirza, Alex Graves, Timothy Lillicrap, Tim Harley, David Silver, Koray Kavukcuoglu

Binary embeddings with structured hashed projections
Anna Choromanska, Krzysztof Choromanski, Mariusz Bojarski, Tony Jebara, Sanjiv Kumar, Yann LeCun

Discrete Distribution Estimation Under Local Privacy
Peter Kairouz, Keith Bonawitz, Daniel Ramage

Dueling Network Architectures for Deep Reinforcement Learning (Best Paper Award recipient)
Ziyu Wang, Nando de Freitas, Tom Schaul, Matteo Hessel, Hado van Hasselt, Marc Lanctot

Exploiting Cyclic Symmetry in Convolutional Neural Networks
Sander Dieleman, Jeffrey De Fauw, Koray Kavukcuoglu

Fast Constrained Submodular Maximization: Personalized Data Summarization
Baharan Mirzasoleiman, Ashwinkumar Badanidiyuru, Amin Karbasi

Greedy Column Subset Selection: New Bounds and Distributed Algorithms
Jason Altschuler, Aditya Bhaskara, Gang Fu, Vahab Mirrokni, Afshin Rostamizadeh, Morteza Zadimoghaddam

Horizontally Scalable Submodular Maximization
Mario Lucic, Olivier Bachem, Morteza Zadimoghaddam, Andreas Krause

Continuous Deep Q-Learning with Model-based Acceleration
Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

Meta-Learning with Memory-Augmented Neural Networks
Adam Santoro, Sergey Bartunov, Matthew Botvinick, Daan Wierstra, Timothy Lillicrap

One-Shot Generalization in Deep Generative Models
Danilo Rezende, Shakir Mohamed, Daan Wierstra

Pixel Recurrent Neural Networks (Best Paper Award recipient)
Aaron Van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu

Pricing a low-regret seller
Hoda Heidari, Mohammad Mahdian, Umar Syed, Sergei Vassilvitskii, Sadra Yazdanbod

Primal-Dual Rates and Certificates
Celestine Dünner, Simone Forte, Martin Takac, Martin Jaggi

Recommendations as Treatments: Debiasing Learning and Evaluation
Tobias Schnabel, Thorsten Joachims, Adith Swaminathan, Ashudeep Singh, Navin Chandak

Recycling Randomness with Structure for Sublinear Time Kernel Expansions
Krzysztof Choromanski, Vikas Sindhwani

Train faster, generalize better: Stability of stochastic gradient descent
Moritz Hardt, Ben Recht, Yoram Singer

Variational Inference for Monte Carlo Objectives
Andriy Mnih, Danilo Rezende

Workshops
Abstraction in Reinforcement Learning
Organizing Committee: Daniel Mankowitz, Timothy Mann, Shie Mannor
Invited Speaker: David Silver

Deep Learning Workshop
Organizers: Antoine Bordes, Kyunghyun Cho, Emily Denton, Nando de Freitas, Rob Fergus
Invited Speaker: Raia Hadsell

Neural Networks Back To The Future
Organizers: Léon Bottou, David Grangier, Tomas Mikolov, John Platt

Data-Efficient Machine Learning
Organizers: Marc Deisenroth, Shakir Mohamed, Finale Doshi-Velez, Andreas Krause, Max Welling

On-Device Intelligence
Organizers: Vikas Sindhwani, Daniel Ramage, Keith Bonawitz, Suyog Gupta, Sachin Talathi
Invited Speakers: Hartwig Adam, H. Brendan McMahan

Online Advertising Systems
Organizing Committee: Sharat Chikkerur, Hossein Azari, Edoardo Airoldi
Opening Remarks: Hossein Azari
Invited Speakers: Martin Pál, Todd Phillips

Anomaly Detection 2016
Organizing Committee: Nico Goernitz, Marius Kloft, Vitaly Kuznetsov

Tutorials
Deep Reinforcement Learning
David Silver

Rigorous Data Dredging: Theory and Tools for Adaptive Data Analysis
Moritz Hardt, Aaron Roth