Tag Archives: Open source

Semantic Reactor: A tool for experimenting with NLU models

Companies are using natural language understanding (NLU) to create digital personal assistants, customer service bots, and semantic search engines for reviews, forums and the news.

However, the perception that using NLU and machine learning is costly and time consuming prevents a lot of potential users from exploring its benefits.

To dispel some of the intimidation of using NLU, and to demonstrate how it can be easily used with pre-trained, generic models, we have released a tool, the Semantic Reactor, and open-sourced example code, The Mystery of the Three Bots.

The Semantic Reactor

The Semantic Reactor is a Google Sheets Add-On that allows the user to sort lines of text in a sheet using a variety of machine-learning models. It is released as a whitelisted experiment, so if you would like to check it out, fill out this application at the Google Cloud AI Workshop. Once approved, you’ll be emailed instructions on how to install it.

The tool offers ranking methods that determine how the list will be sorted. With the semantic similarity method, the lines more similar in meaning to the input will be ranked higher.

With the input-response method, the lines that are the most appropriate conversational responses are ranked higher.

Why use the Semantic Reactor?

There are a lot of interesting things you can do with the Semantic Reactor, but let’s look at the following two:
  • Writing dialogue for a bot that exists within a well-defined environment and has a clear purpose (like a customer service bot) using semantic similarity.
  • Searching within large collections of text, like from a message board. For that, we will use input-response.

Writing Dialogue for a Bot Using Semantic Similarity

For the sake of an example, let’s say you are writing dialogue for a bot that answers questions about a product, in this case, cookies.

If you’ve been running a cookie hotline for a while, you probably can list the most common cookie questions. With that data, you can create your cookie bot. Start by opening a Google Sheet and writing the common questions and answers (questions in the A column, answers in the B).

Here is the start of what that Sheet might look like. Make a copy of the Sheet, which will allow you to use the Semantic Reactor Add-on. Use the tool to experiment with new QA pairs and how each model reacts to them.

Here are a few queries to try, using the semantic similarity rank method:

Query: What are cookie ingredients?
Returns: What are cookies made of?

Query: Are cookies biscuits?
Returns: Are cookies also called biscuits?

Query: What should I serve with cookies?
Returns: What drinks go well with cookies?

Of course, that small list of responses won’t cover many of the questions people will ask your cookie bot. What the Reactor allows you to do is quickly add new QA pairs as you learn about what your users want to ask.

For example, maybe people are asking a lot about cookie calories.

You’d write the new question in column A, and the new answer in column B, and then test a few different phrasings with the Reactor. You might need to tweak the target response a few times to make sure it matches a wide variety of phrasings. You should also experiment with the three different models to see which one performs the best.

For instance, let’s say the new target question you want the model to match to is: “How many calories does a typical cookie have?”

That question might be phrased by users as:
  • Are cookies caloric?
  • A lot of calories in a cookie?
  • Will cookies wreck my diet?
  • Are cookies fattening?

The more you test with live users, the more you’ll find that they phrase their questions in ways you don’t expect. As with all things based on machine learning, constantly refreshing data, testing and improvement is all part of the process.

Searching Through Text Using Input-Response

Sometimes you can’t anticipate what users are going to ask, and sometimes you might be dealing with a lot of potential responses, maybe thousands. In cases like that, you should use the input-response ranking method. That means the model will examine the list of potential responses and then rank each one according to what it thinks is the most likely response.

Here is a Sheet containing a list of simple conversational responses. Using the input-response ranking method, try a few generic conversational openers like “Hello” or “How’s it going?”

Note that in input-response mode, the model is predicting the most likely conversational response to an input and not the most semantically similar response.

Note that “Hello,” in input-response mode, returns “Nice to meet you.” In semantic similarity mode, “Hello” returns what the model thinks is semantically closest to “Hello,” which is “What’s up?”

Now try your own! Add potential responses. Switch between the models and ranking methods to see how it changes the results (be sure to hit the “reload” button every time you add new responses).

Example Code

One of the models available on TensorFlow Hub is the Universal Sentence Encoder Lite. It’s only 1.6MB and is suitable for use within websites and on-device applications.

An open sourced sample game that uses the USE Lite is Mystery of the Three Bots on Github. It’s a simple demonstration that shows how you can use a small semantic ML model to drive conversations with game characters. The corpora the game uses were created and tested using the Semantic Reactor.

You can play a running version of the game here. You can experiment with the corpora of two of the characters, the Maid and the Butler, contained within this Sheet. Be sure to make a copy of the Sheet so you can edit and add new QA pairs.

Where To Get The Models Used Within The Semantic Reactor

All of the models used in the Semantic Reactor are published and available online.
  • Local – Minified TensorFlow.js version of the Universal Sentence Encoder.
  • Basic Online – Basic version of the Universal Sentence Encoder.
  • Multilingual Online – Universal Sentence Encoder trained on question/ answer pairs in 16 languages.

Final Thoughts

These language models are far from perfect. They use their training to give a best estimate on what to return based on the list of responses you gave it. Machine learning is about calculation, prediction, and training. Models can be improved over time with more data and tuning, and in turn, be made more accurate.

Also, because conversational models are trained on dialogue between people, and because people are biased, the models will display biases that exist in the data that they were trained on, sometimes in ways you can’t predict. For more on model bias, and more detail about how these models were trained, see the Semantic Experiences for Developers page.

By Ben Pietrzak, Steve Pucci, Aaron Cohen — Google AI  

A Season of Docs story

Lack of clear and reliable documentation is one of the main shortcomings of many open source projects. Last year, Google set out to help change that by announcing the first ever Season of Docs

Season of Docs is an initiative that brings together technical writers and open source projects to collaborate for a few months, benefitting both the communities and writers.

This is the story of Audrey Tavares, one of the writers who signed up for Season of Docs.

Turning incipient curiosity into an opportunity

In 2019, Audrey was completing the Technical and Professional Communication program at Glendon College, exploring technical writing out of curiosity. One of Google’s technical writers, Nicola Yap, completed the same program and visited Audrey’s class in March to talk about her career. It was an enlightening experience, showing technical writing as an attractive alternative with plenty of opportunities, and introducing Audrey to Season of Docs.

For Audrey, this experience meant stepping into unknown territory—she knew nothing about open source software. Naturally, the first step was to familiarize herself with the communities and understand the software development paradigm. After spending time learning she submitted her Technical Writer application—which was accepted—and was assigned to Oppia, an online educational platform.

Main challenges

Audrey had two mentors to help her on her journey: one in India and the other in the United States. As you can imagine, this revealed the first challenge—time zones. While the first few days were stressful, as navigating schedules across time zones was a daunting task,with a little work, they soon came up with an arrangement that worked for everyone.

The second challenge was learning the tools. For most of us, writing a document involves opening a word processor and typing some text, however, Audrey was about to find out, things are a bit more intricate when it comes to documenting code.

When presented with the choice of a documentation tool set, Audrey decided on Write the Docs. It seemed like a very popular tool among open source communities. How hard can it be to use, right? Well, it’s not so much about how difficult it is, but how different it is for someone unfamiliar with a common software development workflow since it entails learning a few things:
Audrey was not dismayed. She pushed forward and gradually learned these new tools. Both mentors were always available, willing to help, and answered all of her questions. Their mentorship was key to her success.

Every end is a new beginning

After Season of Docs was over, Audrey decided to remain part of the Oppia community to actively contribute to make the platform even better.

The experience allowed Audrey to walk away from Season of Docs with a new set of technical skills, communication skills with software engineers, an extended professional network, and a new item in her résumé. She now works as a technical writer for a software company in Toronto.

Applications for Season of Docs 2020 start on April 13 for open source organizations and on May 11 for technical writers. Check the official announcement to learn how to participate.

By Geri Ochoa, Google Cloud

Announcing Season of Docs 2020

Google Open Source is delighted to announce Season of Docs 2020!

Season of Docs brings technical writers and open source projects together for a few months to work on open source documentation. 2019 was the first year of Season of Docs, bringing together open source organizations and technical writers to create 44 successful documentation projects!

Docs are key to open source success

Survey after survey show the importance of good documentation in how developers choose and use open source:
  • 72% of surveyed developers say “Established policies and documentation” is a key decision factor when choosing open source
  • 93% of surveyed developers say “Incomplete or outdated documentation is a pervasive problem” in open source
  • “Lack of documentation” was the top reason developers gave for deciding against using an open source project
Open source communities know this, and still struggle to produce good documentation. Why? Because creating documentation is hard. But...

There are people who know how to do docs well. Technical writers know how to structure a documentation site so that people can find and understand the content. They know how to write docs that fit the needs of their audience. Technical writers can also help optimize a community’s processes for open source contribution and onboarding new contributors.

Season of Docs brings open source projects and technical writers together with the shared goal of creating great documentation. The writers bring their expertise to the projects, and the project mentors help the technical writers learn more about open source and new technologies. Communities gain new docs contributors and technical writers gain valuable open source skills.

Together the technical writers and mentors build a new doc set, improve the structure of the existing docs, develop a much-needed tutorial, or improve contribution processes and guides. See more ideas for technical writing projects.

By working together in Season of Docs we raise awareness of open source, docs, and technical writing.

How does it work?

April 13 – May 4Open source organizations apply to take part in Season of Docs
May 11Google publishes the list of accepted mentoring organizations, along with their ideas for documentation projects
May 11 – July 9Technical writers choose the project they’d like to work on and submit their proposals to Season of Docs
August 10Google announces the accepted technical writer projects
August 11 – September 11Community bonding: Technical writers get to know mentors and the open source community, and refine their projects in collaboration with their mentors
September 11 – December 6Technical writers work with open source mentors on the accepted projects, and submit their work at the end of the period
January 7, 2021Google publishes the list of successfully-completed projects.
See the timeline for details, including the provision for projects that run longer than three months.

Join us

Explore the Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and get ready to apply!

By Erin McKean, Google Open Source

Google and Binomial partner to open source high quality basis universal

Today, Google and Binomial are excited to announce the high quality update to the original Basis Universal release.

Basis Universal allows you to have state of the art web performance with your images, keeping images compressed even on the GPU. Older systems like JPEG and PNG may look small in storage size, but once they hit the GPU they are processed as uncompressed data! The original Basis Universal codec created images that were 6-8 times smaller than JPEG on the GPU while maintaining a similar storage size.

Today we release a high quality Basis Universal codec that utilizes the highest quality formats modern GPUs support, finally bringing the web up to modern GPU texture standards—with cross platform support. The textures are larger in storage size and GPU compressed size, but are still 3-4 times smaller than sending a JPEG or PNG file to be processed on the GPU, and can transcode to a lower quality format for older GPUs.
Original Image by Erol Ahmed from Unsplash.com
Visual comparison of Basis Universal High Quality

Best of all, we are actively working on standardizing Basis Universal with the Khronos Group.

Since our original release in Summer 2019 we’ve seen widespread adoption of Basis Universal in engines like three.js, Babylon.js, Godot, and more, changing what is possible for people to create on the web. Now that a high quality option is available, we expect to see even more adoption and groundbreaking applications created with it.

Please feel free to join our community on Github and check out the full demo there as well. You can also follow standardization efforts via Khronos Group events and forums.

By Stephanie Hurlburt, Binomial and Jamieson Brettle, Chrome Media

Pigweed: A collection of embedded libraries

We’re excited to announce Pigweed, an open source collection of embedded-targeted libraries, or as we like to call them, modules. Pigweed modules are built to enable faster and more reliable development on 32-bit microcontrollers.

Pigweed is in early development and is not suitable for production use at this time.

Getting Started with Pigweed

As of today, the source is available for everyone at pigweed.googlesource.com under an Apache 2.0 license. Instructions on getting up and running can be found in the README.

See Pigweed in action

Pigweed offers modules that address a wide range of embedded developer needs. These highlight how Pigweed can accelerate embedded development across the entire lifecycle:
  • Setup: Get started faster with simplified setup via a virtual environment with all required tools
  • Development: Accelerated edit-compile-flash-test cycles with watchers and distributed testing
  • Code Submission: Pre-configured code formatting and integrated presubmit checks


A classic challenge in the embedded space is reducing the time from running git clone to having a binary executing on a device. Oftentimes, an entire suite of tools is needed for non-trivial production embedded projects. For example, your project likely needs:
  • A C++ compiler for your target device, and also for your host
  • A build system (or three); for example, GN, Ninja, CMake, Bazel
  • A code formatting program like clang-format
  • A debugger like OpenOCD to flash and debug your embedded device
  • A known Python version with known modules installed for scripting
  • … and so on
Below is a system with Python 2.7, clang-format, and no ARM compiler. The bootstrap script in Pigweed’s pw_env_setup module, sets up the current shell to have access to a standardized set of tools—Python 3.8, clang-format, and an ARM compiler among them. All of this is done in a virtual environment so the system’s default environment remains unmodified.


In typical embedded development, adding even a small change involves the following additional manual steps:
  • Re-building the image
  • Flashing the image to a device
  • Ensuring that the change works as expected
  • Verifying that existing tests continue to pass
This is a huge disparity from web development workflows where file watchers are prevalent—you save a file and instantly see the results of the change.

Pigweed’s pw_watch module solves this inefficiency directly, providing a watcher that automatically invokes a build when a file is saved, and also runs the specific tests affected by the code changes. This drastically reduces the edit-compile-flash-test cycle for changes.

In the demo above, the pw_watch module (on the right) does the following:
  • Detects source file changes
  • Builds the affected libraries, tests, and binaries
  • Flashes the tests to the device (in this case a STM32F429i Discovery board)
  • Runs the specific unit tests
There’s no need to leave your code editor—all of this is done automatically. You can save additional time by using the pw_target_runner module to run tests in parallel across multiple devices.


When developing code as a part of a team, consistent code is an important part of a healthy codebase. However, setting up linters, configuring code formatting, and adding automated presubmit checks is work that often gets delayed indefinitely.

Pigweed’s pw_presubmit module provides an off-the-shelf integrated suite of linters, based on tools that you’ve probably already used, that are pre-configured for immediate use for microcontroller developers. This means that your project can have linting and automatic formatting presubmit checks from its inception.

And a bunch of other modules

There are many modules in addition to the ones highlighted above...
  • pw_tokenizer – A module that converts strings to binary tokens at compile time. This enables logging with dramatically less overhead in flash, RAM, and CPU usage.
  • pw_string – Provides the flexibility, ease-of-use, and safety of C++-style string manipulation, but with no dynamic memory allocation and a much smaller binary size impact. Using pw_string in place of the standard C functions eliminates issues related to buffer overflow or missing null terminators.
  • pw_bloat – A module to generate memory reports for output binaries empowering developers with information regarding the memory impact of any change.
  • pw_unit_test – Unit testing is important and Pigweed offers a portable library that’s broadly compatible with Google Test. Unlike Google Test, pw_unit_test is built on top of embedded friendly primitives; for example, it does not use dynamic memory allocation. Additionally, it is to port to new target platforms by implementing the test event handler interface.
  • pw_kvs – A key-value-store implementation for flash-backed persistent storage with integrated wear levelling. This is a lightweight alternative to a file system for embedded devices.
  • pw_cpu_exception_armv7m – Robust low level hardware fault handler for ARM Cortex-M; the handler even has unit tests written in assembly to verify nested-hardware-fault handling!
  • pw_protobuf – An early preview of our wire-format-oriented protocol buffer implementation. This protobuf compiler makes a different set of implementation tradeoffs than the most popular protocol buffer library in this space, nanopb.

Why name the project Pigweed?

Pigweed, also known as amaranth, is a nutritious grain and leafy salad green that is also a rapidly growing weed. When developing the project that eventually became Pigweed, we wanted to find a name that was fun, playful, and reflective of how we saw Pigweed growing. Teams would start out using one module that catches their eye, and after that goes well, they’d quickly start using more.

So far, so good 😁

What’s next?

We’re continuing to evolve the collection and add new modules. It’s our hope that others in the embedded community find these modules helpful for their projects.

By Keir Mierle and Mohammed Habibulla, on behalf of the Pigweed team

Introducing Dreamer: Scalable Reinforcement Learning Using World Models

Research into how artificial agents can choose actions to achieve goals is making rapid progress in large part due to the use of reinforcement learning (RL). Model-free approaches to RL, which learn to predict successful actions through trial and error, have enabled DeepMind's DQN to play Atari games and AlphaStar to beat world champions at Starcraft II, but require large amounts of environment interaction, limiting their usefulness for real-world scenarios.

In contrast, model-based RL approaches additionally learn a simplified model of the environment. This world model lets the agent predict the outcomes of potential action sequences, allowing it to play through hypothetical scenarios to make informed decisions in new situations, thus reducing the trial and error necessary to achieve goals. In the past, it has been challenging to learn accurate world models and leverage them to learn successful behaviors. While recent research, such as our Deep Planning Network (PlaNet), has pushed these boundaries by learning accurate world models from images, model-based approaches have still been held back by ineffective or computationally expensive planning mechanisms, limiting their ability to solve difficult tasks.

Today, in collaboration with DeepMind, we present Dreamer, an RL agent that learns a world model from images and uses it to learn long-sighted behaviors. Dreamer leverages its world model to efficiently learn behaviors via backpropagation through model predictions. By learning to compute compact model states from raw images, the agent is able to efficiently learn from thousands of predicted sequences in parallel using just one GPU. Dreamer achieves a new state-of-the-art in performance, data efficiency and computation time on a benchmark of 20 continuous control tasks given raw image inputs. To stimulate further advancement of RL, we are releasing the source code to the research community.

How Does Dreamer Work?
Dreamer consists of three processes that are typical for model-based methods: learning the world model, learning behaviors from predictions made by the world model, and executing its learned behaviors in the environment to collect new experience. To learn behaviors, Dreamer uses a value network to take into account rewards beyond the planning horizon and an actor network to efficiently compute actions. The three processes, which can be executed in parallel, are repeated until the agent has achieved its goals:
The three processes of the Dreamer agent. The world model is learned from past experience. From predictions of this model, the agent then learns a value network to predict future rewards and an actor network to select actions. The actor network is used to interact with the environment.
Learning the World Model
Dreamer leverages the PlaNet world model, which predicts outcomes based on a sequence of compact model states that are computed from the input images, instead of directly predicting from one image to the next. It automatically learns to produce model states that represent concepts helpful for predicting future outcomes, such as object types, positions of objects, and the interaction of the objects with their surroundings. Given a sequence of images, actions, and rewards from the agent's dataset of past experience, Dreamer learns the world model as shown:
Dreamer learns a world model from experience. Using past images (o1–o3) and actions (a1–a2), it computes a sequence of compact model states (green circles) from which it reconstructs the images (ô1–ô3) and predicts the rewards (r̂1–r̂3).
An advantage to using the PlaNet world model is that predicting ahead using compact model states instead of images greatly improves the computational efficiency. This enables the model to predict thousands of sequences in parallel on a single GPU. The approach can also facilitate generalization, leading to accurate long-term video predictions. To gain insights into how the model works, we can visualize the predicted sequences by decoding the compact model states back into images, as shown below for a task of the DeepMind Control Suite and for a task of the DeepMind Lab environment:
Predicting ahead using compact model states enables long-term predictions in complex environments. Shown here are two sequences that the agent has not encountered before. Given five input images, the model reconstructs them and predicts the future images up to time step 50.
Efficient Behavior Learning
Previously developed model-based agents typically select actions either by planning through many model predictions or by using the world model in place of a simulator to reuse existing model-free techniques. Both designs are computationally demanding and do not fully leverage the learned world model. Moreover, even powerful world models are limited in how far ahead they can accurately predict, rendering many previous model-based agents shortsighted. Dreamer overcomes these limitations by learning a value network and an actor network via backpropagation through predictions of its world model.

Dreamer efficiently learns the actor network to predict successful actions by propagating gradients of rewards backwards through predicted state sequences, which is not possible for model-free approaches. This tells Dreamer how small changes to its actions affect what rewards are predicted in the future, allowing it to refine the actor network in the direction that increases the rewards the most. To consider rewards beyond the prediction horizon, the value network estimates the sum of future rewards for each model state. The rewards and values are then backpropagated to refine the actor network to select improved actions:
Dreamer learns long-sighted behaviors from predicted sequences of model states. It first learns the long-term value (v̂2–v̂3) of each state, and then predicts actions (â1–â2) that lead to high rewards and values by backpropagating them through the state sequence to the actor network.
Dreamer differs from PlaNet in several ways. For a given situation in the environment, PlaNet searches for the best action among many predictions for different action sequences. In contrast, Dreamer side-steps this expensive search by decoupling planning and acting. Once its actor network has been trained on predicted sequences, it computes the actions for interacting with the environment without additional search. In addition, Dreamer considers rewards beyond the planning horizon using a value function and leverages backpropagation for efficient planning.

Performance on Control Tasks
We evaluated Dreamer on a standard benchmark of 20 diverse tasks with continuous actions and image inputs. The tasks include balancing and catching objects, as well as locomotion of various simulated robots. The tasks are designed to pose a variety of challenges to the RL agent, including difficult to predict collisions, sparse rewards, chaotic dynamics, small but relevant objects, high degrees of freedom, and 3D perspectives:
Dreamer learns to solve 20 challenging continuous control tasks with image inputs, 5 of which are displayed here. The visualizations show the same 64x64 images that the agent receives from the environment.
We compare the performance of Dreamer to that of PlaNet, the previous best model-based agent, the popular model-free agent, A3C, as well as the current best model-free agent on this benchmark, D4PG, which combines several advances of model-free RL. The model-based agents learn efficiently in under 5 million frames, corresponding to 28 hours inside the simulation. The model-free agents learn more slowly and require 100 million frames, corresponding to 23 days inside the simulation.

On the benchmark of 20 tasks, Dreamer outperforms the best model-free agent (D4PG) with an average score of 823 compared to 786, while learning from 20 times fewer environment interactions. Moreover, it exceeds the final performance of the previously best model-based agent (PlaNet) across almost all of the tasks. The computation time of 16 hours for training Dreamer is less than the 24 hours required for the other methods. The final performance of the four agents is shown below:
Dreamer outperforms the previous best model-free (D4PG) and model-based (PlaNet) methods on the benchmark of 20 tasks in terms of final performance, data efficiency, and computation time.
In addition to our main experiments on continuous control tasks, we demonstrate the generality of Dreamer by applying it to tasks with discrete actions. For this, we select Atari games and DeepMind Lab levels that require both reactive and long-sighted behavior, spatial awareness, and understanding of visually more diverse scenes. The resulting behaviors are visualized below, showing that Dreamer also efficiently learns to solve these more challenging tasks:
Dreamer learns successful behaviors on Atari games and DeepMind Lab levels, which feature discrete actions and visually more diverse scenes, including 3D environments with multiple objects.
Our work demonstrates that learning behaviors from sequences predicted by world models alone can solve challenging visual control tasks from image inputs, surpassing the performance of previous model-free approaches. Moreover, Dreamer demonstrates that learning behaviors by backpropagating value gradients through predicted sequences of compact model states is successful and robust, solving a diverse collection of continuous and discrete control tasks. We believe that Dreamer offers a strong foundation for further pushing the limits of reinforcement learning, including better representation learning, directed exploration with uncertainty estimates, temporal abstraction, and multi-task learning.

This project is a collaboration with Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. We further thank everybody in the Brain Team and beyond who commented on our paper draft and provided feedback at any point throughout the project.

Source: Google AI Blog

Google Summer of Code 2020 now open for student applications!

If you’re a university student and want to sharpen your software development skills while doing good for the open source community, check out Google Summer of Code (GSoC) 2020! This will be our 16th year of GSoC!

We are now accepting student applications for our program that introduces university students from around the world to open source software communities, as well as our enthusiastic and generous community of mentors. For three months students code from the comfort of their homes (the program is entirely online!) and receive stipends based on the successful completion of their project milestones.

Past participants say the real-world experience that GSoC provides honed their technical skills, boosted their confidence, expanded their professional network, and enhanced their resume, all while making them better developers.

Interested students can submit proposals on the program site between now and Tuesday, March 31, 2020 at 18:00 UTC.

While many students began preparing in February when we announced the 200 participating open source organizations, it’s not too late for you to start! The first step is to browse the list of organizations and look for project ideas that appeal to you. Next, reach out to the organization to introduce yourself and determine if your skills and interests are a good fit. Since spots are limited, we recommend writing a strong proposal and submitting a draft early so you can communicate with the organization and get their feedback to increase your odds of being selected.

You can learn more about how to prepare by watching the video below and checking out the Student Guide and Advice for Students.

You can find more information on our website, including a full timeline of important dates. We also highly recommend reviewing the FAQ and Program Rules.

Remember to submit your proposals early as you only have until Tuesday, March 31 at 18:00 UTC. Good luck to all who apply!

By Stephanie Taylor, Google Open Source

Fast and Easy Infinitely Wide Networks with Neural Tangents

The widespread success of deep learning across a range of domains such as natural language processing, conversational agents, and connectomics, has transformed the landscape of research in machine learning and left researchers with a number of interesting and important open questions such as: Why do deep neural networks (DNNs) generalize so well despite being overparameterized? What is the relationship between architecture, training, and performance for deep networks? How can one extract salient features from deep learning models?

One of the key theoretical insights that has allowed us to make progress in recent years has been that increasing the width of DNNs results in more regular behavior, and makes them easier to understand. A number of recent results have shown that DNNs that are allowed to become infinitely wide converge to another, simpler, class of models called Gaussian processes. In this limit, complicated phenomena (like Bayesian inference or gradient descent dynamics of a convolutional neural network) boil down to simple linear algebra equations. Insights from these infinitely wide networks frequently carry over to their finite counterparts. As such, infinite-width networks can be used as a lens to study deep learning, but also as useful models in their own right.
Left: A schematic showing how deep neural networks induce simple input / output maps as they become infinitely wide. Right: As the width of a neural network increases , we see that the distribution of outputs over different random instantiations of the network becomes Gaussian.
Unfortunately, deriving the infinite-width limit of a finite network requires significant mathematical expertise and has to be worked out separately for each architecture studied. Once the infinite-width model is derived, coming up with an efficient and scalable implementation further requires significant engineering proficiency. Together, the process of taking a finite-width model to its corresponding infinite-width network could take months and might be the topic of a research paper in its own right.

To address this issue and to accelerate theoretical progress in deep learning, we present Neural Tangents, a new open-source software library written in JAX that allows researchers to build and train infinitely wide neural networks as easily as finite neural networks. At its core, Neural Tangents provides an easy-to-use neural network library that builds finite- and infinite-width versions of neural networks simultaneously.

As an example of the utility of Neural Tangents, imagine training a fully-connected neural network on some data. Normally, a neural network is randomly initialized and then trained using gradient descent. Initializing and training many of these neural networks results in an ensemble. Often researchers and practitioners average the predictions from different members of the ensemble together for better performance. Additionally, the variance in the predictions of members of the ensemble can be used to estimate uncertainty. The downside is that training an ensemble of networks requires a significant computational budget, so it is often avoided. However, when the neural networks become infinitely wide, the ensemble is described by a Gaussian process with a mean and variance that can be computed throughout training.

With Neural Tangents, one can construct and train ensembles of these infinite-width networks at once using only five lines of code! The resulting training process is displayed below, and an interactive colaboratory notebook going through this experiment can be found here.
In both plots we compare training of an ensemble of finite neural networks with the infinite-width ensemble of the same architecture. The empirical mean and variance of the finite ensemble is displayed as a dashed black line between two dotted black lines. The closed-form mean and variance of the infinite-width ensemble is displayed as a solid colored line inside a filled color region. In both plots finite- and infinite-width ensembles match very closely and can be hard to distinguish. Left: Outputs (vertical f-axis) on the input data (horizontal x-axis) as the training progresses. Right: Train and test loss with uncertainty over the course of training.
Despite the fact that the infinite-width ensemble is governed by a simple closed-form expression, it exhibits remarkable agreement with the finite-width ensemble. And since the infinite-width ensemble is a Gaussian process, it naturally provides closed-form uncertainty estimates (filled colored regions in the figure above). These uncertainty estimates closely match the variation of predictions that are observed when training many different copies of the finite network (dashed lines).

The above example shows the power of infinite-width neural networks to capture training dynamics. However, networks built using Neural Tangents can be applied to any problem on which you could apply a regular neural network. For example, below we compare three different infinite-width neural network architectures on image recognition using the CIFAR-10 dataset. Remarkably, we can evaluate ensembles of highly-elaborate models like infinitely wide residual networks in closed-form under both gradient descent and fully-Bayesian inference (an intractable task in the finite-width regime).
We see that, mimicking finite neural networks, infinite-width networks follow a similar hierarchy of performance with fully-connected networks performing worse than convolutional networks, which in turn perform worse than wide residual networks. However, unlike regular training, the learning dynamics of these models is completely tractable in closed-form, which allows unprecedented insight into their behavior.

We invite everyone to explore the infinite-width versions of their models with Neural Tangents, and help us open the black box of deep learning. To get started, please check out the paper, the tutorial Colab notebook, and the Github repo — contributions, feature requests, and bug reports are very welcome. This work has been accepted as a spotlight at ICLR 2020.

Neural Tangents is being actively developed by Lechao Xiao, Roman Novak, Jiri Hron, Jaehoon Lee, Alex Alemi, Jascha Sohl-Dickstein, and Samuel S. Schoenholz. We also thank Yasaman Bahri and Greg Yang for the ongoing contributions to improve the library, as well as Sergey Ioffe, Ben Adlam, Ravid Ziv, and Jeffrey Pennington for frequent discussion and useful feedback. Finally, we thank Tom Small for creating the animation in the first figure.

Source: Google AI Blog

WebAssembly brings extensibility to network proxies

With the Istio 1.5 release we are happy to introduce WebAssembly (Wasm) extensions in Envoy, built with our long running collaborators Lyft and IBM. With partners like Solo.io deepening their engagement as well, we are excited to see the community and ecosystem developing around this segment of the open source world.

The Envoy service proxy has taken the Cloud Native landscape by storm since it was open sourced by Lyft in 2016, quickly becoming a fixture in modern app deployment—both at the edge and as a sidecar. Since Google and IBM started the Istio project and selected Envoy as the proxy of choice for service mesh, we have been working with the Envoy community to improve performance and add functionality. In fact, Google now commits more code to Envoy than any other company.

Envoy has always had an extension mechanism, either with compiled-in C++ modules or Lua scripts—both with downsides. One of our design goals with Istio was to bring ease of extensibility to allow an ecosystem of policy, telemetry, and logging systems. We did this with a control plane component and out-of-process adapters that could be written in any language, but this approach introduced additional network hops and latency.

This is where Wasm comes in. Wasm is a binary instruction format, compilable from over 30 languages, with a runtime to execute it in a sandboxed environment. Already embedded in all major browsers and with a W3C working group defining the standards, we are now bringing it server-side via Envoy. It allows adding functionality to the Envoy proxy without recompiling it, without forking, and without difficult rollouts. Istio can distribute extensions to proxies and load them without even restarting. This really brings together the best of both worlds in terms of extensibility—choice of language and great performance.

“I am extremely excited to see Wasm support land in Envoy; this is the future of Envoy extensibility, full stop. Envoy’s Wasm support coupled with a community driven hub will unlock an incredible amount of innovation in the networking space across both service mesh and API gateway use cases. I can’t wait to see what the community builds moving forward.” – Matt Klein, Envoy creator

To make sure that developing Wasm extensions is a great experience, our partner Solo.io has been working hard on creating a great developer experience. Solo.io recently announced WebAssembly Hub, a service for building, sharing, discovering and deploying Wasm extensions. With the WebAssembly Hub, Wasm extensions are as easy to manage, install and run as containers.

“We are committed to creating the most user friendly developer experience for service mesh. Like Docker did for containers, our goal is to simplify the consumption of WebAssembly extensions, which is the ‘why' behind WebAssembly Hub. By working with Google and the Istio open source community, we are able to simplify the experience of creating, sharing and deploying WebAssembly extensions to Envoy proxy and Istio, to bring the power of WebAssembly to more languages, and to enable a broader set of developers to innovate on service mesh." said Idit Levine, CEO and Founder, Solo.io.

One major retailer is looking to use Wasm to integrate with their policy system as they standardize use of Envoy—at the edge, as a sidecar, and even in their stores. The ability to roll out a policy change that is enforced everywhere they serve traffic, all with a great developer experience, makes Wasm a very attractive option for them.

By Dan Ciruli, Istio

Measuring Compositional Generalization

People are capable of learning the meaning of a new word and then applying it to other language contexts. As Lake and Baroni put it, “Once a person learns the meaning of a new verb ‘dax’, he or she can immediately understand the meaning of ‘dax twice’ and ‘sing and dax’.” Similarly, one can learn a new object shape and then recognize it with different compositions of previously learned colors or materials (e.g., in the CLEVR dataset). This is because people exhibit the capacity to understand and produce a potentially infinite number of novel combinations of known components, or as Chomsky said, to make “infinite use of finite means.” In the context of a machine learning model learning from a set of training examples, this skill is called compositional generalization.

A common approach for measuring compositional generalization in machine learning (ML) systems is to split the training and testing data based on properties that intuitively correlate with compositional structure. For instance, one approach is to split the data based on sequence length—the training set consists of short examples, while the test set consists of longer examples. Another approach uses sequence patterns, meaning the split is based on randomly assigning clusters of examples sharing the same pattern to either train or test sets. For instance, the questions "Who directed Movie1" and "Who directed Movie2" both fall into the pattern "Who directed <MOVIE>" so they would be grouped together. Yet another method uses held out primitives—some linguistic primitives are shown very rarely during training (e.g., the verb “jump”), but are very prominent in testing. While each of these experiments are useful, it is not immediately clear which experiment is a "better" measure for compositionality. Is it possible to systematically design an “optimal” compositional generalization experiment?

In “Measuring Compositional Generalization: A Comprehensive Method on Realistic Data”, we attempt to address this question by introducing the largest and most comprehensive benchmark for compositional generalization using realistic natural language understanding tasks, specifically, semantic parsing and question answering. In this work, we propose a metric—compound divergence—that allows one to quantitatively assess how much a train-test split measures the compositional generalization ability of an ML system. We analyze the compositional generalization ability of three sequence to sequence ML architectures, and find that they fail to generalize compositionally. We also are releasing the Compositional Freebase Questions dataset used in the work as a resource for researchers wishing to improve upon these results.

Measuring Compositionality

In order to measure the compositional generalization ability of a system, we start with the assumption that we understand the underlying principles of how examples are generated. For instance, we begin with the grammar rules to which we must adhere when generating questions and answers. We then draw a distinction between atoms and compounds. Atoms are the building blocks that are used to generate examples and compounds are concrete (potentially partial) compositions of these atoms. For example, in the figure below, every box is an atom (e.g., Shane Steel, brother, <entity>'s <entity>, produce, etc.), which fits together to form compounds, such as produce and <verb>, Shane Steel’s brother, Did Shane Steel’s brother produce and direct Revenge of the Spy?, etc.
Building compositional sentences (compounds) from building blocks (atoms)

An ideal compositionality experiment then should have a similar atom distribution, i.e., the distribution of words and sub-phrases in the training set is as similar as possible to their distribution in the test set, but with a different compound distribution. To measure compositional generalization on a question answering task about a movie domain, one might, for instance, have the following questions in train and test:

Train set Test set
Who directed Inception?
Did Greta Gerwig direct Goldfinger?
Did Greta Gerwig produce Goldfinger?
Who produced Inception?
While atoms such as “directed”, “Inception”, and “who <predicate> <entity>” appear in both the train and test sets, the compounds are different.

The Compositional Freebase Questions dataset

In order to conduct an accurate compositionality experiment, we created the Compositional Freebase Questions (CFQ) dataset, a simple, yet realistic, large dataset of natural language questions and answers generated from the public Freebase knowledge base. The CFQ can be used for text-in / text-out tasks, as well as semantic parsing. In our experiments, we focus on semantic parsing, where the input is a natural language question and the output is a query, which when executed against Freebase, produces the correct outcome. CFQ contains around 240k examples and almost 35k query patterns, making it significantly larger and more complex than comparable datasets — about 4 times that of WikiSQL with about 17x more query patterns than Complex Web Questions. Special care has been taken to ensure that the questions and answers are natural. We also quantify the complexity of the syntax in each example using the “complexity level” metric (L), which corresponds roughly to the depth of the parse tree, examples of which are shown below.

LQuestion → Answer
10What did Commerzbank acquire? → Eurohypo; Dresdner Bank
15Did Dianna Rhodes’s spouse produce Soldier Blue? → No
20Which costume designer of E.T. married Mannequin’s cinematographer? → Deborah Lynn Scott
40Was Weekend Cowgirls produced, directed, and written by a film editor that The Evergreen State College and Fairway Pictures employed → No
50Were It’s Not About the Shawerma, The Fifth Wall, Rick’s Canoe, White Stork Is Coming, and Blues for the Avatar executive produced, edited, directed, and written by a screenwriter’s parent? → Yes

Compositional Generalization Experiments on CFQ

For a given train-test split, if the compound distributions of the train and test sets are very similar, then their compound divergence would be close to 0, indicating that they are not difficult tests for compositional generalization. A compound divergence close to 1 means that the train-test sets have many different compounds, which makes it a good test for compositional generalization. Compound divergence thus captures the notion of "different compound distribution", as desired.

We algorithmically generate train-test splits using the CFQ dataset that have a compound divergence ranging from 0 to 0.7 (the maximum that we were able to achieve). We fix the atom divergence to be very small. Then, for each split we measure the performance of three standard ML architectures — LSTM+attention, Transformer, and Universal Transformer. The results are shown in the graph below.
Compound divergence vs accuracy for three ML architectures. There is a surprisingly strong negative correlation between compound divergence and accuracy.

We measure the performance of a model by comparing the correct answers with the output string given by the model. All models achieve an accuracy greater than 95% when the compound divergence is very low. The mean accuracy on the split with highest compound divergence is below 20% for all architectures, which means that even a large training set with a similar atom distribution between train and test is not sufficient for the architectures to generalize well. For all architectures, there is a strong negative correlation between the compound divergence and the accuracy. This seems to indicate that compound divergence successfully captures the core difficulty for these ML architectures to generalize compositionally.

Potentially promising directions for future work might be to apply unsupervised pre-training on input language or output queries, or to use more diverse or more targeted learning architectures, such as syntactic attention. It would also be interesting to apply this approach to other domains such as visual reasoning, e.g. based on CLEVR, or to extend our approach to broader subsets of language understanding, including the use of ambiguous constructs, negations, quantification, comparatives, additional languages, and other vertical domains. We hope that this work will inspire others to use this benchmark to advance the compositional generalization capabilities of learning systems.

By Marc van Zee, Software Engineer, Google Research – Brain Team