Tag Archives: machine learning

Eager Execution: An imperative, define-by-run interface to TensorFlow



Today, we introduce eager execution for TensorFlow. Eager execution is an imperative, define-by-run interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive.

The benefits of eager execution include:
  • Fast debugging with immediate run-time errors and integration with Python tools
  • Support for dynamic models using easy-to-use Python control flow
  • Strong support for custom and higher-order gradients
  • Almost all of the available TensorFlow operations
Eager execution is available now as an experimental feature, so we're looking for feedback from the community to guide our direction.

To understand this all better, let's look at some code. This gets pretty technical; familiarity with TensorFlow will help.

Using Eager Execution

When you enable eager execution, operations execute immediately and return their values to Python without requiring a Session.run(). For example, to multiply two matrices together, we write this:
import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution()

x = [[2.]]
m = tf.matmul(x, x)
It’s straightforward to inspect intermediate results with print or the Python debugger.
print(m)
# The 1x1 matrix [[4.]]
Dynamic models can be built with Python flow control. Here's an example of the Collatz conjecture using TensorFlow’s arithmetic operations:
a = tf.constant(12)
counter = 0
while not tf.equal(a, 1):
if tf.equal(a % 2, 0):
a = a / 2
else:
a = 3 * a + 1
print(a)
Here, the use of the tf.constant(12) Tensor object will promote all math operations to tensor operations, and as such all return values with be tensors.

Gradients

Most TensorFlow users are interested in automatic differentiation. Because different operations can occur during each call, we record all forward operations to a tape, which is then played backwards when computing gradients. After we've computed the gradients, we discard the tape.

If you’re familiar with the autograd package, the API is very similar. For example:
def square(x):
return tf.multiply(x, x)

grad = tfe.gradients_function(square)

print(square(3.)) # [9.]
print(grad(3.)) # [6.]
The gradients_function call takes a Python function square() as an argument and returns a Python callable that computes the partial derivatives of square() with respect to its inputs. So, to get the derivative of square() at 3.0, invoke grad(3.0), which is 6.

The same gradients_function call can be used to get the second derivative of square:
gradgrad = tfe.gradients_function(lambda x: grad(x)[0])

print(gradgrad(3.)) # [2.]
As we noted, control flow can cause different operations to run, such as in this example.
def abs(x):
return x if x > 0. else -x

grad = tfe.gradients_function(abs)

print(grad(2.0)) # [1.]
print(grad(-2.0)) # [-1.]

Custom Gradients

Users may want to define custom gradients for an operation, or for a function. This may be useful for multiple reasons, including providing a more efficient or more numerically stable gradient for a sequence of operations.

Here is an example that illustrates the use of custom gradients. Let's start by looking at the function log(1 + ex), which commonly occurs in the computation of cross entropy and log likelihoods.
def log1pexp(x):
return tf.log(1 + tf.exp(x))
grad_log1pexp = tfe.gradients_function(log1pexp)

# The gradient computation works fine at x = 0.
print(grad_log1pexp(0.))
# [0.5]
# However it returns a `nan` at x = 100 due to numerical instability.
print(grad_log1pexp(100.))
# [nan]
We can use a custom gradient for the above function that analytically simplifies the gradient expression. Notice how the gradient function implementation below reuses an expression (tf.exp(x)) that was computed during the forward pass, making the gradient computation more efficient by avoiding redundant computation.
@tfe.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
grad_log1pexp = tfe.gradients_function(log1pexp)

# Gradient at x = 0 works as before.
print(grad_log1pexp(0.))
# [0.5]
# And now gradient computation at x=100 works as well.
print(grad_log1pexp(100.))
# [1.0]

Building models

Models can be organized in classes. Here's a model class that creates a (simple) two layer network that can classify the standard MNIST handwritten digits.
class MNISTModel(tfe.Network):
def __init__(self):
super(MNISTModel, self).__init__()
self.layer1 = self.track_layer(tf.layers.Dense(units=10))
self.layer2 = self.track_layer(tf.layers.Dense(units=10))
def call(self, input):
"""Actually runs the model."""
result = self.layer1(input)
result = self.layer2(result)
return result
We recommend using the classes (not the functions) in tf.layers since they create and contain model parameters (variables). Variable lifetimes are tied to the lifetime of the layer objects, so be sure to keep track of them.

Why are we using tfe.Network? A Network is a container for layers and is a tf.layer.Layer itself, allowing Network objects to be embedded in other Network objects. It also contains utilities to assist with inspection, saving, and restoring.

Even without training the model, we can imperatively call it and inspect the output:
# Let's make up a blank input image
model = MNISTModel()
batch = tf.zeros([1, 1, 784])
print(batch.shape)
# (1, 1, 784)
result = model(batch)
print(result)
# tf.Tensor([[[ 0. 0., ...., 0.]]], shape=(1, 1, 10), dtype=float32)
Note that we do not need any placeholders or sessions. The first time we pass in the input, the sizes of the layers’ parameters are set.

To train any model, we define a loss function to optimize, calculate gradients, and use an optimizer to update the variables. First, here's a loss function:
def loss_function(model, x, y):
y_ = model(x)
return tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_)
And then, our training loop:
optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
for (x, y) in tfe.Iterator(dataset):
grads = tfe.implicit_gradients(loss_function)(model, x, y)
optimizer.apply_gradients(grads)
implicit_gradients() calculates the derivatives of loss_function with respect to all the TensorFlow variables used during its computation.

We can move computation to a GPU the same way we’ve always done with TensorFlow:
with tf.device("/gpu:0"):
for (x, y) in tfe.Iterator(dataset):
optimizer.minimize(lambda: loss_function(model, x, y))
(Note: We're shortcutting storing our loss and directly calling the optimizer.minimize, but you could also use the apply_gradients() method above; they are equivalent.)

Using Eager with Graphs

Eager execution makes development and debugging far more interactive, but TensorFlow graphs have a lot of advantages with respect to distributed training, performance optimizations, and production deployment.

The same code that executes operations when eager execution is enabled will construct a graph describing the computation when it is not. To convert your models to graphs, simply run the same code in a new Python session where eager execution hasn’t been enabled, as seen, for example, in the MNIST example. The value of model variables can be saved and restored from checkpoints, allowing us to move between eager (imperative) and graph (declarative) programming easily. With this, models developed with eager execution enabled can be easily exported for production deployment.

In the near future, we will provide utilities to selectively convert portions of your model to graphs. In this way, you can fuse parts of your computation (such as internals of a custom RNN cell) for high-performance, but also keep the flexibility and readability of eager execution.

How does my code change?

Using eager execution should be intuitive to current TensorFlow users. There are only a handful of eager-specific APIs; most of the existing APIs and operations work with eager enabled. Some notes to keep in mind:
  • As with TensorFlow generally, we recommend that if you have not yet switched from queues to using tf.data for input processing, you should. It's easier to use and usually faster. For help, see this blog post and the documentation page.
  • Use object-oriented layers, like tf.layer.Conv2D() or Keras layers; these have explicit storage for variables.
  • For most models, you can write code so that it will work the same for both eager execution and graph construction. There are some exceptions, such as dynamic models that use Python control flow to alter the computation based on inputs.
  • Once you invoke tfe.enable_eager_execution(), it cannot be turned off. To get graph behavior, start a new Python session.

Getting started and the future

This is still a preview release, so you may hit some rough edges. To get started today:
There's a lot more to talk about with eager execution and we're excited… or, rather, we're eager for you to try it today! Feedback is absolutely welcome.

Closing the Simulation-to-Reality Gap for Deep Robotic Learning



Each of us can learn remarkably complex skills that far exceed the proficiency and robustness of even the most sophisticated robots, when it comes to basic sensorimotor skills like grasping. However, we also draw on a lifetime of experience, learning over the course of multiple years how to interact with the world around us. Requiring such a lifetime of experience for a learning-based robot system is quite burdensome: the robot would need to operate continuously, autonomously, and initially at a low level of proficiency before it can become useful. Fortunately, robots have a powerful tool at their disposal: simulation.

Simulating many years of robotic interaction is quite feasible with modern parallel computing, physics simulation, and rendering technology. Moreover, the resulting data comes with automatically-generated annotations, which is particularly important for tasks where success is hard to infer automatically. The challenge with simulated training is that even the best available simulators do not perfectly capture reality. Models trained purely on synthetic data fail to generalize to the real world, as there is a discrepancy between simulated and real environments, in terms of both visual and physical properties. In fact, the more we increase the fidelity of our simulations, the more effort we have to expend in order to build them, both in terms of implementing complex physical phenomena and in terms of creating the content (e.g., objects, backgrounds) to populate these simulations. This difficulty is compounded by the fact that powerful optimization methods based on deep learning are exceptionally proficient at exploiting simulator flaws: the more powerful the machine learning algorithm, the more likely it is to discover how to "cheat" the simulator to succeed in ways that are infeasible in the real world. The question then becomes: how can a robot utilize simulation to enable it to perform useful tasks in the real world?

The difficulty of transferring simulated experience into the real world is often called the "reality gap." The reality gap is a subtle but important discrepancy between reality and simulation that prevents simulated robotic experience from directly enabling effective real-world performance. Visual perception often constitutes the widest part of the reality gap: while simulated images continue to improve in fidelity, the peculiar and pathological regularities of synthetic pictures, and the wide, unpredictable diversity of real-world images, makes bridging the reality gap particularly difficult when the robot must use vision to perceive the world, as is the case for example in many manipulation tasks. Recent advances in closing the reality gap with deep learning in computer vision for tasks such as object classification and pose estimation provide promising solutions.  For example,  Shrivastava et al. and Bousmalis et al. explored pixel-level domain adaptation. Ganin et al. and Bousmalis and Trigeorgis et al. focus on feature-level domain adaptation. These advances required a rethinking of the approaches used to solve the simulation-to-reality domain shift problem for robotic manipulation as well. Although a number of recent works have sought to address the reality gap in robotics, through techniques such as machine learning-based domain adaptation (Tzeng et al.) and randomization of simulated environments (Sadeghi and Levine), effective transfer in robotic manipulation has been limited to relatively simple tasks, such as grasping rectangular, brightly-colored objects (Tobin et al. and James et al.) and free-space motion (Christiano et al.).  In this post, we describe how learning in simulation, in our case PyBullet, and using domain adaptation methods such as machine learning methods that deal with the simulation-to-reality domain shift, can accelerate learning of robotic grasping in the real world. This approach can enable real robots to grasp a large of variety physical objects, unseen during training, with a high degree of proficiency.

The performance effect of using 8 million simulated samples of procedural objects with no randomization and various amounts of real data.

Before we consider introducing simulated experience, what does it take for our robots to learn to reliably grasp such not-before-seen objects with only real-world experience? In a previous post, we discussed how the Google Brain team and X’s robotics teams teach robots how to grasp a variety of ordinary objects by just using images from a single monocular camera. It takes tens to hundreds of thousands of grasp attempts, the equivalent of thousands of robot-hours of real-world experience. Although distributing the learning across multiple robots expedites this, the realities of real-world data collection, including maintenance and wear-and-tear, mean that these kinds of data collection efforts still take a significant amount of real time. As mentioned above, an appealing alternative is to use off-the-shelf simulators and learn basic sensorimotor skills like grasping in a virtual environment. Training a robot how to grasp in simulation can be parallelized easily over any number of machines, and can provide large amounts of experience in dramatically less time (e.g., hours rather than months) and at a fraction of the cost.


If the goal is to bridge the reality gap for vision-based robotic manipulation, we must answer a few critical questions. First, how do we design simulation so that simulated experience appears realistic to a neural network? And second, how should we integrate simulated and real experience in a way that maximizes transfer to the real world? We studied these questions in the context of a particularly challenging and important robotic manipulation task: vision-based grasping of diverse objects. We extensively evaluated the effect of various simulation design decisions in combination with various techniques for integrating simulated and real experience for maximal performance.
The setup we used for collecting the simulated and real-world datasets.

Images used during training of simulated grasping experience with procedurally generated objects (left) and of real-world experience with a varied collection of everyday physical objects (right). In both cases, we see pairs of image inputs with and without the robot arm present.
When it comes to simulation, there are a number of choices we have to make: the type of objects to use for simulated grasping, whether to use appearance and/or dynamics randomization, and whether to extract any additional information from the simulator that could aid adaptation to the real world. The types of objects we use in simulation is a particularly important one, and there are a number of choices. A question that comes naturally is: how realistic do the objects used in simulation need to be? Using randomly generated procedural objects is the most desirable choice, because these objects are generated effortlessly on demand, and are easy to parameterize if we change the requirements of the task. However, they are not realistic and one could imagine they might not be useful for transferring the experience of grasping them to the real world. Using realistic 3D object models from a publicly available model library, such as the widely used ShapeNet, is another choice, which however restricts our findings to be related to the characteristics of the specific models we are using. In this work, we compared the effect of using procedurally-generated and realistic objects from the ShapeNet model repository, and found that simply using random objects generated programmatically was not just sufficient for efficient experience transfer from simulation to reality, but also generalized better to the real world than using ShapeNet ones.

Some of the procedurally-generated objects used in simulation.

Some of the ShapeNet objects used in simulation.

Some of the physical objects used to collect real grasping experience.
Another decision about our simulated environment has to do with the randomization of the simulation. Simulation randomization has shown promise in providing generalization to real-world environments in previous work. We further evaluate randomization as a way to provide generalization by separately evaluating the effect of using appearance randomization (randomly changing textures of different visual components of the virtual environment), and dynamics randomization (randomly changing object mass, and friction properties). For our task, visual randomization had a positive effect when we did not use domain adaptation methods to aid with generalization, and had no effect when we included domain adaptation. Using dynamics randomization did not show a significant improvement for this particular task, however it is possible that dynamics randomization might be more relevant in other tasks. These results suggest that, although randomization can be an important part of simulation-to-real-world transfer, the inclusion of effective domain adaptation can have a substantially more pronounced impact for vision-based manipulation tasks.

Appearance randomization in simulation.
Finally, the information we choose to extract and use for our domain adaptation methods has a significant impact on performance. In one of our proposed methods, we utilize the extracted semantic map of the simulated image, ie the description of each pixel in the simulated image, and use it to ground our proposed domain adaptation approach to produce semantically-meaningful realistic samples, as we discuss below.

Our main proposed approach to integrating simulated and real experience, which we call GraspGAN, takes as input synthetic images generated by a simulator, along with their semantic maps, and produces adapted images that look similar to real-world ones. This is possible with adversarial training, a powerful idea proposed by Goodfellow et al. In our framework, a convolutional neural network, the generator, takes as input synthetic images and generates images that another neural network, the discriminator, cannot distinguish from actual real images. The generator and discriminator networks are trained simultaneously and improve together, resulting in a generator that can produce images that are both realistic and useful for learning a grasping model that will generalize to the real world. One way to make sure that these images are useful is the use of the semantic maps of the synthetic images to ground the generator. By using the prediction of these masks as an auxiliary task, the generator is encouraged to produce meaningful adapted images that correspond to the original label attributed to the simulated experience. We train a deep vision-based grasping model with both visually-adapted simulated and real images, and attempt to account for the domain shift further by using a feature-level domain adaptation technique which helps produce a domain-invariant model. See below the GraspGAN adapting simulated images to realistic ones and a semantic map it infers.


By using synthetic data and domain adaptation we are able to reduce the number of real-world samples required to achieve a given level of performance by up to 50 times, using only randomly generated objects in simulation. This means that we have no prior information about the objects in the real world, other than pre-specified size limits for the graspable objects. We have shown that we are able to increase performance with various amounts of real-world data, and also that by using only unlabeled real-world data and our GraspGAN methodology, we obtain real-world grasping performance without any real-world labels that is similar to that achieved with hundreds of thousands of labeled real-world samples. This suggests that, instead of collecting labeled experience, it may be sufficient in the future to simply record raw unlabeled images, use them to train a GraspGAN model, and then learn the skills themselves in simulation.

Although this work has not addressed all the issues around closing the reality gap, we believe that our results show that using simulation and domain adaptation to integrate simulated and real robotic experience is an attractive choice for training robots. Most importantly, we have extensively evaluated the performance gains for different available amounts of labeled real-world samples, and for the different design choices for both the simulator and the domain adaptation methods used. This evaluation can hopefully serve as a guide for practitioners to use for their own design decisions and for weighing the advantages and disadvantages of incorporating such an approach in their experimental design.

This research was conducted by K. Bousmalis, A. Irpan, P. Wohlhart, Y. Bai, M, Kelcey, M. Kalakrishnan, L. Downs, J. Ibarz, P. Pastor, K. Konolige, S. Levine, V. Vanhoucke, with special thanks to colleagues at Google Research and X who've contributed their expertise and time to this research. An early preprint is available on arXiv.

The collection of procedurally-generated objects we used in simulation was made publicly available here by Laura Downs.











TensorFlow lends a hand to build a rock-paper-scissors machine

Editor’s note: It’s hard to think of a more “analog” game than rock-paper-scissors. But this summer, one Googler decided to use TensorFlow, Google’s open source machine learning system, to build a machine that could play rock-paper-scissors. For more technical details and source code, see the original post on the Google Cloud Big Data and Machine Learning Blog.

This summer, my 12-year-old son and I were looking for a science project to do together. He’s interested in CS and has studied programming with Scratch, so we knew we wanted to do something involving coding. After exploring several ideas, we decided to build a rock-paper-scissors machine that detects a hand gesture, then selects the appropriate pose to respond: rock, paper, or scissors.

But our rock-paper-scissors machine had a secret ingredient: TensorFlow, which in this case runs a very simple ML algorithm that detects your hand posture through an Arduino micro controller connected to the glove.


To build the machine’s hardware, we used littleBits—kid-friendly kits that include a wide variety of components like LEDs, motors, switches, sensors, and controllers—to attach three bend sensors to a plastic glove. When you bend your fingers while wearing the glove, the sensors output an electric signal. To read the signals from the bend sensors and control the machine’s dial, we used an Arduino and Servo module.
hardware components of rock paper scissors
The hardware components of the rock-paper-scissors machine

After putting together the hardware component, we wrote the code to read data from the sensors. The Arduino module takes the input signal voltage it receives from the glove, then converts those signals to a series of three numbers.

probability of rock paper scissors
Estimated probability distribution of rock (red), paper (green) and scissors (blue), as determined by TensorFlow and our linear model

The next step was to determine which combination of three numbers represents rock, paper or scissors. We wanted to do it in a way that could be flexible over time—for example if we wanted to capture more than three hand positions with many more sensors. So we created a linear model—a simple algebraic formula that many of you might have learned in high school or college—and used machine learning in TensorFlow to solve the formula based on the given sensor data and the expected results (rock, paper or scissors). What’s cool about this is that it’s like automated programming—we specify the input and output, and the computer generates the most important transformation in the middle.

Finally, we put it all together. Once we determine the hand’s posture, the Servo controls the machine hand to win the game. (Of course, if you were feeling competitive, you could always program the machine to let YOU win every time… but we would never do that. ?)

kaz's son drawing
My son drawing the sign for the machine hand

Rock-paper-scissors probably isn’t what comes to mind when you think about ML, but this project demonstrates how ML can be useful for all kinds of programmers, regardless of the task—reducing human coding work and speeding up calculations. In this case, it also provided some family fun!

Fighting phishing with smarter protections

Editor’s note: October is Cybersecurity Awareness Month, and we're celebrating with a series of security announcements this week. This is the third post; read the first and second ones.

Online security is top of mind for everyone these days, and we’re more focused than ever on protecting you and your data on Google, in the cloud, on your devices, and across the web.


One of our biggest focuses is phishing, attacks that trick people into revealing personal information like their usernames and passwords. You may remember phishing scams as spammy emails from “princes” asking for money via wire-transfer. But things have changed a lot since then. Today’s attacks are often very targeted—this is called “spear-phishing”—more sophisticated, and may even seem to be from someone you know.


Even for savvy users, today’s phishing attacks can be hard to spot. That’s why we’ve invested in automated security systems that can analyze an internet’s-worth of phishing attacks, detect subtle clues to uncover them, and help us protect our users in Gmail, as well as in other Google products, and across the web.


Our investments have enables us to significantly decrease the volume of phishing emails that users and customers ever see. With our automated protections, account security (like security keys) and warnings, Gmail is the most secure email service today.


Here is a look at some of the systems that have helped us secure users over time, and enabled us to add brand new protections in the last year.

More data helps protect your data


The best protections against large-scale phishing operations are even larger-scale defenses. Safe Browsing and Gmail spam filters are effective because they have such broad visibility across the web. By automatically scanning billions of emails, webpages, and apps for threats, they enable us to see the clearest, most up-to-date picture of the phishing landscape.


We’ve trained our security systems to block known issues for years. But, new, sophisticated phishing emails may come from people’s actual contacts (yes, attackers are able to do this), or include familiar company logos or sign-in pages. Here’s one example:

Screenshot 2017-10-11 at 2.45.09 PM.png

Attacks like this can be really difficult for people to spot. But new insights from our automated defenses have enabled us to immediately detect, thwart and protect Gmail users from subtler threats like these as well.

Smarter protections for Gmail users, and beyond

Since the beginning of the year, we’ve added brand new protections that have reduced the volume of spam in people’s inboxes even further.

  • We now show a warning within Gmail’s Android and iOS apps if a user clicks a link to a phishing site that’s been flagged by Safe Browsing. These supplement the warnings we’ve shown on the web since last year.

safelinks.png

  • We’ve built new systems that detect suspicious email attachments and submit them for further inspection by Safe Browsing. This protects all Gmail users, including G Suite customers, from malware that may be hidden in attachments.
  • We’ve also updated our machine learning models to specifically identify pages that look like common log-in pages and messages that contain spear-phishing signals.

Safe Browsing helps protect more than 3 billion devices from phishing, across Google and beyond. It hunts and flags malicious extensions in the Chrome Web Store, helps block malicious ads, helps power Google Play Protect, and more. And of course, Safe Browsing continues to show millions of red warnings about websites it considers dangerous or insecure in multiple browsers—Chrome, Firefox, Safari—and across many different platforms, including iOS and Android.

pastedImage0 (5).png

Layers of phishing protection


Phishing is a complex problem, and there isn’t a single, silver-bullet solution. That’s why we’ve provided additional protections for users for many years.

pasted image 0 (5).png
  • Since 2012, we’ve warned our users if their accounts are being targeted by government-backed attackers. We send thousands of these warnings each year, and we’ve continued to improve them so they are helpful to people. The warnings look like this.
  • This summer, we began to warn people before they linked their Google account to an unverified third-party app.
  • We first offered two-step verification in 2011, and later strengthened it in 2014 with Security Key, the most secure version of this type of protection. These features add extra protection to your account because attackers need more than just your username and password to sign in.

We’ll never stop working to keep your account secure with industry-leading protections. More are coming soon, so stay tuned.

Pixel Visual Core: image processing and machine learning on Pixel 2

The camera on the new Pixel 2 is packed full of great hardware, software and machine learning (ML), so all you need to do is point and shoot to take amazing photos and videos. One of the technologies that helps you take great photos is HDR+, which makes it possible to get excellent photos of scenes with a large range of brightness levels, from dimly lit landscapes to a very sunny sky.

HDR+ produces beautiful images, and we’ve evolved the algorithm that powers it over the past year to use the Pixel 2’s application processor efficiently, and enable you to take multiple pictures in sequence by intelligently processing HDR+ in the background. In parallel, we’ve also been working on creating hardware capabilities that enable significantly greater computing power—beyond existing hardware—to bring HDR+ to third-party photography applications.

To expand the reach of HDR+, handle the most challenging imaging and ML applications, and deliver lower-latency and even more power-efficient HDR+ processing, we’ve created Pixel Visual Core.

Pixel Visual Core is Google’s first custom-designed co-processor for consumer products. It’s built into every Pixel 2, and in the coming months, we’ll turn it on through a software update to enable more applications to use Pixel 2’s camera for taking HDR+ quality pictures.

pixel visual core
Magnified image of Pixel Visual Core

Let's delve into the details for you technical folks out there: The centerpiece of Pixel Visual Core is the Google-designed Image Processing Unit (IPU)—a fully programmable, domain-specific processor designed from scratch to deliver maximum performance at low power. With eight Google-designed custom cores, each with 512 arithmetic logic units (ALUs), the IPU delivers raw performance of more than 3 trillion operations per second on a mobile power budget. Using Pixel Visual Core, HDR+ can run 5x faster and at less than one-tenth the energy than running on the application processor (AP). A key ingredient to the IPU’s efficiency is the tight coupling of hardware and software—our software controls many more details of the hardware than in a typical processor. Handing more control to the software makes the hardware simpler and more efficient, but it also makes the IPU challenging to program using traditional programming languages. To avoid this, the IPU leverages domain-specific languages that ease the burden on both developers and the compiler: Halide for image processing and TensorFlow for machine learning. A custom Google-made compiler optimizes the code for the underlying hardware.


In the coming weeks, we’ll enable Pixel Visual Core as a developer option in the developer preview of Android Oreo 8.1 (MR1). Later, we’ll enable it for all third-party apps using the Android Camera API, giving them access to the Pixel 2’s HDR+ technology. We can’t wait to see the beautiful HDR+ photography that you already get through your Pixel 2 camera become available in your favorite photography apps.

HDR+ will be the first application to run on Pixel Visual Core. Notably, because Pixel Visual Core is programmable, we’re already preparing the next set of applications. The great thing is that as we port more machine learning and imaging applications to use Pixel Visual Core, Pixel 2 will continuously improve. So keep an eye out!

TensorFlow Lattice: Flexibility Empowered by Prior Knowledge



(Cross-posted on the Google Open Source Blog)

Machine learning has made huge advances in many applications including natural language processing, computer vision and recommendation systems by capturing complex input/output relationships using highly flexible models. However, a remaining challenge is problems with semantically meaningful inputs that obey known global relationships, like “the estimated time to drive a road goes up if traffic is heavier, and all else is the same.” Flexible models like DNNs and random forests may not learn these relationships, and then may fail to generalize well to examples drawn from a different sampling distribution than the examples the model was trained on.

Today we present TensorFlow Lattice, a set of prebuilt TensorFlow Estimators that are easy to use, and TensorFlow operators to build your own lattice models. Lattices are multi-dimensional interpolated look-up tables (for more details, see [1--5]), similar to the look-up tables in the back of a geometry textbook that approximate a sine function. We take advantage of the look-up table’s structure, which can be keyed by multiple inputs to approximate an arbitrarily flexible relationship, to satisfy monotonic relationships that you specify in order to generalize better. That is, the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions. Importantly, because they interpolate between the look-up table values, the lattice models are smooth and the predictions are bounded, which helps to avoid spurious large or small predictions in the testing time.

How Lattice Models Help You
Suppose you are designing a system to recommend nearby coffee shops to a user. You would like the model to learn, “if two cafes are the same, prefer the closer one.” Below we show a flexible model (pink) that accurately fits some training data for users in Tokyo (purple), where there are many coffee shops nearby. The pink flexible model overfits the noisy training examples, and misses the overall trend that a closer cafe is better. If you used this pink model to rank test examples from Texas (blue), where businesses are spread farther out, you would find it acted strangely, sometimes preferring farther cafes!
Slice through a model’s feature space where all the other inputs stay the same and only distance changes. A flexible function (pink) that is accurate on training examples from Tokyo (purple) predicts that a cafe 10km-away is better than the same cafe if it was 5km-away. This problem becomes more evident at test-time if the data distribution has shifted, as shown here with blue examples from Texas where cafes are spread out more.
A monotonic flexible function (green) is both accurate on training examples and can generalize for Texas examples compared to non-monotonic flexible function (pink) from the previous figure.
In contrast, a lattice model, trained over the same example from Tokyo, can be constrained to satisfy such a monotonic relationship and result in a monotonic flexible function (green). The green line also accurately fits the Tokyo training examples, but also generalizes well to Texas, never preferring farther cafes.

In general, you might have many inputs about each cafe, e.g., coffee quality, price, etc. Flexible models have a hard time capturing global relationships of the form, “if all other inputs are equal, nearer is better, ” especially in parts of the feature space where your training data is sparse and noisy. Machine learning models that capture prior knowledge (e.g. how inputs should impact the prediction) work better in practice, and are easier to debug and more interpretable.

Pre-built Estimators
We provide a range of lattice model architectures as TensorFlow Estimators. The simplest estimator we provide is the calibrated linear model, which learns the best 1-d transformation of each feature (using 1-d lattices), and then combines all the calibrated features linearly. This works well if the training dataset is very small, or there are no complex nonlinear input interactions. Another estimator is a calibrated lattice model. This model combines the calibrated features nonlinearly using a two-layer single lattice model, which can represent complex nonlinear interactions in your dataset. The calibrated lattice model is usually a good choice if you have 2-10 features, but for 10 or more features, we expect you will get the best results with an ensemble of calibrated lattices, which you can train using the pre-built ensemble architectures. Monotonic lattice ensembles can achieve 0.3% -- 0.5% accuracy gain compared to Random Forests [4], and these new TensorFlow lattice estimators can achieve 0.1 -- 0.4% accuracy gain compared to the prior state-of-the-art in learning models with monotonicity [5].

Build Your Own
You may want to experiment with deeper lattice networks or research using partial monotonic functions as part of a deep neural network or other TensorFlow architecture. We provide the building blocks: TensorFlow operators for calibrators, lattice interpolation, and monotonicity projections. For example, the figure below shows a 9-layer deep lattice network [5].
Example of a 9-layer deep lattice network architecture [5], alternating layers of linear embeddings and ensembles of lattices with calibrators layers (which act like a sum of ReLU’s in Neural Networks). The blue lines correspond to monotonic inputs, which is preserved layer-by-layer, and hence for the entire model. This and other arbitrary architectures can be constructed with TensorFlow Lattice because each layer is differentiable.
In addition to the choice of model flexibility and standard L1 and L2 regularization, we offer new regularizers with TensorFlow Lattice:
  • Monotonicity constraints [3] on your choice of inputs as described above.
  • Laplacian regularization [3] on the lattices to make the learned function flatter.
  • Torsion regularization [3] to suppress un-necessary nonlinear feature interactions.
We hope TensorFlow Lattice will be useful to the larger community working with meaningful semantic inputs. This is part of a larger research effort on interpretability and controlling machine learning models to satisfy policy goals, and enable practitioners to take advantage of their prior knowledge. We’re excited to share this with all of you. To get started, please check out our GitHub repository and our tutorials, and let us know what you think!

Acknowledgements
Developing and open sourcing TensorFlow Lattice was a huge team effort. We’d like to thank all the people involved: Andrew Cotter, Kevin Canini, David Ding, Mahdi Milani Fard, Yifei Feng, Josh Gordon, Kiril Gorovoy, Clemens Mewald, Taman Narayan, Alexandre Passos, Christine Robson, Serena Wang, Martin Wicke, Jarek Wilkiewicz, Sen Zhao, Tao Zhu

References
[1] Lattice Regression, Eric Garcia, Maya Gupta, Advances in Neural Information Processing Systems (NIPS), 2009
[2] Optimized Regression for Efficient Function Evaluation, Eric Garcia, Raman Arora, Maya R. Gupta, IEEE Transactions on Image Processing, 2012
[3] Monotonic Calibrated Interpolated Look-Up Tables, Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, Alexander van Esbroeck, Journal of Machine Learning Research (JMLR), 2016
[4] Fast and Flexible Monotonic Functions with Ensembles of Lattices, Mahdi Milani Fard, Kevin Canini, Andrew Cotter, Jan Pfeifer, Maya Gupta, Advances in Neural Information Processing Systems (NIPS), 2016
[5] Deep Lattice Networks and Partial Monotonic Functions, Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya R. Gupta, Advances in Neural Information Processing Systems (NIPS), 2017


TensorFlow Lattice: Flexibility Empowered by Prior Knowledge

Crossposted on the Google Research Blog

Machine learning has made huge advances in many applications including natural language processing, computer vision and recommendation systems by capturing complex input/output relationships using highly flexible models. However, a remaining challenge is problems with semantically meaningful inputs that obey known global relationships, like “the estimated time to drive a road goes up if traffic is heavier, and all else is the same.” Flexible models like DNNs and random forests may not learn these relationships, and then may fail to generalize well to examples drawn from a different sampling distribution than the examples the model was trained on.

Today we present TensorFlow Lattice, a set of prebuilt TensorFlow Estimators that are easy to use, and TensorFlow operators to build your own lattice models. Lattices are multi-dimensional interpolated look-up tables (for more details, see [1--5]), similar to the look-up tables in the back of a geometry textbook that approximate a sine function.  We take advantage of the look-up table’s structure, which can be keyed by multiple inputs to approximate an arbitrarily flexible relationship, to satisfy monotonic relationships that you specify in order to generalize better. That is, the look-up table values are trained to minimize the loss on the training examples, but in addition, adjacent values in the look-up table are constrained to increase along given directions of the input space, which makes the model outputs increase in those directions. Importantly, because they interpolate between the look-up table values, the lattice models are smooth and the predictions are bounded, which helps to avoid spurious large or small predictions in the testing time.

How Lattice Models Help You

Suppose you are designing a system to recommend nearby coffee shops to a user. You would like the model to learn, “if two cafes are the same, prefer the closer one.”  Below we show a flexible model (pink) that accurately fits some training data for users in Tokyo (purple), where there are many coffee shops nearby.  The pink flexible model overfits the noisy training examples, and misses the overall trend that a closer cafe is better. If you used this pink model to rank test examples from Texas (blue), where businesses are spread farther out, you would find it acted strangely, sometimes preferring farther cafes!
Slice through a model’s feature space where all the other inputs stay the same and only distance changes. A flexible function (pink) that is accurate on training examples from Tokyo (purple) predicts that a cafe 10km-away is better than the same cafe if it was 5km-away. This problem becomes more evident at test-time if the data distribution has shifted, as shown here with blue examples from Texas where cafes are spread out more.
A monotonic flexible function (green) is both accurate on training examples and can generalize for Texas examples compared to non-monotonic flexible function (pink) from the previous figure.

In contrast, a lattice model, trained over the same example from Tokyo, can be constrained to satisfy such a monotonic relationship and result in a monotonic flexible function (green). The green line also accurately fits the Tokyo training examples, but also generalizes well to Texas, never preferring farther cafes.

In general, you might have many inputs about each cafe, e.g., coffee quality, price, etc. Flexible models have a hard time capturing global relationships of the form, “if all other inputs are equal, nearer is better, ” especially in parts of the feature space where your training data is sparse and noisy. Machine learning models that capture prior knowledge (e.g.  how inputs should impact the prediction) work better in practice, and are easier to debug and more interpretable.

Pre-built Estimators

We provide a range of lattice model architectures as TensorFlow Estimators. The simplest estimator we provide is the calibrated linear model, which learns the best 1-d transformation of each feature (using 1-d lattices), and then combines all the calibrated features linearly. This works well if the training dataset is very small, or there are no complex nonlinear input interactions. Another estimator is a calibrated lattice model. This model combines the calibrated features nonlinearly using a two-layer single lattice model, which can represent complex nonlinear interactions in your dataset. The calibrated lattice model is usually a good choice if you have 2-10 features, but for 10 or more features, we expect you will get the best results with an ensemble of calibrated lattices, which you can train using the pre-built ensemble architectures. Monotonic lattice ensembles can achieve 0.3% -- 0.5% accuracy gain compared to Random Forests [4], and these new TensorFlow lattice estimators can achieve 0.1 -- 0.4% accuracy gain compared to the prior state-of-the-art in learning models with monotonicity [5].

Build Your Own

You may want to experiment with deeper lattice networks or research using partial monotonic functions as part of a deep neural network or other TensorFlow architecture. We provide the building blocks: TensorFlow operators for calibrators, lattice interpolation, and monotonicity projections. For example, the figure below shows a 9-layer deep lattice network [5].


Example of a 9-layer deep lattice network architecture [5], alternating layers of linear embeddings and ensembles of lattices with calibrators layers (which act like a sum of ReLU’s in Neural Networks). The blue lines correspond to monotonic inputs, which is preserved layer-by-layer, and hence for the entire model. This and other arbitrary architectures can be constructed with TensorFlow Lattice because each layer is differentiable.

In addition to the choice of model flexibility and standard L1 and L2 regularization, we offer new regularizers with TensorFlow Lattice:
  • Monotonicity constraints [3] on your choice of inputs as described above.
  • Laplacian regularization [3] on the lattices to make the learned function flatter.
  • Torsion regularization [3] to suppress un-necessary nonlinear feature interactions.
We hope TensorFlow Lattice will be useful to the larger community working with meaningful semantic inputs. This is part of a larger research effort on interpretability and controlling machine learning models to satisfy policy goals, and enable practitioners to take advantage of their prior knowledge. We’re excited to share this with all of you. To get started, please check out our GitHub repository and our tutorials, and let us know what you think!

By Maya Gupta, Research Scientist, Jan Pfeifer, Software Engineer and Seungil You, Software Engineer

Acknowledgements

Developing and open sourcing TensorFlow Lattice was a huge team effort. We’d like to thank all the people involved: Andrew Cotter, Kevin Canini, David Ding, Mahdi Milani Fard, Yifei Feng, Josh Gordon, Kiril Gorovoy, Clemens Mewald, Taman Narayan, Alexandre Passos, Christine Robson, Serena Wang, Martin Wicke, Jarek Wilkiewicz, Sen Zhao, Tao Zhu

References

[1] Lattice Regression, Eric Garcia, Maya Gupta, Advances in Neural Information Processing Systems (NIPS), 2009
[2] Optimized Regression for Efficient Function Evaluation, Eric Garcia, Raman Arora, Maya R. Gupta, IEEE Transactions on Image Processing, 2012
[3] Monotonic Calibrated Interpolated Look-Up Tables, Maya Gupta, Andrew Cotter, Jan Pfeifer, Konstantin Voevodski, Kevin Canini, Alexander Mangylov, Wojciech Moczydlowski, Alexander van Esbroeck, Journal of Machine Learning Research (JMLR), 2016
[4] Fast and Flexible Monotonic Functions with Ensembles of Lattices, Mahdi Milani Fard, Kevin Canini, Andrew Cotter, Jan Pfeifer, Maya Gupta, Advances in Neural Information Processing Systems (NIPS), 2016
[5] Deep Lattice Networks and Partial Monotonic Functions, Seungil You, David Ding, Kevin Canini, Jan Pfeifer, Maya R. Gupta, Advances in Neural Information Processing Systems (NIPS), 2017

The best hardware, software and AI—together

Today, we introduced our second generation family of consumer hardware products, all made by Google: new Pixel phones, Google Home Mini and Max, an all new Pixelbook, Google Clips hands-free camera, Google Pixel Buds, and an updated Daydream View headset. We see tremendous potential for devices to be helpful, make your life easier, and even get better over time when they’re created at the intersection of hardware, software and advanced artificial intelligence (AI).


Why Google?

These days many devices—especially smartphones—look and act the same. That means in order to create a meaningful experience for users, we need a different approach. A year ago, Sundar outlined his vision of how AI would change how people would use computers. And in fact, AI is already transforming what Google’s products can do in the real world. For example, swipe typing has been around for a while, but AI lets people use Gboard to swipe-type in two languages at once. Google Maps uses AI to figure out what the parking is like at your destination and suggest alternative spots before you’ve even put your foot on the gas. But, for this wave of computing to reach new breakthroughs, we have to build software and hardware that can bring more of the potential of AI into reality—which is what we’ve set out to do with this year’s new family of products.

Hardware, built from the inside out

We’ve designed and built our latest hardware products around a few core tenets. First and foremost, we want them to be radically helpful. They’re fast, they’re there when you need them, and they’re simple to use. Second, everything is designed for you, so that the technology doesn’t get in they way and instead blends into your lifestyle. Lastly, by creating hardware with AI at the core, our products can improve over time. They’re constantly getting better and faster through automatic software updates. And they’re designed to learn from you, so you’ll notice features—like the Google Assistant—get smarter and more assistive the more you interact with them.


You’ll see this reflected in our 2017 lineup of new Made by Google products:

  • The Pixel 2 has the best camera of any smartphone, again, along with a gorgeous display and augmented reality capabilities. Pixel owners get unlimited storage for their photos and videos, and an exclusive preview of Google Lens, which uses AI to give you helpful information about the things around you.
  • Google Home Mini brings the Assistant to more places throughout your home, with a beautiful design that fits anywhere. And Max is our biggest and best-sounding Google Home device, powered by the Assistant. And with AI-based Smart Sound, Max has the ability to adapt your audio experience to you—your environment, context, and preferences.
  • With Pixelbook, we’ve reimagined the laptop as a high-performance Chromebook, with a versatile form factor that works the way you do. It’s the first laptop with the Assistant built in, and the Pixelbook Pen makes the whole experience even smarter.
  • Our new Pixel Buds combine Google smarts and the best digital sound. You’ll get elegant touch controls that put the Assistant just a tap away, and they’ll even help you communicate in a different language.
  • The updated Daydream View is the best mobile virtual reality (VR) headset on the market, and the simplest, most comfortable VR experience.
  • Google Clips is a totally new way to capture genuine, spontaneous moments—all powered by machine learning and AI. This tiny camera seamlessly sends clips to your phone, and even edits and curates them for you.

Assistant, everywhere

Across all these devices, you can interact with the Google Assistant any way you want—talk to it with your Google Home or your Pixel Buds, squeeze your Pixel 2, or use your Pixelbook’s Assistant key or circle things on your screen with the Pixelbook Pen. Wherever you are, and on any device with the Assistant, you can connect to the information you need and get help with the tasks to get you through your day. No other assistive technology comes close, and it continues to get better every day.

New hardware products

Google’s hardware business is just getting started, and we’re committed to building and investing for the long run. We couldn’t be more excited to introduce you to our second-generation family of products that truly brings together the best of Google software, thoughtfully designed hardware with cutting-edge AI. We hope you enjoy using them as much as we do.

The best hardware, software and AI—together

Today, we introduced our second generation family of consumer hardware products, all made by Google: new Pixel phones, Google Home Mini and Max, an all new Pixelbook, Google Clips hands-free camera, Google Pixel Buds, and an updated Daydream View headset. We see tremendous potential for devices to be helpful, make your life easier, and even get better over time when they’re created at the intersection of hardware, software and advanced artificial intelligence (AI).


Why Google?

These days many devices—especially smartphones—look and act the same. That means in order to create a meaningful experience for users, we need a different approach. A year ago, Sundar outlined his vision of how AI would change how people would use computers. And in fact, AI is already transforming what Google’s products can do in the real world. For example, swipe typing has been around for a while, but AI lets people use Gboard to swipe-type in two languages at once. Google Maps uses AI to figure out what the parking is like at your destination and suggest alternative spots before you’ve even put your foot on the gas. But, for this wave of computing to reach new breakthroughs, we have to build software and hardware that can bring more of the potential of AI into reality—which is what we’ve set out to do with this year’s new family of products.

Hardware, built from the inside out

We’ve designed and built our latest hardware products around a few core tenets. First and foremost, we want them to be radically helpful. They’re fast, they’re there when you need them, and they’re simple to use. Second, everything is designed for you, so that the technology doesn’t get in they way and instead blends into your lifestyle. Lastly, by creating hardware with AI at the core, our products can improve over time. They’re constantly getting better and faster through automatic software updates. And they’re designed to learn from you, so you’ll notice features—like the Google Assistant—get smarter and more assistive the more you interact with them.


You’ll see this reflected in our 2017 lineup of new Made by Google products:

  • The Pixel 2 has the best camera of any smartphone, again, along with a gorgeous display and augmented reality capabilities. Pixel owners get unlimited storage for their photos and videos, and an exclusive preview of Google Lens, which uses AI to give you helpful information about the things around you.
  • Google Home Mini brings the Assistant to more places throughout your home, with a beautiful design that fits anywhere. And Max is our biggest and best-sounding Google Home device, powered by the Assistant. And with AI-based Smart Sound, Max has the ability to adapt your audio experience to you—your environment, context, and preferences.
  • With Pixelbook, we’ve reimagined the laptop as a high-performance Chromebook, with a versatile form factor that works the way you do. It’s the first laptop with the Assistant built in, and the Pixelbook Pen makes the whole experience even smarter.
  • Our new Pixel Buds combine Google smarts and the best digital sound. You’ll get elegant touch controls that put the Assistant just a tap away, and they’ll even help you communicate in a different language.
  • The updated Daydream View is the best mobile virtual reality (VR) headset on the market, and the simplest, most comfortable VR experience.
  • Google Clips is a totally new way to capture genuine, spontaneous moments—all powered by machine learning and AI. This tiny camera seamlessly sends clips to your phone, and even edits and curates them for you.

Assistant, everywhere

Across all these devices, you can interact with the Google Assistant any way you want—talk to it with your Google Home or your Pixel Buds, squeeze your Pixel 2, or use your Pixelbook’s Assistant key or circle things on your screen with the Pixelbook Pen. Wherever you are, and on any device with the Assistant, you can connect to the information you need and get help with the tasks to get you through your day. No other assistive technology comes close, and it continues to get better every day.

New hardware products

Google’s hardware business is just getting started, and we’re committed to building and investing for the long run. We couldn’t be more excited to introduce you to our second-generation family of products that truly brings together the best of Google software, thoughtfully designed hardware with cutting-edge AI. We hope you enjoy using them as much as we do.

Source: Google Chrome


The best hardware, software and AI—together

Today, we introduced our second generation family of consumer hardware products, all made by Google: new Pixel phones, Google Home Mini and Max, an all new Pixelbook, Google Clips hands-free camera, Google Pixel Buds, and an updated Daydream View headset. We see tremendous potential for devices to be helpful, make your life easier, and even get better over time when they’re created at the intersection of hardware, software and advanced artificial intelligence (AI).


Why Google?

These days many devices—especially smartphones—look and act the same. That means in order to create a meaningful experience for users, we need a different approach. A year ago, Sundar outlined his vision of how AI would change how people would use computers. And in fact, AI is already transforming what Google’s products can do in the real world. For example, swipe typing has been around for a while, but AI lets people use Gboard to swipe-type in two languages at once. Google Maps uses AI to figure out what the parking is like at your destination and suggest alternative spots before you’ve even put your foot on the gas. But, for this wave of computing to reach new breakthroughs, we have to build software and hardware that can bring more of the potential of AI into reality—which is what we’ve set out to do with this year’s new family of products.

Hardware, built from the inside out

We’ve designed and built our latest hardware products around a few core tenets. First and foremost, we want them to be radically helpful. They’re fast, they’re there when you need them, and they’re simple to use. Second, everything is designed for you, so that the technology doesn’t get in they way and instead blends into your lifestyle. Lastly, by creating hardware with AI at the core, our products can improve over time. They’re constantly getting better and faster through automatic software updates. And they’re designed to learn from you, so you’ll notice features—like the Google Assistant—get smarter and more assistive the more you interact with them.


You’ll see this reflected in our 2017 lineup of new Made by Google products:

  • The Pixel 2 has the best camera of any smartphone, again, along with a gorgeous display and augmented reality capabilities. Pixel owners get unlimited storage for their photos and videos, and an exclusive preview of Google Lens, which uses AI to give you helpful information about the things around you.
  • Google Home Mini brings the Assistant to more places throughout your home, with a beautiful design that fits anywhere. And Max is our biggest and best-sounding Google Home device, powered by the Assistant. And with AI-based Smart Sound, Max has the ability to adapt your audio experience to you—your environment, context, and preferences.
  • With Pixelbook, we’ve reimagined the laptop as a high-performance Chromebook, with a versatile form factor that works the way you do. It’s the first laptop with the Assistant built in, and the Pixelbook Pen makes the whole experience even smarter.
  • Our new Pixel Buds combine Google smarts and the best digital sound. You’ll get elegant touch controls that put the Assistant just a tap away, and they’ll even help you communicate in a different language.
  • The updated Daydream View is the best mobile virtual reality (VR) headset on the market, and the simplest, most comfortable VR experience.
  • Google Clips is a totally new way to capture genuine, spontaneous moments—all powered by machine learning and AI. This tiny camera seamlessly sends clips to your phone, and even edits and curates them for you.

Assistant, everywhere

Across all these devices, you can interact with the Google Assistant any way you want—talk to it with your Google Home or your Pixel Buds, squeeze your Pixel 2, or use your Pixelbook’s Assistant key or circle things on your screen with the Pixelbook Pen. Wherever you are, and on any device with the Assistant, you can connect to the information you need and get help with the tasks to get you through your day. No other assistive technology comes close, and it continues to get better every day.

New hardware products

Google’s hardware business is just getting started, and we’re committed to building and investing for the long run. We couldn’t be more excited to introduce you to our second-generation family of products that truly brings together the best of Google software, thoughtfully designed hardware with cutting-edge AI. We hope you enjoy using them as much as we do.