Tag Archives: machine learning

SLING: A Natural Language Frame Semantic Parser



Until recently, most practical natural language understanding (NLU) systems used a pipeline of analysis stages, from part-of-speech tagging and dependency parsing to steps that computed a semantic representation of the input text. While this facilitated easy modularization of different analysis stages, errors in earlier stages would have cascading effects in later stages and the final representation, and the intermediate stage outputs might not be relevant on their own. For example, a typical pipeline might perform the task of dependency parsing in an early stage and the task of coreference resolution towards the end. If one was only interested in the output of coreference resolution, it would be affected by cascading effects of any errors during dependency parsing.

Today we are announcing SLING, an experimental system for parsing natural language text directly into a representation of its meaning as a semantic frame graph. The output frame graph directly captures the semantic annotations of interest to the user, while avoiding the pitfalls of pipelined systems by not running any intermediate stages, additionally preventing unnecessary computation. SLING uses a special-purpose recurrent neural network model to compute the output representation of input text through incremental editing operations on the frame graph. The frame graph, in turn, is flexible enough to capture many semantic tasks of interest (more on this below). SLING's parser is trained using only the input words, bypassing the need for producing any intermediate annotations (e.g. dependency parses).

SLING provides fast parsing at inference time by providing (a) an efficient and scalable frame store implementation and (b) a JIT compiler that generates efficient code to execute the recurrent neural network. Although SLING is experimental, it achieves a parsing speed of >2,500 tokens/second on a desktop CPU, thanks to its efficient frame store and neural network compiler. SLING is implemented in C++ and it is available for download on GitHub. The entire system is described in detail in a technical report as well.

Frame Semantic Parsing
Frame Semantics [1] represents the meaning of text — such as a sentence — as a set of formal statements. Each formal statement is called a frame, which can be seen as a unit of knowledge or meaning, that also contains interactions with concepts or other frames typically associated with it. SLING organizes each frame as a list of slots, where each slot has a name (role) and a value which could be a literal or a link to another frame. As an example, consider the sentence:

“Many people now claim to have predicted Black Monday.”

The figure below illustrates SLING recognizing mentions of entities (e.g. people, places, or events), measurements (e.g. dates or distances), and other concepts (e.g. verbs), and placing them in the correct semantic roles for the verbs in the input. The word predicted evokes the most dominant sense of the verb "predict", denoted as a PREDICT-01 frame. Additionally, this frame also has interactions (slots) with who made the prediction (denoted via the ARG0 slot, which points to the PERSON frame for people) and what was being predicted (denoted via ARG1, which links to the EVENT frame for Black Monday). Frame semantic parsing is the task of producing a directed graph of such frames linked through slots.
Although the example above is fairly simple, frame graphs are powerful enough to model a variety of complex semantic annotation tasks. For starters, frames provide a convenient way to bring together language-internal and external information types (e.g. knowledge bases). This can then be used to address complex language understanding problems such as reference, metaphor, metonymy, and perspective. The frame graphs for these tasks only differ in the inventory of frame types, roles, and any linking constraints.

SLING
SLING trains a recurrent neural network by optimizing for the semantic frames of interest.
The internal learned representations in the network’s hidden layers replace the hand-crafted feature combinations and intermediate representations in pipelined systems. Internally, SLING uses an encoder-decoder architecture where each input word is encoded into a vector using simple lexical features like the raw word, its suffix(es), punctuation etc. The decoder uses that representation, along with recurrent features from its own history, to compute a sequence of transitions that update the frame graph to obtain the intended frame semantic representation of the input sentence. SLING trains its model using TensorFlow and DRAGNN.

The animation below shows how frames and roles are incrementally added to the under-construction frame graph using individual transitions. As discussed earlier with our simple example sentence, SLING connects the VERB and EVENT frames using the role ARG1, signifying that the EVENT frame is the concept being predicted. The EVOKE transition evokes a frame of a specified type from the next few tokens in the text (e.g. EVENT from Black Monday). Similarly, the CONNECT transition links two existing frames with a specified role. When the input is exhausted and the last transition (denoted as STOP) is executed, the frame graph is deemed as complete and returned to the user, who can inspect the graph to get the semantic meaning behind the sentence.
One key aspect of our transition system is the presence of a small fixed-size attention buffer of frames that represents the most recent frames to be evoked or modified, shown with the orange boxes in the figure above. This buffer captures the intuition that we tend to remember knowledge that was recently evoked, referred to, or enhanced. If a frame is no longer in use, it eventually gets flushed out of this buffer as new frames come into picture. We found this simple mechanism to be surprisingly effective at capturing a large fraction of inter-frame links.

Next Steps
The illustrative experiment above is just a launchpad for research in semantic parsing for tasks such as knowledge extraction, resolving complex references, and dialog understanding. The SLING release on Github comes with a pre-trained model for the task we illustrated, as well as examples and recipes to train your own parser on either the supplied synthetic data or your own data. We hope the community finds SLING useful and we look forward to engaging conversations about applying and extending SLING to other semantic parsing tasks.

Acknowledgements
The research described in this post was done by Michael Ringgaard, Rahul Gupta, and Fernando Pereira. We thank the Tensorflow and DRAGNN teams for open-sourcing their packages, and various colleagues at DRAGNN who helped us with multiple aspects of SLING's training setup.



1 Charles J. Fillmore. 1982. Frame semantics. Linguistics in the Morning Calm, pages 111–138.

Quill.org: better writing with machine learning

Editor’s note: TensorFlow, our open source machine learning library, is just that—open to anyone. Companies, nonprofits, researchers and developers have used TensorFlow in some pretty cool ways, and we’re sharing those stories here on Keyword. Here’s one of them.

Quill.org was founded by a group of educators and technologists to help students become better writers and critical thinkers. Before beginning development, they researched hundreds of studies on writing education and found a common theme—students had a hard time grasping the difference between a run-on sentence and a fragment. So the Quill team developed a tool to help students identify the different parts of a sentence, with a focus on real-time feedback.

Using the Quill tool, students complete a variety of exercises, including joining sentences, writing complex sentences, and explaining their use and understanding of grammar. The tool relies on a huge depository of sentence fragments, which Quill finds, recognizes and compiles using TensorFlow, Google's open source machine learning library. TensorFlow technology is the backbone of the tool and can accurately detect if a student’s answers are correct. After completing the exercises, each student gets a customized explanation of incorrect responses, and the tool learns from each answer to create an individualized testing plan focused on areas of difficulty. Here's an example of how it works:

More than 200,000 students—62 percent from low-income schools—have used Quill. They’ve collectively answered 20 million exercises, and Quill’s quick, personalized writing instruction has helped them master writing standards across the Common Core curriculum.

Teachers have also benefitted from introducing Quill in their classrooms. Each teacher has access to a customized portal, allowing them to see an individual student’s progress. Plus, by using machine learning, teachers have been spared hundreds of hours of manual grading. Laura, a teacher at Caswell Elementary School in California said, "Quill has been a wonderful tool for my third graders, many of whom are second language learners. We especially love the immediate feedback provided after each practice; it has definitely made us pay closer attention to detail.”

Quill’s most recent update is a “multiplayer” feature, allowing students to interact with each other in the tool. They can see their peers’ responses, which fosters spirited classroom discussions and collaboration, and helps students learn from each other.

While students aren’t using quills (or even pens!) anymore, strong writing skills are as important as ever. And with the help of machine learning, Quill makes it fun and engaging to develop those skills.

Source: Education


Quill.org: better writing with machine learning

Editor’s note: TensorFlow, our open source machine learning library, is just that—open to anyone. Companies, nonprofits, researchers and developers have used TensorFlow in some pretty cool ways, and we’re sharing those stories here on Keyword. Here’s one of them.

Quill.org was founded by a group of educators and technologists to help students become better writers and critical thinkers. Before beginning development, they researched hundreds of studies on writing education and found a common theme—students had a hard time grasping the difference between a run-on sentence and a fragment. So the Quill team developed a tool to help students identify the different parts of a sentence, with a focus on real-time feedback.

Using the Quill tool, students complete a variety of exercises, including joining sentences, writing complex sentences, and explaining their use and understanding of grammar. The tool relies on a huge depository of sentence fragments, which Quill finds, recognizes and compiles using TensorFlow, Google's open source machine learning library. TensorFlow technology is the backbone of the tool and can accurately detect if a student’s answers are correct. After completing the exercises, each student gets a customized explanation of incorrect responses, and the tool learns from each answer to create an individualized testing plan focused on areas of difficulty. Here's an example of how it works:

More than 200,000 students—62 percent from low-income schools—have used Quill. They’ve collectively answered 20 million exercises, and Quill’s quick, personalized writing instruction has helped them master writing standards across the Common Core curriculum.

Teachers have also benefitted from introducing Quill in their classrooms. Each teacher has access to a customized portal, allowing them to see an individual student’s progress. Plus, by using machine learning, teachers have been spared hundreds of hours of manual grading. Laura, a teacher at Caswell Elementary School in California said, "Quill has been a wonderful tool for my third graders, many of whom are second language learners. We especially love the immediate feedback provided after each practice; it has definitely made us pay closer attention to detail.”

Quill’s most recent update is a “multiplayer” feature, allowing students to interact with each other in the tool. They can see their peers’ responses, which fosters spirited classroom discussions and collaboration, and helps students learn from each other.

While students aren’t using quills (or even pens!) anymore, strong writing skills are as important as ever. And with the help of machine learning, Quill makes it fun and engaging to develop those skills.

On-Device Conversational Modeling with TensorFlow Lite



Earlier this year, we launched Android Wear 2.0 which featured the first "on-device" machine learning technology for smart messaging. This enabled cloud-based technologies like Smart Reply, previously available in Gmail, Inbox and Allo, to be used directly within any application for the first time, including third-party messaging apps, without ever having to connect to the cloud. So you can respond to incoming chat messages on the go, directly from your smartwatch.

Today, we announce TensorFlow Lite, TensorFlow’s lightweight solution for mobile and embedded devices. This framework is optimized for low-latency inference of machine learning models, with a focus on small memory footprint and fast performance. As part of the library, we have also released an on-device conversational model and a demo app that provides an example of a natural language application powered by TensorFlow Lite, in order to make it easier for developers and researchers to build new machine intelligence features powered by on-device inference. This model generates reply suggestions to input conversational chat messages, with efficient inference that can be easily plugged in to your chat application to power on-device conversational intelligence.

The on-device conversational model we have released uses a new ML architecture for training compact neural networks (as well as other machine learning models) based on a joint optimization framework, originally presented in ProjectionNet: Learning Efficient On-Device Deep Networks Using Neural Projections. This architecture can run efficiently on mobile devices with limited computing power and memory, by using efficient “projection” operations that transform any input to a compact bit vector representation — similar inputs are projected to nearby vectors that are dense or sparse depending on type of projection. For example, the messages “hey, how's it going?” and “How's it going buddy?”, might be projected to the same vector representation.

Using this idea, the conversational model combines these efficient operations at low computation and memory footprint. We trained this on-device model end-to-end using an ML framework that jointly trains two types of models — a compact projection model (as described above) combined with a trainer model. The two models are trained in a joint fashion, where the projection model learns from the trainer model — the trainer is characteristic of an expert and modeled using larger and more complex ML architectures, whereas the projection model resembles a student that learns from the expert. During training, we can also stack other techniques such as quantization or distillation to achieve further compression or selectively optimize certain portions of the objective function. Once trained, the smaller projection model is able to be used directly for inference on device.
For inference, the trained projection model is compiled into a set of TensorFlow Lite operations that have been optimized for fast execution on mobile platforms and executed directly on device. The TensorFlow Lite inference graph for the on-device conversational model is shown here.
TensorFlow Lite execution for the On-Device Conversational Model.
The open-source conversational model released today (along with code) was trained end-to-end using the joint ML architecture described above. Today’s release also includes a demo app, so you can easily download and try out one-touch smart replies on your mobile device. The architecture enables easy configuration for model size and prediction quality based on application needs. You can find a list of sample messages where this model does well here. The system can also fall back to suggesting replies from a fixed set that was learned and compiled from popular response intents observed in chat conversations. The underlying model is different from the ones Google uses for Smart Reply responses in its apps1.

Beyond Conversational Models
Interestingly, the ML architecture described above permits flexible choices for the underlying model. We also designed the architecture to be compatible with different machine learning approaches — for example, when used with TensorFlow deep learning, we learn a lightweight neural network (ProjectionNet) for the underlying model, whereas a different architecture (ProjectionGraph) represents the model using a graph framework instead of a neural network.

The joint framework can also be used to train lightweight on-device models for other tasks using different ML modeling architectures. As an example, we derived a ProjectionNet architecture that uses a complex feed-forward or recurrent architecture (like LSTM) for the trainer model coupled with a simple projection architecture comprised of dynamic projection operations and a few, narrow fully-connected layers. The whole architecture is trained end-to-end using backpropagation in TensorFlow and once trained, the compact ProjectionNet is directly used for inference. Using this method, we have successfully trained tiny ProjectionNet models that achieve significant reduction in model sizes (up to several orders of magnitude reduction) and high performance with respect to accuracy on multiple visual and language classification tasks (a few examples here). Similarly, we trained other lightweight models using our graph learning framework, even in semi-supervised settings.
ML architecture for training on-device models: ProjectionNet trained using deep learning (left), and ProjectionGraph trained using graph learning (right).
We will continue to improve and release updated TensorFlow Lite models in open-source. We think that the released model (as well as future models) learned using these ML architectures may be reused for many natural language and computer vision applications or plugged into existing apps for enabling machine intelligence. We hope that the machine learning and natural language processing communities will be able to build on these to address new problems and use-cases we have not yet conceived.

Acknowledgments
Yicheng Fan and Gaurav Nemade contributed immensely to this effort. Special thanks to Rajat Monga, Andre Hentz, Andrew Selle, Sarah Sirajuddin, and Anitha Vijayakumar from the TensorFlow team; Robin Dua, Patrick McGregor, Andrei Broder, Andrew Tomkins and the Google Expander team.



1 The released on-device model was trained to optimize for small size and low latency applications on mobile phones and wearables. Smart Reply predictions in Google apps, however are generated using larger, more complex models. In production systems, we also use multiple classifiers that are trained to detect inappropriate content and apply further filtering and tuning to optimize user experience and quality levels. We recommend that developers using the open-source TensorFlow Lite version also follow such practices for their end applications.

Fused Video Stabilization on the Pixel 2 and Pixel 2 XL



One of the most important aspects of current smartphones is easily capturing and sharing videos. With the Pixel 2 and Pixel 2 XL smartphones, the videos you capture are smoother and clearer than ever before, thanks to our Fused Video Stabilization technique based on both optical image stabilization (OIS) and electronic image stabilization (EIS). Fused Video Stabilization delivers highly stable footage with minimal artifacts, and the Pixel 2 is currently rated as the leader in DxO's video ranking (also earning the highest overall rating for a smartphone camera). But how does it work?

A key principle in videography is keeping the camera motion smooth and steady. A stable video is free of the distraction, so the viewer can focus on the subject of interest. But, videos taken with smartphones are subject to many conditions that make taking a high-quality video a significant challenge:

Camera Shake
Most people hold their mobile phones in their hands to record videos - you pull the phone from your pocket, record the video, and the video is ready to share right after recording. However, that means your videos shake as much as your hands do -- and they shake a lot! Moreover, if you are walking or running while recording, the camera motion can make videos almost unwatchable:
Motion Blur
If the camera or the subject moves during exposure, the resulting photo or video will appear blurry. Even if we stabilize the motion in between consecutive frames, the motion blur in each individual frame cannot be easily restored in practice, especially on a mobile device. One typical video artifact due to motion blur is sharpness inconsistency: the video may rapidly alternate between blurry and sharp, which is very distracting even after the video is stabilized:
Rolling Shutter
The CMOS image sensor collects one row of pixels, or “scanline”, at a time, and it takes tens of milliseconds to goes from the top scanline to the bottom. Therefore, anything moving during this period can appear distorted. This is called the rolling shutter distortion. Even if you have a steady hand, the rolling shutter distortion will appear when you move quickly:
A simulated rendering of a video with global (left) and rolling (right) shutter.
Focus Breathing
When there are objects of varying distance in a video, the angle of view can change significantly due to objects “jumping” in and out of the foreground. As result, everything shrinks or expands like the video below, which professionals call “breathing”:
A good stabilization system should address all of these issues: the video should look sharp, the motion should be smooth, and the rolling shutter and focus breathing should be corrected.

Many professionals mount the camera on a mechanical stabilizer to entirely isolate hand motion. These devices actively sense and compensate for the camera’s movement to remove all unwanted motions. However, they are usually expensive and cumbersome; you wouldn’t want to carry one every day. There are also handheld gimbal mounts available for mobile phones. However, they are usually larger than the phone itself, and you have to put the phone on it before start recording. You’d need to do it fast before the interesting moment vanishes.

Optical Image Stabilization (OIS) is the most well-known method for suppression of handshake artifacts. Typically, in mobile camera modules with OIS, the lens is suspended in the middle of the module by a number of springs and electromagnets are used to move the lens within its enclosure. The lens module actively senses and compensates for handshake motion at very high speeds. Because OIS responds to motion rapidly, it can greatly suppress the handshake blur. However, the range of correctable motion is fairly limited (usually around 1-2 degrees), which is not enough to correct the unwanted motions between consecutive video frames, or to correct excessive motion blur during walking. Moveover, OIS cannot correct some kinds of motions, such as in-plane rotation. Sometimes it can even introduce a “jello” artifact:
The video is taken by Pixel 2 with only OIS enabled. You can see the frame center is stabilized, but the boundaries have some jello-like artifacts.
Electronic Image Stabilization (EIS) analyzes the camera motion, filters out the unwanted parts, and synthesizes a new video by transforming each frame. The final stabilization quality depends on the algorithm design and implementation optimization of these stages. In general, software-based EIS is more flexible than OIS so it can correct larger and more kinds of motions. However, EIS has some common limitations. First, to prevent undefined regions in the synthesized frame, it needs to reduce the field of view or resolution. Second, compared to OIS or an external stabilizer, EIS requires more computation, which is a limited resource on mobile phones.

Making a Better Video: Fused Video Stabilization
With Fused Video Stabilization, both OIS and EIS are enabled simultaneously during video recording to address all the issues mentioned above. Our solution has three processing stages as shown in the system diagram below. The first processing stage, motion analysis, extracts the gyroscope signal, the OIS motion, and other properties to estimate the camera motion precisely. Then, the motion filtering stage combines machine learning and signal processing to predict a person’s intention in moving the camera. Finally, in the frame synthesis stage, we model and remove the rolling shutter and focus breathing distortion. With Fused Video Stabilization, the videos from Pixel 2 have less motion blur and look more natural. The solution is efficient enough to run in all video modes, such as 60fps or 4K recording.
Motion Analysis
In the motion analysis stage, we use the phone’s high-speed gyroscope to estimate the rotational component of the hand motion (roll, pitch, and yaw). By sensing the motion at 200 Hz, we have dense motion vectors for each scanline, enough to model the rolling shutter distortion. We also measure lens motions that are not sensed by the gyroscope, including both the focus adjustment (z) and the OIS movement (x and y) at high speed. Because we need high temporal precision to model the rolling shutter effect, we carefully optimize the system to ensure perfect timestamp alignment between the CMOS image sensor, the gyroscope, and the lens motion readouts. A misalignment of merely a few milliseconds can introduce noticeable jittering artifact:
Left: The stabilized video of a “running” motion with a 3ms timing error. Note the occasional jittering. Right: The stabilized video with correct timestamps. The bottom right corner shows the original shaky video.
Motion Filtering
The motion filtering stage takes the real camera motion from motion analysis and creates the stabilized virtual camera motion. Note that we push the incoming frames into a queue to defer the processing. This enables us to lookahead at future camera motions, using machine learning to accurately predict the user’s intention. Lookahead filtering is not feasible for OIS or any mechanical stabilizers, which can only react to previous or present motions. We will discuss more about this below.

Frame Synthesis
At the final stage, we derive how the frame is transformed based on the real and virtual camera motions. To handle the rolling shutter distortion, we use multiple transformations for each frame. We split the the input frame into a mesh and warp each part separately:
Left: The input video with mesh overlay. Right: The warped frame, and the red rectangle is the final stabilized output. Note how the non-rigid warping corrects the rolling shutter distortion.
Lookahead Motion Filtering
One key feature in the Fused Video Stabilization is our new lookahead filtering algorithm. It analyzes future motions to recognize the user-intended motion patterns, and creates a smooth virtual camera motion. The lookahead filtering has multiple stages to incrementally improve the virtual camera motion for each frame. In the first step, a Gaussian filtering is applied on the real camera motions of both past and future to obtain a smoothed camera motion:
Left: The input unstabilized video. Right: The smoothed result after Gaussian filtering.
You’ll notice that it’s still not very stable. To further improve the quality, we trained a model to extract intentional motions from the noisy real camera motions. We then apply additional filters given the predicted motion. For example, if we predict the camera is panning horizontally, we would reject more vertical motions. The result is shown below.
Left: The Gaussian filtered result. Right: Our lookahead result. We predict that the user is panning to the right, and suppress more vertical motions.
In practice, the process above does not guarantee there is no undefined “bad” regions, which can appear when the virtual camera is too stabilized and the warped frame falls outside the original field of view. We predict the likelihood of this issue in the next couple frames and adjust the virtual camera motion to get the final result.
Left: Our lookahead result. The undefined area at the bottom-left are shown in cyan. Right: The final result with the bad region removed.
As we mentioned earlier, even with OIS enabled, sometimes the motions are too large and cause motion blur in a single frame. When EIS is further applied to further smooth the camera motion, the motion blur leads to distracting sharpness variations:
Left: Pixel 2 with OIS only. Right: Pixel 2 with the basic Fused Video Stabilization. Note that sharpness variation around the “Exit” label.
This is a very common problem in EIS solutions. To address this issue, we exploit the “masking” property in the human visual system. Motion blur usually blurs the frame along a specific direction, and if the overall frame motion follows that direction, the human eye will not notice it. Instead, our brain treats the blur as a natural part of the motion, and masks it away from our perception.

With the high-frequency gyroscope and OIS signals, we can accurately estimate the motion blur for each frame. We compute where the camera pointed to at both the beginning and end of exposure, and the movement in-between is the motion blur. After that, we apply a machine learning algorithm (trained on a set of videos with and without motion blur) to map the motion blurs in past and future frames to the amount of real camera motion we want to keep, and blend the weighted real camera motion with the virtual one. As you can see below, with the motion blur masking, the distracting sharpness variation is greatly reduced and the camera motion is still stabilized.
Left: Pixel 2 with the basic Fused Video Stabilization. Right: The full Fused Video Stabilization solution with motion blur masking.
Results
We have seen many amazing videos from Pixel 2 with Fused Video Stabilization. Here are some for you to check out:
Videos taken by two Pixel 2 phones mounted on a single hand grip. Fused Video Stabilization is disabled in the left one.
Videos taken by two Pixel 2 phones mounting on a single hand grip. Fused Video Stabilization is disabled in the left one. Note that the videographer jumped together with the subject.
Fused Video Stabilization combines the best of OIS and EIS, shows great results in camera motion smoothing and motion blur reduction, and corrects both rolling shutter and focus breathing. With Fused Video Stabilization on the Pixel 2 and Pixel 2 XL, you no longer have to carefully place the phone before recording, hold it firmly over the entire recording session, or carry a gimbal mount everywhere. The recorded video will always be stable, sharp, and ready to share.

Acknowledgements
Fused Video Stabilization is a large-scale effort across multiple teams in Google, including the camera algorithm team, sensor algorithm team, camera hardware team, and sensor hardware team.

Meet a few of our machine learning makers

From reinventing your smartphone to helping you play rock-paper-scissors, AI is changing the way we use technology today. And while we hear about all the new things we’re able to do thanks to machine learning, we don’t always hear about the people shaping machine learning itself. Who are they? What inspired them to get into the field? And what do they geek out about when it comes to AI?


To answer some of these questions, we went behind the scenes with the Google Brain team, a research group that’s part of Google AI. The Brain team works on deep learning research, a sub-field of machine learning focused on neural networks. For the non-researchers among us, a neural network is a kind of machine learning model that uses raw forms of data to understand patterns and concepts. Neural networks tackle difficult problems in computer vision, speech recognition, language understanding, and many other areas.


You can experience some of the Brain team’s work for yourself by using products like Google Photos or Translate, or by trying out our open-source machine learning framework, TensorFlow. The team also does general research in the field of AI and explores how to apply this technology to new spaces like healthcare, robotics, and even music and art.


Watch below to hear more from the team about the basics of machine learning, puzzles, and one researcher’s hope for using AI as her back-up band.

The Google Brain team

To learn more about the Brain team, check out g.co/brain, or take a look at the team’s recent Reddit AMA ( "Ask Me Anything"), as well as their AMA from 2016

Latest Innovations in TensorFlow Serving



Since initially open-sourcing TensorFlow Serving in February 2016, we’ve made some major enhancements. Let’s take a look back at where we started, review our progress, and share where we are headed next.

Before TensorFlow Serving, users of TensorFlow inside Google had to create their own serving system from scratch. Although serving might appear easy at first, one-off serving solutions quickly grow in complexity. Machine Learning (ML) serving systems need to support model versioning (for model updates with a rollback option) and multiple models (for experimentation via A/B testing), while ensuring that concurrent models achieve high throughput on hardware accelerators (GPUs and TPUs) with low latency. So we set out to create a single, general TensorFlow Serving software stack.

We decided to make it open-sourceable from the get-go, and development started in September 2015. Within a few months, we created the initial end-to-end working system and our open-source release in February 2016.

Over the past year and half, with the help of our users and partners inside and outside our company, TensorFlow Serving has advanced performance, best practices, and standards:
  • Out-of-the-box optimized serving and customizability: We now offer a pre-built canonical serving binary, optimized for modern CPUs with AVX, so developers don't need to assemble their own binary from our libraries unless they have exotic needs. At the same time, we added a registry-based framework, allowing our libraries to be used for custom (or even non-TensorFlow) serving scenarios.
  • Multi-model serving: Going from one model to multiple concurrently-served models presents several performance obstacles. We serve multiple models smoothly by (1) loading in isolated thread pools to avoid incurring latency spikes on other models taking traffic; (2) accelerating initial loading of all models in parallel upon server start-up; (3) multi-model batch interleaving to multiplex hardware accelerators (GPUs/TPUs).
  • Standardized model format: We added SavedModel to TensorFlow 1.0, giving the community a single standard model format that works across training and serving.
  • Easy-to-use inference APIs: We released easy-to-use APIs for common inference tasks (classification, regression) that we know work for a wide swathe of our applications. To support more advanced use-cases we support a lower-level tensor-based API (predict) and a new multi-inference API that enables multi-task modeling.
All of our work has been informed by close collaborations with: (a) Google’s ML SRE team, which helps ensure we are robust and meet internal SLAs; (b) other Google machine learning infrastructure teams including ads serving and TFX; (c) application teams such as Google Play; (d) our partners at the UC Berkeley RISE Lab, who explore complementary research problems with the Clipper serving system; (e) our open-source user base and contributors.

TensorFlow Serving is currently handling tens of millions of inferences per second for 1100+ of our own projects including Google’s Cloud ML Prediction. Our core serving code is available to all via our open-source releases.

Looking forward, our work is far from done and we are exploring several avenues of innovation. Today we are excited to share early progress in two experimental areas:
  • Granular batching: A key technique we employ to achieve high throughput on specialized hardware (GPUs and TPUs) is "batching": processing multiple examples jointly for efficiency. We are developing technology and best practices to improve batching to: (a) enable batching to target just the GPU/TPU portion of the computation, for maximum efficiency; (b) enable batching within recursive neural networks, used to process sequence data e.g. text and event sequences. We are experimenting with batching arbitrary sub-graphs using the Batch/Unbatch op pair.
  • Distributed model serving: We are looking at model sharding techniques as a means of handling models that are too large to fit on one server node or sharing sub-models in a memory-efficient way. We recently launched a 1TB+ model in production with good results, and hope to open-source this capability soon.
Thanks again to all of our users and partners who have contributed feedback, code and ideas. Join the project at: github.com/tensorflow/serving.

Go North to unlock the potential of AI

The 20th century brought about incredible changes to the way we work and live. There was the combustion engine, the telephone, the assembly line and, of course, the Internet. Today artificial intelligence and machine learning is expected to bring similarly profound changes. And Canada stands poised to lead the world into this new era.

Imagine you could take a picture of a mole on your arm and your smartphone could assist your dermatologist in diagnosing if it’s malignant. This is not science fiction, it’s a system Vancouver's MetaOptima is developing today - and it’s powered by artificial intelligence.

From diagnosing cancer to reducing data centre energy consumption, artificial intelligence offers a new tool for tackling real-world challenges. Today, at Google Canada’s annual Go North summit, we’re bringing together AI leaders from across Canada and around the world to help Canadian industry better understand the vast potential of this technology to transform their business, shape future innovations and improve our world.

Follow along on Twitter @GoogleCanada or tune-in to the Go North livestream (starting at 9am ET) featuring Canadian Prime Minister Justin Trudeau; Google Chairman, Eric Schmidt; founder of DeepMind, Demis Hassabis; OpenAI’s Shivon Zilis; YouTuber Taryn Southern who uses AI to compose music; Geoffrey Hinton and many more including the founder of MetaOptima, Dr. Maryam Sadeghi.
Go North will explore Canada’s success in building the superclusters of investment and innovation that have sprung up around the work of Yoshua Bengio in Montreal, Geoffrey Hinton in Toronto and Rich Sutton in Edmonton -- global leaders whose research underpins many of the advances we’re seeing in AI today. The conference will highlight the amazing advancements happening here in Canada and the opportunity for businesses to leverage the country’s rich research pedigree and to unlock the potential of AI. When it comes to artificial intelligence, Canada is having a moment - and we’ll seize it at Go North.

Google Developers Launchpad Studio works with top startups to tackle healthcare challenges with machine learning

Posted by Malika Cantor, Developer Relations Program Manager

Google is an artificial intelligence-first company. Machine Learning (ML) and Cloud are deeply embedded in our product strategies and have been crucial thus far in our efforts to tackle some of humanity's greatest challenges - like bringing high-quality, affordable, and specialized healthcare to people globally.

In that spirit, we're excited to announce the first four startups to join Launchpad Studio, our 6-month mentorship program tailored to help applied-ML startups build great products using the most advanced tools and technologies available. Working side-by-side with experts from across Google product and research teams - including Google Cloud, Verily, X, Brain, ML Research -, we intend to support these startups on their journey to build successful applications, and explore leveraging Google Cloud Platform, TensorFlow, Android, and other Google platforms. Launchpad Studio has also enlisted the expertise of a number of top industry practitioners and thought leaders to ensure Studio startups are successful in practice and long-term.

These four startups were selected based on the novel ways they've found to apply ML to important challenges in the Healthcare industry. Namely:

  1. Reducing doctor burnout and increasing doctor productivity (Augmedix)
  2. Regaining movement in paralyzed limbs (BrainQ)
  3. Accelerating clinical trials and enabling value-based healthcare (Byteflies)
  4. Detecting sepsis (CytoVale)

Let's take a closer look:

Reducing Doctor Burnout and Increasing Doctor Productivity

Numerous studies have shown that primary care physicians currently spend about half of their workday on the computer, documenting in the electronic health records (EHR).

Augmedix is on a mission to reclaim this time and repurpose it for what matters most: patient care. When doctors use the service by wearing Alphabet's Glass hardware, their documentation and administrative load is almost entirely alleviated. This saves doctors 2-3 hours per day and dramatically improves the doctor-patient experience.

Augmedix has started leveraging advances in deep learning and natural language understanding to accelerate these efficiencies and offer additional value that further improves patient care.

Regaining Movement in Paralyzed Limbs

Motor disability following neuro-disorders such as stroke, spinal cord injury, and traumatic brain injury affects tens of millions of people each year worldwide.

BrainQ's mission is to help these patients back on their feet, restoring their ability to perform activities of daily living. BrainQ is currently conducting clinical trials in leading hospitals in Israel.

The company is developing a medical device that utilizes artificial intelligence tools to identify high resolution spectral patterns in patient's brain waves, observed in electroencephalogram (EEG) sensors. These patterns are then translated into a personalized electromagnetic treatment protocol aimed at facilitating targeted neuroplasticity and enhancing patient's recovery.

Accelerating Clinical Trials and Enabling Value-Based Healthcare

Today, sensors are making it easier to collect data about health and diseases. However building a new wearable health application that is clinically validated and end-user friendly is still a daunting task. Byteflies' modular platform makes this whole process much easier and cost-effective. Through their medical and signal processing expertise, Byteflies has made advances in the interpretation of multiple synchronized vital signs. This multimodal high-resolution vital sign data is very useful for healthcare and clinical trial applications. With that level of data ingestion comes a great need for automated data processing. Byteflies plans to use ML to transform these data streams into actionable, personalized, and medically-relevant data.

Early Sepsis Detection

Research suggests that sepsis kills more Americans than breast cancer, prostate cancer, and AIDS combined. Fortunately, sepsis can often be quickly mitigated if caught early on in patient care.

CytoVale is developing a medical diagnostics platform based on cell mechanics, initially for use in early detection of sepsis in the emergency room setting. It analyzes thousands of cells' mechanical properties using ultra high speed video to diagnose disease in a few minutes. Their technology also has applications in immune activation, cancer detection, research tools, and biodefense.

CytoVale is leveraging recent advances in ML and computer vision in conjunction with their unique measurement approach to facilitate this early detection of sepsis.

More about the Program

Each startup will get tailored, equity-free support, with the goal of successfully completing a ML-focused project during the term of the program. To support this process, we provide resources, including deep engagement with engineers in Google Cloud, Google X, and other product teams, as well as Google Cloud credits. We also include both Google Cloud Platform and GSuite training in our engagement with all Studio startups.

Join Us

Based in San Francisco, Launchpad Studio is a fully tailored product development acceleration program that matches top ML startups and experts from Silicon Valley with the best of Google - its people, network, and advanced technologies - to help accelerate applied ML and AI innovation. The program's mandate is to support the growth of the ML ecosystem, and to develop product methodologies for ML.

Launchpad Studio is looking to work with the best and most game-changing ML startups from around the world. While we're currently focused on working with startups in the Healthcare and Biotech space, we'll soon be announcing other industry verticals, and any startup applying AI / ML technology to a specific industry vertical can apply on a rolling-basis.

Eager Execution: An imperative, define-by-run interface to TensorFlow

Posted by Asim Shankar and Wolff Dobson, Google Brain Team

Today, we introduce eager execution for TensorFlow.

Eager execution is an imperative, define-by-run interface where operations are executed immediately as they are called from Python. This makes it easier to get started with TensorFlow, and can make research and development more intuitive.

The benefits of eager execution include:

  • Fast debugging with immediate run-time errors and integration with Python tools
  • Support for dynamic models using easy-to-use Python control flow
  • Strong support for custom and higher-order gradients
  • Almost all of the available TensorFlow operations

Eager execution is available now as an experimental feature, so we're looking for feedback from the community to guide our direction.

To understand this all better, let's look at some code. This gets pretty technical; familiarity with TensorFlow will help.

Using Eager Execution

When you enable eager execution, operations execute immediately and return their values to Python without requiring a Session.run(). For example, to multiply two matrices together, we write this:

import tensorflow as tf
import tensorflow.contrib.eager as tfe

tfe.enable_eager_execution()

x = [[2.]]
m = tf.matmul(x, x)

It's straightforward to inspect intermediate results with print or the Python debugger.


print(m)
# The 1x1 matrix [[4.]]

Dynamic models can be built with Python flow control. Here's an example of the Collatz conjecture using TensorFlow's arithmetic operations:

a = tf.constant(12)
counter = 0
while not tf.equal(a, 1):
if tf.equal(a % 2, 0):
a = a / 2
else:
a = 3 * a + 1
print(a)

Here, the use of the tf.constant(12) Tensor object will promote all math operations to tensor operations, and as such all return values with be tensors.

Gradients

Most TensorFlow users are interested in automatic differentiation. Because different operations can occur during each call, we record all forward operations to a tape, which is then played backwards when computing gradients. After we've computed the gradients, we discard the tape.

If you're familiar with the autograd package, the API is very similar. For example:

def square(x):
return tf.multiply(x, x)

grad = tfe.gradients_function(square)

print(square(3.)) # [9.]
print(grad(3.)) # [6.]

The gradients_function call takes a Python function square() as an argument and returns a Python callable that computes the partial derivatives of square() with respect to its inputs. So, to get the derivative of square() at 3.0, invoke grad(3.0), which is 6.

The same gradients_function call can be used to get the second derivative of square:

gradgrad = tfe.gradients_function(lambda x: grad(x)[0])

print(gradgrad(3.)) # [2.]

As we noted, control flow can cause different operations to run, such as in this example.

def abs(x):
return x if x > 0. else -x

grad = tfe.gradients_function(abs)

print(grad(2.0)) # [1.]
print(grad(-2.0)) # [-1.]

Custom Gradients

Users may want to define custom gradients for an operation, or for a function. This may be useful for multiple reasons, including providing a more efficient or more numerically stable gradient for a sequence of operations.

Here is an example that illustrates the use of custom gradients. Let's start by looking at the function log(1 + ex), which commonly occurs in the computation of cross entropy and log likelihoods.

def log1pexp(x):
return tf.log(1 + tf.exp(x))
grad_log1pexp = tfe.gradients_function(log1pexp)

# The gradient computation works fine at x = 0.
print(grad_log1pexp(0.))
# [0.5]
# However it returns a `nan` at x = 100 due to numerical instability.
print(grad_log1pexp(100.))
# [nan]

We can use a custom gradient for the above function that analytically simplifies the gradient expression. Notice how the gradient function implementation below reuses an expression (tf.exp(x)) that was computed during the forward pass, making the gradient computation more efficient by avoiding redundant computation.

@tfe.custom_gradient
def log1pexp(x):
e = tf.exp(x)
def grad(dy):
return dy * (1 - 1 / (1 + e))
return tf.log(1 + e), grad
grad_log1pexp = tfe.gradients_function(log1pexp)

# Gradient at x = 0 works as before.
print(grad_log1pexp(0.))
# [0.5]
# And now gradient computation at x=100 works as well.
print(grad_log1pexp(100.))
# [1.0]

Building models

Models can be organized in classes. Here's a model class that creates a (simple) two layer network that can classify the standard MNIST handwritten digits.

class MNISTModel(tfe.Network):
def __init__(self):
super(MNISTModel, self).__init__()
self.layer1 = self.track_layer(tf.layers.Dense(units=10))
self.layer2 = self.track_layer(tf.layers.Dense(units=10))
def call(self, input):
"""Actually runs the model."""
result = self.layer1(input)
result = self.layer2(result)
return result

We recommend using the classes (not the functions) in tf.layers since they create and contain model parameters (variables). Variable lifetimes are tied to the lifetime of the layer objects, so be sure to keep track of them.

Why are we using tfe.Network? A Network is a container for layers and is a tf.layer.Layer itself, allowing Networkobjects to be embedded in other Network objects. It also contains utilities to assist with inspection, saving, and restoring.

Even without training the model, we can imperatively call it and inspect the output:

# Let's make up a blank input image
model = MNISTModel()
batch = tf.zeros([1, 1, 784])
print(batch.shape)
# (1, 1, 784)
result = model(batch)
print(result)
# tf.Tensor([[[ 0. 0., ...., 0.]]], shape=(1, 1, 10), dtype=float32)

Note that we do not need any placeholders or sessions. The first time we pass in the input, the sizes of the layers' parameters are set.

To train any model, we define a loss function to optimize, calculate gradients, and use an optimizer to update the variables. First, here's a loss function:

def loss_function(model, x, y):
y_ = model(x)
return tf.nn.softmax_cross_entropy_with_logits(labels=y, logits=y_)

And then, our training loop:

optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.001)
for (x, y) in tfe.Iterator(dataset):
grads = tfe.implicit_gradients(loss_function)(model, x, y)
optimizer.apply_gradients(grads)

implicit_gradients() calculates the derivatives of loss_function with respect to all the TensorFlow variables used during its computation.

We can move computation to a GPU the same way we've always done with TensorFlow:

with tf.device("/gpu:0"):
for (x, y) in tfe.Iterator(dataset):
optimizer.minimize(lambda: loss_function(model, x, y))

(Note: We're shortcutting storing our loss and directly calling the optimizer.minimize, but you could also use the apply_gradients() method above; they are equivalent.)

Using Eager with Graphs

Eager execution makes development and debugging far more interactive, but TensorFlow graphs have a lot of advantages with respect to distributed training, performance optimizations, and production deployment.

The same code that executes operations when eager execution is enabled will construct a graph describing the computation when it is not. To convert your models to graphs, simply run the same code in a new Python session where eager execution hasn't been enabled, as seen, for example, in the MNIST example. The value of model variables can be saved and restored from checkpoints, allowing us to move between eager (imperative) and graph (declarative) programming easily. With this, models developed with eager execution enabled can be easily exported for production deployment.

In the near future, we will provide utilities to selectively convert portions of your model to graphs. In this way, you can fuse parts of your computation (such as internals of a custom RNN cell) for high-performance, but also keep the flexibility and readability of eager execution.

How does my code change?

Using eager execution should be intuitive to current TensorFlow users. There are only a handful of eager-specific APIs; most of the existing APIs and operations work with eager enabled. Some notes to keep in mind:

  • As with TensorFlow generally, we recommend that if you have not yet switched from queues to using tf.data for input processing, you should. It's easier to use and usually faster. For help, see this blog post and the documentation page.
  • Use object-oriented layers, like tf.layer.Conv2D() or Keras layers; these have explicit storage for variables.
  • For most models, you can write code so that it will work the same for both eager execution and graph construction. There are some exceptions, such as dynamic models that use Python control flow to alter the computation based on inputs.
  • Once you invoke tfe.enable_eager_execution(), it cannot be turned off. To get graph behavior, start a new Python session.

Getting started and the future

This is still a preview release, so you may hit some rough edges. To get started today:

There's a lot more to talk about with eager execution and we're excited… or, rather, we're eager for you to try it today! Feedback is absolutely welcome.