Tag Archives: TensorFlow

MobileNets: Open-Source Models for Efficient On-Device Vision



Deep learning has fueled tremendous progress in the field of computer vision in recent years, with neural networks repeatedly pushing the frontier of visual recognition technology. While many of those technologies such as object, landmark, logo and text recognition are provided for internet-connected devices through the Cloud Vision API, we believe that the ever-increasing computational power of mobile devices can enable the delivery of these technologies into the hands of our users, anytime, anywhere, regardless of internet connection. However, visual recognition for on device and embedded applications poses many challenges — models must run quickly with high accuracy in a resource-constrained environment making use of limited computation, power and space.

Today we are pleased to announce the release of MobileNets, a family of mobile-first computer vision models for TensorFlow, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application. MobileNets are small, low-latency, low-power models parameterized to meet the resource constraints of a variety of use cases. They can be built upon for classification, detection, embeddings and segmentation similar to how other popular large scale models, such as Inception, are used.
Example use cases include detection, fine-grain classification, attributes and geo-localization.
This release contains the model definition for MobileNets in TensorFlow using TF-Slim, as well as 16 pre-trained ImageNet classification checkpoints for use in mobile projects of all sizes. The models can be run efficiently on mobile devices with TensorFlow Mobile.

Model Checkpoint
Million MACs
Million Parameters
Top-1 Accuracy
Top-5 Accuracy
569
4.24
70.7
89.5
418
4.24
69.3
88.9
291
4.24
67.2
87.5
186
4.24
64.1
85.3
317
2.59
68.4
88.2
233
2.59
67.4
87.3
162
2.59
65.2
86.1
104
2.59
61.8
83.6
150
1.34
64.0
85.4
110
1.34
62.1
84.0
77
1.34
59.9
82.5
49
1.34
56.2
79.6
41
0.47
50.6
75.0
34
0.47
49.0
73.6
21
0.47
46.0
70.7
14
0.47
41.3
66.2
Choose the right MobileNet model to fit your latency and size budget. The size of the network in memory and on disk is proportional to the number of parameters. The latency and power usage of the network scales with the number of Multiply-Accumulates (MACs) which measures the number of fused Multiplication and Addition operations. Top-1 and Top-5 accuracies are measured on the ILSVRC dataset.
We are excited to share MobileNets with the open-source community. Information for getting started can be found at the TensorFlow-Slim Image Classification Library. To learn how to run models on-device please go to TensorFlow Mobile. You can read more about the technical details of MobileNets in our paper, MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications.

Acknowledgements
MobileNets were made possible with the hard work of many engineers and researchers throughout Google. Specifically we would like to thank:

Core Contributors: Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, Hartwig Adam

Special thanks to: Benoit Jacob, Skirmantas Kligys, George Papandreou, Liang-Chieh Chen, Derek Chow, Sergio Guadarrama, Jonathan Huang, Andre Hentz, Pete Warden

Introducing the TensorFlow Research Cloud

Posted by Zak Stone, Product Manager for TensorFlow
Researchers require enormous computational resources to train the machine learning (ML) models that have delivered recent breakthroughs in medical imaging, neural machine translation, game playing, and many other domains. We believe that significantly larger amounts of computation will make it possible for researchers to invent new types of ML models that will be even more accurate and useful.
To accelerate the pace of open machine-learning research, we are introducing the TensorFlow Research Cloud (TFRC), a cluster of 1,000 Cloud TPUs that will be made available free of charge to support a broad range of computationally-intensive research projects that might not be possible otherwise.
The TensorFlow Research Cloud offers researchers the following benefits:
  • Access to Google's all-new Cloud TPUs that accelerate both training and inference
  • Up to 180 teraflops of floating-point performance per Cloud TPU
  • 64 GB of ultra-high-bandwidth memory per Cloud TPU
  • Familiar TensorFlow programming interfaces
You can sign up here to request to be notified when the TensorFlow Research Cloud application process opens, and you can optionally share more information about your computational needs. We plan to evaluate applications on a rolling basis in search of the most creative and ambitious proposals.
The TensorFlow Research Cloud program is not limited to academia — we recognize that people with a wide range of affiliations, roles, and expertise are making major machine learning research contributions, and we especially encourage those with non-traditional backgrounds to apply. Access will be granted to selected individuals for limited amounts of compute time, and researchers are welcome to apply multiple times with multiple projects.
Since the main goal of the TensorFlow Research Cloud is to benefit the open machine learning research community as a whole, successful applicants will be expected to do the following:
  • Share their TFRC-supported research with the world through peer-reviewed publications, open-source code, blog posts, or other open media
  • Share concrete, constructive feedback with Google to help us improve the TFRC program and the underlying Cloud TPU platform over time
  • Imagine a future in which ML acceleration is abundant and develop new kinds of machine learning models in anticipation of that future
For businesses interested in using Cloud TPUs for proprietary research and development, we will offer a parallel Cloud TPU Alpha program. You can sign up here to learn more about this program. We recommend participating in the Cloud TPU Alpha program if you are interested in any of the following:
  • Accelerating training of proprietary ML models; models that take weeks to train on other hardware can be trained in days or even hours on Cloud TPUs
  • Accelerating batch processing of industrial-scale datasets: images, videos, audio, unstructured text, structured data, etc.
  • Processing live requests in production using larger and more complex ML models than ever before
We hope the TensorFlow Research Cloud will allow as many researchers as possible to explore the frontier of machine learning research and extend it with new discoveries! We encourage you to sign up today to be among the first to know as more information becomes available.

What’s New in Android: O Developer Preview 2 & More

Posted by: Dave Burke, VP of Engineering

android-o-logo.png
With billions of Android devices around the world, Android has surpassed our wildest expectations. Today at Google I/O, we showcased a number of ways we’re pushing Android forward, with the O Release, new tools for developers to help create more performant apps, and an early preview of a project we call Android Go -- a new experience that we’re building for entry-level devices.
Fluid experiences in Android O
It's pretty incredible what you can do on mobile devices today, and how easy it is to rely on them as computers in our pockets. In the O release we've focused on creating fluid experiences that make Android even more powerful and easy to use, and today we highlighted some of those:
  • Picture-in-picture: lets users manage two tasks simultaneously, whether it’s video calling your friend while checking your calendar, or reading a new recipe while watching a video on a specific cooking technique. We’ve designed PIP to provide seamless multitasking on any size screen, and it’s easy for apps to support it.
  • Notification dots extend the reach of notifications, a new way for developers to surface activity in their app, driving engagement. Built on our unique and highly regarded notification system, dots work with zero effort for most apps - we even extract the color of the dot from your icon. 
  • Autofill with Google simplifies setting up a new device and synchronizing passwords by bringing Chrome's Autofill feature to Android. Once a user opts-in, Autofill will work out-of-the-box for most apps. Developers can optimize their apps for Autofill by providing hints about the type of data expected or add support in custom views. 
  • A new homescreen for Android TV makes it easy for users to find, preview, and watch content provided by apps. Apps can publish one or more channels, and users can control the channels that appear on the homescreen. You’ll be able to get started with creating channels using the new TvProvider support library APIs
  • Smart Text Selection: In Android O, we’re applying on-device machine learning to copy/paste, to let Android recognize entities like addresses, URLs, telephone numbers, and email addresses. This makes the copy/paste experience better by selecting the entire entity and surfacing the right apps to carry out an action based on the type of entity.
  • TensorFlow Lite: As Android continues to take advantage of machine learning to improve the user experience, we want our developer partners to be able to do the same. Today we shared an early look at TensorFlow Lite, an upcoming project based on TensorFlow, Google’s open source machine learning library. TensorFlow Lite is specifically designed to be fast and lightweight for embedded use cases. Since many on-device scenarios require real-time performance, we’re also working on a new Neural Network API that TensorFlow can take advantage of to accelerate computation. We plan to make both of these available to developers in a maintenance update to O later this year, so stay tuned!  
(L) Android O: Picture-in-picture, (R) Android O: Notification dots

Working on the Vitals in Android
We think Android’s foundations are critical, so we’re investing in Android Vitals, a project focused on optimizing battery life, startup time, graphic rendering time, and stability. Today we showcased some of the work we’ve done so far, and introduced new tools to help developers understand power, performance, and reliability issues in their apps:
  • System optimizations: in Android O, we’ve done a lot of work across the system to make apps run faster and smoother. For example we made extensive changes in our runtime - including new optimizations like concurrent compacting garbage collection, code locality, and more. 
  • Background limits: up to now it’s been fairly easy for apps to unintentionally overuse resources while they’re in the background, and this can adversely affect the performance of the system. So in O, we've introduced new limits on background location and wi-fi scans, and changes in the way apps run in the background. These boundaries prevent overuse -- they’re about increasing battery life and freeing up memory.
  • New Android Vitals Dashboards in the Play Console: today we launched six Play Console dashboards to help you pinpoint common issues in your apps - excessive crash rate, ANR rate, frozen frames, slow rendering, excessive wakeups, and stuck wake locks, including how many users are affected, with guidance on the best way to address the issues. You can visit the Play Console today to see your app's data, then learn how to address any issues.
Android Go
Part of Android’s mission is to bring computing to everyone. We’re excited about seeing more users come online for the first time as the price of entry level smart phones drop, and we want to help manufacturers continue to offer lower-cost devices that provide a great experience for these users. Today we gave a sneak peek of a new experience that we’re building specifically for Android devices that have 1GB or less of memory -- Internally we call it “Android Go,” and it’s designed around three things
  • OS: We’re optimizing Android O to run smoothly and efficiently on entry-level devices
  • Apps: We’re also designing Google apps to use less memory, storage space, and mobile data, including apps such as YouTube Go, Chrome, and Gboard. 
  • Play: On entry-level devices, Play store will promote a better user experience by highlighting apps that are specifically designed for these devices -- such as apps that use less memory, storage space, and mobile data -- while still giving users access to the entire app catalog.
The Android Go experience will ship in 2018 for all Android devices that have 1GB or less of memory. We recommend getting your apps ready for these devices soon -- take a look at the Building for Billions to learn about the importance of offering a useful offline state, reducing APK size, and minimizing battery and memory use.

O Developer Preview 2, Now in Public Beta
Today’s release of O Developer Preview 2 is our first beta-quality candidate, available to test on your primary phone or tablet. We’re inviting those who want to try the beta release of Android O to enroll now at android.com/beta -- it’s an incredibly convenient way to preview Android O on your Nexus 5X, 6P, and Player, as well as Pixel, Pixel XL, or Pixel C device.

With more users starting to get Android O on their devices through the Android Beta program, now is the time to test your apps for compatibility, resolve any issues, and publish an update as soon as possible. See the migration guide for steps and a recommended timeline.

Later today you’ll be able to download the updated tools for developing on Android O, including the latest canaries of Android Studio, SDK, and tools, Android O system images, and emulators. Along with those, you’ll be able to download support library 26.0.0 beta and other libraries from our new Maven repo. The change to Maven from SDK Manager means a slight change to your build configuration, but gives you much more flexibility in how you integrate library updates with your CI systems.

When you’re ready to get started developing with Android O, visit the O Developer Preview site for details on all of the features you can use in your apps, including notification channels and dots, picture-in-picture, autofill, and others. APIs have changed since the first developer preview, so take a look at the diff report to see where your code might be affected.

Thanks for the feedback you’ve given us so far. Please keep it coming, about Android O features, APIs, issues, or requests -- see the Feedback and Bugs page for details on where to report feedback.

Introducing the TensorFlow Research Cloud



Researchers require enormous computational resources to train the machine learning (ML) models that have delivered recent breakthroughs in medical imaging, neural machine translation, game playing, and many other domains. We believe that significantly larger amounts of computation will make it possible for researchers to invent new types of ML models that will be even more accurate and useful.

To accelerate the pace of open machine-learning research, we are introducing the TensorFlow Research Cloud (TFRC), a cluster of 1,000 Cloud TPUs that will be made available free of charge to support a broad range of computationally-intensive research projects that might not be possible otherwise.
The TensorFlow Research Cloud offers researchers the following benefits:
  • Access to Google’s all-new Cloud TPUs that accelerate both training and inference
  • Up to 180 teraflops of floating-point performance per Cloud TPU
  • 64 GB of ultra-high-bandwidth memory per Cloud TPU
  • Familiar TensorFlow programming interfaces
You can sign up here to request to be notified when the TensorFlow Research Cloud application process opens, and you can optionally share more information about your computational needs. We plan to evaluate applications on a rolling basis in search of the most creative and ambitious proposals.

The TensorFlow Research Cloud program is not limited to academia — we recognize that people with a wide range of affiliations, roles, and expertise are making major machine learning research contributions, and we especially encourage those with non-traditional backgrounds to apply. Access will be granted to selected individuals for limited amounts of compute time, and researchers are welcome to apply multiple times with multiple projects.
Since the main goal of the TensorFlow Research Cloud is to benefit the open machine learning research community as a whole, successful applicants will be expected to do the following:
  • Share their TFRC-supported research with the world through peer-reviewed publications, open-source code, blog posts, or other open media
  • Share concrete, constructive feedback with Google to help us improve the TFRC program and the underlying Cloud TPU platform over time
  • Imagine a future in which ML acceleration is abundant and develop new kinds of machine learning models in anticipation of that future
For businesses interested in using Cloud TPUs for proprietary research and development, we will offer a parallel Cloud TPU Alpha program. You can sign up here to learn more about this program. We recommend participating in the Cloud TPU Alpha program if you are interested in any of the following:
  • Accelerating training of proprietary ML models; models that take weeks to train on other hardware can be trained in days or even hours on Cloud TPUs
  • Accelerating batch processing of industrial-scale datasets: images, videos, audio, unstructured text, structured data, etc.
  • Processing live requests in production using larger and more complex ML models than ever before
We hope the TensorFlow Research Cloud will allow as many researchers as possible to explore the frontier of machine learning research and extend it with new discoveries! We encourage you to sign up today to be among the first to know as more information becomes available.

TensorFlow Benchmarks and a New High-Performance Guide

Posted by Josh Gordon on behalf of the TensorFlow team

We recently published a collection of performance benchmarks that highlight TensorFlow's speed and scalability when training image classification models, like InceptionV3 and ResNet, on a variety of hardware and configurations.

To help you build highly scalable models, we've also added a new High-Performance Models guide to the performance site on tensorflow.org. Together with the guide, we hope these benchmarks and associated scripts will serve as a reference point as you tune your code, and help you get the most performance from your new and existing hardware.

When running benchmarks, we tested using both real and synthetic data. We feel this is important to show, as it exercises both the compute and input pipelines, and is more representative of real-world performance numbers than testing with synthetic data alone. For transparency, we've also shared our scripts and methodology.

Collected below are highlights of TensorFlow's performance when training with an NVIDIA® DGX-1™, as well as with 64 NVIDIA® Tesla® K80 GPUs running in a distributed configuration. In-depth results, including details like batch-size and configurations used for the various platforms we tested, are available on the benchmarks site.

Training with NVIDIA® DGX-1™ (8 NVIDIA® Tesla® P100s)

Our benchmarks show that TensorFlow has nearly linear scaling on an NVIDIA® DGX-1™for training image classification models with synthetic data. With 8 NVIDIA Tesla P100s, we report a speedup of 7.99x (99% efficiency) for InceptionV3 and 7.91x (98% efficiency) for ResNet-50, compared to using a single GPU.
The following are results comparing training with synthetic and real data. The benchmark results show a small difference between training data placed statically on the GPU (synthetic) and executing the full input pipeline with data from ImageNet. One strength of TensorFlow is the ability of its input pipeline to saturate state-of-the-art compute units with large inputs.




Training with NVIDIA® Tesla® K80 (Single server, 8 GPUs)

With 8 NVIDIA® Tesla® K80s in a single-server configuration, TensorFlow has a 7.4x speedup on Inception v3 (93% efficiency) and a 7.4x speedup on ResNet-50, compared to a single GPU. For this benchmark, we used Google Compute Engine instances.

Distributed Training with NVIDIA® Tesla® K80 (up to 64 GPUs)

With 64 Tesla K80s running on Amazon EC2 instancesin a distributed configuration, TensorFlow has a 59x speedup (92% efficiency) for InceptionV3 and a 52x speedup (82% efficiency) for ResNet-50 using synthetic data.

Discussion

During our testing of the DGX-1 and other platforms, we explored a variety of configurations using NCCL, a collective communications library, part of the NVIDIA Deep Learning SDK. Our hypothesis before testing began was that replicating the variables across GPUs and syncing them with NCCL would be the optimal approach. The results were not always as expected. Optimal configurations varied depending not only on the GPU, but also on the platform and model tested. On the DGX-1, for example, VGG16 and AlexNet performed best when replicating the variables on each of the GPUs and updating them using NCCL, while InceptionV3 and ResNet performed best when placing the shared variable on the CPU. These intricacies highlight the need for comprehensive benchmarking. Models have to be tuned for each platform, and a one size fits all approach is likely to result in suboptimal performance in many cases.

To get peak performance, it is necessary to benchmark with a mix of settings to determine which ones are likely to perform best on each platform. The script that accompanies the article on creating High-Performance Models was created not only to illustrate how to achieve the highest performance, but also as a tool to benchmark a platform with a variety of settings. The benchmarks page lists the configurations that we found which provided optimal performance for the platforms tested.

As many people have pointed out in response to various benchmarks that have been performed on other platforms, increases to samples per second does not necessarily correlate to faster convergence, and as batch sizes increase it can be more difficult to converge to the highest accuracy levels.

As a team, we hope to do future tests that focus on time to convergence to high levels of accuracy. We hope these numbers and the guide will prove useful to you as you tune your code for performance.

We'd like to thank NVIDIA for sharing a DGX-1 for benchmark testing and for their technical assistance. We're looking forward to NVIDIA's upcoming Voltaarchitecture, and to working closely with them to optimize TensorFlow's performance there, and to expand support for FP16.

Thanks for reading, and as always, we look forward to working with you on forums like GitHub issues, Stack Overflow, the [email protected]list, and @TensorFlow.

Updating Google Maps with Deep Learning and Street View



Every day, Google Maps provides useful directions, real-time traffic information and information on businesses to millions of people. In order to provide the best experience for our users, this information has to constantly mirror an ever-changing world. While Street View cars collect millions of images daily, it is impossible to manually analyze more than 80 billion high resolution images collected to date in order to find new, or updated, information for Google Maps. One of the goals of the Google’s Ground Truth team is to enable the automatic extraction of information from our geo-located imagery to improve Google Maps.

In “Attention-based Extraction of Structured Information from Street View Imagery”, we describe our approach to accurately read street names out of very challenging Street View images in many countries, automatically, using a deep neural network. Our algorithm achieves 84.2% accuracy on the challenging French Street Name Signs (FSNS) dataset, significantly outperforming the previous state-of-the-art systems. Importantly, our system is easily extensible to extract other types of information out of Street View images as well, and now helps us automatically extract business names from store fronts. We are excited to announce that this model is now publicly available!
Example of street name from the FSNS dataset correctly transcribed by our system. Up to four views of the same sign are provided.
Text recognition in a natural environment is a challenging computer vision and machine learning problem. While traditional Optical Character Recognition (OCR) systems mainly focus on extracting text from scanned documents, text acquired from natural scenes is more challenging due to visual artifacts, such as distortion, occlusions, directional blur, cluttered background or different viewpoints. Our efforts to solve this research challenge first began in 2008, when we used neural networks to blur faces and license plates in Street View images to protect the privacy of our users. From this initial research, we realized that with enough labeled data, we could additionally use machine learning not only to protect the privacy of our users, but also to automatically improve Google Maps with relevant up-to-date information.

In 2014, Google’s Ground Truth team published a state-of-the-art method for reading street numbers on the Street View House Numbers (SVHN) dataset, implemented by then summer intern (now Googler) Ian Goodfellow. This work was not only of academic interest but was critical in making Google Maps more accurate. Today, over one-third of addresses globally have had their location improved thanks to this system. In some countries, such as Brazil, this algorithm has improved more than 90% of the addresses in Google Maps today, greatly improving the usability of our maps.

The next logical step was to extend these techniques to street names. To solve this problem, we created and released French Street Name Signs (FSNS), a large training dataset of more than 1 million street names. The FSNS dataset was a multi-year effort designed to allow anyone to improve their OCR models on a challenging and real use case. FSNS dataset is much larger and more challenging than SVHN in that accurate recognition of street signs may require combining information from many different images.
These are examples of challenging signs that are properly transcribed by our system by selecting or combining understanding across images. The second example is extremely challenging by itself, but the model learned a language model prior that enables it to remove ambiguity and correctly read the street name.
With this training set, Google intern Zbigniew Wojna spent the summer of 2016 developing a deep learning model architecture to automatically label new Street View imagery. One of the interesting strengths of our new model is that it can normalize the text to be consistent with our naming conventions, as well as ignore extraneous text, directly from the data itself.
Example of text normalization learned from data in Brazil. Here it changes “AV.” into “Avenida” and “Pres.” into “Presidente” which is what we desire.
In this example, the model is not confused from the fact that there is two street names, properly normalizes “Av” into “Avenue” as well as correctly ignores the number “1600”.
While this model is accurate, it did show a sequence error rate of 15.8%. However, after analyzing failure cases, we found that 48% of them were due to ground truth errors, highlighting the fact that this model is on par with the label quality (a full analysis our error rate can be found in our paper).

This new system, combined with the one extracting street numbers, allows us to create new addresses directly from imagery, where we previously didn’t know the name of the street, or the location of the addresses. Now, whenever a Street View car drives on a newly built road, our system can analyze the tens of thousands of images that would be captured, extract the street names and numbers, and properly create and locate the new addresses, automatically, on Google Maps.

But automatically creating addresses for Google Maps is not enough -- additionally we want to be able to provide navigation to businesses by name. In 2015, we published “Large Scale Business Discovery from Street View Imagery”, which proposed an approach to accurately detect business store-front signs in Street View images. However, once a store front is detected, one still needs to accurately extract its name for it to be useful -- the model must figure out which text is the business name, and which text is not relevant. We call this extracting “structured text” information out of imagery. It is not just text, it is text with semantic meaning attached to it.

Using different training data, the same model architecture that we used to read street names can also be used to accurately extract business names out of business facades. In this particular case, we are able to only extract the business name which enables us to verify if we already know about this business in Google Maps, allowing us to have more accurate and up-to-date business listings.
The system is correctly able to predict the business name ‘Zelina Pneus’, despite not receiving any data about the true location of the name in the image. Model is not confused by the tire brands that the sign indicates are available at the store.
Applying these large models across our more than 80 billion Street View images requires a lot of computing power. This is why the Ground Truth team was the first user of Google's TPUs, which were publicly announced earlier this year, to drastically reduce the computational cost of the inferences of our pipeline.

People rely on the accuracy of Google Maps in order to assist them. While keeping Google Maps up-to-date with the ever-changing landscape of cities, roads and businesses presents a technical challenge that is far from solved, it is the goal of the Ground Truth team to drive cutting-edge innovation in machine learning to create a better experience for over one billion Google Maps users.

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow



(Crossposted on the Google Open Source Blog)

Last year, we announced Google Neural Machine Translation (GNMT), a sequence-to-sequence (“seq2seq”) model which is now used in Google Translate production systems. While GNMT achieved huge improvements in translation quality, its impact was limited by the fact that the framework for training these models was unavailable to external researchers.

Today, we are excited to introduce tf-seq2seq, an open source seq2seq framework in TensorFlow that makes it easy to experiment with seq2seq models and achieve state-of-the-art results. To that end, we made the tf-seq2seq codebase clean and modular, maintaining full test coverage and documenting all of its functionality.

Our framework supports various configurations of the standard seq2seq model, such as depth of the encoder/decoder, attention mechanism, RNN cell type, or beam size. This versatility allowed us to discover optimal hyperparameters and outperform other frameworks, as described in our paper, “Massive Exploration of Neural Machine Translation Architectures.”
A seq2seq model translating from Mandarin to English. At each time step, the encoder takes in one Chinese character and its own previous state (black arrow), and produces an output vector (blue arrow). The decoder then generates an English translation word-by-word, at each time step taking in the last word, the previous state, and a weighted combination of all the outputs of the encoder (aka attention [3], depicted in blue) and then producing the next English word. Please note that in our implementation we use wordpieces [4] to handle rare words.
In addition to machine translation, tf-seq2seq can also be applied to any other sequence-to-sequence task (i.e. learning to produce an output sequence given an input sequence), including machine summarization, image captioning, speech recognition, and conversational modeling. We carefully designed our framework to maintain this level of generality and provide tutorials, preprocessed data, and other utilities for machine translation.

We hope that you will use tf-seq2seq to accelerate (or kick off) your own deep learning research. We also welcome your contributions to our GitHub repository, where we have a variety of open issues that we would love to have your help with!

Acknowledgments:
We’d like to thank Eugene Brevdo, Melody Guan, Lukasz Kaiser, Quoc V. Le, Thang Luong, and Chris Olah for all their help. For a deeper dive into how seq2seq models work, please see the resources below.

References:
[1] Massive Exploration of Neural Machine Translation Architectures, Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le
[2] Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le. NIPS, 2014
[3] Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. ICLR, 2015
[4] Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. Technical Report, 2016
[5] Attention and Augmented Recurrent Neural Networks, Chris Olah, Shan Carter. Distill, 2016
[6] Neural Machine Translation and Sequence-to-sequence Models: A Tutorial, Graham Neubig
[7] Sequence-to-Sequence Models, TensorFlow.org

Introducing tf-seq2seq: An Open Source Sequence-to-Sequence Framework in TensorFlow

Crossposted on the Google Research Blog

Last year, we announced Google Neural Machine Translation (GNMT), a sequence-to-sequence (“seq2seq”) model which is now used in Google Translate production systems. While GNMT achieved huge improvements in translation quality, its impact was limited by the fact that the framework for training these models was unavailable to external researchers.

Today, we are excited to introduce tf-seq2seq, an open source seq2seq framework in TensorFlow that makes it easy to experiment with seq2seq models and achieve state-of-the-art results. To that end, we made the tf-seq2seq codebase clean and modular, maintaining full test coverage and documenting all of its functionality.

Our framework supports various configurations of the standard seq2seq model, such as depth of the encoder/decoder, attention mechanism, RNN cell type, or beam size. This versatility allowed us to discover optimal hyperparameters and outperform other frameworks, as described in our paper, “Massive Exploration of Neural Machine Translation Architectures.”

A seq2seq model translating from Mandarin to English. At each time step, the encoder takes in one Chinese character and its own previous state (black arrow), and produces an output vector (blue arrow). The decoder then generates an English translation word-by-word, at each time step taking in the last word, the previous state, and a weighted combination of all the outputs of the encoder (aka attention [3], depicted in blue) and then producing the next English word. Please note that in our implementation we use wordpieces [4] to handle rare words.
In addition to machine translation, tf-seq2seq can also be applied to any other sequence-to-sequence task (i.e. learning to produce an output sequence given an input sequence), including machine summarization, image captioning, speech recognition, and conversational modeling. We carefully designed our framework to maintain this level of generality and provide tutorials, preprocessed data, and other utilities for machine translation.

We hope that you will use tf-seq2seq to accelerate (or kick off) your own deep learning research. We also welcome your contributions to our GitHub repository, where we have a variety of open issues that we would love to have your help with!

Acknowledgments:
We’d like to thank Eugene Brevdo, Melody Guan, Lukasz Kaiser, Quoc V. Le, Thang Luong, and Chris Olah for all their help. For a deeper dive into how seq2seq models work, please see the resources below.

References:
[1] Massive Exploration of Neural Machine Translation Architectures, Denny Britz, Anna Goldie, Minh-Thang Luong, Quoc Le
[2] Sequence to Sequence Learning with Neural Networks, Ilya Sutskever, Oriol Vinyals, Quoc V. Le. NIPS, 2014
[3] Neural Machine Translation by Jointly Learning to Align and Translate, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio. ICLR, 2015
[4] Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, Yuan Cao, Qin Gao, Klaus Macherey, Jeff Klingner, Apurva Shah, Melvin Johnson, Xiaobing Liu, Łukasz Kaiser, Stephan Gouws, Yoshikiyo Kato, Taku Kudo, Hideto Kazawa, Keith Stevens, George Kurian, Nishant Patil, Wei Wang, Cliff Young, Jason Smith, Jason Riesa, Alex Rudnick, Oriol Vinyals, Greg Corrado, Macduff Hughes, Jeffrey Dean. Technical Report, 2016
[5] Attention and Augmented Recurrent Neural Networks, Chris Olah, Shan Carter. Distill, 2016
[6] Neural Machine Translation and Sequence-to-sequence Models: A Tutorial, Graham Neubig
[7] Sequence-to-Sequence Models, TensorFlow.org

By Anna Goldie and Denny Britz, Research Software Engineer and Google Brain Resident, Google Brain Team

An Upgrade to SyntaxNet, New Models and a Parsing Competition

At Google, we continuously improve the language understanding capabilities used in applications ranging from generation of email responses to translation. Last summer, we open-sourced SyntaxNet, a neural-network framework for analyzing and understanding the grammatical structure of sentences. Included in our release was Parsey McParseface, a state-of-the-art model that we had trained for analyzing English, followed quickly by a collection of pre-trained models for 40 additional languages, which we dubbed Parsey's Cousins. While we were excited to share our research and to provide these resources to the broader community, building machine learning systems that work well for languages other than English remains an ongoing challenge. We are excited to announce a few new research resources, available now, that address this problem.

SyntaxNet Upgrade
We are releasing a major upgrade to SyntaxNet. This upgrade incorporates nearly a year’s worth of our research on multilingual language understanding, and is available to anyone interested in building systems for processing and understanding text. At the core of the upgrade is a new technology that enables learning of richly layered representations of input sentences. More specifically, the upgrade extends TensorFlow to allow joint modeling of multiple levels of linguistic structure, and to allow neural-network architectures to be created dynamically during processing of a sentence or document.

Our upgrade makes it, for example, easy to build character-based models that learn to compose individual characters into words (e.g. ‘c-a-t’ spells ‘cat’). By doing so, the models can learn that words can be related to each other because they share common parts (e.g. ‘cats’ is the plural of ‘cat’ and shares the same stem; ‘wildcat’ is a type of ‘cat’). Parsey and Parsey’s Cousins, on the other hand, operated over sequences of words. As a result, they were forced to memorize words seen during training and relied mostly on the context to determine the grammatical function of previously unseen words.

As an example, consider the following (meaningless but grammatically correct) sentence:
This sentence was originally coined by Andrew Ingraham who explained: “You do not know what this means; nor do I. But if we assume that it is English, we know that the doshes are distimmed by the gostak. We know too that one distimmer of doshes is a gostak." Systematic patterns in morphology and syntax allow us to guess the grammatical function of words even when they are completely novel: we understand that ‘doshes’ is the plural of the noun ‘dosh’ (similar to the ‘cats’ example above) or that ‘distim’ is the third person singular of the verb distim. Based on this analysis we can then derive the overall structure of this sentence even though we have never seen the words before.

ParseySaurus
To showcase the new capabilities provided by our upgrade to SyntaxNet, we are releasing a set of new pretrained models called ParseySaurus. These models use the character-based input representation mentioned above and are thus much better at predicting the meaning of new words based both on their spelling and how they are used in context. The ParseySaurus models are far more accurate than Parsey’s Cousins (reducing errors by as much as 25%), particularly for morphologically-rich languages like Russian, or agglutinative languages like Turkish and Hungarian. In those languages there can be dozens of forms for each word and many of these forms might never be observed during training - even in a very large corpus.

Consider the following fictitious Russian sentence, where again the stems are meaningless, but the suffixes define an unambiguous interpretation of the sentence structure:
Even though our Russian ParseySaurus model has never seen these words, it can correctly analyze the sentence by inspecting the character sequences which constitute each word. In doing so, the system can determine many properties of the words (notice how many more morphological features there are here than in the English example). To see the sentence as ParseySaurus does, here is a visualization of how the model analyzes this sentence:
Each square represents one node in the neural network graph, and lines show the connections between them. The left-side “tail” of the graph shows the model consuming the input as one long string of characters. These are intermittently passed to the right side, where the rich web of connections shows the model composing words into phrases and producing a syntactic parse. Check out the full-size rendering here.

A Competition
You might be wondering whether character-based modeling are all we need or whether there are other techniques that might be important. SyntaxNet has lots more to offer, like beam search and different training objectives, but there are of course also many other possibilities. To find out what works well in practice we are helping co-organize, together with Charles University and other colleagues, a multilingual parsing competition at this year’s Conference on Computational Natural Language Learning (CoNLL) with the goal of building syntactic parsing systems that work well in real-world settings and for 45 different languages.

The competition is made possible by the Universal Dependencies (UD) initiative, whose goal is to develop cross-linguistically consistent treebanks. Because machine learned models can only be as good as the data that they have access to, we have been contributing data to UD since 2013. For the competition, we partnered with UD and DFKI to build a new multilingual evaluation set consisting of 1000 sentences that have been translated into 20+ different languages and annotated by linguists with parse trees. This evaluation set is the first of its kind (in the past, each language had its own independent evaluation set) and will enable more consistent cross-lingual comparisons. Because the sentences have the same meaning and have been annotated according to the same guidelines, we will be able to get closer to answering the question of which languages might be harder to parse.

We hope that the upgraded SyntaxNet framework and our the pre-trained ParseySaurus models will inspire researchers to participate in the competition. We have additionally created a tutorial showing how to load a Docker image and train models on the Google Cloud Platform, to facilitate participation by smaller teams with limited resources. So, if you have an idea for making your own models with the SyntaxNet framework, sign up to compete! We believe that the configurations that we are releasing are a good place to start, but we look forward to seeing how participants will be able to extend and improve these models or perhaps create better ones!

Thanks to everyone involved who made this competition happen, including our collaborators at UD-Pipe, who provide another baseline implementation to make it easy to enter the competition. Happy parsing from the main developers, Chris Alberti, Daniel Andor, Ivan Bogatyy, Mark Omernick, Zora Tung and Ji Ma!

By David Weiss and Slav Petrov, Research Scientists

An Upgrade to SyntaxNet, New Models and a Parsing Competition



At Google, we continuously improve the language understanding capabilities used in applications ranging from generation of email responses to translation. Last summer, we open-sourced SyntaxNet, a neural-network framework for analyzing and understanding the grammatical structure of sentences. Included in our release was Parsey McParseface, a state-of-the-art model that we had trained for analyzing English, followed quickly by a collection of pre-trained models for 40 additional languages, which we dubbed Parsey's Cousins. While we were excited to share our research and to provide these resources to the broader community, building machine learning systems that work well for languages other than English remains an ongoing challenge. We are excited to announce a few new research resources, available now, that address this problem.

SyntaxNet Upgrade
We are releasing a major upgrade to SyntaxNet. This upgrade incorporates nearly a year’s worth of our research on multilingual language understanding, and is available to anyone interested in building systems for processing and understanding text. At the core of the upgrade is a new technology that enables learning of richly layered representations of input sentences. More specifically, the upgrade extends TensorFlow to allow joint modeling of multiple levels of linguistic structure, and to allow neural-network architectures to be created dynamically during processing of a sentence or document.

Our upgrade makes it, for example, easy to build character-based models that learn to compose individual characters into words (e.g. ‘c-a-t’ spells ‘cat’). By doing so, the models can learn that words can be related to each other because they share common parts (e.g. ‘cats’ is the plural of ‘cat’ and shares the same stem; ‘wildcat’ is a type of ‘cat’). Parsey and Parsey’s Cousins, on the other hand, operated over sequences of words. As a result, they were forced to memorize words seen during training and relied mostly on the context to determine the grammatical function of previously unseen words.

As an example, consider the following (meaningless but grammatically correct) sentence:
This sentence was originally coined by Andrew Ingraham who explained: “You do not know what this means; nor do I. But if we assume that it is English, we know that the doshes are distimmed by the gostak. We know too that one distimmer of doshes is a gostak." Systematic patterns in morphology and syntax allow us to guess the grammatical function of words even when they are completely novel: we understand that ‘doshes’ is the plural of the noun ‘dosh’ (similar to the ‘cats’ example above) or that ‘distim’ is the third person singular of the verb distim. Based on this analysis we can then derive the overall structure of this sentence even though we have never seen the words before.

ParseySaurus
To showcase the new capabilities provided by our upgrade to SyntaxNet, we are releasing a set of new pretrained models called ParseySaurus. These models use the character-based input representation mentioned above and are thus much better at predicting the meaning of new words based both on their spelling and how they are used in context. The ParseySaurus models are far more accurate than Parsey’s Cousins (reducing errors by as much as 25%), particularly for morphologically-rich languages like Russian, or agglutinative languages like Turkish and Hungarian. In those languages there can be dozens of forms for each word and many of these forms might never be observed during training - even in a very large corpus.

Consider the following fictitious Russian sentence, where again the stems are meaningless, but the suffixes define an unambiguous interpretation of the sentence structure:
Even though our Russian ParseySaurus model has never seen these words, it can correctly analyze the sentence by inspecting the character sequences which constitute each word. In doing so, the system can determine many properties of the words (notice how many more morphological features there are here than in the English example). To see the sentence as ParseySaurus does, here is a visualization of how the model analyzes this sentence:
Each square represents one node in the neural network graph, and lines show the connections between them. The left-side “tail” of the graph shows the model consuming the input as one long string of characters. These are intermittently passed to the right side, where the rich web of connections shows the model composing words into phrases and producing a syntactic parse. Check out the full-size rendering here.

A Competition
You might be wondering whether character-based modeling are all we need or whether there are other techniques that might be important. SyntaxNet has lots more to offer, like beam search and different training objectives, but there are of course also many other possibilities. To find out what works well in practice, we are helping co-organize a multilingual parsing competition at this year’s Conference on Computational Natural Language Learning (CoNLL) with the goal of building syntactic parsing systems that work well in real-world settings and for 45 different languages.

The competition is made possible by the Universal Dependencies (UD) initiative, whose goal is to develop cross-linguistically consistent treebanks. Because machine learned models can only be as good as the data that they have access to, we have been contributing data to UD since 2013. For the competition, we partnered with UD and DFKI to build a new multilingual evaluation set consisting of 1000 sentences that have been translated into 20+ different languages and annotated by linguists with parse trees. This evaluation set is the first of its kind (in the past, each language had its own independent evaluation set) and will enable more consistent cross-lingual comparisons. Because the sentences have the same meaning and have been annotated according to the same guidelines, we will be able to get closer to answering the question of which languages might be harder to parse.

We hope that the upgraded SyntaxNet framework and our the pre-trained ParseySaurus models will inspire researchers to participate in the competition. We have additionally created a tutorial showing how to load a Docker image and train models on the Google Cloud Platform, to facilitate participation by smaller teams with limited resources. So, if you have an idea for making your own models with the SyntaxNet framework, sign up to compete! We believe that the configurations that we are releasing are a good place to start, but we look forward to seeing how participants will be able to extend and improve these models or perhaps create better ones!

Thanks to everyone involved who made this competition happen, including our collaborators at UD-Pipe, who provide another baseline implementation to make it easy to enter the competition. Happy parsing from the main developers, Chris Alberti, Daniel Andor, Ivan Bogatyy, Mark Omernick, Zora Tung and Ji Ma!