Tag Archives: machine learning

Facets: An Open Source Visualization Tool for Machine Learning Training Data



(Cross-posted on the Google Open Source Blog)

Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.

Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model. They can easily be used inside of Jupyter notebooks or embedded into webpages. In addition to the open source code, we've also created a Facets demo website. This website allows anyone to visualize their own datasets directly in the browser without the need for any software installation or setup, without the data ever leaving your computer.

Facets Overview
Facets Overview automatically gives users a quick understanding of the distribution of values across the features of their datasets. Multiple datasets, such as a training set and a test set, can be compared on the same visualization. Common data issues that can hamper machine learning are pushed to the forefront, such as: unexpected feature values, features with high percentages of missing values, features with unbalanced distributions, and feature distribution skew between datasets.
Facets Overview visualization of the six numeric features of the UCI Census datasets[1]. The features are sorted by non-uniformity, with the feature with the most non-uniform distribution at the top. Numbers in red indicate possible trouble spots, in this case numeric features with a high percentage of values set to 0. The histograms at right allow you to compare the distributions between the training data (blue) and test data (orange).

Facets Overview visualization showing two of the nine categorical features of the UCI Census datasets[1]. The features are sorted by distribution distance, with the feature with the biggest skew between the training (blue) and test (orange) datasets at the top. Notice in the “Target” feature that the label values differ between the training and test datasets, due to a trailing period in the test set (“<=50K” vs “<=50K.”). This can be seen in the chart for the feature and also in the entries in the “top” column of the table. This label mismatch would cause a model trained and tested on this data to not be evaluated correctly.
Facets Dive
Facets Dive provides an easy-to-customize, intuitive interface for exploring the relationship between the data points across the different features of a dataset. With Facets Dive, you control the position, color and visual representation of each data point based on its feature values. If the data points have images associated with them, the images can be used as the visual representations.
Facets Dive visualization showing all 16281 data points in the UCI Census test dataset[1]. The animation shows a user coloring the data points by one feature (“Relationship”), faceting in one dimension by a continuous feature (“Age”) and then faceting in another dimension by a discrete feature (“Marital Status”).
Facets Dive visualization of a large number of face drawings from the “Quick, Draw!” Dataset, showing the relationship between the number of strokes and points in the drawings and the ability for the “Quick, Draw!” classifier to correctly categorize them as faces.
Fun Fact: In large datasets, such as the CIFAR-10 dataset[2], a small human labelling error can easily go unnoticed. We inspected the CIFAR-10 dataset with Dive and were able to catch a frog-cat – an image of a frog that had been incorrectly labelled as a cat!
Exploration of the CIFAR-10 dataset using Facets Dive. Here we facet the ground truth labels by row and the predicted labels by column. This produces a confusion matrix view, allowing us to drill into particular kinds of misclassifications. In this particular case, the ML model incorrectly labels some small percentage of true cats as frogs. The interesting thing we find by putting the real images in the confusion matrix is that one of these "true cats" that the model predicted was a frog is actually a frog from visual inspection. With Facets Dive, we can determine that this one misclassification wasn't a true misclassification of the model, but instead incorrectly labeled data in the dataset.
Can you spot the frog-cat?

We’ve gotten great value out of Facets inside of Google and are excited to share the visualizations with the world. We hope they can help you discover new and interesting things about your data that lead you to create more powerful and accurate machine learning models. And since they are open source, you can customize the visualizations for your specific needs or contribute to the project to help us all better understand our data. If you have feedback about your experience with Facets, please let us know what you think.

Acknowledgments
This work is a collaboration between Mahima Pushkarna, James Wexler and Jimbo Wilson, with input from the entire Big Picture team. We would also like to thank Justine Tunney for providing us with the build tooling.

References
[1] Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/datasets/Census+Income]. Irvine, CA: University of California, School of Information and Computer Science

[2] Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky, 2009.

Facets: An Open Source Visualization Tool for Machine Learning Training Data

Cross-posted on the Google Research Blog

Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more.

Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model. They can easily be used inside of Jupyter notebooks or embedded into webpages. In addition to the open source code, we've also created a Facets demo website. This website allows anyone to visualize their own datasets directly in the browser without the need for any software installation or setup, without the data ever leaving your computer.

Facets Overview

Facets Overview automatically gives users a quick understanding of the distribution of values across the features of their datasets. Multiple datasets, such as a training set and a test set, can be compared on the same visualization. Common data issues that can hamper machine learning are pushed to the forefront, such as: unexpected feature values, features with high percentages of missing values, features with unbalanced distributions, and feature distribution skew between datasets.
overview-numerical.png
Facets Overview visualization of the six numeric features of the UCI Census datasets[1]. The features are sorted by non-uniformity, with the feature with the most non-uniform distribution at the top. Numbers in red indicate possible trouble spots, in this case numeric features with a high percentage of values set to 0. The histograms at right allow you to compare the distributions between the training data (blue) and test data (orange).

overview-categorical-expand.png
Facets Overview visualization showing two of the nine categorical features of the UCI Census datasets[1]. The features are sorted by distribution distance, with the feature with the biggest skew between the training (blue) and test (orange) datasets at the top. Notice in the “Target” feature that the label values differ between the training and test datasets, due to a trailing period in the test set (“<=50K” vs “<=50K.”). This can be seen in the chart for the feature and also in the entries in the “top” column of the table. This label mismatch would cause a model trained and tested on this data to not be evaluated correctly.

Facets Dive

Facets Dive provides an easy-to-customize, intuitive interface for exploring the relationship between the data points across the different features of a dataset. With Facets Dive, you control the position, color and visual representation of each data point based on its feature values. If the data points have images associated with them, the images can be used as the visual representations.
facets-dive.gif
Facets Dive visualization showing all 16281 data points in the UCI Census test dataset[1]. The animation shows a user coloring the data points by one feature (“Relationship”), faceting in one dimension by a continuous feature (“Age”) and then faceting in another dimension by a discrete feature (“Marital Status”).
dive-quickdraw.png
Facets Dive visualization of a large number of face drawings from the “Quick, Draw!” Dataset, showing the relationship between the number of strokes and points in the drawings and the ability for the “Quick, Draw!” classifier to correctly categorize them as faces.

Fun Fact: In large datasets, such as the CIFAR-10 dataset[2], a small human labelling error can easily go unnoticed. We inspected the CIFAR-10 dataset with Dive and were able to catch a frog-cat – an image of a frog that had been incorrectly labelled as a cat!
cat-frogs.gif
Exploration of the CIFAR-10 dataset using Facets Dive. Here we facet the ground truth labels by row and the predicted labels by column. This produces a confusion matrix view, allowing us to drill into particular kinds of misclassifications. In this particular case, the ML model incorrectly labels some small percentage of true cats as frogs. The interesting thing we find by putting the real images in the confusion matrix is that one of these "true cats" that the model predicted was a frog is actually a frog from visual inspection. With Facets Dive, we can determine that this one misclassification wasn't a true misclassification of the model, but instead incorrectly labeled data in the dataset.
Screen Shot 2017-07-14 at 2.59.13 PM.png
Can you spot the frog-cat?
We’ve gotten great value out of Facets inside of Google and are excited to share the visualizations with the world. We hope they can help you discover new and interesting things about your data that lead you to create more powerful and accurate machine learning models. And since they are open source, you can customize the visualizations for your specific needs or contribute to the project to help us all better understand our data. If you have feedback about your experience with Facets, please let us know what you think.

By James Wexler, Senior Software Engineer, Google Big Picture Team

Acknowledgments

This work is a collaboration between Mahima Pushkarna, James Wexler and Jimbo Wilson, with input from the entire Big Picture team. We would also like to thank Justine Tunney for providing us with the build tooling.

References

[1] Lichman, M. (2013). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml/datasets/Census+Income]. Irvine, CA: University of California, School of Information and Computer Science

[2] Learning Multiple Layers of Features from Tiny Images, Alex Krizhevsky (2009).

Using Deep Learning to Create Professional-Level Photographs



Machine learning (ML) excels in many areas with well defined goals. Tasks where there exists a right or wrong answer help with the training process and allow the algorithm to achieve its desired goal, whether it be correctly identifying objects in images or providing a suitable translation from one language to another. However, there are areas where objective evaluations are not available. For example, whether a photograph is beautiful is measured by its aesthetic value, which is a highly subjective concept.
A professional(?) photograph of Jasper National Park, Canada.
To explore how ML can learn subjective concepts, we introduce an experimental deep-learning system for artistic content creation. It mimics the workflow of a professional photographer, roaming landscape panoramas from Google Street View and searching for the best composition, then carrying out various postprocessing operations to create an aesthetically pleasing image. Our virtual photographer “travelled” ~40,000 panoramas in areas like Alps, Banff and Jasper National Parks, big sur in California, and Yellowstone National Park, returned with creations that are quite impressive, some even approaching professional quality -- as judged by professional photographers.

Training the Model
While aesthetics can be modelled using datasets like AVA, using it naively to enhance photos may miss some aspect in aesthetics, such as making a photo over-saturated. Using supervised learning to learn multiple aspects in aesthetics properly, however, may require a labelled dataset that is intractable to collect.

Our approach relies only on a collection of professional quality photos, without before/after image pairs, or any additional labels. It breaks down aesthetics into multiple aspects automatically, each of which is learned individually with negative examples generated by a coupled image operation. By keeping these image operations semi-”orthogonal”, we can enhance a photo on its composition, saturation/HDR level and dramatic lighting with fast and separable optimizations:
A panorama (a) is cropped into (b), with saturation and HDR strength enhanced in (c), and with dramatic mask applied in (d). Each step is guided by one learned aspect of aesthetics.
A traditional image filter was used to generate negative training examples for saturation, HDR detail and composition. We also introduce a special operation named dramatic mask, which was created jointly while learning the concept of dramatic lighting. The negative examples were generated by applying a combination of image filters that modify brightness randomly on professional photos, degrading their appearance. For the training we use a generative adversarial network (GAN), where a generative model creates a mask to fix lighting for negative examples, while a discriminative model tries to distinguish enhanced results from the real professional ones. Unlike shape-fixed filters such as vignette, dramatic mask adds content-aware brightness adjustment to a photo. The competitive nature of GAN training leads to good variations of such suggestions. You can read more about the training details in our paper.

Results
Some creations of our system from Google Street View are shown below. As you can see, the application of the trained aesthetic filters creates some dramatic results (including the image we started this post with!):
Jasper National Park, Canada.
Interlaken, Switzerland.
Park Parco delle Orobie Bergamasche, Italy.
Jasper National Park, Canada.
Professional Evaluation
To judge how successful our algorithm was, we designed a “Turing-test”-like experiment: we mix our creations with other photos at different quality, and show them to several professional photographers. They were instructed to assign a quality score for each of them, with meaning defined as following:
  • 1: Point-and-shoot without consideration for composition, lighting etc.
  • 2: Good photos from general population without a background in photography. Nothing artistic stands out.
  • 3: Semi-pro. Great photos showing clear artistic aspects. The photographer is on the right track of becoming a professional.
  • 4: Pro.
In the following chart, each curve shows scores from professional photographers for images within a certain predicted score range. For our creations with a high predicted score, about 40% ratings they received are at “semi-pro” to “pro” levels.
Scores received from professional photographers for photos with different predicted scores.
Future Work
The Street View panoramas served as a testing bed for our project. Someday this technique might even help you to take better photos in the real world. We compiled a showcase of photos created to our satisfaction. If you see a photo you like, you can click on it to bring out a nearby Street View panorama. Would you make the same decision if you were there holding the camera at that moment?

Acknowledgements
This work was done by Hui Fang and Meng Zhang from Machine Perception at Google Research. We would like to thank Vahid Kazemi for his earlier work in predicting AVA scores using Inception network, and Sagarika Chalasani, Nick Beato, Bryan Klingner and Rupert Breheny for their help in processing Google Street View panoramas. We would like to thank Peyman Milanfar, Tomas Izo, Christian Szegedy, Jon Barron and Sergey Ioffe for their helpful reviews and comments. Huge thanks to our anonymous professional photographers!

Introducing Gradient Ventures

AI-powered technology holds a lot of promise—from improving patient health to making data centers more efficient. But while we’ve seen some amazing applications of AI so far, we know there are many more out there that haven’t even been imagined yet. And sometimes, these new ideas need support to flourish.

That’s why we’re announcing Gradient Ventures, a new venture fund from Google with technical mentorship for early-stage startups focused on artificial intelligence. Through Gradient, we’ll provide portfolio companies with capital, resources, and dedicated access to experts and bootcamps in AI. We’ll take a minority stake in the startups in which we invest.

Many members of our team are engineers, so we’re familiar with the journey from big idea to product launch. The goal is to help our portfolio companies overcome engineering challenges to create products that will apply artificial intelligence to today’s challenges and those we’ll face in the future.

Our portfolio is already growing, and our first companies are making progress, including Algorithmia, a marketplace for algorithms and functions, and Cogniac, a suite of tools used to create and manage visual models.

Through AI, yesterday's science fiction is becoming today's nonfiction. There's everything to reimagine as we usher in this new era of technology—and we're excited to work with entrepreneurs to start building it.

The Google Brain Residency Program — One Year Later



“Coming from a background in statistics, physics, and chemistry, the Google Brain Residency was my first exposure to both deep learning and serious programming. I enjoyed the autonomy that I was given to research diverse topics of my choosing: deep learning for computer vision and language, reinforcement learning, and theory. I originally intended to pursue a statistics PhD but my experience here spurred me to enroll in the Stanford CS program starting this fall!”
- Melody Guan, 2016 Google Brain Residency Alumna

This month marks the end of an incredibly successful year for our first class of the Google Brain Residency Program. This one-year program was created as an opportunity for individuals from diverse educational backgrounds and experiences to dive into research in machine learning and deep learning. Over the past year, the Residents familiarized themselves with the literature, designed and implemented experiments at Google scale, and engaged in cutting edge research in a wide variety of subjects ranging from theory to robotics to music generation.

To date, the inaugural class of Residents have published over 30 papers at leading machine learning publication venues such as ICLR (15), ICML (11), CVPR (3), EMNLP (2), RSS, GECCO, ISMIR, ISMB and Cosyne. An additional 18 papers are currently under review at NIPS, ICCV, BMVC and Nature Methods. Two of the above papers were published in Distill, exploring how deconvolution causes checkerboard artifacts and presenting ways of visualizing a generative model of handwriting.
A Distill article by residents interactively explores how a neural network generates handwriting.
A system that explores how robots can learn to imitate human motion from observation. For more details, see “Time-Contrastive Networks: Self-Supervised Learning from Multi-View Observation” (Co-authored by Resident Corey Lynch, along with P. Sermanet, , J. Hsu, S. Levine, accepted to CVPR Workshop 2017)
A model that uses reinforcement learning to train distributed deep learning networks at large scale by optimizing computations to hardware devices assignment. For more details, see “Device Placement Optimization with Reinforcement Learning” (Co-authored by Residents Azalia Mirhoseini and Hieu Pham, along with Q. Le, B. Steiner, R. Larsen, Y. Zhou, N. Kumar, M. Norouzi, S. Bengio, J. Dean, submitted to ICML 2017).
An approach to automate the process of discovering optimization methods, with a focus on deep learning architectures. Final version of the paper “Neural Optimizer Search with Reinforcement Learning” (Co-authored by Residents Irwan Bello and Barret Zoph, along with V. Vasudevan, Q. Le, submitted to ICML 2017) coming soon.
Residents have also made significant contributions to the open source community with general-purpose sequence-to-sequence models (used for example in translation), music synthesis, mimicking human sketching, subsampling a sequence for model training, an efficient “attention” mechanism for models, and time series analysis (particularly for neuroscience).

The end of the program year marks our Residents embarking on the next stages in their careers. Many are continuing their research careers on the Google Brain team as full time employees. Others have chosen to enter top machine learning Ph.D. programs at schools such as Stanford University, UC Berkeley, Cornell University, Oxford University and NYU, University of Toronto and CMU. We could not be more proud to see where their hard work and experiences will take them next!

As we “graduate” our first class, this week we welcome our next class of 35 incredibly talented Residents who have joined us from a wide range of experience and education backgrounds. We can’t wait to see how they will build on the successes of our first class and continue to push the team in new and exciting directions. We look forward to another exciting year of research and innovation ahead of us!

Applications to the 2018 Residency program will open in September 2017. To learn more about the program, visit g.co/brainresidency.

PAIR: the People + AI Research Initiative

The past few years have seen rapid advances in machine learning, with dramatic improvements in technical performance—from more accurate speech recognition, to better image search, to improved translations. But we believe AI can go much further—and be more useful to all of us—if we build systems with people in mind at the start of the process.

Today we’re announcing the People + AI Research initiative (PAIR) which brings together researchers across Google to study and redesign the ways people interact with AI systems. The goal of PAIR is to focus on the "human side" of AI: the relationship between users and technology, the new applications it enables, and how to make it broadly inclusive. The goal isn’t just to publish research; we’re also releasing open source tools for researchers and other experts to use.

PAIR's research is divided into three areas, based on different user needs:

  • Engineers and researchers: AI is built by people. How might we make it easier for engineers to build and understand machine learning systems? What educational materials and practical tools do they need?

  • Domain experts: How can AI aid and augment professionals in their work? How might we support doctors, technicians, designers, farmers, and musicians as they increasingly use AI?

  • Everyday users: How might we ensure machine learning is inclusive, so everyone can benefit from breakthroughs in AI? Can design thinking open up entirely new AI applications? Can we democratize the technology behind AI?

We don't have all the answers—that's what makes this interesting research—but we have some ideas about where to look. One key to the puzzle is design thinking. Instead of viewing AI purely as a technology, what if we imagine it as a material to design with? Here history might serve as a guide: For instance, advances in computer graphics meant more than better ways of drawing pictures—and that led to completely new kinds of interfaces and applications. You can read more in this post on what we call human-centered machine learning (HCML).We’re open sourcing new tools, creating educational materials (such as guidelines for designing AI interfaces), and publishing research to answer these questions and spread the power of AI to as many people as possible.

Open-source tools

Today we're open sourcing two visualization tools, Facets Overview and Facets Dive. These applications are aimed at AI engineers, and address the very beginning of the machine learning process. The Facets applications give engineers a clear view of the data they use to train AI systems.

We think this is important because training data is a key ingredient in modern AI systems, but it can often be a source of opacity and confusion. Indeed, one of the ways that ML engineering seems different than traditional software engineering is a stronger need to debug not just code, but data too. With Facets, engineers can more easily debug and understand what they’re building. You can read full details at our open source repository.

Supporting external research

We also acknowledge that we're not the first to see this opportunity or ask these questions. Many designers and academics have started exploring human/AI interaction. Their work inspires us; we see community-building and research support as an essential part of our mission. We’re working with a pair of visiting academics—Prof. Brendan Meade of Harvard and Prof. Hal Abelson of MIT—who are focusing on education and science in the age of AI.

Focusing on the human element in AI brings new possibilities into view. We're excited to work together to invent and explore what's possible.

Supercharge your Computer Vision models with the TensorFlow Object Detection API

Crossposted on the Google Research Blog

At Google, we develop flexible state-of-the-art machine learning (ML) systems for computer vision that not only can be used to improve our products and services, but also spur progress in the research community. Creating accurate ML models capable of localizing and identifying multiple objects in a single image remains a core challenge in the field, and we invest a significant amount of time training and experimenting with these systems.
Detected objects in a sample image (from the COCO dataset) made by one of our models.
Image credit: Michael Miley, original image
Last October, our in-house object detection system achieved new state-of-the-art results, and placed first in the COCO detection challenge. Since then, this system has generated results for a number of research publications1,2,3,4,5,6,7 and has been put to work in Google products such as NestCam, the similar items and style ideas feature in Image Search and street number and name detection in Street View.

Today we are happy to make this system available to the broader research community via the TensorFlow Object Detection API. This codebase is an open source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models.  Our goals in designing this system was to support state-of-the-art models while allowing for rapid exploration and research.  Our first release contains the following:
The SSD models that use MobileNet are lightweight, so that they can be comfortably run in real time on mobile devices. Our winning COCO submission in 2016 used an ensemble of the Faster RCNN models, which are are more computationally intensive but significantly more accurate.  For more details on the performance of these models, see our CVPR 2017 paper.

Are you ready to get started?
We’ve certainly found this code to be useful for our computer vision needs, and we hope that you will as well.  Contributions to the codebase are welcome and please stay tuned for our own further updates to the framework. To get started, download the code here and try detecting objects in some of your own images using the Jupyter notebook, or training your own pet detector on Cloud ML engine!

By Jonathan Huang, Research Scientist and Vivek Rathod, Software Engineer

Acknowledgements
The release of the Tensorflow Object Detection API and the pre-trained model zoo has been the result of widespread collaboration among Google researchers with feedback and testing from product groups. In particular we want to highlight the contributions of the following individuals:

Core Contributors: Derek Chow, Chen Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav Kovalevskyi, Kevin Murphy

Also special thanks to: Andrew Howard, Rahul Sukthankar, Vittorio Ferrari, Tom Duerig, Chuck Rosenberg, Hartwig Adam, Jing Jing Long, Victor Gomes, George Papandreou, Tyler Zhu

References
  1. Speed/accuracy trade-offs for modern convolutional object detectors, Huang et al., CVPR 2017 (paper describing this framework)
  2. Towards Accurate Multi-person Pose Estimation in the Wild, Papandreou et al., CVPR 2017
  3. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video, Real et al., CVPR 2017 (see also our blog post)
  4. Beyond Skip Connections: Top-Down Modulation for Object Detection, Shrivastava et al., arXiv preprint arXiv:1612.06851, 2016
  5. Spatially Adaptive Computation Time for Residual Networks, Figurnov et al., CVPR 2017
  6. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions, Gu et al., arXiv preprint arXiv:1705.08421, 2017
  7. MobileNets: Efficient convolutional neural networks for mobile vision applications, Howard et al., arXiv preprint arXiv:1704.04861, 2017

Supercharge your Computer Vision models with the TensorFlow Object Detection API



(Cross-posted on the Google Open Source Blog)

At Google, we develop flexible state-of-the-art machine learning (ML) systems for computer vision that not only can be used to improve our products and services, but also spur progress in the research community. Creating accurate ML models capable of localizing and identifying multiple objects in a single image remains a core challenge in the field, and we invest a significant amount of time training and experimenting with these systems.
Detected objects in a sample image (from the COCO dataset) made by one of our models. Image credit: Michael Miley, original image.
Last October, our in-house object detection system achieved new state-of-the-art results, and placed first in the COCO detection challenge. Since then, this system has generated results for a number of research publications1,2,3,4,5,6,7 and has been put to work in Google products such as NestCam, the similar items and style ideas feature in Image Search and street number and name detection in Street View.

Today we are happy to make this system available to the broader research community via the TensorFlow Object Detection API. This codebase is an open-source framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. Our goals in designing this system was to support state-of-the-art models while allowing for rapid exploration and research. Our first release contains the following:
The SSD models that use MobileNet are lightweight, so that they can be comfortably run in real time on mobile devices. Our winning COCO submission in 2016 used an ensemble of the Faster RCNN models, which are are more computationally intensive but significantly more accurate. For more details on the performance of these models, see our CVPR 2017 paper.

Are you ready to get started?
We’ve certainly found this code to be useful for our computer vision needs, and we hope that you will as well. Contributions to the codebase are welcome and please stay tuned for our own further updates to the framework. To get started, download the code here and try detecting objects in some of your own images using the Jupyter notebook, or training your own pet detector on Cloud ML engine!

Acknowledgements
The release of the Tensorflow Object Detection API and the pre-trained model zoo has been the result of widespread collaboration among Google researchers with feedback and testing from product groups. In particular we want to highlight the contributions of the following individuals:

Core Contributors: Derek Chow, Chen Sun, Menglong Zhu, Matthew Tang, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Jasper Uijlings, Viacheslav Kovalevskyi, Kevin Murphy

Also special thanks to: Andrew Howard, Rahul Sukthankar, Vittorio Ferrari, Tom Duerig, Chuck Rosenberg, Hartwig Adam, Jing Jing Long, Victor Gomes, George Papandreou, Tyler Zhu

References
  1. Speed/accuracy trade-offs for modern convolutional object detectors, Huang et al., CVPR 2017 (paper describing this framework)
  2. Towards Accurate Multi-person Pose Estimation in the Wild, Papandreou et al., CVPR 2017
  3. YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video, Real et al., CVPR 2017 (see also our blog post)
  4. Beyond Skip Connections: Top-Down Modulation for Object Detection, Shrivastava et al., arXiv preprint arXiv:1612.06851, 2016
  5. Spatially Adaptive Computation Time for Residual Networks, Figurnov et al., CVPR 2017
  6. AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions, Gu et al., arXiv preprint arXiv:1705.08421, 2017
  7. MobileNets: Efficient convolutional neural networks for mobile vision applications, Howard et al., arXiv preprint arXiv:1704.04861, 2017

How Google Cloud is transforming Japanese businesses

This week, we welcomed 13,000 executives, developers, IT managers and partners to our largest Asia-Pacific Cloud event, Google Cloud Next Tokyo. During this event, we celebrated the many ways that Japanese companies such as Kewpie, Sony (and even cucumber farmers) have transformed and scaled their businesses using Google Cloud. 

Since the launch of the Google Cloud Tokyo region last November, roughly 40 percent of Google Compute Engine core hour usage in Tokyo is from customers new to Google Cloud Platform (GCP). The number of new customers using Compute Engine has increased by an average of 21 percent monthly over the last three months, and the total number of paid customers in Japan has increased by 70 percent over the last year.

By supplying compliance statements and documents for FISC — an important Japanese compliance standard — for both GCP and G Suite, we’re making it easier to do business with Google Cloud in Japan.

Here are a few of the exciting announcements that came out of Next Tokyo:

Retailers embracing enterprise innovation  

One of the biggest retailers in Japan, FamilyMart, will work with Google’s Professional Services Organization to transform the way it works, reform its store operations, and build a retail model for the next generation. FamilyMart is using G Suite to facilitate a collaborative culture and transform its business to embrace an ever-changing landscape. Furthermore, it plans to use big data analysis and machine learning to develop new ways of managing store operations. The project, — dubbed “Famima 10x” — kicks off by introducing G Suite to facilitate a more flexible work style and encourage a more collaborative, innovative culture. 

Modernizing food production with cloud computing, data analytics and machine learning

Kewpie, a major food manufacturer in Japan famous for their mayonnaise, takes high standards of food production seriously. For its baby food, it used to depend on human eyes to evaluate 4 - 5 tons of food materials daily, per factory, to root out bad potato cubes — a labor-intensive task that required intense focus on the production line. But over the course of six months, Kewpie has tested Cloud Machine Learning Engine and TensorFlow to help identify the bad cubes. The results of the tests were so successful that Kewpie adopted the technology.

Empowering employees to conduct effective data analysis

Sony Network Communications Inc. is a division of Sony Group that develops and operates cloud services and applications for Sony group companies. It converted from Hive/Hadoop to BigQuery and established a data analysis platform based on BigQuery, called Private Data Management Platform. This not only reduces data preparation and maintenance costs, but also allows a wide range of employees — from data scientists to those who are only familiar with SQL — to conduct effective data analysis, which in turn made its data-driven business more productive than before.

Collaborating with partners

During Next Tokyo, we announced five new Japanese partners that will help Google Cloud better serve customers.

  • NTT Communications Corporation is a respected Japanese cloud solution provider and new Google Cloud partner that helps enterprises worldwide optimize their information and communications technology environments. GCP will connect with NTT Communications’ Enterprise Cloud, and NTT Communications plans to develop new services utilizing Google Cloud’s big data analysis and machine intelligence solutions. NTT Communications will use both G Suite and GCP to run its own business and will use its experiences to help both Japanese and international enterprises.

  • KDDI is already a key partner for G Suite and Chrome devices and will offer GCP to the Japanese market this summer, in addition to an expanded networking partnership.

  • Softbank has been a G Suite partner since 2011 and will expand the collaboration with Google Cloud to include solutions utilizing GCP in its offerings. As part of the collaboration, Softbank plans to link GCP with its own “White Cloud” service in addition to promoting next-generation workplaces with G Suite.

  • SORACOM, which uses cellular and LoRaWAN networks to provide connectivity for IoT devices, announced two new integrations with GCP. SORACOM Beam, its data transfer support service, now supports Google Cloud IoT Core, and SORACOM Funnel, its cloud resource adapter service, enables constrained devices to send messages to Google Cloud Pub/Sub. This means that a small, battery-powered sensor can keep sending data to GCP by LoRaWAN for months, for example.

Create Cloud Spanner instances in Tokyo

Cloud Spanner is the world’s first horizontally-scalable and strongly-consistent relational database service. It became generally available in May, delivering long-term value for our customers with mission-critical applications in the cloud, including customer authentication systems, business-transaction and inventory-management systems, and high-volume media systems that require low latency and high throughput. Starting today, customers can store data and create Spanner instances directly in our Tokyo region.

Jamboard coming to Japan in 2018

At Next Tokyo, businesses discussed how they can use technology to improve productivity, and make it easier for employees to work together. Jamboard, a digital whiteboard designed specifically for the cloud, allows employees to sketch their ideas whiteboard-style on a brilliant 4k display, and drop images, add notes and pull things directly from the web while they collaborate with team members from anywhere. This week, we announced that Jamboard will be generally available in Japan in 2018.

Why Japanese companies are choosing Google Cloud

For Kewpie, Sony and FamilyMart, Google’s track record building secure infrastructure all over the world was an important consideration for their move to Google Cloud. From energy-efficient data centers to custom servers to custom networking gear to a software-defined global backbone to specialized ASICs for machine learning, Google has been living cloud at scale for more than 15 years—and we bring all of it to bear in Google Cloud.

We hope to see many of you as we go on the road to meet with customers and partners, and encourage you to learn more about upcoming Google Cloud events.

How Google Cloud is transforming Japanese businesses

This week, we welcomed 13,000 executives, developers, IT managers and partners to our largest Asia-Pacific Cloud event, Google Cloud Next Tokyo. During this event, we celebrated the many ways that Japanese companies such as Kewpie, Sony (and even cucumber farmers) have transformed and scaled their businesses using Google Cloud. 

Since the launch of the Google Cloud Tokyo region last November, roughly 40 percent of Google Compute Engine core hour usage in Tokyo is from customers new to Google Cloud Platform (GCP). The number of new customers using Compute Engine has increased by an average of 21 percent monthly over the last three months, and the total number of paid customers in Japan has increased by 70 percent over the last year.

By supplying compliance statements and documents for FISC — an important Japanese compliance standard — for both GCP and G Suite, we’re making it easier to do business with Google Cloud in Japan.

Here are a few of the exciting announcements that came out of Next Tokyo:

Retailers embracing enterprise innovation  

One of the biggest retailers in Japan, FamilyMart, will work with Google’s Professional Services Organization to transform the way it works, reform its store operations, and build a retail model for the next generation. FamilyMart is using G Suite to facilitate a collaborative culture and transform its business to embrace an ever-changing landscape. Furthermore, it plans to use big data analysis and machine learning to develop new ways of managing store operations. The project, — dubbed “Famima 10x” — kicks off by introducing G Suite to facilitate a more flexible work style and encourage a more collaborative, innovative culture. 

Modernizing food production with cloud computing, data analytics and machine learning

Kewpie, a major food manufacturer in Japan famous for their mayonnaise, takes high standards of food production seriously. For its baby food, it used to depend on human eyes to evaluate 4 - 5 tons of food materials daily, per factory, to root out bad potato cubes — a labor-intensive task that required intense focus on the production line. But over the course of six months, Kewpie has tested Cloud Machine Learning Engine and TensorFlow to help identify the bad cubes. The results of the tests were so successful that Kewpie adopted the technology.

Empowering employees to conduct effective data analysis

Sony Network Communications Inc. is a division of Sony Group that develops and operates cloud services and applications for Sony group companies. It converted from Hive/Hadoop to BigQuery and established a data analysis platform based on BigQuery, called Private Data Management Platform. This not only reduces data preparation and maintenance costs, but also allows a wide range of employees — from data scientists to those who are only familiar with SQL — to conduct effective data analysis, which in turn made its data-driven business more productive than before.

Collaborating with partners

During Next Tokyo, we announced five new Japanese partners that will help Google Cloud better serve customers.

  • NTT Communications Corporation is a respected Japanese cloud solution provider and new Google Cloud partner that helps enterprises worldwide optimize their information and communications technology environments. GCP will connect with NTT Communications’ Enterprise Cloud, and NTT Communications plans to develop new services utilizing Google Cloud’s big data analysis and machine intelligence solutions. NTT Communications will use both G Suite and GCP to run its own business and will use its experiences to help both Japanese and international enterprises.

  • KDDI is already a key partner for G Suite and Chrome devices and will offer GCP to the Japanese market this summer, in addition to an expanded networking partnership.

  • Softbank has been a G Suite partner since 2011 and will expand the collaboration with Google Cloud to include solutions utilizing GCP in its offerings. As part of the collaboration, Softbank plans to link GCP with its own “White Cloud” service in addition to promoting next-generation workplaces with G Suite.

  • SORACOM, which uses cellular and LoRaWAN networks to provide connectivity for IoT devices, announced two new integrations with GCP. SORACOM Beam, its data transfer support service, now supports Google Cloud IoT Core, and SORACOM Funnel, its cloud resource adapter service, enables constrained devices to send messages to Google Cloud Pub/Sub. This means that a small, battery-powered sensor can keep sending data to GCP by LoRaWAN for months, for example.

Create Cloud Spanner instances in Tokyo

Cloud Spanner is the world’s first horizontally-scalable and strongly-consistent relational database service. It became generally available in May, delivering long-term value for our customers with mission-critical applications in the cloud, including customer authentication systems, business-transaction and inventory-management systems, and high-volume media systems that require low latency and high throughput. Starting today, customers can store data and create Spanner instances directly in our Tokyo region.

Jamboard coming to Japan in 2018

At Next Tokyo, businesses discussed how they can use technology to improve productivity, and make it easier for employees to work together. Jamboard, a digital whiteboard designed specifically for the cloud, allows employees to sketch their ideas whiteboard-style on a brilliant 4k display, and drop images, add notes and pull things directly from the web while they collaborate with team members from anywhere. This week, we announced that Jamboard will be generally available in Japan in 2018.

Why Japanese companies are choosing Google Cloud

For Kewpie, Sony and FamilyMart, Google’s track record building secure infrastructure all over the world was an important consideration for their move to Google Cloud. From energy-efficient data centers to custom servers to custom networking gear to a software-defined global backbone to specialized ASICs for machine learning, Google has been living cloud at scale for more than 15 years—and we bring all of it to bear in Google Cloud.

We hope to see many of you as we go on the road to meet with customers and partners, and encourage you to learn more about upcoming Google Cloud events.