Tag Archives: TensorFlow

MobileNetV2: The Next Generation of On-Device Computer Vision Networks



Last year we introduced MobileNetV1, a family of general purpose computer vision neural networks designed with mobile devices in mind to support classification, detection and more. The ability to run deep networks on personal mobile devices improves user experience, offering anytime, anywhere access, with additional benefits for security, privacy, and energy consumption. As new applications emerge allowing users to interact with the real world in real time, so does the need for ever more efficient neural networks.

Today, we are pleased to announce the availability of MobileNetV2 to power the next generation of mobile vision applications. MobileNetV2 is a significant improvement over MobileNetV1 and pushes the state of the art for mobile visual recognition including classification, object detection and semantic segmentation. MobileNetV2 is released as part of TensorFlow-Slim Image Classification Library, or you can start exploring MobileNetV2 right away in Colaboratory. Alternately, you can download the notebook and explore it locally using Jupyter. MobileNetV2 is also available as modules on TF-Hub, and pretrained checkpoints can be found on github.

MobileNetV2 builds upon the ideas from MobileNetV1 [1], using depthwise separable convolution as efficient building blocks. However, V2 introduces two new features to the architecture: 1) linear bottlenecks between the layers, and 2) shortcut connections between the bottlenecks1. The basic structure is shown below.
Overview of MobileNetV2 Architecture. Blue blocks represent composite convolutional building blocks as shown above.
The intuition is that the bottlenecks encode the model’s intermediate inputs and outputs while the inner layer encapsulates the model’s ability to transform from lower-level concepts such as pixels to higher level descriptors such as image categories. Finally, as with traditional residual connections, shortcuts enable faster training and better accuracy. You can learn more about the technical details in our paper, “MobileNet V2: Inverted Residuals and Linear Bottlenecks”.

How does it compare to the first generation of MobileNets?
Overall, the MobileNetV2 models are faster for the same accuracy across the entire latency spectrum. In particular, the new models use 2x fewer operations, need 30% fewer parameters and are about 30-40% faster on a Google Pixel phone than MobileNetV1 models, all while achieving higher accuracy.
MobileNetV2 improves speed (reduced latency) and increased ImageNet Top 1 accuracy
MobileNetV2 is a very effective feature extractor for object detection and segmentation. For example, for detection when paired with the newly introduced SSDLite [2] the new model is about 35% faster with the same accuracy than MobileNetV1. We have open sourced the model under the Tensorflow Object Detection API [4].

Model
Params
Multiply-Adds
mAP
Mobile CPU
MobileNetV1 + SSDLite
5.1M
1.3B
22.2%
270ms
4.3M
0.8B
22.1%
200ms

To enable on-device semantic segmentation, we employ MobileNetV2 as a feature extractor in a reduced form of DeepLabv3 [3], that was announced recently. On the semantic segmentation benchmark, PASCAL VOC 2012, our resulting model attains a similar performance as employing MobileNetV1 as feature extractor, but requires 5.3 times fewer parameters and 5.2 times fewer operations in terms of Multiply-Adds.

Model
Params
Multiply-Adds
mIOU
MobileNetV1 + DeepLabV3
11.15M
14.25B
75.29%
2.11M
2.75B
75.32%

As we have seen MobileNetV2 provides a very efficient mobile-oriented model that can be used as a base for many visual recognition tasks. We hope by sharing it with the broader academic and open-source community we can help to advance research and application development.

Acknowledgements:
We would like to acknowledge our core contributors Menglong Zhu, Andrey Zhmoginov and Liang-Chieh Chen. We also give special thanks to Bo Chen, Dmitry Kalenichenko, Skirmantas Kligys, Mathew Tang, Weijun Wang, Benoit Jacob, George Papandreou, Zhichao Lu, Vivek Rathod, Jonathan Huang, Yukun Zhu, and Hartwig Adam.

References
  1. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications, Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H, arXiv:1704.04861, 2017.
  2. MobileNetV2: Inverted Residuals and Linear Bottlenecks, Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. arXiv preprint. arXiv:1801.04381, 2018.
  3. Rethinking Atrous Convolution for Semantic Image Segmentation, Chen LC, Papandreou G, Schroff F, Adam H. arXiv:1706.05587, 2017.
  4. Speed/accuracy trade-offs for modern convolutional object detectors, Huang J, Rathod V, Sun C, Zhu M, Korattikara A, Fathi A, Fischer I, Wojna Z, Song Y, Guadarrama S, Murphy K, CVPR 2017.
  5. Deep Residual Learning for Image Recognition, He, Kaiming, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. arXiv:1512.03385,2015


1 The shortcut (also known as skip) connections, popularized by ResNets[5] are commonly used to connect the non-bottleneck layers. MobilenNetV2 inverts this notion and connects the bottlenecks directly.

Source: Google AI Blog


Announcing TensorRT integration with TensorFlow 1.7

Posted by Laurence Moroney (Google) and Siddarth Sharma (NVIDIA)

Today we are announcing integration of NVIDIA® TensorRTTM and TensorFlow. TensorRT is a library that optimizes deep learning models for inference and creates a runtime for deployment on GPUs in production environments. It brings a number of FP16 and INT8 optimizations to TensorFlow and automatically selects platform specific kernels to maximize throughput and minimizes latency during inference on GPUs. We are excited about the new integrated workflow as it simplifies the path to use TensorRT from within TensorFlow with world-class performance. In our tests, we found that ResNet-50 performed 8x faster under 7 ms latency with the TensorFlow-TensorRT integration using NVIDIA Volta Tensor Cores as compared with running TensorFlow only.

Sub-Graph Optimizations within TensorFlow

Now in TensorFlow 1.7, TensorRT optimizes compatible sub-graphs and let's TensorFlow execute the rest. This approach makes it possible to rapidly develop models with the extensive TensorFlow feature set while getting powerful optimizations with TensorRT when performing inference. If you were already using TensorRT with TensorFlow models, you know that certain unsupported TensorFlow layers had to be imported manually, which in some cases could be time consuming.

From a workflow perspective, you need to ask TensorRT to optimize TensorFlow's sub-graphs and replace each subgraph with a TensorRT optimized node. The output of this step is a frozen graph that can then be used in TensorFlow as before.

During inference, TensorFlow executes the graph for all supported areas, and calls TensorRT to execute TensorRT optimized nodes. As an example, if your graph has 3 segments, A, B and C. Segment B is optimized by TensorRT and replaced by a single node. During inference, TensorFlow executes A, then calls TensorRT to execute B, and then TensorFlow executes C.

The newly added TensorFlow API to optimize TensorRT takes the frozen TensorFlow graph, applies optimizations to sub-graphs and sends back to TensorFlow a TensorRT inference graph with optimizations applied. See the code below as an example.

# Reserve memory for TensorRT inference engine
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction = number_between_0_and_1)
...
trt_graph = trt.create_inference_graph(
input_graph_def = frozen_graph_def,
outputs = output_node_name,
max_batch_size=batch_size,
max_workspace_size_bytes=workspace_size,
precision_mode=precision) # Get optimized graph

The per_process_gpu_memory_fraction parameter defines the fraction of GPU memory that TensorFlow is allowed to use, the remaining being available for TensorRT. This parameter should be set the first time the TensorFlow-TensorRT process is started. As an example, a value of 0.67 would allocate 67% of GPU memory for TensorFlow and the remaining 33 % for TensorRT engines.

The create_inference_graph function takes a frozen TensorFlow graph and returns an optimized graph with TensorRT nodes. Let's look at the function's parameters:

  • input_graph_def: frozen TensorFlow graph
  • outputs: list of strings with names of output nodes e.g. ["resnet_v1_50/predictions/Reshape_1"]
  • max_batch_size: integer, size of input batch e.g. 16
  • max_workspace_size_bytes: integer, maximum GPU memory size available for TensorRT
  • precision_mode: string, allowed values "FP32", "FP16" or "INT8"

As an example, if the GPU has 12GB memory, in order to allocate ~4GB for TensorRT engines, set the per_process_gpu_memory_fraction parameter to ( 12 - 4 ) / 12 = 0.67 and the max_workspace_size_bytes parameter to 4000000000.

Lets apply the new API to ResNet-50 and see what the optimized model looks like in TensorBoard. The complete code to run the example is available in . The image on the left is ResNet-50 without TensorRT optimizations and the right image is after. In this case, most of the graph gets optimized by TensorRT and replaced by a single node (highlighted).

Optimized INT8 Inference performance

TensorRT provides capabilities to take models trained in single (FP32) and half (FP16) precision and convert them for deployment with INT8 quantizations at reduced precision with minimal accuracy loss. INT8 models compute faster and place lower requirements on bandwidth but present a challenge in representing weights and activations of neural networks because of the reduced dynamic range available.

Dynamic Range Minimum Positive Value
FP32 -3.4×1038 ~ +3.4×1038 1.4 × 10−45
FP16 65504 ~ +65504 5.96 x 10-8
INT8 -128 ~ +127 1

To address this, TensorRT uses a calibration process that minimizes the information loss when approximating the FP32 network with a limited 8-bit integer representation. With the new integration, after optimizing the TensorFlow graph with TensorRT, you can pass the graph to TensorRT for calibration as below.

trt_graph=trt.calib_graph_to_infer_graph(calibGraph)

The rest of the inference workflow remains unchanged from above. The output of this step is a frozen graph that is executed by TensorFlow as described earlier.

Automatically use Tensor Cores on NVIDIA Volta GPUs

TensorRT runs half precision TensorFlow models on Tensor Cores in VOLTA GPUs for inference. Tensor Cores, provide 8x more throughput than single precision math pipelines. Half precision (also known as FP16) data compared to higher precision FP32 vs FP64 reduces memory usage of the neural network. This allows training and deployment of larger networks, and FP16 data transfers take less time than FP32 or FP64 transfers.

Each Tensor Core performs D = A x B + C, where A, B, C and D are matrices. A and B are half-precision 4x4 matrices, whereas D and C can be either half or single precision 4x4 matrices. The peak performance of Tensor Cores on the V100 is about an order of magnitude (10x) faster than double precision (FP64) and about 4 times faster than single precision (FP32).

Availability

We are excited about this release and will continue to work closely with NVIDIA to enhance this integration. We expect the new solutions ensure the highest performance possible while maintaining the ease and flexibility of TensorFlow. And as TensorRT supports more networks, you will automatically benefit from the updates without any changes to your code.

To get the new solution, you can use the standard pip install process once TensorFlow 1.7 is released:

pip install tensorflow-gpu r1.7

Till then, find detailed installation instructions here: https://github.com/tensorflow/tensorflow/tree/r1.7/tensorflow/contrib/tensorrt

Try it out and let us know what you think!

Using Deep Learning to Facilitate Scientific Image Analysis



Many scientific imaging applications, especially microscopy, can produce terabytes of data per day. These applications can benefit from recent advances in computer vision and deep learning. In our work with biologists on robotic microscopy applications (e.g., to distinguish cellular phenotypes) we've learned that assembling high quality image datasets that separate signal from noise is a difficult but important task. We've also learned that there are many scientists who may not write code, but who are still excited to utilize deep learning in their image analysis work. A particular challenge we can help address involves dealing with out-of-focus images. Even with the autofocus systems on state-of-the-art microscopes, poor configuration or hardware incompatibility may result in image quality issues. Having an automated way to rate focus quality can enable the detection, troubleshooting and removal of such images.

Deep Learning to the Rescue
In “Assessing Microscope Image Focus Quality with Deep Learning”, we trained a deep neural network to rate the focus quality of microscopy images with higher accuracy than previous methods. We also integrated the pre-trained TensorFlow model with plugins in Fiji (ImageJ) and CellProfiler, two leading open source scientific image analysis tools that can be used with either a graphical user interface or invoked via scripts.
A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.
Our publication and source code (TensorFlow, Fiji, CellProfiler) illustrate the basics of a machine learning project workflow: assembling a training dataset (we synthetically defocused 384 in-focus images of cells, avoiding the need for a hand-labeled dataset), training a model using data augmentation, evaluating generalization (in our case, on unseen cell types acquired by an additional microscope) and deploying the pre-trained model. Previous tools for identifying image focus quality often require a user to manually review images for each dataset to determine a threshold between in and out-of-focus images; our pre-trained model requires no user set parameters to use, and can rate focus quality more accurately as well. To help improve interpretability, our model evaluates focus quality on 84×84 pixel patches which can be visualized with colored patch borders.

What about Images without Objects?
An interesting challenge we overcame was that there are often "blank" image patches with no objects, a scenario where no notion of focus quality exists. Instead of explicitly labeling these "blank" patches and teaching our model to recognize them as a separate category, we configured our model to predict a probability distribution across defocus levels, allowing it to learn to express uncertainty (dim borders in the figure) for these empty patches (e.g. predict equal probability in/out-of-focus).

What's Next?
Deep learning-based approaches for scientific image analysis will improve accuracy, reduce manual parameter tuning and may reveal new insights. Clearly, the sharing and availability of datasets and models, and implementation into tools that are proven to be useful within respective communities, will be important for widespread adoption.

Acknowledgements
We thank Claire McQuin, Allen Goodman, Anne Carpenter of the Broad Institute and Kevin Eliceiri of the University of Wisconsin at Madison for assistance with CellProfiler and Fiji integration, respectively.

Source: Google AI Blog


Using Deep Learning to Facilitate Scientific Image Analysis



Many scientific imaging applications, especially microscopy, can produce terabytes of data per day. These applications can benefit from recent advances in computer vision and deep learning. In our work with biologists on robotic microscopy applications (e.g., to distinguish cellular phenotypes) we've learned that assembling high quality image datasets that separate signal from noise is a difficult but important task. We've also learned that there are many scientists who may not write code, but who are still excited to utilize deep learning in their image analysis work. A particular challenge we can help address involves dealing with out-of-focus images. Even with the autofocus systems on state-of-the-art microscopes, poor configuration or hardware incompatibility may result in image quality issues. Having an automated way to rate focus quality can enable the detection, troubleshooting and removal of such images.

Deep Learning to the Rescue
In “Assessing Microscope Image Focus Quality with Deep Learning”, we trained a deep neural network to rate the focus quality of microscopy images with higher accuracy than previous methods. We also integrated the pre-trained TensorFlow model with plugins in Fiji (ImageJ) and CellProfiler, two leading open source scientific image analysis tools that can be used with either a graphical user interface or invoked via scripts.
A pre-trained TensorFlow model rates focus quality for a montage of microscope image patches of cells in Fiji (ImageJ). Hue and lightness of the borders denote predicted focus quality and prediction uncertainty, respectively.
Our publication and source code (TensorFlow, Fiji, CellProfiler) illustrate the basics of a machine learning project workflow: assembling a training dataset (we synthetically defocused 384 in-focus images of cells, avoiding the need for a hand-labeled dataset), training a model using data augmentation, evaluating generalization (in our case, on unseen cell types acquired by an additional microscope) and deploying the pre-trained model. Previous tools for identifying image focus quality often require a user to manually review images for each dataset to determine a threshold between in and out-of-focus images; our pre-trained model requires no user set parameters to use, and can rate focus quality more accurately as well. To help improve interpretability, our model evaluates focus quality on 84×84 pixel patches which can be visualized with colored patch borders.

What about Images without Objects?
An interesting challenge we overcame was that there are often "blank" image patches with no objects, a scenario where no notion of focus quality exists. Instead of explicitly labeling these "blank" patches and teaching our model to recognize them as a separate category, we configured our model to predict a probability distribution across defocus levels, allowing it to learn to express uncertainty (dim borders in the figure) for these empty patches (e.g. predict equal probability in/out-of-focus).

What's Next?
Deep learning-based approaches for scientific image analysis will improve accuracy, reduce manual parameter tuning and may reveal new insights. Clearly, the sharing and availability of datasets and models, and implementation into tools that are proven to be useful within respective communities, will be important for widespread adoption.

Acknowledgements
We thank Claire McQuin, Allen Goodman, Anne Carpenter of the Broad Institute and Kevin Eliceiri of the University of Wisconsin at Madison for assistance with CellProfiler and Fiji integration, respectively.

Semantic Image Segmentation with DeepLab in TensorFlow

Cross-posted on the Google Research Blog.

Semantic image segmentation, the task of assigning a semantic label, such as “road”, “sky”, “person”, “dog”, to every pixel in an image enables numerous new applications, such as the synthetic shallow depth-of-field effect shipped in the portrait mode of the Pixel 2 and Pixel 2 XL smartphones and mobile real-time video segmentation. Assigning these semantic labels requires pinpointing the outline of objects, and thus imposes much stricter localization accuracy requirements than other visual entity recognition tasks such as image-level classification or bounding box-level detection.


Today, we are excited to announce the open source release of our latest and best performing semantic image segmentation model, DeepLab-v3+ [1]*, implemented in TensorFlow. This release includes DeepLab-v3+ models built on top of a powerful convolutional neural network (CNN) backbone architecture [2, 3] for the most accurate results, intended for server-side deployment. As part of this release, we are additionally sharing our TensorFlow model training and evaluation code, as well as models already pre-trained on the Pascal VOC 2012 and Cityscapes benchmark semantic segmentation tasks.

Since the first incarnation of our DeepLab model [4] three years ago, improved CNN feature extractors, better object scale modeling, careful assimilation of contextual information, improved training procedures, and increasingly powerful hardware and software have led to improvements with DeepLab-v2 [5] and DeepLab-v3 [6]. With DeepLab-v3+, we extend DeepLab-v3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further apply the depthwise separable convolution to both atrous spatial pyramid pooling [5, 6] and decoder modules, resulting in a faster and stronger encoder-decoder network for semantic segmentation.


Modern semantic image segmentation systems built on top of convolutional neural networks (CNNs) have reached accuracy levels that were hard to imagine even five years ago, thanks to advances in methods, hardware, and datasets. We hope that publicly sharing our system with the community will make it easier for other groups in academia and industry to reproduce and further improve upon state-of-art systems, train models on new datasets, and envision new applications for this technology.

By Liang-Chieh Chen and Yukun Zhu, Google Research

Acknowledgements
We would like to thank the support and valuable discussions with Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (co-authors of DeepLab-v1 and -v2), as well as Mark Sandler, Andrew Howard, Menglong Zhu, Chen Sun, Derek Chow, Andre Araujo, Haozhi Qi, Jifeng Dai, and the Google Mobile Vision team.

References
  1. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv: 1802.02611, 2018.
  2. Xception: Deep Learning with Depthwise Separable Convolutions, François Chollet, Proc. of CVPR, 2017.
  3. Deformable Convolutional Networks — COCO Detection and Segmentation Challenge 2017 Entry, Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, and Jifeng Dai, ICCV COCO Challenge Workshop, 2017.
  4. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, Proc. of ICLR, 2015.
  5. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, TPAMI, 2017.
  6. Rethinking Atrous Convolution for Semantic Image Segmentation, Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv:1706.05587, 2017.


* DeepLab-v3+ is not used to power Pixel 2's portrait mode or real time video segmentation. These are mentioned in the post as examples of features this type of technology can enable.

Semantic Image Segmentation with DeepLab in TensorFlow



Semantic image segmentation, the task of assigning a semantic label, such as “road”, “sky”, “person”, “dog”, to every pixel in an image enables numerous new applications, such as the synthetic shallow depth-of-field effect shipped in the portrait mode of the Pixel 2 and Pixel 2 XL smartphones and mobile real-time video segmentation. Assigning these semantic labels requires pinpointing the outline of objects, and thus imposes much stricter localization accuracy requirements than other visual entity recognition tasks such as image-level classification or bounding box-level detection.
Today, we are excited to announce the open source release of our latest and best performing semantic image segmentation model, DeepLab-v3+ [1]*, implemented in TensorFlow. This release includes DeepLab-v3+ models built on top of a powerful convolutional neural network (CNN) backbone architecture [2, 3] for the most accurate results, intended for server-side deployment. As part of this release, we are additionally sharing our TensorFlow model training and evaluation code, as well as models already pre-trained on the Pascal VOC 2012 and Cityscapes benchmark semantic segmentation tasks.

Since the first incarnation of our DeepLab model [4] three years ago, improved CNN feature extractors, better object scale modeling, careful assimilation of contextual information, improved training procedures, and increasingly powerful hardware and software have led to improvements with DeepLab-v2 [5] and DeepLab-v3 [6]. With DeepLab-v3+, we extend DeepLab-v3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further apply the depthwise separable convolution to both atrous spatial pyramid pooling [5, 6] and decoder modules, resulting in a faster and stronger encoder-decoder network for semantic segmentation.
Modern semantic image segmentation systems built on top of convolutional neural networks (CNNs) have reached accuracy levels that were hard to imagine even five years ago, thanks to advances in methods, hardware, and datasets. We hope that publicly sharing our system with the community will make it easier for other groups in academia and industry to reproduce and further improve upon state-of-art systems, train models on new datasets, and envision new applications for this technology.

Acknowledgements
We would like to thank the support and valuable discussions with Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (co-authors of DeepLab-v1 and -v2), as well as Mark Sandler, Andrew Howard, Menglong Zhu, Chen Sun, Derek Chow, Andre Araujo, Haozhi Qi, Jifeng Dai, and the Google Mobile Vision team.

References
  1. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv: 1802.02611, 2018.
  2. Xception: Deep Learning with Depthwise Separable Convolutions, François Chollet, Proc. of CVPR, 2017.
  3. Deformable Convolutional Networks — COCO Detection and Segmentation Challenge 2017 Entry, Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, and Jifeng Dai, ICCV COCO Challenge Workshop, 2017.
  4. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, Proc. of ICLR, 2015.
  5. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, TPAMI, 2017.
  6. Rethinking Atrous Convolution for Semantic Image Segmentation, Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv:1706.05587, 2017.


* DeepLab-v3+ is not used to power Pixel 2's portrait mode or real time video segmentation. These are mentioned in the post as examples of features this type of technology can enable.

Source: Google AI Blog


Semantic Image Segmentation with DeepLab in Tensorflow



Semantic image segmentation, the task of assigning a semantic label, such as “road”, “sky”, “person”, “dog”, to every pixel in an image enables numerous new applications, such as the synthetic shallow depth-of-field effect shipped in the portrait mode of the Pixel 2 and Pixel 2 XL smartphones and mobile real-time video segmentation. Assigning these semantic labels requires pinpointing the outline of objects, and thus imposes much stricter localization accuracy requirements than other visual entity recognition tasks such as image-level classification or bounding box-level detection.
Today, we are excited to announce the open source release of our latest and best performing semantic image segmentation model, DeepLab-v3+ [1], implemented in Tensorflow. This release includes DeepLab-v3+ models built on top of a powerful convolutional neural network (CNN) backbone architecture [2, 3] for the most accurate results, intended for server-side deployment. As part of this release, we are additionally sharing our Tensorflow model training and evaluation code, as well as models already pre-trained on the Pascal VOC 2012 and Cityscapes benchmark semantic segmentation tasks.

Since the first incarnation of our DeepLab model [4] three years ago, improved CNN feature extractors, better object scale modeling, careful assimilation of contextual information, improved training procedures, and increasingly powerful hardware and software have led to improvements with DeepLab-v2 [5] and DeepLab-v3 [6]. With DeepLab-v3+, we extend DeepLab-v3 by adding a simple yet effective decoder module to refine the segmentation results especially along object boundaries. We further apply the depthwise separable convolution to both atrous spatial pyramid pooling [5, 6] and decoder modules, resulting in a faster and stronger encoder-decoder network for semantic segmentation.
Modern semantic image segmentation systems built on top of convolutional neural networks (CNNs) have reached accuracy levels that were hard to imagine even five years ago, thanks to advances in methods, hardware, and datasets. We hope that publicly sharing our system with the community will make it easier for other groups in academia and industry to reproduce and further improve upon state-of-art systems, train models on new datasets, and envision new applications for this technology.

Acknowledgements
We would like to thank the support and valuable discussions with Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (co-authors of DeepLab-v1 and -v2), as well as Mark Sandler, Andrew Howard, Menglong Zhu, Chen Sun, Derek Chow, Andre Araujo, Haozhi Qi, Jifeng Dai, and the Google Mobile Vision team.

References
  1. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation, Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv: 1802.02611, 2018.
  2. Xception: Deep Learning with Depthwise Separable Convolutions, François Chollet, Proc. of CVPR, 2017.
  3. Deformable Convolutional Networks — COCO Detection and Segmentation Challenge 2017 Entry, Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, and Jifeng Dai, ICCV COCO Challenge Workshop, 2017.
  4. Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, Proc. of ICLR, 2015.
  5. Deeplab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, and Alan L. Yuille, TPAMI, 2017.
  6. Rethinking Atrous Convolution for Semantic Image Segmentation, Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam, arXiv:1706.05587, 2017.

Open Sourcing the Hunt for Exoplanets



Recently, we discovered two exoplanets by training a neural network to analyze data from NASA’s Kepler space telescope and accurately identify the most promising planet signals. And while this was only an initial analysis of ~700 stars, we consider this a successful proof-of-concept for using machine learning to discover exoplanets, and more generally another example of using machine learning to make meaningful gains in a variety of scientific disciplines (e.g. healthcare, quantum chemistry, and fusion research).

Today, we’re excited to release our code for processing the Kepler data, training our neural network model, and making predictions about new candidate signals. We hope this release will prove a useful starting point for developing similar models for other NASA missions, like K2 (Kepler’s second mission) and the upcoming Transiting Exoplanet Survey Satellite mission. As well as announcing the release of our code, we’d also like take this opportunity to dig a bit deeper into how our model works.

A Planet Hunting Primer

First, let’s consider how data collected by the Kepler telescope is used to detect the presence of a planet. The plot below is called a light curve, and it shows the brightness of the star (as measured by Kepler’s photometer) over time. When a planet passes in front of the star, it temporarily blocks some of the light, which causes the measured brightness to decrease and then increase again shortly thereafter, causing a “U-shaped” dip in the light curve.
A light curve from the Kepler space telescope with a “U-shaped” dip that indicates a transiting exoplanet.
However, other astronomical and instrumental phenomena can also cause the measured brightness of a star to decrease, including binary star systems, starspots, cosmic ray hits on Kepler’s photometer, and instrumental noise.
The first light curve has a “V-shaped” pattern that tells us that a very large object (i.e. another star) passed in front of the star that Kepler was observing. The second light curve contains two places where the brightness decreases, which indicates a binary system with one bright and one dim star: the larger dip is caused by the dimmer star passing in front of the brighter star, and vice versa. The third light curve is one example of the many other non-planet signals where the measured brightness of a star appears to decrease.
To search for planets in Kepler data, scientists use automated software (e.g. the Kepler data processing pipeline) to detect signals that might be caused by planets, and then manually follow up to decide whether each signal is a planet or a false positive. To avoid being overwhelmed with more signals than they can manage, the scientists apply a cutoff to the automated detections: those with signal-to-noise ratios above a fixed threshold are deemed worthy of follow-up analysis, while all detections below the threshold are discarded. Even with this cutoff, the number of detections is still formidable: to date, over 30,000 detected Kepler signals have been manually examined, and about 2,500 of those have been validated as actual planets!

Perhaps you’re wondering: does the signal-to-noise cutoff cause some real planet signals to be missed? The answer is, yes! However, if astronomers need to manually follow up on every detection, it’s not really worthwhile to lower the threshold, because as the threshold decreases the rate of false positive detections increases rapidly and actual planet detections become increasingly rare. However, there’s a tantalizing incentive: it’s possible that some potentially habitable planets like Earth, which are relatively small and orbit around relatively dim stars, might be hiding just below the traditional detection threshold — there might be hidden gems still undiscovered in the Kepler data!

A Machine Learning Approach

The Google Brain team applies machine learning to a diverse variety of data, from human genomes to sketches to formal mathematical logic. Considering the massive amount of data collected by the Kepler telescope, we wondered what we might find if we used machine learning to analyze some of the previously unexplored Kepler data. To find out, we teamed up with Andrew Vanderburg at UT Austin and developed a neural network to help search the low signal-to-noise detections for planets.
We trained a convolutional neural network (CNN) to predict the probability that a given Kepler signal is caused by a planet. We chose a CNN because they have been very successful in other problems with spatial and/or temporal structure, like audio generation and image classification.
Luckily, we had 30,000 Kepler signals that had already been manually examined and classified by humans. We used a subset of around 15,000 of these signals, of which around 3,500 were verified planets or strong planet candidates, to train our neural network to distinguish planets from false positives. The inputs to our network are two separate views of the same light curve: a wide view that allows the model to examine signals elsewhere on the light curve (e.g., a secondary signal caused by a binary star), and a zoomed-in view that enables the model to closely examine the shape of the detected signal (e.g., to distinguish “U-shaped” signals from “V-shaped” signals).

Once we had trained our model, we investigated the features it learned about light curves to see if they matched with our expectations. One technique we used (originally suggested in this paper) was to systematically occlude small regions of the input light curves to see whether the model’s output changed. Regions that are particularly important to the model’s decision will change the output prediction if they are occluded, but occluding unimportant regions will not have a significant effect. Below is a light curve from a binary star that our model correctly predicts is not a planet. The points highlighted in green are the points that most change the model’s output prediction when occluded, and they correspond exactly to the secondary “dip” indicative of a binary system. When those points are occluded, the model’s output prediction changes from ~0% probability of being a planet to ~40% probability of being a planet. So, those points are part of the reason the model rejects this light curve, but the model uses other evidence as well - for example, zooming in on the centred primary dip shows that it's actually “V-shaped”, which is also indicative of a binary system.

Searching for New Planets

Once we were confident with our model’s predictions, we tested its effectiveness by searching for new planets in a small set 670 stars. We chose these stars because they were already known to have multiple orbiting planets, and we believed that some of these stars might host additional planets that had not yet been detected. Importantly, we allowed our search to include signals that were below the signal-to-noise threshold that astronomers had previously considered. As expected, our neural network rejected most of these signals as spurious detections, but a handful of promising candidates rose to the top, including our two newly discovered planets: Kepler-90 i and Kepler-80 g.

Find your own Planet(s)!

Let’s take a look at how the code released today can help (re-)discover the planet Kepler-90 i. The first step is to train a model by following the instructions on the code’s home page. It takes a while to download and process the data from the Kepler telescope, but once that’s done, it’s relatively fast to train a model and make predictions about new signals. One way to find new signals to show the model is to use an algorithm called Box Least Squares (BLS), which searches for periodic “box shaped” dips in brightness (see below). The BLS algorithm will detect “U-shaped” planet signals, “V-shaped” binary star signals and many other types of false positive signals to show the model. There are various freely available software implementations of the BLS algorithm, including VARTOOLS and LcTools. Alternatively, you can even look for candidate planet transits by eye, like the Planet Hunters.
A low signal-to-noise detection in the light curve of the Kepler 90 star detected by the BLS algorithm. The detection has period 14.44912 days, duration 2.70408 hours (0.11267 days) beginning 2.2 days after 12:00 on 1/1/2009 (the year the Kepler telescope launched).
To run this detected signal though our trained model, we simply execute the following command:
python predict.py  --kepler_id=11442793 --period=14.44912 --t0=2.2
--duration=0.11267 --kepler_data_dir=$HOME/astronet/kepler
--output_image_file=$HOME/astronet/kepler-90i.png
--model_dir=$HOME/astronet/model
The output of the command is prediction = 0.94, which means the model is 94% certain that this signal is a real planet. Of course, this is only a small step in the overall process of discovering and validating an exoplanet: the model’s prediction is not proof one way or the other. The process of validating this signal as a real exoplanet requires significant follow-up work by an expert astronomer — see Sections 6.3 and 6.4 of our paper for the full details. In this particular case, our follow-up analysis validated this signal as a bona fide exoplanet, and it’s now called Kepler-90 i!
Our work here is far from done. We’ve only searched 670 stars out of 200,000 observed by Kepler — who knows what we might find when we turn our technique to the entire dataset. Before we do that, though, we have a few improvements we want to make to our model. As we discussed in our paper, our model is not yet as good at rejecting binary stars and instrumental false positives as some more mature computer heuristics. We’re hard at work improving our model, and now that it’s open sourced, we hope others will do the same!


By Chris Shallue, Senior Software Engineer, Google Brain Team

If you’d like to learn more, Chris is featured on the latest episode of This Week In Machine Learning discussing his work.

Open Sourcing the Hunt for Exoplanets



Recently, we discovered two exoplanets by training a neural network to analyze data from NASA’s Kepler space telescope and accurately identify the most promising planet signals. And while this was only an initial analysis of ~700 stars, we consider this a successful proof-of-concept for using machine learning to discover exoplanets, and more generally another example of using machine learning to make meaningful gains in a variety of scientific disciplines (e.g. healthcare, quantum chemistry, and fusion research).

Today, we’re excited to release our code for processing the Kepler data, training our neural network model, and making predictions about new candidate signals. We hope this release will prove a useful starting point for developing similar models for other NASA missions, like K2 (Kepler’s second mission) and the upcoming Transiting Exoplanet Survey Satellite mission. As well as announcing the release of our code, we’d also like take this opportunity to dig a bit deeper into how our model works.

A Planet Hunting Primer
First, let’s consider how data collected by the Kepler telescope is used to detect the presence of a planet. The plot below is called a light curve, and it shows the brightness of the star (as measured by Kepler’s photometer) over time. When a planet passes in front of the star, it temporarily blocks some of the light, which causes the measured brightness to decrease and then increase again shortly thereafter, causing a “U-shaped” dip in the light curve.
A light curve from the Kepler space telescope with a “U-shaped” dip that indicates a transiting exoplanet.
However, other astronomical and instrumental phenomena can also cause the measured brightness of a star to decrease, including binary star systems, starspots, cosmic ray hits on Kepler’s photometer, and instrumental noise.
The first light curve has a “V-shaped” pattern that tells us that a very large object (i.e. another star) passed in front of the star that Kepler was observing. The second light curve contains two places where the brightness decreases, which indicates a binary system with one bright and one dim star: the larger dip is caused by the dimmer star passing in front of the brighter star, and vice versa. The third light curve is one example of the many other non-planet signals where the measured brightness of a star appears to decrease.
To search for planets in Kepler data, scientists use automated software (e.g. the Kepler data processing pipeline) to detect signals that might be caused by planets, and then manually follow up to decide whether each signal is a planet or a false positive. To avoid being overwhelmed with more signals than they can manage, the scientists apply a cutoff to the automated detections: those with signal-to-noise ratios above a fixed threshold are deemed worthy of follow-up analysis, while all detections below the threshold are discarded. Even with this cutoff, the number of detections is still formidable: to date, over 30,000 detected Kepler signals have been manually examined, and about 2,500 of those have been validated as actual planets!

Perhaps you’re wondering: does the signal-to-noise cutoff cause some real planet signals to be missed? The answer is, yes! However, if astronomers need to manually follow up on every detection, it’s not really worthwhile to lower the threshold, because as the threshold decreases the rate of false positive detections increases rapidly and actual planet detections become increasingly rare. However, there’s a tantalizing incentive: it’s possible that some potentially habitable planets like Earth, which are relatively small and orbit around relatively dim stars, might be hiding just below the traditional detection threshold — there might be hidden gems still undiscovered in the Kepler data!

A Machine Learning Approach
The Google Brain team applies machine learning to a diverse variety of data, from human genomes to sketches to formal mathematical logic. Considering the massive amount of data collected by the Kepler telescope, we wondered what we might find if we used machine learning to analyze some of the previously unexplored Kepler data. To find out, we teamed up with Andrew Vanderburg at UT Austin and developed a neural network to help search the low signal-to-noise detections for planets.
We trained a convolutional neural network (CNN) to predict the probability that a given Kepler signal is caused by a planet. We chose a CNN because they have been very successful in other problems with spatial and/or temporal structure, like audio generation and image classification.
Luckily, we had 30,000 Kepler signals that had already been manually examined and classified by humans. We used a subset of around 15,000 of these signals, of which around 3,500 were verified planets or strong planet candidates, to train our neural network to distinguish planets from false positives. The inputs to our network are two separate views of the same light curve: a wide view that allows the model to examine signals elsewhere on the light curve (e.g., a secondary signal caused by a binary star), and a zoomed-in view that enables the model to closely examine the shape of the detected signal (e.g., to distinguish “U-shaped” signals from “V-shaped” signals).

Once we had trained our model, we investigated the features it learned about light curves to see if they matched with our expectations. One technique we used (originally suggested in this paper) was to systematically occlude small regions of the input light curves to see whether the model’s output changed. Regions that are particularly important to the model’s decision will change the output prediction if they are occluded, but occluding unimportant regions will not have a significant effect. Below is a light curve from a binary star that our model correctly predicts is not a planet. The points highlighted in green are the points that most change the model’s output prediction when occluded, and they correspond exactly to the secondary “dip” indicative of a binary system. When those points are occluded, the model’s output prediction changes from ~0% probability of being a planet to ~40% probability of being a planet. So, those points are part of the reason the model rejects this light curve, but the model uses other evidence as well - for example, zooming in on the centred primary dip shows that it's actually “V-shaped”, which is also indicative of a binary system.
Searching for New Planets
Once we were confident with our model’s predictions, we tested its effectiveness by searching for new planets in a small set 670 stars. We chose these stars because they were already known to have multiple orbiting planets, and we believed that some of these stars might host additional planets that had not yet been detected. Importantly, we allowed our search to include signals that were below the signal-to-noise threshold that astronomers had previously considered. As expected, our neural network rejected most of these signals as spurious detections, but a handful of promising candidates rose to the top, including our two newly discovered planets: Kepler-90 i and Kepler-80 g.

Find your own Planet(s)!
Let’s take a look at how the code released today can help (re-)discover the planet Kepler-90 i. The first step is to train a model by following the instructions on the code’s home page. It takes a while to download and process the data from the Kepler telescope, but once that’s done, it’s relatively fast to train a model and make predictions about new signals. One way to find new signals to show the model is to use an algorithm called Box Least Squares (BLS), which searches for periodic “box shaped” dips in brightness (see below). The BLS algorithm will detect “U-shaped” planet signals, “V-shaped” binary star signals and many other types of false positive signals to show the model. There are various freely available software implementations of the BLS algorithm, including VARTOOLS and LcTools. Alternatively, you can even look for candidate planet transits by eye, like the Planet Hunters.
A low signal-to-noise detection in the light curve of the Kepler 90 star detected by the BLS algorithm. The detection has period 14.44912 days, duration 2.70408 hours (0.11267 days) beginning 2.2 days after 12:00 on 1/1/2009 (the year the Kepler telescope launched).
To run this detected signal though our trained model, we simply execute the following command:
python predict.py  --kepler_id=11442793 --period=14.44912 --t0=2.2
--duration=0.11267 --kepler_data_dir=$HOME/astronet/kepler
--output_image_file=$HOME/astronet/kepler-90i.png
--model_dir=$HOME/astronet/model
The output of the command is prediction = 0.94, which means the model is 94% certain that this signal is a real planet. Of course, this is only a small step in the overall process of discovering and validating an exoplanet: the model’s prediction is not proof one way or the other. The process of validating this signal as a real exoplanet requires significant follow-up work by an expert astronomer — see Sections 6.3 and 6.4 of our paper for the full details. In this particular case, our follow-up analysis validated this signal as a bona fide exoplanet, and it’s now called Kepler-90 i!
Our work here is far from done. We’ve only searched 670 stars out of 200,000 observed by Kepler — who knows what we might find when we turn our technique to the entire dataset. Before we do that, though, we have a few improvements we want to make to our model. As we discussed in our paper, our model is not yet as good at rejecting binary stars and instrumental false positives as some more mature computer heuristics. We’re hard at work improving our model, and now that it’s open sourced, we hope others will do the same!

If you’d like to learn more, Chris is featured on the latest episode of This Week In Machine Learning & AI discussing his work.

Machine Learning Crash Course

Posted by Barry Rosenberg, Google Engineering Education Team

Today, we're happy to share our Machine Learning Crash Course (MLCC) with the world. MLCC is one of the most popular courses created for Google engineers. Our engineering education team has delivered this course to more than 18,000 Googlers, and now you can take it too! The course develops intuition around fundamental machine learning concepts.

What does the course cover?

MLCC covers many machine learning fundamentals, starting with loss and gradient descent, then building through classification models and neural nets. The programming exercises introduce TensorFlow. You'll watch brief videos from Google machine learning experts, read short text lessons, and play with educational gadgets devised by instructional designers and engineers.

How much does it cost?

MLCC is free.

I don't get it. Why are you offering MLCC to everyone?

We believe that the potential of machine learning is so vast that every technical person should learn machine learning fundamentals. We're offering the course in English, Spanish, Korean, Mandarin, and French.

Does the real world make an appearance in the course?

Yes, MLCC ends with short lessons on designing real-world machine learning systems. MLCC also contains sections enabling you to learn from the mistakes that our experts have made.

Do I have enough mathematical background to understand MLCC?

Understanding a little algebra and a little elementary statistics (mean and standard deviation) is helpful. If you understand calculus, you'll get a bit more out of the course, but calculus is not a requirement. MLCC contains a helpful section to refresh your memory on the background math.

Is this a programming course?

MLCC contains some Python programming exercises. However, those exercises comprise only a small percentage of the course, which non-programmers may safely skip.

I'm new to Python. Will the programming exercises be too hard for me?

Many of the Google engineers who took MLCC didn't know any Python but still completed the exercises. That's because you'll write only a few lines of code during the programming exercises. Instead of writing code from scratch, you'll primarily manipulate the values of existing variables. That said, the code will be easier to understand if you can program in Python.

But how will I learn machine learning concepts without programming?

MLCC relies on a variety of media and hands-on interactive tools to build intuition in fundamental machine learning concepts. You need a technical mind, but you don't need programming skills.

How can I show off my machine learning skills?

As your knowledge about Machine Learning grows, you can test your skill by helping others. We're also kicking off a Kaggle competition to help DonorsChoose.org. DonorsChoose.org is an organization that empowers public school teachers from across the country to request materials and experiences they need to help their students grow. Teachers submit hundreds of thousands of project proposals each year; 500,000 proposals are expected in 2018.

Currently, DonorsChoose.org relies on a large number of volunteers to screen the proposals. The Kaggle competition hopes to help DonorsChoose.org use ML to accelerate the screening process, which will enable volunteers to make better use of their time. In addition, this work should help increase the consistency of decisions about projects.

Is MLCC Google's only machine learning educational project?

MLCC is merely one of many ways to learn about machine learning. To explore the universe of machine learning educational opportunities from Google, see our new Learn with Google AI program at g.co/learnwithgoogleai. To start on MLCC, see g.co/machinelearningcrashcourse.