Tag Archives: machine learning

Understanding Transfer Learning for Medical Imaging



As deep neural networks are applied to an increasingly diverse set of domains, transfer learning has emerged as a highly popular technique in developing deep learning models. In transfer learning, the neural network is trained in two stages: 1) pretraining, where the network is generally trained on a large-scale benchmark dataset representing a wide diversity of labels/categories (e.g., ImageNet); and 2) fine-tuning, where the pretrained network is further trained on the specific target task of interest, which may have fewer labeled examples than the pretraining dataset. The pretraining step helps the network learn general features that can be reused on the target task.

This kind of two-stage paradigm has become extremely popular in many settings, and particularly so in medical imaging. In the context of transfer learning, standard architectures designed for ImageNet with corresponding pretrained weights are fine-tuned on medical tasks ranging from interpreting chest x-rays and identifying eye diseases, to early detection of Alzheimer’s disease. Despite its widespread use, however, the precise effects of transfer learning are not yet well understood. While recent work challenges many common assumptions, including the effects on performance improvement, contribution of the underlying architecture and impact of pretraining dataset type and size, these results are all in the natural image setting, and leave many questions open for specialized domains, such as medical images.

In our NeurIPS 2019 paper, “Transfusion: Understanding Transfer Learning for Medical Imaging,” we investigate these central questions for transfer learning in medical imaging tasks. Through both a detailed performance evaluation and analysis of neural network hidden representations, we uncover many surprising conclusions, such as the limited benefits of transfer learning for performance on the tested medical imaging tasks, a detailed characterization of how representations evolve through the training process across different models and hidden layers, and feature independent benefits of transfer learning for convergence speed.

Performance Evaluation
We first performed a thorough study on the effect of transfer learning on model performance. We compared models trained from random initialization and applied directly on tasks to those pretrained on ImageNet that leverage transfer learning for the same tasks. We looked at two large scale medical imaging tasks — diagnosing diabetic retinopathy from fundus photographs and identifying five different diseases from chest x-rays. We evaluated various neural network architectures including both standard architectures popularly used for medical imaging (ResNet50, Inception-v3) as well as a family of simple, lightweight convolutional neural networks that consist of four or five layers of the standard convolution-batchnorm-ReLU progression, or CBRs.

The results from evaluating all of these models on the different tasks with and without transfer learning give us four main takeaways:
  • Surprisingly, transfer learning does not significantly affect performance on medical imaging tasks, with models trained from scratch performing nearly as well as standard ImageNet transferred models.
  • On the medical imaging tasks, the much smaller CBR models perform at a level comparable to the standard ImageNet architectures.
  • As the CBR models are much smaller and shallower than the standard ImageNet models, they perform much worse on ImageNet classification, highlighting that ImageNet performance is not indicative of performance on medical tasks.
  • The two medical tasks are much smaller in size than ImageNet (~200k vs ~1.2m training images), but in the very small data regime, there may only be a few thousand training examples. We evaluated transfer learning in this very small data regime, finding that while there was a larger gap in performance between transfer and training from scratch for large models (ResNet) this was not true for smaller models (CBRs), suggesting that the large models designed for ImageNet might be too overparameterized for the very small data regime.
Representation Analysis
We next study the degree to which transfer learning affects the kinds of features and representations learned by the neural networks. Given the similar performance, does transfer learning result in different representations from random initialization? Is knowledge from the pretraining step reused, and if so, where? To find answers to these questions, this study analyzes and compares the hidden representations (i.e., representations learned in the latent layers of the network) in the different neural networks trained to solve these tasks. This quantitative analysis can be challenging, due to the complexity and lack of alignment in different hidden layers. But a recent method, singular vector canonical correlation analysis (SVCCA; code and tutorials), based on canonical correlation analysis (CCA), helps overcome these challenges, and can be used to calculate a similarity score between a pair of hidden representations.

Similarity scores are computed for some of the hidden representations from the top latent layers of the networks (closer to the output) between networks trained from random initialization and networks trained from pretrained ImageNet weights. As a baseline, we also compute similarity scores of representations learned from different random initializations. For large models, representations learned from random initialization are much more similar to each other than those learned from transfer learning. For smaller models, there is greater overlap between representation similarity scores.
Representation similarity scores between networks trained from random initialization and networks trained from pretrained ImageNet weights (orange), and baseline similarity scores of representations trained from two different random initializations (blue). Higher values indicate greater similarity. For larger models, representations learned from random initialization are much more similar to each other than those learned through transfer. This is not the case for smaller models.
The reason for this difference between large and small models becomes clear with further investigation into the hidden representations. Large models change less through training, even from random initialization. We perform multiple experiments that illustrate this, from simple filter visualizations to tracking changes between different layers through fine-tuning.

When we combine the results of all the experiments from the paper, we can assemble a table summarizing how much representations change through training on the medical task across (i) transfer learning, (ii) model size and (iii) lower/higher layers.
Effects on Convergence: Feature Independent Benefits and Hybrid Approaches
One consistent effect of transfer learning was a significant speedup in the time taken for the model to converge. But having seen the mixed results for feature reuse from our representational study, we looked into whether there were other properties of the pretrained weights that might contribute to this speedup. Surprisingly, we found a feature independent benefit of pretraining — the weight scaling.

We initialized the weights of the neural network as independent and identically distributed (iid), just like random initialization, but using the mean and variance of the pretrained weights. We called this initialization the Mean Var Init, which keeps the pretrained weight scaling but destroys all the features. This Mean Var Init offered significant speedups over random initialization across model architectures and tasks, suggesting that the pretraining process of transfer learning also helps with good weight conditioning.
Filter visualization of weights initialized according to pretrained ImageNet weights, Random Init, and Mean Var Init. Only the ImageNet Init filters have pretrained (Gabor-like) structure, as Rand Init and Mean Var weights are iid.
Recall that our earlier experiments suggested that feature reuse primarily occurs in the lowest layers. To understand this, we performed weight transfusion experiments, where only a subset of the pretrained weights (corresponding to a contiguous set of layers) are transferred, with the remainder of weights being randomly initialized. Comparing convergence speeds of these transfused networks with full transfer learning further supports the conclusion that feature reuse is primarily happening in the lowest layers.
Learning curves comparing the convergence speed with AUC on the test set. Using only the scaling of the pretrained weights (Mean Var Init) helps with convergence speed. The figures compare the standard transfer learning and the Mean Var initialization scheme to training from random initialization.
This suggests hybrid approaches to transfer learning, where instead of reusing the full neural network architecture, we can recycle its lowest layers and redesign the upper layers to better suit the target task. This gives us most of the benefits of transfer learning while further enabling flexible model design. In the Figure below, we show the effect of reusing pretrained weights up to Block2 in Resnet50, halving the remainder of the channels, initializing those layers randomly, and then training end-to-end. This matches the performance and convergence of full transfer learning.
Hybrid approaches to transfer learning on Resnet50 (left) and CBR models (right) — reusing a subset of the weights and slimming the remainder of the network (Slim), and using mathematically synthesized Gabors for conv1 (Synthetic Gabor).
The figure above also shows the results of an extreme version of this partial reuse, transferring only the very first convolutional layer with mathematically synthesized Gabor filters (pictured below). Using just these (synthetic) weights offers significant speedups, and hints at many other creative hybrid approaches.
Synthetic Gabor filters used to initialize the first layer if neural networks in some of the experiments in this paper. The Gabor filters are generated as grayscale images and repeated across the RGB channels. Left: Low frequencies. Right: High frequencies.
Conclusion and Open Questions
Transfer learning is a central technique for many domains. In this paper we provide insights on some of its fundamental properties in the medical imaging context, studying performance, feature reuse, the effect of different architectures, convergence and hybrid approaches. Many interesting open questions remain: How much of the original task has the model forgotten? Why do large models change less? Can we get further gains matching higher order moments of pretrained weight statistics? Are the results similar for other tasks, such as segmentation? We look forward to tackling these questions in future work!

Acknowledgements
Special thanks to Samy Bengio and Jon Kleinberg, who are co-authors on this work. Thanks also to Geoffrey Hinton for helpful feedback.

Source: Google AI Blog


Developing Deep Learning Models for Chest X-rays with Adjudicated Image Labels



With millions of diagnostic examinations performed annually, chest X-rays are an important and accessible clinical imaging tool for the detection of many diseases. However, their usefulness can be limited by challenges in interpretation, which requires rapid and thorough evaluation of a two-dimensional image depicting complex, three-dimensional organs and disease processes. Indeed, early-stage lung cancers or pneumothoraces (collapsed lungs) can be missed on chest X-rays, leading to serious adverse outcomes for patients.

Advances in machine learning (ML) present an exciting opportunity to create new tools to help experts interpret medical images. Recent efforts have shown promise in improving lung cancer detection in radiology, prostate cancer grading in pathology, and differential diagnoses in dermatology. For chest X-ray images in particular, large, de-identified public image sets are available to researchers across disciplines, and have facilitated several valuable efforts to develop deep learning models for X-ray interpretation. However, obtaining accurate clinical labels for the very large image sets needed for deep learning can be difficult. Most efforts have either applied rule-based natural language processing (NLP) to radiology reports or relied on image review by individual readers, both of which may introduce inconsistencies or errors that can be especially problematic during model evaluation. Another challenge involves assembling datasets that represent an adequately diverse spectrum of cases (i.e., ensuring inclusion of both “hard” cases and “easy” cases that represent the full spectrum of disease presentation). Finally, some chest X-ray findings are non-specific and depend on clinical information about the patient to fully understand their significance. As such, establishing labels that are clinically meaningful and have consistent definitions can be a challenging component of developing machine learning models that use only the image as input. Without standardized and clinically meaningful datasets as well as rigorous reference standard methods, successful application of ML to interpretation of chest X-rays will be hindered.

To help address these issues, we recently published “Chest Radiograph Interpretation with Deep Learning Models: Assessment with Radiologist-adjudicated Reference Standards and Population-adjusted Evaluation” in the journal Radiology. In this study we developed deep learning models to classify four clinically important findings on chest X-rays — pneumothorax, nodules and masses, fractures, and airspace opacities. These target findings were selected in consultation with radiologists and clinical colleagues, so as to focus on conditions that are both critical for patient care and for which chest X-ray images alone are an important and accessible first-line imaging study. Selection of these findings also allowed model evaluation using only de-identified images without additional clinical data.

Models were evaluated using thousands of held-out images from each dataset for which we collected high-quality labels using a panel-based adjudication process among board-certified radiologists. Four separate radiologists also independently reviewed the held-out images in order to compare radiologist accuracy to that of the deep learning models (using the panel-based image labels as the reference standard). For all four findings and across both datasets, the deep learning models demonstrated radiologist-level performance. We are sharing the adjudicated labels for the publicly available data here to facilitate additional research.

Data Overview
This work leveraged over 600,000 images sourced from two de-identified datasets. The first dataset was developed in collaboration with co-authors at the Apollo Hospitals, and consists of a diverse set of chest X-rays obtained over several years from multiple locations across the Apollo Hospitals network. The second dataset is the publicly available ChestX-ray14 image set released by the National Institutes of Health (NIH). This second dataset has served as an important resource for many machine learning efforts, yet has limitations stemming from issues with the accuracy and clinical interpretation of the currently available labels.
Chest X-ray depicting an upper left lobe pneumothorax identified by the model and the adjudication panel, but missed by the individual radiologist readers. Left: The original image. Right: The same image with the most important regions for the model prediction highlighted in orange.
Training Set Labels Using Deep Learning and Visual Image Review
For very large datasets consisting of hundreds of thousands of images, such as those needed to train highly accurate deep learning models, it is impractical to manually assign image labels. As such, we developed a separate, text-based deep learning model to extract image labels using the de-identified radiology reports associated with each X-ray. This NLP model was then applied to provide labels for over 560,000 images from the Apollo Hospitals dataset used for training the computer vision models.

To reduce noise from any errors introduced by the text-based label extraction and also to provide the relevant labels for a substantial number of the ChestX-ray14 images, approximately 37,000 images across the two datasets were visually reviewed by radiologists. These were separate from the NLP-based labels and helped to ensure high quality labels across such a large, diverse set of training images.

Creating and Sharing Improved Reference Standard Labels
To generate high-quality reference standard labels for model evaluation, we utilized a panel-based adjudication process, whereby three radiologists reviewed all final tune and test set images and resolved disagreements through discussion. This often allowed difficult findings that were initially only detected by a single radiologist to be identified and documented appropriately. To reduce the risk of bias based on any individual radiologist’s personality or seniority, the discussions took place anonymously via an online discussion and adjudication system.

Because the lack of available adjudicated labels was a significant initial barrier to our work, we are sharing with the research community all of the adjudicated labels for the publicly available ChestX-ray14 dataset, including 2,412 training/validation set images and 1,962 test set images (4,374 images in total). We hope that these labels will facilitate future machine learning efforts and enable better apples-to-apples comparisons between machine learning models for chest X-ray interpretation.

Future Outlook
This work presents several contributions: (1) releasing adjudicated labels for images from a publicly available dataset; (2) a method to scale accurate labeling of training data using a text-based deep learning model; (3) evaluation using a diverse set of images with expert-adjudicated reference standard labels; and ultimately (4) radiologist-level performance of deep learning models for clinically important findings on chest X-rays.

However, in regards to model performance, achieving expert-level accuracy on average is just a part of the story. Even though overall accuracy for the deep learning models was consistently similar to that of radiologists for any given finding, performance for both varied across datasets. For example, the sensitivity for detecting pneumothorax among radiologists was approximately 79% for the ChestX-ray14 images, but was only 52% for the same radiologists on the other dataset, suggesting a more difficult collection cases in the latter. This highlights the importance of validating deep learning tools on multiple, diverse datasets and eventually across the patient populations and clinical settings in which any model is intended to be used.

The performance differences between datasets also emphasize the need for standardized evaluation image sets with accurate reference standards in order to allow comparison across studies. For example, if two different models for the same finding were evaluated using different datasets, comparing performance would be of minimal value without knowing additional details such as the case mix, model error modes, or radiologist performance on the same cases.

Finally, the model often identified findings that were consistently missed by radiologists, and vice versa. As such, strategies that combine the unique “skills” of both the deep learning systems and human experts are likely to hold the most promise for realizing the potential of AI applications in medical image interpretation.

Acknowledgements
Key contributors to this project at Google include Sid Mittal, Gavin Duggan, Anna Majkowska, Scott McKinney, Andrew Sellergren, David Steiner, Krish Eswaran, Po-Hsuan Cameron Chen, Yun Liu, Shravya Shetty, and Daniel Tse. Significant contributions and input were also made by radiologist collaborators Joshua Reicher, Alexander Ding, and Sreenivasa Raju Kalidindi. The authors would also like to acknowledge many members of the Google Health radiology team including Jonny Wong, Diego Ardila, Zvika Ben-Haim, Rory Sayres, Shahar Jamshy, Shabir Adeel, Mikhail Fomitchev, Akinori Mitani, Quang Duong, William Chen and Sahar Kazemzadeh. Sincere appreciation also goes to the many radiologists who enabled this work through their expert image interpretation efforts throughout the project.

Source: Google AI Blog


#AndroidDevChallenge: today is the last day to apply!

Dev Challenge banner with Android logo

Today is the last day to apply for the Android Developer Challenge! And to spark your imagination, we wanted to take a look at one of the original Android Developer Challenge winners, from over 10 years ago. Meet Maurizio Leo:

Maurizio and team have been working on Android for a while now. In fact, he was one of the winners of the original Android Developer Challenge, which launched with the start of Android over ten years ago. Their app, which won 3rd place worldwide at the time, has gone on to be downloaded over 30 million times!

If you’ve got a great idea that can help users get things done, we want to hear! We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition, in part from this amazing set of experts we’ve assembled. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. You can read more about all of the prizes here.

There’s still time to submit your idea before the deadline today! Submitting your idea is as simple as creating a repository on GitHub, telling us what you’d build and how we can help (we’ve included all of the materials here), and then officially submitting your repository here. Ideas can be in a concept phase to something that’s already complete; we can’t wait to hear what you come up with, and to work with you on bringing helpful innovation powered by machine learning to more and more users!

Updates from Coral: Mendel Linux 4.0 and much more!

Posted by Carlos Mendonça (Product Manager), Coral TeamIllustration of the Coral Dev Board placed next to Fall foliage

Last month, we announced that Coral graduated out of beta, into a wider, global release. Today, we're announcing the next version of Mendel Linux (4.0 release Day) for the Coral Dev Board and SoM, as well as a number of other exciting updates.

We have made significant updates to improve performance and stability. Mendel Linux 4.0 release Day is based on Debian 10 Buster and includes upgraded GStreamer pipelines and support for Python 3.7, OpenCV, and OpenCL. The Linux kernel has also been updated to version 4.14 and U-Boot to version 2017.03.3.

We’ve also made it possible to use the Dev Board's GPU to convert YUV to RGB pixel data at up to 130 frames per second on 1080p resolution, which is one to two orders of magnitude faster than on Mendel Linux 3.0 release Chef. These changes make it possible to run inferences with YUV-producing sources such as cameras and hardware video decoders.

To upgrade your Dev Board or SoM, follow our guide to flash a new system image.

MediaPipe on Coral

MediaPipe is an open-source, cross-platform framework for building multi-modal machine learning perception pipelines that can process streaming data like video and audio. For example, you can use MediaPipe to run on-device machine learning models and process video from a camera to detect, track and visualize hand landmarks in real-time.

Developers and researchers can prototype their real-time perception use cases starting with the creation of the MediaPipe graph on desktop. Then they can quickly convert and deploy that same graph to the Coral Dev Board, where the quantized TensorFlow Lite model will be accelerated by the Edge TPU.

As part of this first release, MediaPipe is making available new experimental samples for both object and face detection, with support for the Coral Dev Board and SoM. The source code and instructions for compiling and running each sample are available on GitHub and on the MediaPipe documentation site.

New Teachable Sorter project tutorial

New Teachable Sorter project tutorial

A new Teachable Sorter tutorial is now available. The Teachable Sorter is a physical sorting machine that combines the Coral USB Accelerator's ability to perform very low latency inference with an ML model that can be trained to rapidly recognize and sort different objects as they fall through the air. It leverages Google’s new Teachable Machine 2.0, a web application that makes it easy for anyone to quickly train a model in a fun, hands-on way.

The tutorial walks through how to build the free-fall sorter, which separates marshmallows from cereal and can be trained using Teachable Machine.

Coral is now on TensorFlow Hub

Earlier this month, the TensorFlow team announced a new version of TensorFlow Hub, a central repository of pre-trained models. With this update, the interface has been improved with a fresh landing page and search experience. Pre-trained Coral models compiled for the Edge TPU continue to be available on our Coral site, but a select few are also now available from the TensorFlow Hub. On the site, you can find models featuring an Overlay interface, allowing you to test the model's performance against a custom set of images right from the browser. Check out the experience for MobileNet v1 and MobileNet v2.

We are excited to share all that Coral has to offer as we continue to evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the new Coral partnerships page. We hope you’ll use the new features offered on Coral.ai as a resource and encourage you to keep sending us feedback at coral-support@google.com.

Our panel of experts for the #AndroidDevChallenge (apply by Dec. 2)

Just a little over a week left to finish your submission for the Android Developer Challenge, due December 2! Technology is enabling us to create a whole new era of helpful innovation by helping people get things done more quickly and surfacing patterns that would be difficult to detect using traditional methods. Ultimately, this helpful innovation is enabling us to live better, more productive, and safer lives.

Earlier this week, we highlighted the type of helpful innovation ideas powered by machine learning which are the sort of examples we’re looking for, to help inspire you. Today, we wanted to share the names of the panel of experts we’ve assembled to help bring your projects to life as part of the Android Developer Challenge. These experts will be making the final decision on the 10 finalists of the Android Developer Challenge, and if you’re selected as one of those finalists, we plan to have you meet them when we bring you to Google HQ for a bootcamp next year:

  • Dave Burke is Vice President of Engineering at Google where he leads engineering for the Android platform. Android is the largest mobile platform and ecosystem in the world, with over 2 billion active devices spanning smartphones, tablets, wearables, auto, TV, and IOT. Dave joined Google UK in 2007, becoming an engineering site lead and later moving to California in 2011. Prior to Google, Dave co-founded and was CTO of an internet/telecoms voice startup and helped define related Web and Internet standards.
  • Stephanie Cuthbertson is Senior Director of Developer PM, DevRel and UX for Android. She previously worked on Google’s Search & Ads businesses, as well as a range of developer tools used by Google employees internally. Prior to Google, she was at AWS where she led the product management team for Storage, including Amazon S3. Before AWS, she spent 10 years working on Visual Studio and developer tools.
  • Brahim Elbouchikhi is a Director of Product Management on the Android team. On Android, Brahim is responsible for developer and consumer facing ML and Camera products including CameraX and ML Kit. Prior to Android, Brahim led Daydream’s software team. Brahim was also a founding PM of the Google Play store where he led monetization, search, and discovery.
  • Yossi Matias is Vice President, Engineering, at Google. He is leading efforts in Search (Google Autocomplete, Search Live Results, Google Trends), Conversational AI (Google Duplex, Call Screen, Live Caption, Live Relay, Recorder, Pronunciation), and other Research initiatives. Yossi is the founding Head of Google's R&D Center in Israel, and the founding executive lead of Google for Startup Campus Tel Aviv and of Launchpad. He is the lead of Crisis Response and co-lead of Google’s AI for Social Good. In addition to his experience as an executive and entrepreneur, Yossi has a rich record of scientific research, published extensively, and has dozens of patents on his name. Yossi is a recipient of the Godel Prize and is an ACM Fellow.
  • Sarah Sirajuddin is an engineering director working on TensorFlow at Google. She leads the teams working on on-device machine learning, TensorFlow Extended, and efforts around training models for the best accuracy and performance with Google’s cutting-edge infrastructure, including TensorFlow and tensor processing units (TPUs).

If you’ve got a great idea that can help users get things done, we want to hear! We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition, in part from this amazing set of experts we’ve assembled. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. You can read more about all of the prizes here.

There’s still time to submit your idea before the December 2 deadline. Submitting your idea is as simple as creating a repository on GitHub, telling us what you’d build and how we can help (we’ve included all of the materials here), and then officially submitting your repository here. Ideas can be in a concept phase to something that’s already complete; we can’t wait to hear what you come up with, and to work with you on bringing helpful innovation powered by machine learning to more and more users!

Android Developer Challenge: here’s what we’re looking for! (Apply by Dec. 2)

Last month, we kicked off the next Android Developer Challenge, and asked you to submit your ideas focused on helpful innovation, powered by on-device machine learning. But what exactly do we mean when we say helpful innovation? We’re glad you asked! We rounded up a few of Google’s on-device machine learning offerings, together with some great recent examples of this technology in action, to help inspire your submission. Don’t forget, submit your idea by December 2!

Using machine learning to tackle Fall Armyworm

Take Nazirini Siraji. When she and a team of developers noticed a crop-pest threatening the livelihood of Ugandan farmers, they taught themselves TensorFlow to combat this pest. They collected training data from nearby fields in the form of images. With TensorFlow, they re-trained a MobileNet, a technique known as transfer learning and then used the TensorFlow Converter to generate a TensorFlow Lite FlatBuffer file which they deployed in an Android app. With the app, a farmer can snap a picture of their crop and the image frame is analysed to look for Fall armyworm damage. Depending on the results from this phase, a suggestion of a possible solution is given. It’s pretty cool!

Helping doctors detect respiratory diseases using machine learning

Tambua Health is helping doctors determine the likelihood of respiratory diseases by turning any smartphone into a powerful non-invasive screening tool. They developed an app using TensorFlow Lite that can help doctors analyze lung sounds for the presence of abnormal sounds like wheezes, crackles, stridor, and other adventitious sounds.

adidas uses machine learning to make the shopping experience easier

Even brands are tapping the power of machine learning. Take adidas, who recently launched a new “Bring It to Me” experience for their London store. Shoppers can use Visual Lookup to scan products on their phones while they are in the store, and the app lets them check stock and request their size without the need for queues. Under the hood, ML Kit is helping power the experience. It’s another way machine learning is helping users get things done more quickly.

The benefits of on-device machine learning

Running machine learning on a user’s device comes with a number of benefits. First, you reduce the amount of data you send to your server, enhancing user privacy. And because it runs on device, it can also work offline - perfect for inaccessible areas such as the middle of a rainforest, a desert or the London Underground. Last but not least, the most exciting aspect of running your model on device is low latency and this can enable all kinds of new user experiences. Machine learning is not just for automating tasks, it can work alongside your users and give them super powers too!

At Google, we offer a number of different technologies to help you take advantage of this:

  • ML Kit offers a turnkey SDK to help you tackle tasks with powerful Google Machine Learning models
  • The TensorFlow Lite Framework lets you take a custom model and optimise it to run it on Android
  • There’s also the infrastructure of Firebase / Google Cloud, which can help you train on-device models using AutoML Vision Edge for specific model types or give you the raw processing power to train your own model

If you’ve got a great idea that can help users get things done, we want to hear from you! We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. You can read more about all of the prizes here.

There’s still time to submit your idea before the December 2 deadline. We can’t wait to hear what you come up with, and to work with you on bringing helpful innovation powered by on-device machine learning to more and more users!

Accelerating Japan’s AI startups in our new Tokyo Campus

Posted by Takuo Suzuki

Japan is well known as an epicenter of innovation and technology, and its startup ecosystem is no different. We’ve seen this first hand from our work with startups such as Cinnamon-- who uses artificial intelligence to remove repetitive tasks from office workers daily function, allowing more work to get done by fewer people, faster.

This is why we are pleased to announce our second accelerator program, housed at the new Google for Startups Campus in the heart of Tokyo. Accelerated with Google in JapanThe Google for Startups Accelerator (previously Launchpad Accelerator) is an intensive three-month program for high potential, AI-focused startups, utilizing the proven Launchpad foundational components and content.

Founders who successfully apply for the accelerator will have the opportunity to work on the technical problems facing their startup alongside relevant experts from Google and the industry. They will receive mentorship on these challenges, support on machine learning best practices, as well as connections to relevant teams from across Google to help grow their business.

In addition to mentorship and technical project support, the accelerator also includes deep dives and workshops focused on product design, customer acquisition, and leadership development for founders.

“We hope that by providing these founders with the tools, mentorship, and connections to prepare for the next step in their journey it will, in turn, contribute to a stronger Japanese economy.” says Takuo Suzuki, Google Developers Regional Lead for Japan. “We are excited to work with such passionate startups in a new Google for Startups Campus, an environment built to foster startup growth, and meet our next cohort in 2020”

The program will run from February-May 2020 and applications are now open until 13th December 2019.

Using machine learning to tackle Fall Armyworm

Guest post by Nsubuga Hassan, CEO at Hansu Mobile and Intelligent Innovations, Android and Machine Learning DeveloperNazirini doing work on laptopIn 2016 Fall armyworm (FAW) was first reported in Africa. It has devastated maize crops across the continent.

Research shows the potential impact of FAW on continental wide maize yield lies between 8.3 and 20.6 million tonnes per year (total expected production of 39m tonnes per year); with losses lying between US$2,48m and US$6,19m per year (of a US$11,59m annual expected value). The impact of FAW is far reaching, and is now reported in many countries around the world.

Agriculture is the backbone of Uganda’s economy, employing 70% of the population. It contributes to half of Uganda’s export earnings and a quarter of the country’s gross domestic product (GDP). Fall armyworm posses a great threat on our livelihoods. Two people having a conversationWe are a small group of like minded developers living and working in Uganda. Most of our relatives grow maize so the impact of the worm was very close to home. We really felt like we needed to do something about it. Fall Armyworm that threatens cropsThe vast damage and yield losses in maize production, due to FAW, got the attention of global organizations, who are calling for innovators to help. It is the perfect time to apply machine learning. Our goal is to build an intelligent agent to help local farmers fight this pest in order to increase our food security.

Based on a Machine Learning Crash Course, our Google Developer Group (GDG) in Mbale hosted some study jams in May 2018, alongside several other code labs. This is where we first got hands-on experience using TensorFlow, from which the foundations were laid for the Farmers Companion app. Finally, we felt as though an intelligent solution to help farmers had been conceived.

Equipped with this knowledge & belief, the team embarked on collecting training data from nearby fields. This was done using a smartphone to take images, with the help of some GDG Mbale members. With farmers miles from town, and many fields inaccessible by road (not to mention the floods), this was not as simple as we had first hoped. To inhibit us further, our smartphones were (and still are) the only hard drives we had, thus decreasing the number of images & data we can capture in a day.

But we persisted! Once gathered, the images were sorted, one at a time, and categorized. With TensorFlow we re-trained a MobileNet, a technique known as transfer learning. We then used the TensorFlow Converter to generate a TensorFlow Lite FlatButter file which we deployed in an Android app. Demonstration of the Android app identifying the Fall armywormWe started with about 3956 images, but our dataset is growing exponentially. We are actively collecting more and more data to improve our model’s accuracy. The improvements in TensorFlow, with Keras high level APIs, has really made our approach to deep learning easy and enjoyable and we are now experimenting with TensorFlow 2.0.

The app is simple for the user. Once installed, the user focuses the camera through the app, on a maize crop. Then an image frame is picked and, using TensorFlow Lite, the image frame is analysed to look for Fall armyworm damage. Depending on the results from this phase, a suggestion of a possible solution is given.

The app is available for download and it is constantly undergoing updates, as we push for local farmers to adapt and use it. We strive to ensure a world with #ZeroHunger and believe technology can do a lot to help us achieve this.

We have so far been featured on a national TV station in Uganda, participated in the #hackAgainstHunger and ‘The International Symposium on Agricultural Innovations’ for family farmers, organized by the Food Agricultural Organization of the United Nations, where our solution was highlighted.

More recently, Google highlighted our work with this film:

We have embarked on scaling the solution to coffee disease and cassava diseases and will slowly be moving on to more. We have also introduced virtual reality to help farmers showcase good farming practices and various training.

Our plan is to collect more data and to scale the solution to handle more pests and diseases. We are also shifting to cloud services and Firebase to improve and serve our model better despite the lack of resources. With improved hardware and greater localised understanding, there's huge scope for Machine Learning to make a difference in the fight against hunger.

Learning to Assemble and to Generalize from Self-Supervised Disassembly



Our physical world is full of different shapes, and learning how they are all interconnected is a natural part of interacting with our surroundings — for example, we understand that coat hangers hook onto clothing racks, power plugs insert into wall outlets, and USB cables fit into USB sockets. This general concept of “how things fit together'' based on their shapes is something that people acquire over time and experience, and it helps to increase the efficiency with which we perform tasks, like assembling DIY furniture kits or packing gifts into a box. If robots could learn “how things fit together,” then perhaps they could become more adaptable to new manipulation tasks involving objects they have never seen before, like reconnecting severed pipes, or building makeshift shelters by piecing together debris during disaster response scenarios.

To explore this idea, we worked with researchers from Stanford and Columbia Universities to develop Form2Fit, a robotic manipulation algorithm that uses deep neural networks to learn to visually recognize how objects correspond (or “fit”) to each other. To test this algorithm, we tasked a real robot to perform kit assembly, where it needed to accurately assemble objects into a blister pack or corrugated display to form a single unit. Previous systems built for this task required extensive manual tuning to assemble a single kit unit at a time. However, we demonstrate that by learning the general concept of “how things fit together,” Form2Fit enables our robot to assemble various types of kits with a 94% success rate. Furthermore, Form2Fit is one of the first systems capable of generalizing to new objects and kitting tasks not seen during training.
Form2Fit learns to assemble a wide variety of kits by finding geometric correspondences between object surfaces and their target placement locations. By leveraging geometric information learned from multiple kits during training, the system generalizes to new objects and kits.
While often overlooked, shape analysis plays an important role in manipulation, especially for tasks like kit assembly. In fact, the shape of an object often matches the shape of its corresponding space in the packaging, and understanding this relationship is what allows people to do this task with minimal guesswork. At its core, Form2Fit aims to learn this relationship by training over numerous pairs of objects and their corresponding placing locations across multiple different kitting tasks – with the goal to acquire a broader understanding of how shapes and surfaces fit together. Form2Fit improves itself over time with minimal human supervision, gathering its own training data by repeatedly disassembling completed kits through trial and error, then time-reversing the disassembly sequences to get assembly trajectories. After training overnight for 12 hours, our robot learns effective pick and place policies for a variety of kits, achieving 94% assembly success rates with objects and kits in varying configurations, and over 86% assembly success rates when handling completely new objects and kits.

Data-Driven Shape Descriptors For Generalizable Assembly
The core component of Form2Fit is a two-stream matching network that learns to infer orientation-sensitive geometric pixel-wise descriptors for objects and their target placement locations from visual data. These descriptors can be understood as compressed 3D point representations that encode object geometry, textures, and contextual task-level knowledge. Form2Fit uses these descriptors to establish correspondences between objects and their target locations (i.e., where they should be placed). Since these descriptors are orientation-sensitive, they allow Form2Fit to infer how the picked object should be rotated before it is placed in its target location.

Form2Fit uses two additional networks to generate valid pick and place candidates. A suction network gets fed a 3D image of the objects and generates pixel-wise predictions of suction success. The suction probability map is visualized as a heatmap, where hotter pixels indicate better locations to grasp the object at the 3D location of the corresponding pixel. In parallel, a place network gets fed a 3D image of the target kit and outputs pixel-wise predictions of placement success. These, too, are visualized as a heatmap, where higher confidence values serve as better locations for the robot arm to approach from a top-down angle to place the object. Finally, the planner integrates the output of all three modules to produce the final pick location, place location and rotation angle.
Overview of Form2Fit. The suction and place networks infer candidate picking and placing locations in the scene respectively. The matching network generates pixel-wise orientation-sensitive descriptors to match picking locations to their corresponding placing locations. The planner then integrates it all to control the robot to execute the next best pick and place action.
Learning Assembly from Disassembly
Neural networks require large amounts of training data, which can be difficult to collect for tasks like assembly. Precisely inserting objects into tight spaces with the correct orientation (e.g., in kits) is challenging to learn through trial and error, because the chances of success from random exploration can be slim. In contrast, disassembling completed units is often easier to learn through trial and error, since there are fewer incorrect ways to remove an object than there are to correctly insert it. We leveraged this difference in order to amass training data for Form2Fit.
An example of self-supervision through time-reversal: rewinding a disassembly sequence of a deodorant kit over time generates a valid assembly sequence.
Our key observation is that in many cases of kit assembly, a disassembly sequence – when reversed over time – becomes a valid assembly sequence. This concept, called time-reversed disassembly, enables Form2Fit to train entirely through self-supervision by randomly picking with trial and error to disassemble a fully-assembled kit, then reversing that disassembly sequence to learn how the kit should be put together.

Generalization Results
The results of our experiments show great potential for learning generalizable policies for assembly. For instance, when a policy is trained to assemble a kit in only one specific position and orientation, it can still robustly assemble random rotations and translations of the kit 90% of the time.
Form2Fit policies are robust to a wide range of rotations and translations of the kits.
We also find that Form2Fit is capable of tackling novel configurations it has not been exposed to during training. For example, when training a policy on two single-object kits (floss and tape), we find that it can successfully assemble new combinations and mixtures of those kits, even though it has never seen such configurations before.
Form2Fit policies can generalize to novel kit configurations such as multiple versions of the same kit and mixtures of different kits.
Furthermore, when given completely novel kits on which it has not been trained, Form2Fit can generalize using its learned shape priors to assemble those kits with over 86% assembly accuracy.
Form2Fit policies can generalize to never-before-seen single and multi-object kits.
What Have the Descriptors Learned?
To explore what the descriptors of the matching network from Form2Fit have learned to encode, we visualize the pixel-wise descriptors of various objects in RGB colorspace through use of an embedding technique called t-SNE.
The t-SNE embedding of the learned object descriptors. Similarly oriented objects of the same category display identical colors (e.g. A, B or F, G) while different objects (e.g. C, H) and same objects but different orientation (e.g. A, C, D or H, F) exhibit different colors.
We observe that the descriptors have learned to encode (a) rotation — objects oriented differently have different descriptors (A, C, D, E) and (H, F); (b) spatial correspondence — same points on the same oriented objects share similar descriptors (A, B) and (F, G); and (c) object identity — zoo animals and fruits exhibit unique descriptors (columns 3 and 4).

Limitations & Future Work
While Form2Fit’s results are promising, its limitations suggest directions for future work. In our experiments, we assume a 2D planar workspace to constrain the kit assembly task so that it can be solved by sequencing top-down picking and placing actions. This may not work for all cases of assembly – for example, when a peg needs to be precisely inserted at a 45 degree angle. It would be interesting to expand Form2Fit to more complex action representations for 3D assembly.

You can learn more about this work and download the code from our GitHub repository.


Acknowledgments
This research was done by Kevin Zakka, Andy Zeng, Johnny Lee, and Shuran Song (faculty at Columbia University), with special thanks to Nick Hynes, Alex Nichol, and Ivan Krasin for fruitful technical discussions; Adrian Wong, Brandon Hurd, Julian Salazar, and Sean Snyder for hardware support; Ryan Hickman for valuable managerial support; and Chad Richards for helpful feedback on writing.

Source: Google AI Blog


Android Developer Challenge: helpful innovation, powered by On-Device Machine Learning + you!

Posted by The Android team

Android Developer Challenge banner

Developers like you have always played an important role in shaping the direction of Android, fueling the wave of Android innovation. It’s the reason that when we first launched the SDK for Android 10+ years ago, we simultaneously announced the Android Developer Challenge: a way to help reward model apps and show us what user problems you wanted to solve. As Android continues to push the boundaries into emerging areas like ML, 5G, foldables and more, we need your help to bring to life the consumer experiences that will define these new frontiers.

So we’re bringing back the Android Developer Challenge and asking you to help us unlock new experiences on Android, and help inspire other developers around these emerging technologies.

As we kick off this challenge, the first area we’ll be focusing on is On-Device Machine Learning. At Google, we’re big believers in how this new technology can open up a world of helpful innovation so you can get things done in ways you never thought possible. Take Live Captions: for the almost 500 million people who are deaf and hard of hearing, Live Captions bring content to life and is exactly the type of machine learning-powered innovation we expect to see more of someday, and with your help we can turn someday into today!

Bringing your idea to life in front of billions of eyes

Got an idea? Whether it’s still a concept or ready for users, tell us how you could use Google’s help, and how it supports the mission of using machine learning to help people get something done. Join the #AndroidDeveloperChallenge topic on GitHub, and share your idea as a repository under this topic. Don’t forget to come back here and officially submit your concept.

We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. Here’s what those 10 developers will get:

Expertise and development support bootcamp: We’ll work with you to provide expertise and guidance to help in your plans to bring your app from concept to reality, including:

  • An all-expenses paid, working session with a panel of experts at Google HQ in Mountain View, CA
  • Google engineer mentorship at the bootcamp, providing guidance and technical expertise on how to help your plans to bring your app to fruition

Exposure and street cred! Once your idea is ready for prime-time, we’ll help you get users, and celebrate you to the broader Android community, including:

  • A collection on Google Play where we’ll feature your app (apps must be ready for Google Play on May 1, and must meet Google’s minimum quality requirements)
  • Tickets to Google I/O 2020
  • And we’ll celebrate these experiences to the broader Android developer community on developers.android.com. We might even showcase you at Google I/O, in places like the sandbox, sessions, perhaps even a keynote!

Helpful innovation is an important investment area for us on the Android team, and On-Device Machine Learning has played a critical role in powering new features in the last several releases of Android. We’re just beginning to scratch the surface, and we can’t wait to see what you come up with!