Tag Archives: AI

Building SMILY, a Human-Centric, Similar-Image Search Tool for Pathology

Advances in machine learning (ML) have shown great promise for assisting in the work of healthcare professionals, such as aiding the detection of diabetic eye disease and metastatic breast cancer. Though high-performing algorithms are necessary to gain the trust and adoption of clinicians, they are not always sufficient—what information is presented to doctors and how doctors interact with that information can be crucial determinants in the utility that ML technology ultimately has for users.

The medical specialty of anatomic pathology, which is the gold standard for the diagnosis of cancer and many other diseases through microscopic analysis of tissue samples, can greatly benefit from applications of ML. Though diagnosis through pathology is traditionally done on physical microscopes, there has been a growing adoption of “digital pathology,” where high-resolution images of pathology samples can be examined on a computer. With this movement comes the potential to much more easily look up information, as is needed when pathologists tackle the diagnosis of difficult cases or rare diseases, when “general” pathologists approach specialist cases, and when trainee pathologists are learning. In these situations, a common question arises, “What is this feature that I’m seeing?” The traditional solution is for doctors to ask colleagues, or to laboriously browse reference textbooks or online resources, hoping to find an image with similar visual characteristics. The general computer vision solution to problems like this is termed content-based image retrieval (CBIR), one example of which is the “reverse image search” feature in Google Images, in which users can search for similar images by using another image as input.

Today, we are excited to share two research papers describing further progress in human-computer interaction research for similar image search in medicine. In “Similar Image Search for Histopathology: SMILY” published in Nature Partner Journal (npj) Digital Medicine, we report on our ML-based tool for reverse image search for pathology. In our second paper, Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making(preprint available here), which received an honorable mention at the 2019 ACM CHI Conference on Human Factors in Computing Systems, we explored different modes of refinement for image-based search, and evaluated their effects on doctor interaction with SMILY.

SMILY Design
The first step in developing SMILY was to apply a deep learning model, trained using 5 billion natural, non-pathology images (e.g., dogs, trees, man-made objects, etc.), to compress images into a “summary” numerical vector, called an embedding. The network learned during the training process to distinguish similar images from dissimilar ones by computing and comparing their embeddings. This model is then used to create a database of image patches and their associated embeddings using a corpus of de-identified slides from The Cancer Genome Atlas. When a query image patch is selected in the SMILY tool, the query patch’s embedding is similarly computed and compared with the database to retrieve the image patches with the most similar embeddings.
Schematic of the steps in building the SMILY database and the process by which input image patches are used to perform the similar image search.
The tool allows a user to select a region of interest, and obtain visually-similar matches. We tested SMILY’s ability to retrieve images along a pre-specified axis of similarity (e.g. histologic feature or tumor grade), using images of tissue from the breast, colon, and prostate (3 of the most common cancer sites). We found that SMILY demonstrated promising results despite not being trained specifically on pathology images or using any labeled examples of histologic features or tumor grades.
Example of selecting a small region in a slide and using SMILY to retrieve similar images. SMILY efficiently searches a database of billions of cropped images in a few seconds. Because pathology images can be viewed at different magnifications (zoom levels), SMILY automatically searches images at the same magnification as the input image.
Second example of using SMILY, this time searching for a lobular carcinoma, a specific subtype of breast cancer.
Refinement tools for SMILY
However, a problem emerged when we observed how pathologists interacted with SMILY. Specifically, users were trying to answer the nebulous question of “What looks similar to this image?” so that they could learn from past cases containing similar images. Yet, there was no way for the tool to understand the intent of the search: Was the user trying to find images that have a similar histologic feature, glandular morphology, overall architecture, or something else? In other words, users needed the ability to guide and refine the search results on a case-by-case basis in order to actually find what they were looking for. Furthermore, we observed that this need for iterative search refinement was rooted in how doctors often perform “iterative diagnosis”—by generating hypotheses, collecting data to test these hypotheses, exploring alternative hypotheses, and revisiting or retesting previous hypotheses in an iterative fashion. It became clear that, for SMILY to meet real user needs, it would need to support a different approach to user interaction.

Through careful human-centered research described in our second paper, we designed and augmented SMILY with a suite of interactive refinement tools that enable end-users to express what similarity means on-the-fly: 1) refine-by-region allows pathologists to crop a region of interest within the image, limiting the search to just that region; 2) refine-by-example gives users the ability to pick a subset of the search results and retrieve more results like those; and 3) refine-by-concept sliders can be used to specify that more or less of a clinical concept be present in the search results (e.g., fused glands). Rather than requiring that these concepts be built into the machine learning model, we instead developed a method that enables end-users to create new concepts post-hoc, customizing the search algorithm towards concepts they find important for each specific use case. This enables new explorations via post-hoc tools after a machine learning model has already been trained, without needing to re-train the original model for each concept or application of interest.
Through our user study with pathologists, we found that the tool-based SMILY not only increased the clinical usefulness of search results, but also significantly increased users’ trust and likelihood of adoption, compared to a conventional version of SMILY without these tools. Interestingly, these refinement tools appeared to have supported pathologists’ decision-making process in ways beyond simply performing better on similarity searches. For example, pathologists used the observed changes to their results from iterative searches as a means of progressively tracking the likelihood of a hypothesis. When search results were surprising, many re-purposed the tools to test and understand the underlying algorithm, for example, by cropping out regions they thought were interfering with the search or by adjusting the concept sliders to increase the presence of concepts they suspected were being ignored. Beyond being passive recipients of ML results, doctors were empowered with the agency to actively test hypotheses and apply their expert domain knowledge, while simultaneously leveraging the benefits of automation.
With these interactive tools enabling users to tailor each search experience to their desired intent, we are excited for SMILY’s potential to assist with searching large databases of digitized pathology images. One potential application of this technology is to index textbooks of pathology images with descriptive captions, and enable medical students or pathologists in training to search these textbooks using visual search, speeding up the educational process. Another application is for cancer researchers interested in studying the correlation of tumor morphologies with patient outcomes, to accelerate the search for similar cases. Finally, pathologists may be able to leverage tools like SMILY to locate all occurrences of a feature (e.g. signs of active cell division, or mitosis) in the same patient’s tissue sample to better understand the severity of the disease to inform cancer therapy decisions. Importantly, our findings add to the body of evidence that sophisticated machine learning algorithms need to be paired with human-centered design and interactive tooling in order to be most useful.

This work would not have been possible without Jason D. Hipp, Yun Liu, Emily Reif, Daniel Smilkov, Michael Terry, Craig H. Mermel, Martin C. Stumpe and members of Google Health and PAIR. Preprints of the two papers are available here and here.

Source: Google AI Blog

Google Translate’s instant camera translation gets an upgrade

Google Translate allows you to explore unfamiliar lands, communicate in different languages, and make connections that would be otherwise impossible. One of my favorite features on the Google Translate mobile app is instant camera translation, which allows you to see the world in your language by just pointing your camera lens at the foreign text. Similar to the real-time translation feature we recently launched in Google Lens, this is an intuitive way to understand your surroundings, and it’s especially helpful when you’re traveling abroad as it works even when you’re not connected to Wi-Fi or using cellular data. Today, we’re launching new upgrades to this feature, so that it’s even more useful.

Instant camera translation.gif

Translate from 88 languages into 100+ languages

The instant camera translation adds support for 60 more languages, such as Arabic, Hindi, Malay, Thai and Vietnamese. Here’s a full list of all 88 supported languages.

What’s more exciting is that, previously you could only translate between English and other languages, but now you can translate into any of the 100+ languages supported on Google Translate. This means you can now translate from Arabic to French, or from Japanese to Chinese, etc. 

Automatically detect the language

When traveling abroad, especially in a region with multiple languages, it can be challenging for people to determine the language of the text that they need to translate. We took care of that—in the new version of the app, you can just select “Detect language” as the source language, and the Translate app will automatically detect the language and translate. Say you’re traveling through South America, where both Portuguese and Spanish are spoken, and you encounter a sign. Translate app can now determine what language the sign is in, and then translate it for you into your language of choice.

Better translations powered by Neural Machine Translation

For the first time, Neural Machine Translation (NMT) technology is built into instant camera translations. This produces more accurate and natural translations, reducing errors by 55-85 percent in certain language pairs. And most of the languages can be downloaded onto your device, so that you can use the feature without an internet connection. However, when your device is connected to the internet, the feature uses that connection to produce higher quality translations.

A new look

Last but not least, the feature has a new look and is more intuitive to use. In the past, you might have noticed the translated text would flicker when viewed on your phone, making it difficult to read. We’ve reduced that flickering, making the text more stable and easier to understand. The new look has all three camera translation features conveniently located on the bottom of the app: “Instant” translates foreign text when you point your camera at it. "Scan" lets you take a photo and use your finger to highlight text you want translated. And “Import” lets you translate text from photos on your camera roll. 

To try out the the instant camera translation feature, download the Google Translate app.

Source: Translate

To reduce plastic waste in Indonesia, one startup turns to AI

In Indonesia, plastic waste poses a major challenge. With 50,000 km of coastline and a lack of widespread public awareness of waste management across the archipelago, much of Indonesia’s trash could end up in the ocean. Gringgo Indonesia Foundation has started tackling this problem using technology—and more recently, with a little help from Google. 

Earlier this year, Gringgo was named one of 20 grantees of the Google AI Impact Challenge. In addition to receiving $500,000 of funding from Google.org, Gringgo is part of our Launchpad Accelerator program that gives them guidance and resources to jumpstart their work. 

We sat down with Febriadi Pratama, CTO & co-founder at Gringgo, to find out how this so-called “trash tech start-up” plans to change waste management in Indonesia with the help of artificial intelligence (AI). 

Gringgo Foundation team

The team at Gringgo Indonesia Foundation.

Why is plastic waste such a problem for Indonesia? 
In the past 30 years, Indonesia  has become overwhelmed by plastic waste. Sadly, we haven’t found a solution to deal with this waste across our many islands. 

The topography of Indonesia makes it more challenging to put a price on recyclables. It consists of more than 17,000 islands with 5 major islands, but most recycling facilities are based on the mainland of Java. This makes transporting recyclables from other islands expensive, so materials with low value aren’t sorted and end up polluting the environment.  

To add to the complexity, waste workers often have irregular routes and schedules, leaving many parts of the country unserviced. Workers also don’t always have the knowledge and expertise to accurately identify what can be recycled, and what recycled items are worth. Together, these factors have a devastating impact on recycling rates and the livelihood waste workers.

How are you proposing to address this problem? 
Waste workers’ livelihood depends on the volume and value of the recyclable waste they collect. We realized that giving workers tools to track their collections and productivity could boost their earning power while also helping the environment. 

We came up with the idea to build an image recognition tool that would help improve plastic recycling rates by classifying different materials and giving them a monetary value.  In turn, this will reduce ocean plastic pollution and strengthen waste management in under-resourced communities. We believe this creates a new economic model for waste management that prioritizes people and the planet. 

How does the tool work in practice? 
We launched several  apps in 2017—both for waste workers and the public. One of the apps allows waste workers to track the amount and type of waste they collect. This helps them save time by suggesting a more organized route, and manually quantify their collections and earning potential. Within a year of launching the apps, we were able to improve recycling rates by 35 percent in our first pilot village, Sanur Kaja in Bali.  We also launched an app for the public, connecting people with waste collection services for their homes.

Ussing the Gringgo mobile app

Febriadi Pratama with waste worker, Baidi, using the Gringgo mobile app

Tell us about the role that AI will play in your app? 

With Google’s support, we’re working with Indonesian startup Datanest to build an image recognition tool using Google’s machine learning platform, TensorFlow. The goal is to allow waste workers to better analyze and classify waste items, and quantify their value. 

With AI built into the app, waste workers will be able to take a photo of trash, and through image recognition, the tool will identify the items and their associated value. This will educate waste workers about the market value of materials, help them optimize their operations, and maximize their wages.  Ultimately, this will motivate waste workers to collect and process waste more efficiently, and boost recycling rates. 

So whether it’s a plastic bottle (worth Rp 2,500/kg or 18 cents/kg) or a cereal box (worth Rp 10,000/kg or 71 cents/kg), these new technologies should allow more precious materials to be sorted and reused, thereby removing the guesswork for workers and putting more money in their pockets.

Identifying waste through AI powered image recognition

A mock-up shows how Gringgo thinks the app will be able to identify waste through AI-powered image recognition

What do you aspire to achieve in the next ten years? 

Waste management issues aren’t specific to Bali or to Indonesia. We think our technology has the potential to benefit many people and places around the globe. Our goal is to improve our AI model, make it economically sustainable, and ultimately help implement it across Indonesia, Asia and around the world.

Responsible AI: Putting our principles into action

Every day, we see how AI can help people from around the world and make a positive difference in our lives—from helping radiologists detect lung cancer, to increasing literacy rates in rural India, to conserving endangered species. These examples are just scratching the surface—AI could also save lives through natural disaster mitigation with our flood forecasting initiative and research on predicting earthquake aftershocks

As AI expands our reach into the once-unimaginable, it also sparks conversation around topics like fairness and privacy. This is an important conversation and one that requires the engagement of societies globally. A year ago, we announced Google’s AI Principles that help guide the ethical development and use of AI in our research and products. Today we’re sharing updates on our work.

Internal education

We’ve educated and empowered our employees to understand the important issues of AI and think critically about how to put AI into practice responsibly. This past year, thousands of Googlers have completed training in machine learning fairness. We’ve also piloted ethics trainings across four offices and organized an AI ethics speaker series hosted on three continents.

Tools and research

Over the last year, we’ve focused on sharing knowledge, building technical tools and product updates, and cultivating a framework for developing responsible and ethical AI that benefits everyone. This includes releasing more than 75 research papers on topics in responsible AI, including machine learning fairness, explainability, privacy, and security, and developed and open sourced 12 new tools. For example:

  • The What-If Tool is a new feature that lets users analyze an ML model without writing code. It enables users to visualize biases and the effects of various fairness constraints as well as compare performance across multiple models.
  • Google Translate reduces gender bias by providing feminine and masculine translations for some gender-neutral words on the Google Translate website.
  • We expanded our work in federated learning, a new approach to machine learning that allows developers to train AI models and make products smarter without your data ever leaving your device. It’s also now open-sourced as TensorFlow Federated.
  • Our People + AI Guidebook is a toolkit of methods and decision-making frameworks for how to build human-centered AI products. It launched in May and includes contributions from 40 Google product teams. 

We continue to update the Responsible AI Practices quarterly, as we reflect on the latest technical ideas and work at Google.

Review process

Our review process helps us meet our AI Principles. We encourage all Google employees to consider how the AI Principles affect their projects, and we’re evolving our processes to ensure we’re thoughtfully considering and assessing new projects, products, and deals. In each case we consider benefits and assess how we can mitigate risks. Here are two examples:

Cloud AI Hub

With Cloud AI Hub, enterprises and other organizations can share and more readily access a variety of already-trained machine learning models. Much of AI Hub’s content would be published by organizations outside of Google, which would make it difficult for us to evaluate all the content along the AI Principles. As a result, we evaluated the ethical considerations around releasing the AI Hub, such as the potential for harmful dual use, abuse, or presenting misleading information. 

In the course of the review, the team developed a two-tiered strategy for handling potentially risky and harmful content: 

  1. Encouraging community members to weigh in on issues like unfair bias. To support the community, Cloud AI provides resources (like the inclusive ML guide) to help users identify trustworthy content.
  2. Crafting a Terms of Service for Cloud AI Hub, specifically the sections on content and conduct restrictions.

These safeguards made it more likely that the AI Hub’s content ecosystem would be useful and well-maintained and as a result, we went ahead with launching the AI Hub.

Text-to-speech (TTS) research paper

A research group within Google wrote an academic paper that addresses a major challenge in AI research: systems often need to be retrained from scratch, with huge amounts of data, to take on even slightly different tasks. This paper detailed an efficient text-to-speech (TTS) network, which allows a system to be trained once and then adapted to new speakers with much less time and data.

While smarter text-to-speech networks could help individuals with voice disabilities, ALS, or tracheotomies, we recognize the potential for such technologies to be used for harmful applications, like synthesizing an individual’s voice for deceptive purposes.

Ultimately we determined that the technology described in the paper had limited potential for misuse for several reasons, including the quality of data required to make it work. Arbitrary recordings from the internet would not satisfy these requirements. In addition, there are enough differences between samples generated by the network and speakers’ voices for listeners to identify what’s real and what’s not. As a result, we concluded that this paper aligned with our AI Principles, but this exercise reinforced our commitment to identifying and preempting the potential for misuse.

Engaging with external stakeholders

Ongoing dialogue with the broader community is essential to developing socially responsible AI. We’ve engaged with policymakers and the tech community, participated in more than 100 workshops, research conferences and summits, and directly engaged with more than 4,000 stakeholders across the world.

As advances in AI continue, we’ll continue to share our perspectives and engage with academia, industry, and policymakers to promote the responsible development of AI. We support smart regulation tailored to specific sectors and use cases, and earlier this year we published this white paper to help promote pragmatic and forward-looking approaches to AI governance. It outlines five areas where government should work with civil society and AI practitioners to cultivate a framework for AI.

We recognize there’s always more to do and will continue working with leaders, policymakers, academics, and other stakeholders from across industries to tackle these important issues. Having these conversations, doing the proper legwork, and ensuring the inclusion of the widest array of perspectives, is critical to ensuring that AI joins the long list of technologies transforming life for the better. 

Whale songs and AI, for everyone to explore

Back in the 1960s, scientists first discovered that humpback whales actually sing songs, which evolve over time. But there’s still so much we don’t understand. Why do humpbacks sing? What is the meaning of the patterns within their songs?

Scientists sift through an ocean of sound to find answers to these questions. But what if anyone could help make discoveries?

For the past year, Google AI has been partnering with NOAA’s Pacific Island Fisheries Science Center to train an artificial intelligence model on their vast collection of underwater recordings. This project is helping scientists better understand whales’ behavioral and migratory patterns, so scientists can better protect whales. The effort fits into Google’s AI for Social Good program, applying the latest in machine learning to the world’s biggest humanitarian and environmental challenges.

NOAA research oceanographer Ann Allen and Google software engineer Matt Harvey work together to field test the algorithm aboard a research vessel.

NOAA research oceanographer Ann Allen (left) works onboard a research vessel and Google software engineer Matt Harvey (right) field tests the algorithm.

Now, everyone can play a role in this project using a website called Pattern Radio: Whale Songs. It’s a new tool that visualizes audio at a vast scale and uses AI to make it easy to explore. The site hosts more than 8,000 hours of NOAA’s recordings, which means scientists aren’t the only ones who can explore this data and make discoveries. Everyone can.

Zooming in on the spectrogram shows you individual sounds.

Zooming in on the spectrogram shows you individual sounds. 

On the site, you can zoom all the way in to see individual sounds on a spectrogram (in addition to humpback songs, you can see the sounds of ships, fish and all kinds of mysterious and even unknown noises). You can also zoom all the way out to see months of sound at a time. An AI heat map helps you find whale calls, and visualizations help you see repetitions and patterns of the sounds within the songs.

Highlights help visualize patterns and repetitions of individual sounds within the songs.

Highlights help visualize patterns and repetitions of individual sounds within the songs.

The idea is to get everyone listening—and maybe even make a totally new discovery. If you find something you think others should hear, you can share a link that goes directly to that sound. And if you need a bit more context around what you’re hearing, guided tours from whale song experts—like NOAA research oceanographer Ann Allen, bioacoustic scientist Christopher Clark, Cornell music professor Annie Lewandowski and more—point out especially interesting parts of the data.

You can start exploring at g.co/patternradio. And to dive even deeper, learn more about the project at our about page and check out Ann Allen’s article on how this whole project got started on her NOAA Fisheries blog. Jump on in!

EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling

Convolutional neural networks (CNNs) are commonly developed at a fixed resource cost, and then scaled up in order to achieve better accuracy when more resources are made available. For example, ResNet can be scaled up from ResNet-18 to ResNet-200 by increasing the number of layers, and recently, GPipe achieved 84.3% ImageNet top-1 accuracy by scaling up a baseline CNN by a factor of four. The conventional practice for model scaling is to arbitrarily increase the CNN depth or width, or to use larger input image resolution for training and evaluation. While these methods do improve accuracy, they usually require tedious manual tuning, and still often yield suboptimal performance. What if, instead, we could find a more principled method to scale up a CNN to obtain better accuracy and efficiency?

In our ICML 2019 paper, “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks”, we propose a novel model scaling method that uses a simple yet highly effective compound coefficient to scale up CNNs in a more structured manner. Unlike conventional approaches that arbitrarily scale network dimensions, such as width, depth and resolution, our method uniformly scales each dimension with a fixed set of scaling coefficients. Powered by this novel scaling method and recent progress on AutoML, we have developed a family of models, called EfficientNets, which superpass state-of-the-art accuracy with up to 10x better efficiency (smaller and faster).

Compound Model Scaling: A Better Way to Scale Up CNNs
In order to understand the effect of scaling the network, we systematically studied the impact of scaling different dimensions of the model. While scaling individual dimensions improves model performance, we observed that balancing all dimensions of the network—width, depth, and image resolution—against the available resources would best improve overall performance.

The first step in the compound scaling method is to perform a grid search to find the relationship between different scaling dimensions of the baseline network under a fixed resource constraint (e.g., 2x more FLOPS).This determines the appropriate scaling coefficient for each of the dimensions mentioned above. We then apply those coefficients to scale up the baseline network to the desired target model size or computational budget.

Comparison of different scaling methods. Unlike conventional scaling methods (b)-(d) that arbitrary scale a single dimension of the network, our compound scaling method uniformly scales up all dimensions in a principled way.
This compound scaling method consistently improves model accuracy and efficiency for scaling up existing models such as MobileNet (+1.4% imagenet accuracy), and ResNet (+0.7%), compared to conventional scaling methods.

EfficientNet Architecture
The effectiveness of model scaling also relies heavily on the baseline network. So, to further improve performance, we have also developed a new baseline network by performing a neural architecture search using the AutoML MNAS framework, which optimizes both accuracy and efficiency (FLOPS). The resulting architecture uses mobile inverted bottleneck convolution (MBConv), similar to MobileNetV2 and MnasNet, but is slightly larger due to an increased FLOP budget. We then scale up the baseline network to obtain a family of models, called EfficientNets.
The architecture for our baseline network EfficientNet-B0 is simple and clean, making it easier to scale and generalize.
EfficientNet Performance
We have compared our EfficientNets with other existing CNNs on ImageNet. In general, the EfficientNet models achieve both higher accuracy and better efficiency over existing CNNs, reducing parameter size and FLOPS by an order of magnitude. For example, in the high-accuracy regime, our EfficientNet-B7 reaches state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on CPU inference than the previous Gpipe. Compared with the widely used ResNet-50, our EfficientNet-B4 uses similar FLOPS, while improving the top-1 accuracy from 76.3% of ResNet-50 to 82.6% (+6.3%).
Model Size vs. Accuracy Comparison. EfficientNet-B0 is the baseline network developed by AutoML MNAS, while Efficient-B1 to B7 are obtained by scaling up the baseline network. In particular, our EfficientNet-B7 achieves new state-of-the-art 84.4% top-1 / 97.1% top-5 accuracy, while being 8.4x smaller than the best existing CNN.
Though EfficientNets perform well on ImageNet, to be most useful, they should also transfer to other datasets. To evaluate this, we tested EfficientNets on eight widely used transfer learning datasets. EfficientNets achieved state-of-the-art accuracy in 5 out of the 8 datasets, such as CIFAR-100 (91.7%) and Flowers (98.8%), with an order of magnitude fewer parameters (up to 21x parameter reduction), suggesting that our EfficientNets also transfer well.

By providing significant improvements to model efficiency, we expect EfficientNets could potentially serve as a new foundation for future computer vision tasks. Therefore, we have open-sourced all EfficientNet models, which we hope can benefit the larger machine learning community. You can find the EfficientNet source code and TPU training scripts here.

Special thanks to Hongkun Yu, Ruoming Pang, Vijay Vasudevan, Alok Aggarwal, Barret Zoph, Xianzhi Du, Xiaodan Song, Samy Bengio, Jeff Dean, and the Google Brain team.

Source: Google AI Blog

How AI could tackle a problem shared by a billion people

Earlier this month, Google AI Impact Challenge grantees from around the world gathered in San Francisco to start applying artificial intelligence to address some of the world’s toughest problems, from protecting rainforests to improving emergency response times.

In addition to receiving part of the $25 million pool of funding from Google.org, each organization is participating in a six-month program called the Launchpad Accelerator. The accelerator kicked off with a week-long boot camp which included mentorship, workshops, presentations from AI and product experts and an opportunity to connect with other grantees. They also received support and guidance from DataKind, a global non-profit dedicated to harnessing the power of data science and AI in the service of humanity. Throughout the accelerator, grantees will receive ongoing support and coaching from their Google mentors as they complete the first phase of their projects.

We sat down with Rajesh Jain from grantee Wadhwani AI, an Indian organization with a project dedicated to using AI to help farmers with pest control, to learn more about the problem he and his team are setting out to solve, and how support from Google will help them get there.


Rajesh Jain taking a photo of a pest trap

Why is pest control such a big issue?

More than a billion people around the world live in smallholder farmer households. These are farms that support a single family with a mixture of cash crops and subsistence farming. Many of these farmers struggle with pest damage that can wipe out a devastating amount of annual crop yield, despite heavy usage of pesticides. Currently, many farmers track and manage pests by manually identifying and counting them. In India, some send photos of pests to universities for analysis, but the advice often arrives after it’s too late to prevent irreversible damage to their crops. Last season, nearly 1,000 cotton farmers in India committed suicide after a pink bollworm attack destroyed 40% of the cotton yield. At Wadhwani AI, we’re creating technology that will help reduce crop losses.

What will you use the funding and support from Google for?

Before applying for this grant, we had already developed algorithms that detect two major pests, and have successfully tested this in parts of India. We plan to use the mentorship and funding from Google.org to develop a globally scalable pest management solution. This will allow farmers and agriculture program workers to take photos of pest traps and use image classification models on their phones to identify and count the pests and receive timely intervention recommendations, including what pesticides to spray and when. The goal is to provide millions of farmers with timely, localized advice to reduce pesticide usage and improve crop yield.

Going into the Launchpad Accelerator kickoff, what were you hoping to get out of it?

We’ve been working on this for nine months now—and we believe we’ve discovered a solution to this problem. We’re a small team, so funding and collaborations are necessary for us to succeed. We really needed help to scale our whole infrastructure. At first our goal was to get the project working, but we want this to be helpful to people around the globe, not just the subset of people who tested our first prototype. Companies like Google operate at scale so we were excited to get advice on how to do this.

What did you learn? 

We were really impressed with the kind of mentors we met with, especially the AI coach we’ve been assigned to work with in the coming months. He helped us set very concrete goals and we’re excited to continue to have his support in accomplishing them. It was helpful to learn practices that are commonplace at Google that can change how we do our work. For example, we learned and plan to implement right away when we get back is “Stand up Mondays” - it’s important for us to be on the same page - so this is a way to get us focused and connected at the start of the week.

What do you think is going to be the most challenging part of your project?

We have the technology, but scaling it and making it accessible to farmers will be difficult. There are differences in literacy, diversity, cultural differences, climate differences. We need to scale our solution to address all the challenges. We’re really looking to lean on the mentorship from Google to help us design the app so it’s scalable.

What are you most optimistic about?

We have confidence in our technical capabilities - and we got a lot of confirmation from other AI experts that this is a good idea. We’re excited to get to work.


How artists use AI and AR: collaborations with Google Arts & Culture

For centuries, creative people have turned tools into art, or come up with inventions to change how we think about the world around us. Today you can explore the intersection of art and technology through two new experiments, created by artists in collaboration with the Google Arts & Culture Lab, only recently announced at Google I/O 2019.

Created by artists Molmol Kuo & Zach Lieberman, Weird Cuts lets you make collages using augmented reality. You can select one of the cutouts shown in the camera screen to take a photo in a particular shape. The resulting cut-out can then be copy-pasted into the space around you, as seen through your camera’s eye. Download the app, available on iOS and Android, at g.co/weirdcuts.


Weird cuts in action 

Want to design your very own artwork with AI? Artist duo Pinar & Viola and Google engineer Alexander Mordvintsev—best known for his work on DeepDream—used machine learning to create a tool to do so. To use Infinite Patterns, upload an image and a DeepDream algorithm will transform and morph it into a unique pattern. For Pinar & Viola it is the perfect way to find new design inspirations for fashion by challenging one’s perception of shape, color and reality.

ezgif.com-gif-maker (2).gif

Infinite Patterns

These experiments were created in the Google Arts & Culture Lab, where we invite artists and coders to explore how technology can inspire artistic creativity. Collaborations at the Lab gave birth to Cardboard, the affordable VR headset, and Art Selfie, which has matched millions of selfies with works of art around the world.

To continue to encourage this emerging field of art with machine intelligence, we’re announcing the Artists + Machine Intelligence Grants for contemporary artists exploring creative applications of machine learning. This program will offer artists engineering mentorship, access to core Google research, and project funding.

Machine learning and artificial intelligence are greats tool for artists, and there’s so much more to learn. If you’re curious about its origins and future, dive into the online exhibition “AI: More than Human” by the Barbican Centre, in which some of the world’s leading experts, artists and innovators explore the evolving relationship between humans and technology.

You can try our new experiments as well as the digital exhibition on the Google Arts & Culture app for iOS and Android.

Behind Magenta, the tech that rocked I/O

On the second day of I/O 2019, two bands took the stage—with a little help from machine learning. Both YACHT and The Flaming Lips worked with Google engineers who say that machine learning could change the way artists create music.

“Any time there has been a new technological development, it has made its way into music and art,” says Adam Roberts, a software engineer on the Magenta team. “The history of the piano, essentially, went from acoustic to electric to the synthesizer, and now there are ways to play it directly from your computer. That just happens naturally. If it’s a new technology, people figure out how to use it in music.”

Magenta, which started nearly three years ago, is an open-source research project powered by TensorFlow that explores the role of machine learning as a tool in the creative process. Machine learning is a process of teaching computers to recognize patterns, with a goal of letting them learn by example rather than constantly receiving input from a programmer. So with music, for example, you can input two types of melodies, then use machine learning to combine them in a novel way.


Jesse Engel, Claire Evans, Wayne Coyne and Adam Roberts speak at I/O.  

But the Magenta team isn’t just teaching computers to make music—instead, they’re working hand-in-hand with musicians to help take their art in new directions. YACHT was one of Magenta’s earliest collaborators; the trio came to Google to learn more about how to use artificial intelligence and machine learning in their upcoming album.

The band first took all 82 songs from their back catalog and isolated each part, from bass lines to vocal melodies to drum rhythms; they then took those isolated parts and broke them up into four-bar loops. Then, they put those loops into the machine learning model, which put out new melodies based on their old work. They did a similar process with lyrics, using their old songs plus other material they considered inspiring. The final task was to pick lyrics and melodies that made sense, and pair them together to make a song.

Music and Machine Learning (Google I/O'19)

Music and Machine Learning Session from Google I/O'19

“They used these tools to push themselves out of their comfort zone,” says Jesse Engel, a research scientist on the Magenta team. “They imposed some rules on themselves that they had to use the outputs of the model to some extent, and it helped them make new types of music.”

Claire Evans, the singer of YACHT, explained the process during a presentation at I/O. “Using machine learning to make a song with structure, with a beginning, middle and end, is a little bit still out of our reach,” she explained. “But that’s a good thing. The melody was the model’s job, but the arrangement and performance was entirely our job.”

The Flaming Lips’ use of Magenta is a lot more recent; the band started working with the Magenta team to prepare for their performance at I/O. The Magenta team showcased all their projects to the band, who were drawn to one in particular: Piano Genie, which was dreamed up by a graduate student, Chris Donahue, who was a summer intern at Google. They decided to use Piano Genie as the basis for a new song to be debuted on the I/O stage.

Google AI collaboration with The Flaming Lips bears fruit at I/O 2019

Piano Genie distills 88 notes on a piano to eight buttons, which you can push to your heart’s content to make piano music. In what Jesse calls “an initial moment of inspiration,” someone put a piece of wire inside a piece of fruit, and turned fruit into the buttons for Piano Genie. “Fruit can be used as a capacitive sensor, like the screen on your phone, so you can detect whether or not someone is touching the fruit,” Jesse explains. “They were playing these fruits just by touching these different fruits, and they got excited by how that changed the interaction.”

Wayne Coyne, the singer of The Flaming Lips, noted during an I/O panel that a quick turnaround time, plus close collaboration with Google, gave them the inspiration to think outside the box. “For me, the idea that we’re not playing it on a keyboard, we’re not playing it on a guitar, we’re playing it on fruit, takes it into this other realm,” he said.

During their performance that night, Steven Drozd from The Flaming Lips, who usually plays a variety of instruments, played a “magical bowl of fruit” for the first time. He tapped each fruit in the bowl, which then played different musical tones, “singing” the fruit’s own name. With help from Magenta, the band broke into a brand-new song, “Strawberry Orange.”

IO_19_Wed_Concert_13496 (1).jpg

The Flaming Lips’ Steven Drozd plays a bowl of fruit.

The Flaming Lips also got help from the audience: At one point, they tossed giant, blow-up “fruits” into the crowd, and each fruit was also set up as a sensor, so any audience member who got their hands on one played music, too. The end result was a cacophonous, joyous moment when a crowd truly contributed to the band’s sound.

IO_19_Wed_Concert_13502 (1).jpg

Audience members “play” an inflatable banana.

You can learn more about the "Fruit Genie" and how to build your own at g.co/magenta/fruitgenie.

Though the Magenta team collaborated on a much deeper level with YACHT, they also found the partnership with The Flaming Lips to be an exciting look toward the future. “The Flaming Lips is a proof of principle of how far we’ve come with the technologies,” Jesse says. “Through working with them we understood how to make our technologies more accessible to a broader base of musicians. We were able to show them all these things and they could just dive in and play with it.”

A promising step forward for predicting lung cancer

Over the past three years, teams at Google have been applying AI to problems in healthcare—from diagnosing eye disease to predicting patient outcomes in medical records. Today we’re sharing new research showing how AI can predict lung cancer in ways that could boost the chances of survival for many people at risk around the world.

Lung cancer results in over 1.7 million deaths per year, making it the deadliest of all cancers worldwide—more than breast, prostate, and colorectal cancers combined—and it’s the sixth most common cause of death globally, according to the World Health Organization. While lung cancer has one of the worst survival rates among all cancers, interventions are much more successful when the cancer is caught early. Unfortunately, the statistics are sobering because the overwhelming majority of cancers are not caught until later stages.

Over the last three decades, doctors have explored ways to screen people at high-risk for lung cancer. Though lower dose CT screening has been proven to reduce mortality, there are still challenges that lead to unclear diagnosis, subsequent unnecessary procedures, financial costs, and more.

Our latest research

In late 2017, we began exploring how we could address some of these challenges using AI. Using advances in 3D volumetric modeling alongside datasets from our partners (including Northwestern University), we’ve made progress in modeling lung cancer prediction as well as laying the groundwork for future clinical testing. Today we’re publishing our promising findings in “Nature Medicine.”

Radiologists typically look through hundreds of 2D images within a single CT scan and cancer can be miniscule and hard to spot. We created a model that can not only generate the overall lung cancer malignancy prediction (viewed in 3D volume) but also identify subtle malignant tissue in the lungs (lung nodules). The model can also factor in information from previous scans, useful in predicting lung cancer risk because the growth rate of suspicious lung nodules can be indicative of malignancy.

lung cancer model.gif

This is a high level modeling framework. For each patient, the AI uses the current CT scan and, if available, a previous CT scan as input. The model outputs an overall malignancy prediction.

In our research, we leveraged 45,856 de-identified chest CT screening cases (some in which cancer was found) from NIH’s research dataset from the National Lung Screening Trial study and Northwestern University. We validated the results with a second dataset and also compared our results against 6 U.S. board-certified radiologists.

When using a single CT scan for diagnosis, our model performed on par or better than the six radiologists. We detected five percent more cancer cases while reducing false-positive exams by more than 11 percent compared to unassisted radiologists in our study. Our approach achieved an AUC of 94.4 percent (AUC is a common common metric used in machine learning and provides an aggregate measure for classification performance).

lung cancer scan.gif

For an asymptomatic patient with no history of cancer, the AI system reviewed and detected potential lung cancer that had been previously called normal.

Next steps

Despite the value of lung cancer screenings, only 2-4 percent of eligible patients in the U.S. are screened today. This work demonstrates the potential for AI to increase both accuracy and consistency, which could help accelerate adoption of lung cancer screening worldwide.

These initial results are encouraging, but further studies will assess the impact and utility in clinical practice. We’re collaborating with Google Cloud Healthcare and Life Sciences team to serve this model through the Cloud Healthcare API and are in early conversations with partners around the world to continue additional clinical validation research and deployment. If you’re a research institution or hospital system that is interested in collaborating in future research, please fill out this form.