Tag Archives: machine learning

AIY Projects: A first step into STEM

Artificial Intelligence allows computers to perform increasingly complex tasks like understanding speech or identifying what’s in an image. AI built into hardware lets you build devices that feel more personal, can be trained to solve individual problems, and do things people haven’t thought of yet. Building AI-based things used to require expensive hardware and an advanced computer science degree, but with AIY Projects we’ve created two simple kits that allow students and makers to start building, playing and learning about AI.

Each kit takes you through step-by-step instructions to build a cardboard shell and then install the electronics to assemble your own device. The Voice Kit lets you build a voice-controlled speaker, while the Vision Kit lets you build a camera that learns to recognize people and objects. Along the way you learn the basics of building simple electronic circuits, some light programming, and setting up a Raspberry Pi (a small circuit board computer). But building the kit is just a starting point, and once it’s built you can start to customize its functionality and dive even deeper into programming and hardware.

Since launching these kits last year, we’ve seen interest from parents and teachers who have found the products to be great learning tools in and out of the classroom. While the changing nature of work means that our students may have jobs that haven't yet been imagined, we do know that computer science skills—like analytical thinking and creative problem solving— will be crucial in the future. AIY Projects kits aim to help prepare students, lowering the barriers to entry for learning computer science.

We’ve created a new version of our original kits that make classroom use easier with the AIY Vision Kit 2 and the AIY Voice Kit 2. Each one now includes everything you need to get going right in the box. We’ve also released a new Android App that greatly simplifies configuration of the device.

To help students learn more about AI we’re introducing a new AIY Models area to our site that showcases a collection of pre-built AI models designed to work with AIY kits. Now students can load up new models to explore different facets of AI, like a new smile detector model that lets you instantly tell whether someone looking into a Vision Kit is smiling. Over time we’ll be adding new models that explore new functionality and content about each model.

Finally, on the refreshed the AIY website we’ve improved documentation with better photos and instructions, to make it easier for young makers to get started and learn as they build.  

These are our first steps in starting to address the needs of the STEM market and improving our products for parents, students and teachers. However, it’s also the start of a conversation with the STEM community to learn more about their needs as we build, iterate, and make content for our new and existing products. Send us your feedback, thoughts, and ideas on how we can make these kits a meaningful part of STEM education at [email protected] or stop by Maker Faire in May and ISTE in June.

The new Vision Kit and Voice Kit have arrived at U.S. Target Stores and Target.com this month and we’re working to make them globally available through retailers worldwide. Be sure to sign up on our mailing list to be notified when our products become available, or check out what we’re doing on social media by searching for #aiyprojects.

AIY Projects: Updated kits for 2018

Posted by Billy Rutledge, Director of AIY Projects

Last year, AIY Projects launched to give makers the power to build AI into their projects with two do-it-yourself kits. We're seeing continued demand for the kits, especially from the STEM audience where parents and teachers alike have found the products to be great tools for the classroom. The changing nature of work in the future means students may have jobs that haven't yet been imagined, and we know that computer science skills, like analytical thinking and creative problem solving, will be crucial.

We're taking the first of many steps to help educators integrate AIY into STEM lesson plans and help prepare students for the challenges of the future by launching a new version of our AIY kits. The Voice Kit lets you build a voice controlled speaker, while the Vision Kit lets you build a camera that learns to recognize people and objects (check it out here). The new kits make getting started a little easier with clearer instructions, a new app and all the parts in one box.

To make setup easier, both kits have been redesigned to work with the new Raspberry Pi Zero WH, which comes included in the box, along with the USB connector cable and pre-provisioned SD card. Now users no longer need to download the software image and can get running faster. The updated AIY Vision Kit v1.1 also includes the Raspberry Pi Camera v2.

AIY Voice Kit v2 includes Raspberry Pi Zero WH and pre-provisioned SD card

AIY Voice Kit v1.1 includes Raspberry Pi Zero WH, Raspberry Pi Cam 2 and pre-provisioned SD card

We're also introducing the AIY companion app for Android, available here in Google Play, to make wireless setup and configuration a snap. The kits still work with monitor, keyboard and mouse as an alternate path and we're working on iOS and Chrome companions which will be coming soon.

The AIY website has been refreshed with improved documentation, now easier for young makers to get started and learn as they build. It also includes a new AIY Models area, showcasing a collection of neural networks designed to work with AIY kits. While we've solved one barrier to entry for the STEM audience, we recognize that there are many other things that we can do to make our kits even more useful. We'll once again be at #MakerFaire events to gather feedback from our users and in June we'll be working with teachers from all over the world at the ISTE conference in Chicago.

The new AIY Voice Kit and Vision Kit have arrived at Target Stores and Target.com (US) this month and we're working to make them globally available through retailers worldwide. Sign up on our mailing list to be notified when our products become available.

We hope you'll pick up one of the new AIY kits and learn more about how to build your own smart devices. Be sure to share your recipes on Hackster.io and social media using #aiyprojects.

Text Embedding Models Contain Bias. Here’s Why That Matters.

Posted by Ben Packer, Yoni Halpern, Mario Guajardo-Céspedes & Margaret Mitchell (Google AI)

As Machine Learning practitioners, when faced with a task, we usually select or train a model primarily based on how well it performs on that task. For example, say we're building a system to classify whether a movie review is positive or negative. We take 5 different models and see how well each performs this task:

Figure 1: Model performances on a task. Which model would you choose?

Normally, we'd simply choose Model C. But what if we found that while Model C performs the best overall, it's also most likely to assign a more positive sentiment to the sentence "The main character is a man" than to the sentence "The main character is a woman"? Would we reconsider?

Bias in Machine Learning Models

Neural network models can be quite powerful, effectively helping to identify patterns and uncover structure in a variety of different tasks, from language translation to pathology to playing games. At the same time, neural models (as well as other kinds of machine learning models) can contain problematic biases in many forms. For example, classifiers trained to detect rude, disrespectful, or unreasonable comments may be more likely to flag the sentence "I am gay" than "I am straight" [1]; face classification models may not perform as well for women of color [2]; speech transcription may have higher error rates for African Americans than White Americans [3].

Many pre-trained machine learning models are widely available for developers to use -- for example, TensorFlow Hub recently launched its platform publicly. It's important that when developers use these models in their applications, they're aware of what biases they contain and how they might manifest in those applications.

Human data encodes human biases by default. Being aware of this is a good start, and the conversation around how to handle it is ongoing. At Google, we are actively researching unintended bias analysis and mitigation strategies because we are committed to making products that work well for everyone. In this post, we'll examine a few text embedding models, suggest some tools for evaluating certain forms of bias, and discuss how these issues matter when building applications.

WEAT scores, a general-purpose measurement tool

Text embedding models convert any input text into an output vector of numbers, and in the process map semantically similar words near each other in the embedding space:

Figure 2: Text embeddings convert any text into a vector of numbers (left). Semantically similar pieces of text are mapped nearby each other in the embedding space (right).

Given a trained text embedding model, we can directly measure the associations the model has between words or phrases. Many of these associations are expected and are helpful for natural language tasks. However, some associations may be problematic or hurtful. For example, the ground-breaking paper by Bolukbasi et al. [4] found that the vector-relationship between "man" and "woman" was similar to the relationship between "physician" and "registered nurse" or "shopkeeper" and "housewife" in the popular publicly-available word2vec embedding trained on Google News text.

The Word Embedding Association Test (WEAT) was recently proposed by Caliskan et al. [5] as a way to examine the associations in word embeddings between concepts captured in the Implicit Association Test (IAT). We use the WEAT here as one way to explore some kinds of problematic associations.

The WEAT test measures the degree to which a model associates sets of target words (e.g., African American names, European American names, flowers, insects) with sets of attribute words (e.g., "stable", "pleasant" or "unpleasant"). The association between two given words is defined as the cosine similarity between the embedding vectors for the words.

For example, the target lists for the first WEAT test are types of flowers and insects, and the attributes are pleasant words (e.g., "love", "peace") and unpleasant words (e.g., "hatred," "ugly"). The overall test score is the degree to which flowers are more associated with the pleasant words, relative to insects. A high positive score (the score can range between 2.0 and -2.0) means that flowers are more associated with pleasant words, and a high negative score means that insects are more associated with pleasant words.

While the first two WEAT tests proposed in Caliskan et al. measure associations that are of little social concern (except perhaps to entomologists), the remaining tests measure more problematic biases.

We used the WEAT score to examine several word embedding models: word2vec and GloVe (previously reported in Caliskan et al.), and three newly-released models available on the TensorFlow Hub platform -- nnlm-en-dim50, nnlm-en-dim128, and universal-sentence-encoder. The scores are reported in Table 1.

Table 1: Word Embedding Association Test (WEAT) scores for different embedding models. Cell color indicates whether the direction of the measured bias is in line with (blue) or against (yellow) the common human biases recorded by the Implicit Association Tests. *Statistically significant (p < 0.01) using Caliskan et al. (2015) permutation test. Rows 3-5 are variations whose word lists come from [6], [7], and [8]. See Caliskan et al. for all word lists. * For GloVe, we follow Caliskan et al. and drop uncommon words from the word lists. All other analyses use the full word lists.

These associations are learned from the data that was used to train these models. All of the models have learned the associations for flowers, insects, instruments, and weapons that we might expect and that may be useful in text understanding. The associations learned for the other targets vary, with some -- but not all -- models reinforcing common human biases.

For developers who use these models, it's important to be aware that these associations exist, and that these tests only evaluate a small subset of possible problematic biases. Strategies to reduce unwanted biases are a new and active area of research, and there exists no "silver bullet" that will work best for all applications.

When focusing in on associations in an embedding model, the clearest way to determine how they will affect downstream applications is by examining those applications directly. We turn now to a brief analysis of two sample applications: A Sentiment Analyzer and a Messaging App.

Case study 1: Tia's Movie Sentiment Analyzer

WEAT scores measure properties of word embeddings, but they don't tell us how those embeddings affect downstream tasks. Here we demonstrate the effect of how names are embedded in a few common embeddings on a movie review sentiment analysis task.

Tia is looking to train a sentiment classifier for movie reviews. She does not have very many samples of movie reviews, and so she leverages pretrained embeddings which map the text into a representation which can make the classification task easier.

Let's simulate Tia's scenario using an IMDB movie review dataset [9], subsampled to 1,000 positive and 1,000 negative reviews. We'll use a pre-trained word embedding to map the text of the IMDB reviews to low-dimensional vectors and use these vectors as features in a linear classifier. We'll consider a few different word embedding models and training a linear sentiment classifier with each.

We'll evaluate the quality of the sentiment classifier using the area under the ROC curve (AUC) metric on a held-out test set.

Here are AUC scores for movie sentiment classification using each of the embeddings to extract features:

Figure 3: Performance scores on the sentiment analysis task, measured in AUC, for each of the different embeddings.

At first, Tia's decision seems easy. She should use the embedding that result in the classifier with the highest score, right?

However, let's think about some other aspects that could affect this decision. The word embeddings were trained on large datasets that Tia may not have access to. She would like to assess whether biases inherent in those datasets may affect the behavior of her classifier.

Looking at the WEAT scores for various embeddings, Tia notices that some embeddings consider certain names more "pleasant" than others. That doesn't sound like a good property of a movie sentiment analyzer. It doesn't seem right to Tia that names should affect the predicted sentiment of a movie review. She decides to check whether this "pleasantness bias" affects her classification task.

She starts by constructing some test examples to determine whether a noticeable bias can be detected.

In this case, she takes the 100 shortest reviews from her test set and appends the words "reviewed by _______", where the blank is filled in with a name. Using the lists of "African American" and "European American" names from Caliskan et al. and common male and female names from the United States Social Security Administration, she looks at the difference in average sentiment scores.

Figure 4: Difference in average sentiment scores on the modified test sets where "reviewed by ______" had been added to the end of each review. The violin plots show the distribution over differences when models are trained on small samples of the original IMDB training data.

The violin-plots above show the distribution in differences of average sentiment scores that Tia might see, simulated by taking subsamples of 1,000 positive and 1,000 negative reviews from the original IMDB training set. We show results for five word embeddings, as well as a model (No embedding) that doesn't use a word embedding.

Checking the difference in sentiment with no embedding is a good check that confirms that the sentiment associated with the names is not coming from the small IMDB supervised dataset, but rather is introduced by the pretrained embeddings. We can also see that different embeddings lead to different system outcomes, demonstrating that the choice of embedding is a key factor in the associations that Tia's sentiment classifier will make.

Tia needs to think very carefully about how this classifier will be used. Maybe her goal is just to select a few good movies for herself to watch next. In this case, it may not be a big deal. The movies that appear at the top of the list are likely to be very well-liked movies. But what if she hires and pays actors and actresses according to their average movie review ratings, as assessed by her model? That sounds much more problematic.

Tia may not be limited to the choices presented here. There are other approaches that she may consider, like mapping all names to a single word type, retraining the embeddings using data designed to mitigate sensitivity to names in her dataset, or using multiple embeddings and handling cases where the models disagree.

There is no one "right" answer here. Many of these decisions are highly context dependent and depend on Tia's intended use. There is a lot for Tia to think about as she chooses between feature extraction methods for training text classification models.

Case study 2: Tamera's Messaging App

Tamera is building a messaging app, and she wants to use text embedding models to give users suggested replies when they receive a message. She's already built a system to generate a set of candidate replies for a given message, and she wants to use a text embedding model to score these candidates. Specifically, she'll run the input message through the model to get the message embedding vector, do the same for each of the candidate responses, and then score each candidate with the cosine similarity between its embedding vector and the message embedding vector.

While there are many ways that a model's bias may play a role in these suggested replies, she decides to focus on one narrow aspect in particular: the association between occupations and binary gender. An example of bias in this context is if the incoming message is "Did the engineer finish the project?" and the model scores the response "Yes he did" higher than "Yes she did." These associations are learned from the data used to train the embeddings, and while they reflect the degree to which each gendered response is likely to be the actual response in the training data (and the degree to which there's a gender imbalance in these occupations in the real world), it can be a negative experience for users when the system simply assumes that the engineer is male.

To measure this form of bias, she creates a templated list of prompts and responses. The templates include questions such as, "Is/was your cousin a(n) ?" and "Is/was the here today?", with answer templates of "Yes, s/he is/was." For a given occupation and question (e.g., "Will the plumber be there today?"), the model's bias score is the difference between the model's score for the female-gendered response ("Yes, she will") and that of the male-gendered response ("Yes, he will"):

For a given occupation overall, the model's bias score is the sum of the bias scores for all question/answer templates with that occupation.

Tamera runs 200 occupations through this analysis using the Universal Sentence Encoder embedding model. Table 2 shows the occupations with the highest female-biased scores (left) and the highest male-biased scores (right):

Highest female biasHighest male bias

Table 2: Occupations with the highest female-biased scores (left) and the highest male-biased scores (right).

Tamera isn't bothered by the fact that "waitress" questions are more likely to induce a response that contains "she," but many of the other response biases give her pause. As with Tia, Tamera has several choices she can make. She could simply accept these biases as is and do nothing, though at least now she won't be caught off-guard if users complain. She could make changes in the user interface, for example by having it present two gendered responses instead of just one, though she might not want to do that if the input message has a gendered pronoun (e.g., "Will she be there today?"). She could try retraining the embedding model using a bias mitigation technique (e.g., as in Bolukbasi et al.) and examining how this affects downstream performance, or she might mitigate bias in the classifier directly when training her classifier (e.g., as in Dixon et al. [1], Beutel et al. [10], or Zhang et al. [11]). No matter what she decides to do, it's important that Tamera has done this type of analysis so that she's aware of what her product does and can make informed decisions.

Conclusions

To better understand the potential issues that an ML model might create, both model creators and practitioners who use these models should examine the undesirable biases that models may contain. We've shown some tools for uncovering particular forms of stereotype bias in these models, but this certainly doesn't constitute all forms of bias. Even the WEAT analyses discussed here are quite narrow in scope, and so should not be interpreted as capturing the full story on implicit associations in embedding models. For example, a model trained explicitly to eliminate negative associations for 50 names in one of the WEAT categories would likely not mitigate negative associations for other names or categories, and the resulting low WEAT score could give a false sense that negative associations as a whole have been well addressed. These evaluations are better used to inform us about the way existing models behave and to serve as one starting point in understanding how unwanted biases can affect the technology that we make and use. We're continuing to work on this problem because we believe it's important and we invite you to join this conversation as well.

Acknowledgments

We would like to thank Lucy Vasserman, Eric Breck, Erica Greene, and the TensorFlow Hub and Semantic Experiences teams for collaborating on this work.

References

[1] Dixon, L., Li, J., Sorensen, J., Thain, M. and Vasserman, L., 2018. Measuring and Mitigating Unintended Bias in Text Classification. AIES.

[2] Buolamwini, J. and Gebru, T., 2018. Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification. FAT/ML.

[3] Tatman, R. and Kasten, C. 2017. Effects of Talker Dialect, Gender & Race on Accuracy of Bing Speech and YouTube Automatic Captions. INTERSPEECH.

[4] Bolukbasi, T., Chang, K., Zou, J., Saligrama, V. and Kalai, A. 2016. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings. NIPS.

[5] Caliskan, A., Bryson, J. J. and Narayanan, A. 2017. Semantics derived automatically from language corpora contain human-like biases. Science.

[6] Greenwald, A. G., McGhee, D. E., and Schwartz, J. L. 1998. Measuring individual differences in implicit cognition: the implicit association test. Journal of personality and social psychology.

[7] Bertrand, M. and Mullainathan, S. 2004. Are emily and greg more employable than lakisha and jamal? a field experiment on labor market discrimination. The American Economic Review.

[8] Nosek, B. A., Banaji, M., and Greenwald, A. G. 2002. Harvesting implicit group attitudes and beliefs from a demonstration web site. Group Dynamics: Theory, Research, and Practice.

[9] Andrew L. Maas, Raymond E. Daly, Peter T. Pham, Dan Huang, Andrew Y. Ng, and Christopher Potts. 2011. Learning Word Vectors for Sentiment Analysis. ACL.

[10] Beutel, A., Chen, J., Zhao, Z., & Chi, E. H. 2017 Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations. FAT/ML.

[11] Zhang, B., Lemoine, B., and Mitchell, M. 2018. Mitigating Unwanted Biases with Adversarial Learning. AIES.

How to make AI that’s good for people

For a field that was not well known outside of academia a decade ago, artificial intelligence has grown dizzyingly fast. Tech companies from Silicon Valley to Beijing are betting everything on it, venture capitalists are pouring billions into research and development, and start-ups are being created on what seems like a daily basis. If our era is the next Industrial Revolution, as many claim, AI is surely one of its driving forces.

It is an especially exciting time for a researcher like me. When I was a graduate student in computer science in the early 2000s, computers were barely able to detect sharp edges in photographs, let alone recognize something as loosely defined as a human face. But thanks to the growth of big data, advances in algorithms like neural networks and an abundance of powerful computer hardware, something momentous has occurred: AI has gone from an academic niche to the leading differentiator in a wide range of industries, including manufacturing, health care, transportation and retail.

I worry, however, that enthusiasm for AI is preventing us from reckoning with its looming effects on society. Despite its name, there is nothing “artificial” about this technology—it is made by humans, intended to behave like humans and affects humans. So if we want it to play a positive role in tomorrow’s world, it must be guided by human concerns.

I call this approach “human-centered AI.” It consists of three goals that can help responsibly guide the development of intelligent machines.

First, AI needs to reflect more of the depth that characterizes our own intelligence. Consider the richness of human visual perception. It’s complex and deeply contextual, and naturally balances our awareness of the obvious with a sensitivity to nuance. By comparison, machine perception remains strikingly narrow.

Sometimes this difference is trivial. For instance, in my lab, an image-captioning algorithm once fairly summarized a photo as “a man riding a horse” but failed to note the fact that both were bronze sculptures. Other times, the difference is more profound, as when the same algorithm described an image of zebras grazing on a savanna beneath a rainbow. While the summary was technically correct, it was entirely devoid of aesthetic awareness, failing to detect any of the vibrancy or depth a human would naturally appreciate.

That may seem like a subjective or inconsequential critique, but it points to a major aspect of human perception beyond the grasp of our algorithms. How can we expect machines to anticipate our needs—much less contribute to our well-being—without insight into these “fuzzier” dimensions of our experience?

Making AI more sensitive to the full scope of human thought is no simple task. The solutions are likely to require insights derived from fields beyond computer science, which means programmers will have to learn to collaborate more often with experts in other domains.

Such collaboration would represent a return to the roots of our field, not a departure from it. Younger AI enthusiasts may be surprised to learn that the principles of today’s deep-learning algorithms stretch back more than 60 years to the neuroscientific researchers David Hubel and Torsten Wiesel, who discovered how the hierarchy of neurons in a cat’s visual cortex responds to stimuli.

Likewise, ImageNet, a data set of millions of training photographs that helped to advance computer vision, is based on a project called WordNet, created in 1995 by the cognitive scientist and linguist George Miller. WordNet was intended to organize the semantic concepts of English.

Reconnecting AI with fields like cognitive science, psychology and even sociology will give us a far richer foundation on which to base the development of machine intelligence. And we can expect the resulting technology to collaborate and communicate more naturally, which will help us approach the second goal of human-centered AI: enhancing us, not replacing us.

Imagine the role that AI might play during surgery. The goal need not be to automate the process entirely. Instead, a combination of smart software and specialized hardware could help surgeons focus on their strengths—traits like dexterity and adaptability—while keeping tabs on more mundane tasks and protecting against human error, fatigue and distraction.

Or consider senior care. Robots may never be the ideal custodians of the elderly, but intelligent sensors are already showing promise in helping human caretakers focus more on their relationships with those they provide care for by automatically monitoring drug dosages and going through safety checklists.

These are examples of a trend toward automating those elements of jobs that are repetitive, error-prone and even dangerous. What’s left are the creative, intellectual and emotional roles for which humans are still best suited.

No amount of ingenuity, however, will fully eliminate the threat of job displacement. Addressing this concern is the third goal of human-centered AI: ensuring that the development of this technology is guided, at each step, by concern for its effect on humans.

Today’s anxieties over labor are just the start. Additional pitfalls include bias against underrepresented communities in machine learning, the tension between AI’s appetite for data and the privacy rights of individuals and the geopolitical implications of a global intelligence race.

Adequately facing these challenges will require commitments from many of our largest institutions. Universities are uniquely positioned to foster connections between computer science and traditionally unrelated departments like the social sciences and even humanities, through interdisciplinary projects, courses and seminars. Governments can make a greater effort to encourage computer science education, especially among young girls, racial minorities and other groups whose perspectives have been underrepresented in AI. And corporations should combine their aggressive investment in intelligent algorithms with ethical AI policies that temper ambition with responsibility.

No technology is more reflective of its creators than AI. It has been said that there are no “machine” values at all, in fact; machine values are human values. A human-centered approach to AI means these machines don’t have to be our competitors, but partners in securing our well-being. However autonomous our technology becomes, its impact on the world—for better or worse—will always be our responsibility.

This article was originally published in the New York Times.

Noodle on this: Machine learning that can identify ramen by shop

There are casual ramen fans and then there are ramen lovers. There are people who are all tonkatsu all the time, and others who swear by tsukemen. And then there’s machine learning, which—based on a recent case study out of Japan—might be the biggest ramen aficionado of them all.


Recently, data scientist Kenji Doi used machine learning models and AutoML Vision to classify bowls of ramen and identify the exact shop each bowl is made at, out of 41 ramen shops, with 95 percent accuracy. Sounds crazy (also delicious), especially when you see what these bowls look like:
Ramen bowls made at three different Ramen Jiro shops.
Ramen bowls made at three different Ramen Jiro shops

With 41 locations around Tokyo, Ramen Jiro is one of the most popular restaurant franchises in Japan, because of its generous portions of toppings, noodles and soup served at low prices. They serve the same basic menu at each shop, and as you can see above, it's almost impossible for a human (especially if you're new to Ramen Jiro) to tell what shop each bowl is made at.


But Kenji thought deep learning could discern the minute details that make one shop’s bowl of ramen different from the next. He had already built a machine learning model to classify ramen, but wanted to see if AutoML Vision could do it more efficiently.


AutoML Vision creates customized ML models automatically—to identify animals in the wild, or recognize types of products to improve an online store, or in this case classify ramen. You don’t have to be a data scientist to know how to use it—all you need to do is upload well-labeled images and then click a button. In Kenji’s case, he compiled a set of 48,000 photos of bowls of soup from Ramen Jiro locations, along with labels for each shop, and uploaded them to AutoML Vision. The model took about 24 hours to train, all automatically (although a less accurate, “basic” mode had a model ready in just 18 minutes). The results were impressive: Kenji’s model got 94.5 percentaccuracy on predicting the shop just from the photos.

Confusion matrix of Ramen Jiro shop classifier by AutoML Vision

Confusion matrix of Ramen Jiro shop classifier by AutoML Vision (Advanced mode). Row = actual shop, column = predicted shop. You can see AutoML Vision incorrectly identified the restaurant location in only a couple of instances for each test case.

AutoML Vision is designed for people without ML expertise, but it also speeds things up dramatically for experts. Building a model for ramen classification from scratch would be a time-consuming process requiring multiple steps—labeling, hyperparameter tuning, multiple attempts with different neural net architectures, and even failed training runs—and experience as a data scientist. As Kenji puts it, “With AutoML Vision, a data scientist wouldn’t need to spend a long time training and tuning a model to achieve the best results. This means businesses could scale their AI work even with a limited number of data scientists." We wrote about another recent example of AutoML Vision at work in this Big Data blog post, which also has more technical details on Kenji’s model.


As for how AutoML detects the differences in ramen, it’s certainly not from the taste. Kenji’s first hypothesis was that the model was looking at the color or shape of the bowl or table—but that seems unlikely, since the model was highly accurate even when each shop used the same bowl and table design. Kenji’s new theory is that the model is accurate enough to distinguish very subtle differences between cuts of the meat, or the way toppings are served. He plans on continuing to experiment with AutoML to see if his theories are true. Sounds like a project that might involve more than a few bowls of ramen. Slurp on.

Source: Google Cloud


Using Machine Learning to Discover Neural Network Optimizers



Deep learning models have been deployed in numerous Google products, such as Search, Translate and Photos. The choice of optimization method plays a major role when training deep learning models. For example, stochastic gradient descent works well in many situations, but more advanced optimizers can be faster, especially for training very deep networks. Coming up with new optimizers for neural networks, however, is challenging due to to the non-convex nature of the optimization problem. On the Google Brain team, we wanted to see if it could be possible to automate the discovery of new optimizers, in a way that is similar to how AutoML has been used to discover new competitive neural network architectures.

In “Neural Optimizer Search with Reinforcement Learning”, we present a method to discover optimization methods with a focus on deep learning architectures. Using this method we found two new optimizers, PowerSign and AddSign, that are competitive on a variety of different tasks and architectures, including ImageNet classification and Google’s neural machine translation system. To help others benefit from this work we have made the optimizers available in Tensorflow.

Neural Optimizer Search makes use of a recurrent neural network controller which is given access to a list of simple primitives that are typically relevant for optimization. These primitives include, for example, the gradient or the running average of the gradient and lead to search spaces with over 1010 possible combinations. The controller then generates the computation graph for a candidate optimizer or update rule in that search space.

In our paper, proposed candidate update rules (U) are used to train a child convolutional neural network on CIFAR10 for a few epochs and the final validation accuracy (R) is fed as a reward to the controller. The controller is trained with reinforcement learning to maximize the validation accuracies of the sampled update rules. This process is illustrated below.
An overview of Neural Optimizer Search using an iterative process to discover new optimizers.
Interestingly, the optimizers we have found are interpretable. For example, in the PowerSign optimizer we are releasing, each update compares the sign of the gradient and its running average, adjusting the step size according to whether those two values agree. The intuition behind this is that if these values agree, one is more confident in the direction of the update, and thus the step size can be larger. We also discovered a simple learning rate decay scheme, linear cosine decay, which we found can lead to faster convergence.
Graph comparing learning rate decay functions for linear cosine decay, stepwise decay and cosine decay.
Neural Optimizer Search found several optimizers that outperform commonly used optimizers on the small ConvNet model. Among the ones that transfer well to other tasks, we found that PowerSign and AddSign improve top-1 and top-5 accuracy of a state-of-the-art ImageNet mobile-sized model by up to 0.4%. They also work well on Google’s Neural Machine Translation system, giving an improvement of up to 0.7 using bilingual evaluation metrics (BLEU) on an English to German translation task.

We are excited that Neural Optimizer Search can not only improve the performance of machine learning models but also potentially lead to new, interpretable equations and discoveries. It is our hope that open sourcing these optimizers in Tensorflow will be useful to machine learning practitioners.

Source: Google AI Blog


Using Machine Learning to Discover Neural Network Optimizers



Deep learning models have been deployed in numerous Google products, such as Search, Translate and Photos. The choice of optimization method plays a major role when training deep learning models. For example, stochastic gradient descent works well in many situations, but more advanced optimizers can be faster, especially for training very deep networks. Coming up with new optimizers for neural networks, however, is challenging due to to the non-convex nature of the optimization problem. On the Google Brain team, we wanted to see if it could be possible to automate the discovery of new optimizers, in a way that is similar to how AutoML has been used to discover new competitive neural network architectures.

In “Neural Optimizer Search with Reinforcement Learning”, we present a method to discover optimization methods with a focus on deep learning architectures. Using this method we found two new optimizers, PowerSign and AddSign, that are competitive on a variety of different tasks and architectures, including ImageNet classification and Google’s neural machine translation system. To help others benefit from this work we have made the optimizers available in Tensorflow.

Neural Optimizer Search makes use of a recurrent neural network controller which is given access to a list of simple primitives that are typically relevant for optimization. These primitives include, for example, the gradient or the running average of the gradient and lead to search spaces with over 1010 possible combinations. The controller then generates the computation graph for a candidate optimizer or update rule in that search space.

In our paper, proposed candidate update rules (U) are used to train a child convolutional neural network on CIFAR10 for a few epochs and the final validation accuracy (R) is fed as a reward to the controller. The controller is trained with reinforcement learning to maximize the validation accuracies of the sampled update rules. This process is illustrated below.
An overview of Neural Optimizer Search using an iterative process to discover new optimizers.
Interestingly, the optimizers we have found are interpretable. For example, in the PowerSign optimizer we are releasing, each update compares the sign of the gradient and its running average, adjusting the step size according to whether those two values agree. The intuition behind this is that if these values agree, one is more confident in the direction of the update, and thus the step size can be larger. We also discovered a simple learning rate decay scheme, linear cosine decay, which we found can lead to faster convergence.
Graph comparing learning rate decay functions for linear cosine decay, stepwise decay and cosine decay.
Neural Optimizer Search found several optimizers that outperform commonly used optimizers on the small ConvNet model. Among the ones that transfer well to other tasks, we found that PowerSign and AddSign improve top-1 and top-5 accuracy of a state-of-the-art ImageNet mobile-sized model by up to 0.4%. They also work well on Google’s Neural Machine Translation system, giving an improvement of up to 0.7 using bilingual evaluation metrics (BLEU) on an English to German translation task.

We are excited that Neural Optimizer Search can not only improve the performance of machine learning models but also potentially lead to new, interpretable equations and discoveries. It is our hope that open sourcing these optimizers in Tensorflow will be useful to machine learning practitioners.

The fight against illegal deforestation with TensorFlow

Editor’s Note: Rainforest Connection is using technology to protect the rainforest. Founder and CEO Topher White shares how TensorFlow, Google’s open-source machine learning framework, aids in their efforts.

For me, growing up in the 80s and 90s, the phrase “Save the Rainforest” was a directive that barely progressed over the years. The appeal was clear, but the threat was abstract and distant. And the solution (if there was one) seemed difficult to grasp. Since then, other worries—even harder to grasp in their immediacy and scope—have come to dominate our conversations: climate change, as an example.

So many of us believe that technology has a crucial role to play in fighting climate change, but few are as aware that “Saving the Rainforest” and fighting climate change are nearly one and the same. By the numbers, destruction of forests accounts for nearly one-fifth of all greenhouse gas emissions every year. And in the tropical rainforest deforestation accelerated on the heels of rampant logging—up to 90 percent of which is done illegally and under the radar.

Stopping illegal logging and protecting the world’s rainforests may be the fastest, cheapest way for humanity to slow climate change. And who’s best suited to protect the rainforest? The locals and the indigenous tribes that have lived there for generations.

Rainforest Connection is a group of engineers and developers focused on building technology to help locals—like the Tembé tribe from central Amazon—protect their land, and in the process, protect the rest of us from the effects of climate change. Chief Naldo Tembé reached out to me a couple years ago seeking to collaborate on ways technology could help stop illegal loggers from destroying their land. Together, we embarked on an ambitious plan to address this issue using recycled cell phones and machine learning.

Our team has built the world’s first scalable, real-time detection and alert system for logging and environmental conservation in the rainforest. Building hardware that will survive in the rainforest is challenging, but we’re using what’s already there: the trees. We’ve hidden modified smartphones powered with solar panels—called “Guardian” devices—in trees in threatened areas, and continuously monitor the sounds of the forest, sending all audio up to our cloud-based servers over the standard, local cell-­phone network.

Once the audio is in the cloud, we use TensorFlow, Google’s machine learning framework, to analyze all the auditory data in real-time and listen for chainsaws, logging trucks and other sounds of illegal activity that can help us pinpoint problems in the forest. Audio pours in constantly from every phone, 24 hours a day, every day, and the stakes of missed detections are high.

That’s why we’ve come to use TensorFlow, due to its ability to analyze every layer of our data heavy detection process. The versatility of the machine learning framework empowers us to use a wide range of AI techniques with Deep Learning on one unified platform. This allows us to tweak our audio inputs and improve detection quality. Without the help of machine learning, this process is impossible. When fighting deforestation, every improvement can mean one more saved tree.

The next step is to involve those who will inherit the planet from us: kids. Today, we’re launching the “Planet Guardians” program with hundreds of students from Los Angeles STEM science programs. These students will speak with the local Tembé Tribe through Google Hangouts, and build their own Guardian devices to be sent to the Amazon. We expect that the Guardian devices built by these LA students will protect nearly 100,000 acres through the year 2020.

With technology, and the possibilities that TensorFlow opens up for us, Rainforest Connection will continue to do our part in the fight against climate change. And programs like “Planet Guardians” will ensure that the next generation is a part of this fight, too.

Making music using new sounds generated with machine learning

Technology has always played a role in inspiring musicians in new and creative ways. The guitar amp gave rock musicians a new palette of sounds to play with in the form of feedback and distortion. And the sounds generated by synths helped shape the sound of electronic music. But what about new technologies like machine learning models and algorithms? How might they play a role in creating new tools and possibilities for a musician’s creative process? Magenta, a research project within Google, is currently exploring answers to these questions.

Building upon past research in the field of machine learning and music, last year Magenta released NSynth (Neural Synthesizer). It’s a machine learning algorithm that uses deep neural networks to learn the characteristics of sounds, and then create a completely new sound based on these characteristics. Rather than combining or blending the sounds, NSynth synthesizes an entirely new sound using the acoustic qualities of the original sounds—so you could get a sound that’s part flute and part sitar all at once.

Since then, Magenta has continued to experiment with different musical interfaces and tools to make the algorithm more easily accessible and playable. As part of this exploration, Google Creative Lab and Magenta collaborated to create NSynth Super. It’s an open source experimental instrument which gives musicians the ability to explore new sounds generated with the NSynth algorithm.

Making music using new sounds generated with machine learning

To create our prototype, we recorded 16 original source sounds across a range of 15 pitches and fed them into the NSynth algorithm. The outputs, over 100,000 new sounds, were then loaded into NSynth Super to precompute the new sounds. Using the dials, musicians can select the source sounds they would like to explore between, and drag their finger across the touchscreen to navigate the new, unique sounds which combine their acoustic qualities. NSynth Super can be played via any MIDI source, like a DAW, sequencer or keyboard.

03. NSynth-Super-Bathing_2880x1800.jpg

Part of the goal of Magenta is to close the gap between artistic creativity and machine learning. It’s why we work with a community of artists, coders and machine learning researchers to learn more about how machine learning tools might empower creators. It’s also why we create everything, including NSynth Super, with open source libraries, including TensorFlow and openFrameworks. If you’re maker, musician, or both, all of the source code, schematics, and design templates are available for download on GitHub.

04. Open-NSynth-Super-Parts-2880x1800.jpg

New sounds are powerful. They can inspire musicians in creative and unexpected ways, and sometimes they might go on to define an entirely new musical style or genre. It’s impossible to predict where the new sounds generated by machine learning tools might take a musician, but we're hoping they lead to even more musical experimentation and creativity.


Learn more about NSynth Super at g.co/nsynthsuper.

Understanding the inner workings of neural networks

Neural networks are a powerful approach to machine learning, allowing computers to understand images, recognize speech, translate sentences, play Go, and much more. As much as we’re using neural networks in our technology at Google, there’s more to learn about how these systems accomplish these feats. For example, neural networks can learn how to recognize images far more accurately than any program we directly write, but we don’t really know how exactly they decide whether a dog in a picture is a Retriever, a Beagle, or a German Shepherd.

We’ve been working for several years to better grasp how neural networks operate. Last week we shared new research on how these techniques come together to give us a deeper understanding of why networks make the decisions they do—but first, let’s take a step back to explain how we got here.

Neural networks consist of a series of “layers,” and their understanding of an image evolves over the course of multiple layers. In 2015, we started a project called DeepDream to get a sense of what neural networks “see” at the different layers. Itled to a much larger research project that would not only develop beautiful art, but also shed light on the inner workings of neural networks.

pasted image 0 (11).png
Outside Google, DeepDream grew into a small art movement producing all sorts of amazing things.

Last year, we shared new work on this subject, showing how techniques building on DeepDream—and lots of excellent research from our colleagues around the world—can help us explore how neural networks build up their understanding of images. We showed that neural networks build on previous layers to detect more sophisticated ideas and eventually reach complex conclusions. For instance, early layers detect edges and textures of images, but later layers progress to detecting parts of objects.

pasted image 0 (12).png
The neural network first detects edges, then textures, patterns, parts, and objects.

Last week we released another milestone in our research: an exploration of how different techniques for understanding neural networks fit together into a bigger picture.

This work, which we've published in the online journal Distill, explores how different techniques allow us to “stand in the middle of a neural network” and see how decisions made at an individual point influence a final output. For instance, we can see how a network detects a “floppy ear,” and then that increases the probability that the image will be labeled as a Labrador Retriever or Beagle.

In one example, we explore which neurons activate in response to different inputs—a kind of “MRI for neural networks.” The network has some floppy ear detectors that really like this dog!

SemanticDict2-ezgif-crop.gif

We can also see how different neurons in the middle of the network—like those floppy ear detectors—affect the decision to classify an image as a Labrador Retriever or tiger cat.

pasted image 0 (13).png

If you want to learn more, check out our interactive paper, published in Distill. We’ve also open sourced our neural net visualization library, Lucid, so you can make these visualizations, too.