Tag Archives: ML

ML Kit expands into NLP with Language Identification and Smart Reply

Posted by Christiaan Prins and Max Gubin

Today we are announcing the release of two new features to ML Kit: Language Identification and Smart Reply.

You might notice that both of these features are different from our existing APIs that were all focused on image/video processing. Our goal with ML Kit is to offer powerful but simple-to-use APIs to leverage the power of ML, independent of the domain. As such, we are excited to expand ML Kit with solutions for Natural Language Processing (NLP)!

NLP is a category of ML that deals with analyzing and generating text, speech, and other kinds of natural language data. We're excited to start out with two APIs: one that helps you identify the language of text, and one that generates reply suggestions in chat applications. Both of these features work fully on-device and are available on the latest version of the ML Kit SDK, on iOS (9.0 and higher) and Android (4.1 and higher).

Generate reply suggestions based on previous messages

A new feature popping up in messaging apps is to provide the user with a selection of suggested responses, either as actions on a notification or inside the app itself. This can really help a user to quickly respond when they are busy or a handy way to initiate a longer message.

With the new Smart Reply API you can now quickly achieve the same in your own apps. The API provides suggestions based on the last 10 messages in a conversation, although it still works if only one previous message is available. It is a stateless API that fully runs on-device, so we don't keep message history in memory nor send it to a server.

textPlus app providing response suggestions using Smart Reply

We have worked closely with partners like textPlus to ensure Smart Reply is ready for prime time and they have now implemented in-app response suggestions with the latest version of their app (screenshot above).

Adding Smart Reply to your own app is done with a simple function call (using Swift in this example):

let smartReply = NaturalLanguage.naturalLanguage().smartReply()
smartReply.suggestReplies(for: conversation) { result, error in
    guard error == nil, let result = result else {
        return
    }
    if (result.status == .success) {
        for suggestion in result.suggestions {
            print("Suggested reply: \(suggestion.text)")
        }
    }
}

After you initialize a Smart Reply instance, call suggestReplies with a list of recent messages. The callback provides the result which contains a list of suggestions.

For details on how to use the Smart Reply API, check out the documentation.

Tell me more ...

Although as a developer, you can just pick up this new API and easily get it integrated in your app, it may be interesting to reveal a bit on how it works under the hood. At the core of Smart Reply is a machine-learned model that is executed using TensorFlow Lite and has a state-of-the-art modern architecture based on SentencePiece text encoding[1] and Transformer[2].

However, as we realized when we started development of the API, the core suggestion model is not all that's needed to provide a solution that developers can use in their apps. For example, we added a model to detect sensitive topics, so that we avoid making suggestions in response to profanity or in cases of personal tragedy/hardship. Also, we included language identification, to ensure we do not provide suggestions for languages the core model is not trained on. The Smart Reply feature is launching with English support first.

Identify the language of a piece of text

The language of a given text string is a subtle but helpful piece of information. A lot of apps have functionality with a dependency on the language: you can think of features like spell checking, text translation or Smart Reply. Rather than asking a user to specify the language they use, you can use our new Language Identification API.

ML Kit recognizes text in 103 different languages and typically only requires a few words to make an accurate determination. It is fast as well, typically providing a response within 1 to 2 ms across iOS and Android phones.

Similar to the Smart Reply API, you can identify the language with a function call (using Swift in this example):

let languageId = NaturalLanguage.naturalLanguage().languageIdentification()
languageId.identifyLanguage(for: "¿Cómo estás?") { languageCode, error in
  guard error == nil, let languageCode = languageCode else {
    print("Failed to identify language with error: \(error!)")
    return
  }

  print("Identified Language: \(languageCode)")
}

The identifyLanguage functions takes a piece of a text and its callback provides a BCP-47 language code. If no language can be confidently recognized, ML Kit returns a code of und for undetermined. The Language Identification API can also provide a list of possible languages and their confidence values.

For details on how to use the Language Identification API, check out the documentation.

Get started today

We're really excited to expand ML Kit to include Natural Language APIs. Give the two new NLP APIs a spin today and let us know what you think! You can always reach us in our Firebase Talk Google Group.

As ML Kit grows we look forward to adding more APIs and categories that enables you to provide smarter experiences for your users. With that, please keep an eye out for some exciting ML Kit announcements at Google I/O.

Using Deep Learning to Improve Usability on Mobile Devices



Tapping is the most commonly used gesture on mobile interfaces, and is used to trigger all kinds of actions ranging from launching an app to entering text. While the style of clickable elements (e.g., buttons) in traditional desktop graphical user interfaces is often conventionally defined, on mobile interfaces it can still be difficult for people to distinguish tappable versus non-tappable elements due to the diversity of styles. This confusion can lead to false affordances (e.g., a feature that could be mistaken for a button) and a lack of discoverability that can lead to user frustration, uncertainty, and errors. To avoid this, interface designers can conduct a study or a visual affordance test to help clarify the tappability of items in their interfaces. However, such studies are time-consuming and their findings are often limited to a specific app or interface design.

In our CHI'19 paper, "Modeling Mobile Interface Tappability Using Crowdsourcing and Deep Learning", we introduced an approach for modeling the usability of mobile interfaces at scale. We crowdsourced a task to study UI elements across a range of mobile apps to measure the perceived tappability by a user. Our model predictions were consistent with the user group at the ~90% level, demonstrating that a machine learning model can be effectively used to estimate the perceived tappability of interface elements in their design without the need for expensive and time consuming user testing.
Predicting Tappability with Deep Learning
Designers often use visual properties such as the color or depth of an element to signify its availability for interaction on interfaces, e.g., the blue color and underline of a link. While these common signifiers are useful, it is not always clear when to apply them in each specific design setting. Furthermore, with design trends evolving, traditional signifiers are constantly being altered and challenged, potentially causing user uncertainty and mistakes.

To understand how users perceive this changing landscape, we analyzed the potential signifiers affecting tappability in real mobile apps—element type (e.g., check boxes, text boxes, etc.), location, size, color, and words. We started by crowdsourcing volunteers to label the perceived clickability of ~20,000 unique interface elements from ~3,500 apps. With the exception of text boxes, type signifiers yielded low uncertainty in user perceived tappability. The location signifier refers to the position of a feature on the screen and is informed by the common layout design in mobile apps, as demonstrated in the figure below.
Heatmaps displaying the accuracy of tappable and non-tappable elements by location, where warmer colors represent areas of higher accuracy. Users labeled non-tappable elements more accurately towards the upper center of the interface, and tappable elements towards the bottom center of the interface.
The impact of element size was relatively weak, but did indicate confusion in the case of large non-tappable elements. Users showed a tendency to bright colors and short word counts for tappable elements, though word semantics also played a significant role.

We used these labels to train a simple deep neural network that predicts the likelihood that a user will perceive an interface element as tappable versus non-tappable. For a given element of the interface, the model uses a range of features, including the spatial context of the element on the screen (location), the semantics and functionality of the element (words and type), and the visual appearance (size as well as raw pixels). The neural network model applies a convolutional neural network (CNN) to extract features from raw pixels, and uses learned semantic embeddings to represent text content and element properties. The concatenation of all these features are then fed to a fully-connected network layer, the output of which produces a binary classification of an element's tappability.

Evaluation of the Model
The model allowed us to automatically diagnose mismatches between the tappability of each interface element as perceived by a user—predicted by our model—and the intended or actual tappable state of the element specified by the developer or designer. In the example below, our model predicts that there is a 73% chance that a user would think the labels such as "Followers" or "Following" are tappable, while these interface elements are in fact not programmed to be tappable.
To understand how our model behaves compared to human users, particularly when there is ambiguity in human perception, we generated a second, independent dataset by crowdsourcing an effort among 290 volunteers to label each of 2,000 unique interface elements with respect to their perceived tappability. Each element was labeled independently by five different users. We found that more than 40% of the elements in our sample were labeled inconsistently by volunteers. Our model matches this uncertainty in human perception quite well, as demonstrated in the figure below.
The scatterplot of the tappability probability predicted by the model (the Y axis) versus the consistency in the human user labels (the X axis) for each element in the consistency dataset.
When users agree an element's tappability, our model tends to give a more definite answer—a probability close to 1 for tappable and close to 0 for not tappable. When workers are less consistent on an element (towards the middle of the X axis), our model is also less certain about the decision. Overall, our model achieved reasonable accuracy of matching human perception in identifying tappable UI elements with a mean precision of 90.2% and recall of 87.0%.

Predicting tappability is merely one example of what we can do with machine learning to solve usability issues in user interfaces. There are many other challenges in interaction design and user experience research where deep learning models can offer a vehicle to distill large, diverse user experience datasets and advance scientific understandings about interaction behaviors.

Acknowledgements
This research was a joint work of Amanda Swangson, summer intern at Google, and Yang Li, a Research Scientist in Deep Learning and Human Computer Interaction.

Source: Google AI Blog


A Summary of the Google Flood Forecasting Meets Machine Learning Workshop



Recently, we hosted the Google Flood Forecasting Meets Machine Learning workshop in our Tel Aviv office, which brought hydrology and machine learning experts from Google and the broader research community to discuss existing efforts in this space, build a common vocabulary between these groups, and catalyze promising collaborations. In line with our belief that machine learning has the potential to significantly improve flood forecasting efforts and help the hundreds of millions of people affected by floods every year, this workshop discussed improving flood forecasting by aggregating and sharing large data sets, automating calibration and modeling processes, and applying modern statistical and machine learning tools to the problem.

Panel on challenges and opportunities in flood forecasting, featuring (from left to right): Prof. Paolo Burlando (ETH Zürich), Dr. Tyler Erickson (Google Earth Engine), Dr. Peter Salamon (Joint Research Centre) and Prof. Dawei Han (University of Bristol).
The event was kicked off by Google's Yossi Matias, who discussed recent machine learning work and its potential relevance for flood forecasting, crisis response and AI for Social Good, followed by two introductory sessions aimed at bridging some of the knowledge gap between the two fields - introduction to hydrology for computer scientists by Prof. Peter Molnar of ETH Zürich, and introduction to machine learning for hydrologists by Prof. Yishay Mansour of Tel Aviv University and Google

Included in the 2-day event was a wide range of fascinating talks and posters across the flood forecasting landscape, from both hydrologic and machine learning points of view.

An overview of research areas in flood forecasting addressed in the workshop.
Presentations from the research community included:
Alongside these talks, we presented the various efforts across Google to try and improve flood forecasting and foster collaborations in the field, including:
Additionally, at this workshop we piloted an experimental "ML Consultation" panel, where Googlers Gal Elidan, Sasha Goldshtein and Doron Kukliansky gave advice on how to best use machine learning in several hydrology-related tasks. Finally, we concluded the workshop with a moderated panel on the greatest challenges and opportunities in flood forecasting, with hydrology experts Prof. Paolo Burlando of ETH Zürich, Prof. Dawei Han of the University of Bristol, Dr. Peter Salamon of the Joint Research Centre and Dr. Tyler Erickson of Google Earth Engine.
Flood forecasting is an incredibly important and challenging task that is one part of our larger AI for Social Good efforts. We believe that effective global-scale solutions can be achieved by combining modern techniques with the domain expertise already existing in the field. The workshop was a great first step towards creating much-needed understanding, communication and collaboration between the flood forecasting community and the machine learning community, and we look forward to our continued engagement with the broad research community to tackle this challenge.

Acknowledgements
We would like to thank Avinatan Hassidim, Carla Bromberg, Doron Kukliansky, Efrat Morin, Gal Elidan, Guy Shalev, Jennifer Ye, Nadav Rabani and Sasha Goldshtein for their contributions to making this workshop happen.

Source: Google AI Blog


This is the Future of Finance

Posted by Roy Glasberg, Head of Launchpad

Launchpad's mission is to accelerate innovation and to help startups build world-class technologies by leveraging the best of Google - its people, network, research, and technology.

In September 2018, the Launchpad team welcomed ten of the world's leading FinTech startups to join their accelerator program, helping them fast-track their application of advanced technology. Today, March 15th, we will see this cohort graduate from the program at the Launchpad team's inaugural event - The Future of Finance - a global discussion on the impact of applied ML/AI on the finance industry. These startups are ensuring that everyone has relevant insights at their fingertips and that all people, no matter where they are, have access to equitable money, banking, loans, and marketplaces.

Tune into the event from wherever you are via the livestream link

The Graduating Class of Launchpad FinTech Accelerator San Francisco'19

  • Alchemy (USA), bridging blockchain and the real world
  • Axinan (Singapore), providing smart insurance for the digital economy
  • Aye Finance (India), transforming financing in India
  • Celo (USA), increasing financial inclusion through a mobile-first cryptocurrency
  • Frontier Car Group (Germany), investing in the transformation of used-car marketplaces
  • GO-JEK (Indonesia), improving the welfare and livelihoods of informal sectors
  • GuiaBolso (Brazil), improving the financial lives of Brazilians
  • JUMO (South Africa), creating a transparent, fair money marketplace for mobile users to access loans
  • m.Paani (India), (em)powering local retailers and the next billion users in India
  • Starling Bank (UK), improving financial health with a 100% mobile-only bank

Since joining the accelerator, these startups have made great strides and are going from strength to strength. Some recent announcements from this cohort include:

  • JUMO have announced the launch of Opportunity Co, a 500M fund for credit where all the profits go back to the customers.
  • The team at Aye Finance have just closed $30m in Series D equity round.
  • Starling Bank has provided 150 new jobs in Southampton and have received a £100m grant from a fund aimed at increasing competition and innovation in the British banking sector, and also a £75m fundraise.
  • GuiaBolso ran a campaign to pay the bills of some its users (the beginning of the year in Brazil is a time of high expenses and debts) and is having a significant impact on credit with 80% of cases seeing interest rates on loans being cheaper than traditional banks.

We look forward to following the success of all our participating founders as they continue to make a significant impact on the global economy.

Want to know more about the Launchpad Accelerator? Visit our site, stay updated on developments and future opportunities by subscribing to the Google Developers newsletter and visit The Launchpad Blog.

Introducing Class II of Launchpad Accelerator India

https://lh6.googleusercontent.com/Gxl43TzIBGARTYC9VQmiY_1cbFn2_NSAuh0wL9GlaDG-dyr9P2hrFQuABDoN1ZrmVJuvTE8o4zfVEA87UgVveiHwJ00j_br_8Nxbe53FxqxLF6JYoShY3-zbPo75g0Qo8z8ceU4f
In December 2018, we opened applications for Class II of Launchpad Accelerator India and are thrilled to announce the start of the new class.
The second batch of the Launchpad Accelerator India announced
Similar to Class I, these 10 incredible startups will get access to the best of Google -- including mentorship from Google teams and industry experts, free support, cloud credits, and more. These startups will undergo an intensive 1-week mentorship bootcamp in March, followed by more engagements in April and May.  


At the bootcamp, they will meet with mentors both from Google and subject matter experts from the industry to set their goals for the upcoming three months. During the course of the program, the startups will receive insights and support on advanced technologies such as ML, in-depth design sprints for specifically identified challenges, guidance on focused tech projects, networking opportunities at industry events, and much more.


The first class kicks off today in Bangalore.


Meet the 10 startups of Class II:


(1) Opentalk Pvt Ltd: An app to talk to new people around the world, become a better speaker and make new friends
(2) THB: Helping healthcare providers organize and standardize healthcare information to drive clinical and commercial analytical applications and use cases
(3) Perceptiviti Data Solutions: An AI platform for insurance claim flagging, payment integrity, and fraud and abuse management
(4) DheeYantra: A cognitive conversational AI for Indian vernacular languages
(5) Kaleidofin: Customized financial solutions that combine multiple financial products such as savings, credit, and insurance in intuitive ways, to help customers achieve their real-life goals
(6) FinancePeer: A P2P lending company that connects lenders with borrowers online
(7) SmartCoin: An app for providing credit access to the vastly underserved lower- and middle-income segments through advanced AI/ML models
(8) HRBOT: Using AI and video analytics to find employable candidates in tier 2 and tier 3 cities, remotely.
(9) Savera.ai: A service that remotely maps your roof and helps you make an informed decision about having a solar panel, followed by chatbot-based support to help you learn about solar tech while enabling connections to local service providers
(10) Adiuvo Diagnostics: A rapid wound infection assessment and management device

By Paul Ravindranath, Program Manager, Launchpad Accelerator India

Introduction to Fairness in Machine Learning

Posted by Andrew Zaldivar, Developer Advocate, Google AI

A few months ago, we announced our AI Principles, a set of commitments we are upholding to guide our work in artificial intelligence (AI) going forward. Along with our AI Principles, we shared a set of recommended practices to help the larger community design and build responsible AI systems.

In particular, one of our AI Principles speaks to the importance of recognizing that AI algorithms and datasets are the product of the environment—and, as such, we need to be conscious of any potential unfair outcomes generated by an AI system and the risk it poses across cultures and societies. A recommended practice here for practitioners is to understand the limitations of their algorithm and datasets—but this is a problem that is far from solved.

To help practitioners take on the challenge of building fairer and more inclusive AI systems, we developed a short, self-study training module on fairness in machine learning. This new module is part of our Machine Learning Crash Course, which we highly recommend taking first—unless you know machine learning really well, in which case you can jump right into the Fairness module.

The Fairness module features a hands-on technical exercise. This exercise demonstrates how you can use tools and techniques that may already exist in your development stack (such as Facets Dive, Seaborn, pandas, scikit-learn and TensorFlow Estimators to name a few) to explore and discover ways to make your machine learning system fairer and more inclusive. We created our exercise in a Colaboratory notebook, which you are more than welcome to use, modify and distribute for your own purposes.

From exploring datasets to analyzing model performance, it's really easy to forget to make time for responsible reflection when building an AI system. So rather than having you run every code cell in sequential order without pause, we added what we call FairAware tasks throughout the exercise. FairAware tasks help you zoom in and out of the problem space. That way, you can remind yourself of the big picture: finding the undesirable biases that could disproportionately affect model performance across groups. We hope a process like FairAware will become part of your workflow, helping you find opportunities for inclusion.

FairAware task guiding practitioner to compare performances across gender.

The Fairness module was created to provide you with enough of an understanding to get started in addressing fairness and inclusion in AI. Keep an eye on this space for future work as this is only the beginning.

If you wish to learn more from our other examples, check out the Fairness section of our Responsible AI Practices guide. There, you will find a full set of Google recommendations and resources. From our latest research proposal on reporting model performance with fairness and inclusion considerations, to our recently launched diagnostic tool that lets anyone investigate trained models for fairness, our resource guide highlights many areas of research and development in fairness.

Let us know what your thoughts are on our Fairness module. If you have any specific comments on the notebook exercise itself, then feel free to leave a comment on our GitHub repo.


On behalf of many contributors and supporters,

Andrew Zaldivar – Developer Advocate, Google AI

Improving Search for the next 20 years

https://storage.googleapis.com/gweb-uniblog-publish-prod/images/BLR_-_Koshys_1_1.max-1000x1000.jpg
Growing up in India, there was one good library in my town that I had access to—run by the British Council.  It was modest by western standards, and I had to take two buses just to get there. But I was lucky, because for every child like me, there were many more who didn’t have access to the same information that I did. Access to information changed my life, bringing me to the U.S. to study computer science and opening up huge possibilities for me that would not have been available without the education I had.
Ben's library
The British Council Library in my hometown.


When Google started 20 years ago, our mission was to organize the world’s information and make it universally accessible and useful. That seemed like an incredibly ambitious mission at the time—even considering that in 1998 the web consisted of just 25 million pages (roughly the equivalent of books in a small library).
Fast forward to today, and now we index hundreds of billions of pages in our index—more information than all the libraries in the world could hold. We’ve grown to serve people all over the world, offering Search in more than 150 languages and over 190 countries.
Through all of this, we’ve remained grounded in our mission. In fact, providing greater access to information is as core to our work today as it was when we first started. And while almost everything has changed about technology and the information available to us, the core principles of Search have stayed the same.
  • First and foremost, we focus on the user. Whether you’re looking for recipes, studying for an exam, or finding information on where to vote, we’re focused on serving your information needs.
  • We strive to give you the most relevant, highest quality information as quickly as possible. This was true when Google started with the Page Rank algorithm—the foundational technology to Search. And it’s just as true today.
  • We see billions of queries every day, and 15 percent of queries are ones we’ve never seen before. Given this scale, the only way to provide Search effectively is through an algorithmic approach. This helps us not just solve all the queries we’ve seen yesterday, but also all the ones we can’t anticipate for tomorrow.
  • Finally, we rigorously test every change we make. A key part of this testing is the rater guidelines which define our goals in search, and which are publicly available for anyone to see. Every change to Search is evaluated by experimentation and by raters using these guidelines. Last year alone, we ran more than 200,000 experiments that resulted in 2,400+ changes to search. Search will serve you better today than it did yesterday, and even better tomorrow.
As Google marks our 20th anniversary, I wanted to share a first look at the next chapter of Search, and how we’re working to make information more accessible and useful for people everywhere. This next chapter is driven by three fundamental shifts in how we think about Search:
    Underpinning each of these are our advancements in AI, improving our ability to understand language in ways that weren’t possible when Google first started. This is incredibly exciting, because over 20 years ago when I studied neural nets at school, they didn’t actually work very well...at all!
    But we’ve now reached the point where neural networks can help us take a major leap forward from understanding words to understanding concepts. Neural embeddings, an approach developed in the field of neural networks, allow us to transform words to fuzzier representations of the underlying concepts, and then match the concepts in the query with the concepts in the document. We call this technique neural matching. This can enable us to address queries like: “why does my TV look strange?” to surface the most relevant results for that question, even if the exact words aren’t contained in the page. (By the way, it turns out the reason is called the soap opera effect).
    Finding the right information about my TV is helpful in the moment. But AI can have much more profound effects. Whether it’s predicting areas that might be affected in a flood, or helping you identify the best job opportunities for you, AI can dramatically improve our ability to make information more accessible and useful.
    I’ve worked on Search at Google since the early days of its existence. One of the things that keeps me so inspired about Search all these years is our mission and how timeless it is. Providing greater access to information is fundamental to what we do, and there are always more ways we can help people access the information they need. That’s what pushes us forward to continue to make Search better for our users. And that’s why our work here is never done.

    Posted by Ben Gomes, VP, Search, News and Assistant

    Keeping people safe with AI-enabled flood forecasting

    https://storage.googleapis.com/gweb-uniblog-publish-prod/original_images/Flood_Forecast.gif
    For 20 years, Google Search has provided people with the information they need, and in times of crisis, access to timely, actionable information is often crucial. Last year we launched SOS Alerts on Search and Maps to make emergency information more accessible. Since then, we’ve activated SOS Alerts in more than 200 crisis situations, in addition to tens of thousands of Google Public Alerts, which have been viewed more than 1.5 billion times.
    Floods are devastating natural disasters worldwide—it’s estimated that every year, 250 million people around the world are affected by floods, also costing billions of dollars in damages. Flood forecasting can help individuals and authorities better prepare to keep people safe, but accurate forecasting isn’t currently available in many areas. And the warning systems that do exist can be imprecise and non-actionable, resulting in far too many people being underprepared and under informed before a flood happens.
    To help improve awareness of impending floods, we're using AI and significant computational power to create better forecasting models that predict when and where floods will occur, and incorporating that information into Google Public Alerts. A variety of elements—from historical events, to river level readings, to the terrain and elevation of a specific area—feed into our models. From there, we generate maps and run up to hundreds of thousands of simulations in each location. With this information, we’ve created river flood forecasting models that can more accurately predict not only when and where a flood might occur, but the severity of the event as well.
    flood forecast
    These images depict a flood simulation of a river in Hyderabad, India. The left side uses publicly available data while the right side uses Google data and technology. Our models contain higher resolution, accuracy, and up-to-date information.


    We started these flood forecasting efforts in India, where 20 percent of global flood-related fatalities occur. We’re partnering with India’s Central Water Commission to get the data we need to roll out early flood warnings, starting with the Patna region. The first alert went out earlier this month after heavy rains in the region.
    alert
    Flood alert shown to users in the Patna region.


    We’re also looking to expand coverage to more countries, to help more people around the world get access to these early warnings, and help keep them informed and safe.

    Posted by Yossi Matias, VP, Engineering

    Cloud AutoML: Making AI accessible to every business

    https://img.youtube.com/vi/GbLQE2C181U/maxresdefault.jpg
    When we both joined Google Cloud just over a year ago, we embarked on a mission to democratize AI. Our goal was to lower the barrier of entry and make AI available to the largest possible community of developers, researchers and businesses.


    Our Google Cloud AI team has been making good progress towards this goal. In 2017, we introduced Google Cloud Machine Learning Engine, to help developers with machine learning expertise easily build ML models that work on any type of data, of any size. We showed how modern machine learning services, i.e., APIs—including Vision, Speech, NLP, Translation and Dialogflow—could be built upon pre-trained models to bring unmatched scale and speed to business applications. Kaggle, our community of data scientists and ML researchers, has grown to more than 1 million members. And today, more than 10,000 businesses are using Google Cloud AI services, including companies like Box, Rolls Royce Marine, Kewpie, and Ocado.


    But there’s much more we can do. Currently, only a handful of businesses in the world have access to the talent and budgets needed to fully appreciate the advancements of ML and AI. There’s a very limited number of people that can create advanced machine learning models. And if you’re one of the companies that has access to ML/AI engineers, you still have to manage the time-intensive and complicated process of building your own custom ML model. While Google has offered pre-trained machine learning models via APIs that perform specific tasks, there's still a long road ahead if we want to bring AI to everyone.


    To close this gap, and to make AI accessible to every business, we’re introducing Cloud AutoML. Cloud AutoML helps businesses with limited ML expertise start building their own high-quality custom models by using advanced techniques like learning2learn and transfer learning from Google. We believe Cloud AutoML will make AI experts even more productive, advance new fields in AI, and help less-skilled engineers build powerful AI systems they previously only dreamed of.


    Our first Cloud AutoML release will be Cloud AutoML Vision, a service that makes it faster and easier to create custom ML models for image recognition. Its drag-and-drop interface lets you easily upload images, train and manage models, and then deploy those trained models directly on Google Cloud. Early results using Cloud AutoML Vision to classify popular public datasets like ImageNet and CIFAR have shown more accurate results with fewer misclassifications than generic ML APIs.


    Here’s a little more on what Cloud AutoML Vision has to offer:
    • Increased accuracy: Cloud AutoML Vision is built on Google’s leading image recognition approaches, including transfer learning and neural architecture search technologies. This means you’ll get a more accurate model even if your business has limited machine learning expertise.
    • Faster turnaround time to production-ready models: With Cloud AutoML, you can create a simple model in minutes to pilot your AI-enabled application, or build out a full, production-ready model in as little as a day.
    • Easy to use: AutoML Vision provides a simple graphical user interface that lets you specify data, then turns that data into a high quality model customized for your specific needs.



      Urban Outfitters is constantly looking for new ways to enhance our customers’ shopping experience," says Alan Rosenwinkel, Data Scientist at URBN. "Creating and maintaining a comprehensive set of product attributes is critical to providing our customers relevant product recommendations, accurate search results, and helpful product filters; however, manually creating product attributes is arduous and time-consuming. To address this, our team has been evaluating Cloud AutoML to automate the product attribution process by recognizing nuanced product characteristics like patterns and necklines styles. Cloud AutoML has great promise to help our customers with better discovery, recommendation, and search experiences."


      Mike White, CTO and SVP, for Disney Consumer Products and Interactive Media, says: “Cloud AutoML’s technology is helping us build vision models to annotate our products with Disney characters, product categories, and colors. These annotations are being integrated into our search engine to enhance the impact on Guest experience through more relevant search results, expedited discovery, and product recommendations on shopDisney.”

      And Sophie Maxwell, Conservation Technology Lead at the Zoological Society of London, tells us: "ZSL is an international conservation charity devoted to the worldwide conservation of animals and their habitats. A key requirement to deliver on this mission is to track wildlife populations to learn more about their distribution and better understand the impact humans are having on these species. In order to achieve this, ZSL has deployed a series of camera traps in the wild that take pictures of passing animals when triggered by heat or motion. The millions of images captured by these devices are then manually analysed and annotated and with the relevant species such as elephants, lions, and giraffes, etc., which is a labour-intensive and expensive process. ZSL’s dedicated Conservation Technology Unit has been collaborating closely with Google’s CloudML team to help shape the development of this exciting technology, which ZSL aims to use to automate the tagging of these images—cutting costs, enabling wider-scale deployments, and gaining a deeper understanding of how to conserve the world’s wildlife effectively."


      If you’re interested in trying out AutoML Vision, you can request access via this form.

      AutoML Vision is the result of our close collaboration with Google Brain and other Google AI teams, and is the first of several Cloud AutoML products in development. While we’re still at the beginning of our journey to make AI more accessible, we’ve been deeply inspired by what our 10,000+ customers using Cloud AI products have been able to achieve. We hope the release of Cloud AutoML will help even more businesses discover what’s possible through AI.

      By Jia Li, Head of R&D, Cloud AI, and Fei-Fei Li, Chief Scientist, Cloud AI

      How Machine Learning with TensorFlow Enabled Mobile Proof-Of-Purchase at Coca-Cola

      In this guest editorial, Patrick Brandt of The Coca-Cola Company tells us how they're using AI and TensorFlow to achieve frictionless proof-of-purchase.

      Coca-Cola's core loyalty program launched in 2006 as MyCokeRewards.com. The "MCR.com" platform included the creation of unique product codes for every Coca-Cola, Sprite, Fanta, and Powerade product sold in 20oz bottles and cardboard "fridge-packs" purchasable at grocery stores and other retail outlets. Users could enter these product codes at MyCokeRewards.com to participate in promotional campaigns.

      Fast-forward to 2016: Coke's loyalty programs are still hugely popular with millions of product codes having been entered for promotions and sweepstakes. However, mobile browsing went from non-existent in 2006 to over 50% share by the end of 2016. The launch of Coke.com as a mobile-first web experience (replacing MCR.com) was a response to these changes in browsing behavior. Thumb-entering 14-character codes into a mobile device could be a difficult enough user experience to impact the success of our programs. We want to provide our mobile audience the best possible experience, and recent advances in artificial intelligence opened new opportunities.

      The quest for frictionless proof-of-purchase

      For years Coke attempted to use off-the-shelf optical character recognition (OCR) libraries and services to read product codes with little success. Our printing process typically uses low-resolution dot-matrix fonts with the cap or fridge-pack media running under the printhead at very high speeds. All of this translates into a low-fidelity string of characters that defeats off-the-shelf OCR offerings (and can sometimes be hard to read with the human eye as well). OCR is critical to simplifying the code-entry process for mobile users: they should be able to take a picture of a code and automatically have the purchase registered for a promotional entry. We needed a purpose-built OCR system to recognize our product codes.

      Bottlecap and fridge-pack examples

      Our research led us to a promising solution: Convolutional Neural Networks. CNNs are one of a family of "deep learning" neural networks that are at the heart of modern artificial intelligence products. Google has used CNNs to extract street address numbers from StreetView images. CNNs also perform remarkably well at recognizing handwritten digits. These number-recognition use-cases were a perfect proxy for the type of problem we were trying to solve: extracting strings from images that contain small character sets with lots of variance in the appearance of the characters.

      CNNs with TensorFlow

      In the past, developing deep neural networks like CNNs has been a challenge because of the complexity of available training and inference libraries. TensorFlow, a machine learning framework that was open sourced by Google in November 2015, is designed to simplify the development of deep neural networks.

      TensorFlow provides high-level interfaces to different kinds of neuron layers and popular loss functions, which makes it easier to implement different CNN model architectures. The ability to rapidly iterate over different model architectures dramatically reduced the time required to build Coke's custom OCR solution because different models could be developed, trained, and tested in a matter of days. TensorFlow models are also portable: the framework supports model execution natively on mobile devices ("AI on the edge") or in servers hosted remotely in the cloud. This enables a "create once, run anywhere" approach for model execution across many different platforms, including web-based and mobile.

      Machine learning: practice makes perfect

      Any neural network is only as good as the data used to train it. We knew that we needed a large set of labeled product-code images to train a CNN that would achieve our performance goals. Our training set would be built in three phases:

      1. Pre-launch simulated images
      2. Pre-launch real-world images
      3. Images labeled by our users in production

      The pre-launch training phase began by programmatically generating millions of simulated product-code images. These simulated images included variations in tilt, lighting, shadows, and blurriness. The prediction accuracy (i.e. how often all 14 characters were correctly predicted within the top-10 predictions) was at 50% against real-world images when the model was trained using only simulated images. This provided a baseline for transfer-learning: a model initially trained with simulated images was the foundation for a more accurate model that would be trained against real-world images.

      The challenge now turned to enriching the simulated images with enough real-world images to hit our performance goals. We created a purpose-built training app for iOS and Android devices that "trainers" could use to take pictures of codes and label them; these labeled images were then transferred to cloud storage for training. We did a production run of several thousand product codes on bottle caps and fridge-packs and distributed these to multiple suppliers who used the app to create the initial real-world training set.

      Even with an augmented and enriched training set, there is no substitute for images created by end-users in a variety of environmental conditions. We knew that scans would sometimes result in an inaccurate code prediction, so we needed to provide a user-experience that would allow users to quickly correct these predictions. Two components are essential to delivering this experience: a product-code validation service that has been in use since the launch of our original loyalty platform in 2006 (to verify that a predicted code is an actual code) and a prediction algorithm that performs a regression to determine a per-character confidence at each one of the 14 character positions. If a predicted code is invalid, the top prediction as well as the confidence levels for each character are returned to the user interface. Low-confidence characters are visually highlighted to guide the user to update characters that need attention.

      Error correction user interface lets users correct invalid predictions and generate useful training data

      This user interface innovation enables an active learning process: a feedback loop allows the model to gradually improve by returning corrected predictions to the training pipeline. In this way, our users organically improve the accuracy of the character recognition model over time.

      Product-code recognition pipeline

      Optimizing for maximum performance

      To meet user expectations around performance, we established a few ambitious requirements for the product-code OCR pipeline:

      • It had to be fast: we needed a one-second average processing time once the image of the product-code was sent into the OCR pipeline
      • It had to be accurate: our goal was to achieve 95% string recognition accuracy at launch with the guarantee that the model could be improved over time via active learning
      • It had to be small: the OCR pipeline needs to be small enough to be distributed directly to mobile apps and accommodate over-the-air updates as the model improves over time
      • It had to handle diverse product code media: dozens of different combinations of font types, bottlecaps, and cardboard fridge-pack media

      We initially explored an architecture that used a single CNN for all product-code media. This approach created a model that was too large to be distributed to mobile apps and the execution time was longer than desired. Our applied-AI partners at Quantiphi, Inc.began iterating on different model architectures, eventually landing on one that used multiple CNNs.

      This new architecture reduced the model size dramatically without sacrificing accuracy, but it was still on the high end of what we needed in order to support over-the-air updates to mobile apps. We next used TensorFlow's prebuilt quantization module to reduce the model size by reducing the fidelity of the weights between connected neurons. Quantization reduced the model size by a factor of 4, but a dramatic reduction in model size occurred when Quantiphi had a breakthrough using a new approach called SqueezeNet.

      The SqueezeNet model was published by a team of researchers from UC Berkeley and Stanford in November of 2016. It uses a small but highly complex design to achieve accuracy levels on par with much larger models against popular benchmarks such as Imagenet. After re-architecting our character recognition models to use a SqueezeNet CNN, Quantiphi was able to reduce the model size of certain media types by a factor of 100. Since the SqueezeNet model was inherently smaller, a richer feature detection architecture could be constructed, achieving much higher accuracy at much smaller sizes compared to our first batch of models trained without SqueezeNet. We now have a highly accurate model that can be easily updated on remote devices; the recognition success rate of our final model before active learning was close to 96%, which translates into a 99.7% character recognition accuracy (just 3 misses for every 1000 character predictions).

      Valid product-code recognition examples with different types of occlusion, translation, and camera focus issues

      Crossing boundaries with AI

      Advances in artificial intelligence and the maturity of TensorFlow enabled us to finally achieve a long-sought proof-of-purchase capability. Since launching in late February 2017, our product code recognition platform has fueled more than a dozen promotions and resulted in over 180,000 scanned codes; it is now a core component for all of Coca-Cola North America's web-based promotions.

      Moving to an AI-enabled product-code recognition platform has been valuable for two key reasons:

      • Frictionless proof-of-purchase was enabled in a timely fashion, corresponding to our overall move to a mobile-first marketing platform.
      • Coke saved millions of dollars by avoiding the requirement to update printers in our production lines to support higher-fidelity fonts that would work with existing off-the-shelf OCR software.

      Our product-code recognition platform is the first execution of new AI-enabled capabilities at scale within Coca-Cola. We're now exploring AI applications across multiple lines of business, from new product development to ecommerce retail optimization.