Tag Archives: machine learning

Privacy Considerations in Large Language Models

Machine learning-based language models trained to predict the next word in a sentence have become increasingly capable, common, and useful, leading to groundbreaking improvements in applications like question-answering, translation, and more. But as language models continue to advance, new and unexpected risks can be exposed, requiring the research community to proactively work to develop new ways to mitigate potential problems.

One such risk is the potential for models to leak details from the data on which they’re trained. While this may be a concern for all large language models, additional issues may arise if a model trained on private data were to be made publicly available. Because these datasets can be large (hundreds of gigabytes) and pull from a range of sources, they can sometimes contain sensitive data, including personally identifiable information (PII) — names, phone numbers, addresses, etc., even if trained on public data. This raises the possibility that a model trained using such data could reflect some of these private details in its output. It is therefore important to identify and minimize the risks of such leaks, and to develop strategies to address the issue for future models.

If one prompts the GPT-2 language model with the prefix “East Stroudsburg Stroudsburg...”, it will autocomplete a long block of text that contains the full name, phone number, email address, and physical address of a particular person whose information was included in GPT-2’s training data.

In “Extracting Training Data from Large Language Models”, a collaboration with OpenAI, Apple, Stanford, Berkeley, and Northeastern University, we demonstrate that, given only the ability to query a pre-trained language model, it is possible to extract specific pieces of training data that the model has memorized. As such, training data extraction attacks are realistic threats on state-of-the-art large language models. This research represents an early, critical step intended to inform researchers about this class of vulnerabilities, so that they may take steps to mitigate these weaknesses.

Ethics of Language Model Attacks
A training data extraction attack has the greatest potential for harm when applied to a model that is available to the public, but for which the dataset used in training is not. However, since conducting this research on such a dataset could have harmful consequences, we instead mount a proof of concept training data extraction attack on GPT-2, a large, publicly available language model developed by OpenAI, that was trained using only public data. While this work focuses on GPT-2 specifically, the results apply to understanding what privacy threats are possible on large language models generally.

As with other privacy- and security-related research, it is important to consider the ethics of such attacks before actually performing them. To minimize the potential risk of this work, the training data extraction attack in this work was developed using publicly available data. Furthermore, the GPT-2 model itself was made public by OpenAI in 2019, and the training data used to train GPT-2 was collected from the public internet, and is available for download by anyone who follows the data collection process documented in the GPT-2 paper.

Additionally, in accordance with responsible computer security disclosure norms, we followed up with individuals whose PII was extracted, and secured their permission before including references to this data in publication. Further, in all publications of this work, we have redacted any personally identifying information that may identify individuals. We have also worked closely with OpenAI in the analysis of GPT-2.

The Training Data Extraction Attack
By design, language models make it very easy to generate a large amount of output data. By seeding the model with random short phrases, the model can generate millions of continuations, i.e., probable phrases that complete the sentence. Most of the time, these continuations will be benign strings of sensible text. For example, when asked to predict the continuation of the string “Mary had a little…”, a language model will have high confidence that the next token is the word “lamb”. However, if one particular training document happened to repeat the string “Mary had a little wombat” many times, the model might predict that phrase instead.

The goal of a training data extraction attack is then to sift through the millions of output sequences from the language model and predict which text is memorized. To accomplish this, our approach leverages the fact that models tend to be more confident on results captured directly from their training data. These membership inference attacks enable us to predict if a result was used in the training data by checking the confidence of the model on a particular sequence.

The main technical contribution of this work is the development of a method for inferring membership with high accuracy along with techniques for sampling from models in a way that encourages the output of memorized content. We tested a number of different sampling strategies, the most successful of which generates text conditioned on a wide variety of input phrases. We then compare the output of two different language models. When one model has high confidence in a sequence, but the other (equally accurate) model has low confidence in a sequence, it's likely that the first model has memorized the data.

Results
Out of 1800 candidate sequences from the GPT-2 language model, we extracted over 600 that were memorized from the public training data, with the total number limited by the need for manual verification. The memorized examples cover a wide range of content, including news headlines, log messages, JavaScript code, PII, and more. Many of these examples are memorized even though they appear infrequently in the training dataset. For example, for many samples of PII we extract are found in only a single document in the dataset. However, in most of these cases, the originating document contains multiple instances of the PII, and as a result, the model still learns it as high likelihood text.

Finally, we also find that the larger the language model, the more easily it memorizes training data. For example, in one experiment we find that the 1.5 billion parameter GPT-2 XL model memorizes 10 times more information than the 124 million parameter GPT-2 Small model. Given that the research community has already trained models 10 to 100 times larger, this means that as time goes by, more work will be required to monitor and mitigate this problem in increasingly large language models.

Lessons
While we demonstrate these attacks on GPT-2 specifically, they show potential flaws in all large generative language models. The fact that these attacks are possible has important consequences for the future of machine learning research using these types of models.

Fortunately, there are several ways to mitigate this issue. The most straightforward solution is to ensure that models do not train on any potentially problematic data. But this can be difficult to do in practice.

The use of differential privacy, which allows training on a dataset without revealing any details of individual training examples, is one of the most principled techniques to train machine learning models with privacy. In TensorFlow, this can be achieved with the use of the tensorflow/privacy module (or similar for PyTorch or JAX) that is a drop-in replacement for existing optimizers. Even this can have limitations and won’t prevent memorization of content that is repeated often enough. If this is not possible, we recommend at least measuring how much memorization occurs so appropriate action can be taken.

Language models continue to demonstrate great utility and flexibility—yet, like all innovations, they can also pose risks. Developing them responsibly means proactively identifying those risks and developing ways to mitigate them. We hope that this effort to highlight current weaknesses in large language modeling will raise awareness of this challenge in the broader machine learning community and motivate researchers to continue to develop effective techniques to train models with reduced memorization.

Acknowledgements
This work was performed jointly with Florian Tramer, Eric Wallace, Matthew Jagielski, Ariel Herbert-Voss, Katherine Lee, Adam Roberts, Tom Brown, Dawn Song, Ulfar Erlingsson, Alina Oprea, and Colin Raffel.

Source: Google AI Blog


Announcing the Newest Addition to MLKit: Entity Extraction

Posted by Kenny Sulaimon, Product Manager, ML Kit, Tory Voight, Product Manager, ML Kit, Daniel Furlong, Lei Yu, Software Engineers, ML Kit, Dong Chen, Technical Lead, MLKit
ML Kit logo

Six months ago, we introduced the standalone version of the ML Kit SDK, making it even easier to integrate on-device machine learning into mobile apps. Since then, we’ve launched the Digital Ink Recognition and Pose Detection APIs, and also introduced the ML Kit early access program. Today we are excited to add Entity Extraction to the official ML Kit lineup and also debut a new API for our early access program, Selfie Segmentation!

Overview of ML Kit's Entity Extraction API

With ML Kit’s Entity Extraction API, you can now improve the user experience inside your app by understanding text and performing specific actions on it.

The Entity Extraction API allows you to detect and locate entities from raw text, and take action based on those entities. The API works on static text and also in real-time while a user is typing. It supports 11 different entities and 15 different languages (with more coming in the future) to allow developers to make any text interaction a richer experience for the user.

Supported Entities

  • Address (350 third street, cambridge)
  • Date-Time* (12/12/2020, tomorrow at 3pm) (let's meet tomorrow at 6pm)
  • Email ([email protected])
  • Flight Number* (LX37)
  • IBAN* (CH52 0483 0000 0000 0000 9)
  • ISBN* (978-1101904190)
  • Money (including currency)* ($12, 25USD)
  • Payment Card* (4111 1111 1111 1111)
  • Phone Number ((555) 225-3556, 12345)
  • Tracking Number* (1Z204E380338943508)
  • URL (www.google.com, https://en.wikipedia.org/wiki/Platypus, seznam.cz)
Example Results
Table of Input text and Detected entities

Real World Applications

2 phones showing TamTam using Entity Extraction API

(Images courtesy of TamTam)

Our early access partner, TamTam, has been using the Entity Extraction API to provide helpful suggestions to their users during their chat conversations. This feature allows users to quickly perform actions based on the context of their conversations.

While integrating this API, Iurii Dorofeev, Head of TamTam Android Development, mentioned, “We appreciated the ease of integration of the ML Kit ... and it works offline. Clustering the content of messages right on the device allowed us to save resources. ML Kit capabilities will help us develop other features for TamTam messenger in the future.”

Check out their messaging app on the Google Play and App Store today.

Under The Hood

Diagram of underlying Text Classifier API

(Diagram of underlying Text Classifier API)

ML Kit’s Entity Extraction API builds upon the technology powering the Smart Linkify feature in Android 10+ to deliver an easy-to-use and streamlined experience for developers. For an in-depth review of the Text Classifier API, please see our blog post here.

The neural network annotators/models in the Entity Extraction API work as follows: A given input text is first split into words (based on space separation), then all possible word subsequences of certain maximum length (15 words in the example above) are generated, and for each candidate the scoring neural net assigns a value (between 0 and 1) based on whether it represents a valid entity.

Next, the generated entities that overlap are removed, favoring the ones with a higher score over the conflicting ones with a lower score. Then a second neural network is used to classify the type of the entity as a phone number, an address, or in some cases, a non-entity.

The neural network models in the Entity Extraction API are used in combination with other types of models (e.g. rule-based) to identify additional entities in text, such as: flight numbers, currencies and other examples listed above. Therefore, if multiple entities are detected for one text input, the Entity Extraction API can return several overlapping results.

Lastly, ML Kit will automatically download the required language-specific models to the device dynamically. You can also explicitly manage models you want available on the device by using ML Kit’s model management API. This can be useful if you want to download models ahead of time for your users. The API also allows you to delete models that are no longer required.

New APIs Coming Soon

Selfie Segmentation

With the increased usage of selfie cameras and webcams in today's world, being able to quickly and easily add effects to camera experiences has become a necessity for many app developers today.

ML Kit's Selfie Segmentation API allows developers to easily separate the background from a scene and focus on what matters. Adding cool effects to selfies or inserting your users into interesting background environments has never been easier. This API produces great results with low latency on both Android and iOS devices.

Example of ML Kit Selfie Segmentation

(Example of ML Kit Selfie Segmentation)

Key capabilities:

  • Easily allows developers to replace or blur a user’s background
  • Works well with single or multiple people
  • Cross-platform support (iOS and Android)
  • Runs real-time on most modern phones

To join our early access program and request access to ML Kit's Selfie Segmentation API, please fill out this form.

Using AutoML for Time Series Forecasting

Time series forecasting is an important research area for machine learning (ML), particularly where accurate forecasting is critical, including several industries such as retail, supply chain, energy, finance, etc. For example, in the consumer goods domain, improving the accuracy of demand forecasting by 10-20% can reduce inventory by 5% and increase revenue by 2-3%. Current ML-based forecasting solutions are usually built by experts and require significant manual effort, including model construction, feature engineering and hyper-parameter tuning. However, such expertise may not be broadly available, which can limit the benefits of applying ML towards time series forecasting challenges.

To address this, automated machine learning (AutoML) is an approach that makes ML more widely accessible by automating the process of creating ML models, and has recently accelerated both ML research and the application of ML to real-world problems. For example, the initial work on neural architecture search enabled breakthroughs in computer vision, such as NasNet, AmoebaNet, and EfficientNet, and in natural language processing, such as Evolved Transformer. More recently, AutoML has also been applied to tabular data.

Today we introduce a scalable end-to-end AutoML solution for time series forecasting, which meets three key criteria:

  • Fully automated: The solution takes in data as input, and produces a servable TensorFlow model as output with no human intervention.
  • Generic: The solution works for most time series forecasting tasks and automatically searches for the best model configuration for each task.
  • High-quality: The produced models have competitive quality compared to those manually crafted for specific tasks.

We demonstrate the success of this approach through participation in the M5 forecasting competition, where this AutoML solution achieved competitive performance against hand-crafted models with moderate compute cost.

Challenges in Time Series Forecasting
Time series forecasting presents several challenges to machine learning models. First, the uncertainty is often high since the goal is to predict the future based on historical data. Unlike other machine learning problems, the test set, for example, future product sales, might have a different distribution from the training and validation set, which are extracted from the historical data. Second, the time series data from the real world often suffers from missing data and high intermittency (i.e., when a high fraction of the time series has the value of zero). Some time series tasks may not have historical data available and suffer from the cold start problem, for example, when predicting the sales of a new product. Third, since we aim to build a fully automated generic solution, the same solution needs to apply to a variety of datasets, which can vary significantly in the domain (product sales, web traffic, etc), the granularity (daily, hourly, etc), the history length, the types of features (categorical, numerical, date time, etc), and so on.

An AutoML Solution
To tackle these challenges, we designed an end-to-end TensorFlow pipeline with a specialized search space for time series forecasting. It is based on an encoder-decoder architecture, in which an encoder transforms the historical information in a time series into a set of vectors, and a decoder generates the future predictions based on these vectors. Inspired by the state-of-the-art sequence models, such as Transformer and WaveNet, and best practices in time series forecasting, our search space included components such as attention, dilated convolution, gating, skip connections, and different feature transformations. The resulting AutoML solution searches for the best combination of these components as well as core hyperparameters.

To combat the uncertainty in predicting the future of a time series, an ensemble of the top models discovered in the search is used to make final predictions. The diversity in the top models made the predictions more robust to uncertainty and less prone to overfitting the historical data. To handle time series with missing data, we fill in the gaps with a trainable vector and let the model learn to adapt to the missing time steps. To address intermittency, we predict, for each future time step, not only the value, but also the probability that the value at this time step is non-zero, and combine the two predictions. Finally, we found that the automated search is able to adjust the architecture and hyperparameter choices for different datasets, which makes the AutoML solution generic and automates the modeling efforts.

Benchmarking in Forecasting Competitions
To benchmark our AutoML solution, we participated in the M5 forecasting competition, the latest in the M-competition series, which is one of the most important competitions in the forecasting community, with a long history spanning nearly 40 years. This most recent competition was hosted on Kaggle and used a dataset from Walmart product sales, the real-world nature of which makes the problem quite challenging.

We participated in the competition with our fully automated solution and achieved a rank of 138 out of 5558 participants (top 2.5%) on the final leaderboard, which is in the silver medal zone. Participants in the competition had almost four months to produce their models. While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.

We also benchmarked our AutoML forecasting solution on several other Kaggle datasets and found that on average it outperforms 92% of hand-crafted models, despite its limited resource use.

Evaluation of the AutoML Forecasting solution on other Kaggle Datasets (Rossman Store Sales, Web Traffic, Favorita Grocery Sales) besides M5.

This work demonstrates the strength of an end-to-end AutoML solution for time series forecasting, and we are excited about its potential impact on real-world applications.

Acknowledgements
This project was a joint effort of Google Brain team members Chen Liang, Da Huang, Yifeng Lu and Quoc V. Le. We also thank Junwei Yuan, Xingwei Yang, Dawei Jia, Chenyu Zhao, Tin-yun Ho, Meng Wang, Yaguang Li, Nicolas Loeff, Manish Kurse, Kyle Anderson and Nishant Patil for their collaboration.

Source: Google AI Blog


Transformers for Image Recognition at Scale

While convolutional neural networks (CNNs) have been used in computer vision since the 1980s, they were not at the forefront until 2012 when AlexNet surpassed the performance of contemporary state-of-the-art image recognition methods by a large margin. Two factors helped enable this breakthrough: (i) the availability of training sets like ImageNet, and (ii) the use of commoditized GPU hardware, which provided significantly more compute for training. As such, since 2012, CNNs have become the go-to model for vision tasks.

The benefit of using CNNs was that they avoided the need for hand-designed visual features, instead learning to perform tasks directly from data “end to end”. However, while CNNs avoid hand-crafted feature-extraction, the architecture itself is designed specifically for images and can be computationally demanding. Looking forward to the next generation of scalable vision models, one might ask whether this domain-specific design is necessary, or if one could successfully leverage more domain agnostic and computationally efficient architectures to achieve state-of-the-art results.

As a first step in this direction, we present the Vision Transformer (ViT), a vision model based as closely as possible on the Transformer architecture originally designed for text-based tasks. ViT represents an input image as a sequence of image patches, similar to the sequence of word embeddings used when applying Transformers to text, and directly predicts class labels for the image. ViT demonstrates excellent performance when trained on sufficient data, outperforming a comparable state-of-the-art CNN with four times fewer computational resources. To foster additional research in this area, we have open-sourced both the code and models.

The Vision Transformer treats an input image as a sequence of patches, akin to a series of word embeddings generated by a natural language processing (NLP) Transformer.

The Vision Transformer
The original text Transformer takes as input a sequence of words, which it then uses for classification, translation, or other NLP tasks. For ViT, we make the fewest possible modifications to the Transformer design to make it operate directly on images instead of words, and observe how much about image structure the model can learn on its own.

ViT divides an image into a grid of square patches. Each patch is flattened into a single vector by concatenating the channels of all pixels in a patch and then linearly projecting it to the desired input dimension. Because Transformers are agnostic to the structure of the input elements we add learnable position embeddings to each patch, which allow the model to learn about the structure of the images. A priori, ViT does not know about the relative location of patches in the image, or even that the image has a 2D structure — it must learn such relevant information from the training data and encode structural information in the position embeddings.

Scaling Up

We first train ViT on ImageNet, where it achieves a best score of 77.9% top-1 accuracy. While this is decent for a first attempt, it falls far short of the state of the art — the current best CNN trained on ImageNet with no extra data reaches 85.8%. Despite mitigation strategies (e.g., regularization), ViT overfits the ImageNet task due to its lack of inbuilt knowledge about images.

To investigate the impact of dataset size on model performance, we train ViT on ImageNet-21k (14M images, 21k classes) and JFT (300M images, 18k classes), and compare the results to a state-of-the-art CNN, Big Transfer (BiT), trained on the same datasets. As previously observed, ViT performs significantly worse than the CNN equivalent (BiT) when trained on ImageNet (1M images). However, on ImageNet-21k (14M images) performance is comparable, and on JFT (300M images), ViT now outperforms BiT.

Finally, we investigate the impact of the amount of computation involved in training the models. For this, we train several different ViT models and CNNs on JFT. These models span a range of model sizes and training durations. As a result, they require varying amounts of compute for training. We observe that, for a given amount of compute, ViT yields better performance than the equivalent CNNs.

Left: Performance of ViT when pre-trained on different datasets. Right: ViT yields a good performance/compute trade-off.

High-Performing Large-Scale Image Recognition
Our data suggest that (1) with sufficient training ViT can perform very well, and (2) ViT yields an excellent performance/compute trade-off at both smaller and larger compute scales. Therefore, to see if performance improvements carried over to even larger scales, we trained a 600M-parameter ViT model.

This large ViT model attains state-of-the-art performance on multiple popular benchmarks, including 88.55% top-1 accuracy on ImageNet and 99.50% on CIFAR-10. ViT also performs well on the cleaned-up version of the ImageNet evaluations set “ImageNet-Real”, attaining 90.72% top-1 accuracy. Finally, ViT works well on diverse tasks, even with few training data points. For example, on the VTAB-1k suite (19 tasks with 1,000 data points each), ViT attains 77.63%, significantly ahead of the single-model state of the art (SOTA) (76.3%), and even matching SOTA attained by an ensemble of multiple models (77.6%). Most importantly, these results are obtained using fewer compute resources compared to previous SOTA CNNs, e.g., 4x fewer than the pre-trained BiT models.

Vision Transformer matches or outperforms state-of-the-art CNNs on popular benchmarks. Left: Popular image classification tasks (ImageNet, including new validation labels ReaL, and CIFAR, Pets, and Flowers). Right: Average across 19 tasks in the VTAB classification suite.

Visualizations
To gain some intuition into what the model learns, we visualize some of its internal workings. First, we look at the position embeddings — parameters that the model learns to encode the relative location of patches — and find that ViT is able to reproduce an intuitive image structure. Each position embedding is most similar to others in the same row and column, indicating that the model has recovered the grid structure of the original images. Second, we examine the average spatial distance between one element attending to another for each transformer block. At higher layers (depths of 10-20) only global features are used (i.e., large attention distances), but the lower layers (depths 0-5) capture both global and local features, as indicated by a large range in the mean attention distance. By contrast, only local features are present in the lower layers of a CNN. These experiments indicate that ViT can learn features hard-coded into CNNs (such as awareness of grid structure), but is also free to learn more generic patterns, such as a mix of local and global features at lower layers, that can aid generalization.

Left: ViT learns the grid like structure of the image patches via its position embeddings. Right: The lower layers of ViT contain both global and local features, the higher layers contain only global features.

Summary
While CNNs have revolutionized computer vision, our results indicate that models tailor-made for imaging tasks may be unnecessary, or even sub-optimal. With ever-increasing dataset sizes, and the continued development of unsupervised and semi-supervised methods, the development of new vision architectures that train more efficiently on these datasets becomes increasingly important. We believe ViT is a preliminary step towards generic, scalable architectures that can solve many vision tasks, or even tasks from many domains, and are excited for future developments.

A preprint of our work as well as code and models are publically available.

Acknowledgements
We would like to thank our co-authors in Berlin, Zürich, and Amsterdam: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, and Jakob Uszkoreit. We would like to thank Andreas Steiner for crucial help with infrastructure and open-sourcing, Joan Puigcerver and Maxim Neumann for work on large-scale training infrastructure, and Dmitry Lepikhin, Aravindh Mahendran, Daniel Keysers, Mario Lučić, Noam Shazeer, and Colin Raffel for useful discussions. Finally, we thank Tom Small for creating the Visual Transformer animation in this post.

Source: Google AI Blog


From MLPerf to MLCommons: moving machine learning forward

Today, the community of machine learning researchers and engineers behind the MLPerf benchmark is launching an open engineering consortium called MLCommons. For us, this is the next step in a journey that started almost three years ago.


Early in 2018, we gathered a group of industry researchers and academics who had published work on benchmarking machine learning (ML), in a conference room to propose the creation of an industry standard benchmark to measure ML performance. Everyone had doubts: creating an industry standard is challenging under the best conditions and ML was (and is) a poorly understood stochastic process running on extremely diverse software and hardware. Yet, we all agreed to try.

Together, along with a growing community of researchers and academics, we created a new benchmark called MLPerf. The effort took off. MLPerf is now an industry standard with over 2,000 submitted results and multiple benchmarks suites that span systems from smartphones to supercomputers. Over that time, the fastest result submitted to MLPerf for training the classic ML network ResNet improved by over 13x.

We created MLPerf because we believed in three principles:
  • Machine learning has tremendous potential: Already, machine learning helps billions of people find and understand information through tools like Google’s search engine and translation service. Active research in machine learning could one day save millions of lives through improvements in healthcare and automotive safety.
  • Transforming machine learning from promising research into wide-spread industrial practice requires investment in common infrastructure -- especially metrics: Much like computing in the ‘80s, real innovation is mixed with hype and adopting new ideas is slow and cumbersome. We need good metrics to identify the best ideas, and good infrastructure to make adoption of new techniques fast and easy.
  • Developing common infrastructure is best done by an open, fast-moving collaboration: We need the vision of academics and the resources of industry. We need the agility of startups and the scale of leading tech companies. Working together, a diverse community can develop new ideas, launch experiments, and rapidly iterate to arrive at shared solutions.
Our belief in the principles behind MLPerf has only gotten stronger, and we are excited to be part of the next step for the MLPerf community with the launch of MLCommons.

MLCommons aims to accelerate machine learning to benefit everyone. MLCommons will build a a common set of tools for ML practitioners including:
  • Benchmarks to measure progress: MLCommons will leverage MLPerf to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).
  • Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude, better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision. 
  • Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.
Google believes in the potential of machine learning, the importance of common infrastructure, and the power of open, collaborative development. Our leadership in co-founding, and deep support in sustaining, MLPerf and MLCommons has echoed our involvement in other efforts like TensorFlow and NNAPI. Together with the MLCommons community, we can improve machine learning to benefit everyone.

Want to get involved? Learn more at mlcommons.org.


By Peter Mattson – ML Metrics, Naveen Kumar – ML Performance, and Cliff Young – Google Brain

Aligning thousands of Billie Eilish covers in an infinite music video experiment

Posted by Google Creative Lab

Billie Eilish gif

“Bad Guy” by Billie Eilish is one of the most-covered songs on YouTube, inspiring thousands of fans to upload their own versions. To celebrate all these covers, YouTube and Google Creative Lab built an AI experiment to combine all of them seamlessly in the world’s first infinite music video: Infinite Bad Guy. The experience aligns every cover to the same beat, no matter its genre, language, or instrumentation.

Finding all the covers

How do you find “Bad Guy” covers amidst all the billions of videos on YouTube? Just searching for “Bad Guy” would result in false positives, like videos of Billie being interviewed about the song, or miss covers that didn’t use the song name in their titles. YouTube’s ContentID system allows us to find videos that match the musical composition “Bad Guy” and also allows us to narrow our search to videos that appear to be performances or creative interpretations of the song. That way, we can also avoid videos where “Bad Guy” was just background music. We continue to run this search daily, collecting an ever-expanding list of potential covers to use in the experience.

Finding all the covers

Aligning all the covers to the same beat

A key part of the experience is being able to jump from cover to cover seamlessly. But fan covers of “Bad Guy” vary widely. Some might be similar to the original, like a dance video set to Billie’s track. Some might vary more in tempo and instrumentation, like a heavy metal cover. And others might diverge greatly from the original, like a clarinet version with no lyrics. How can you get all these covers on the same beat? After trying several approaches like dynamic time warping and chord recognition, we’ve found the most success with a recurrent neural network trained to recognize sections and beats of “Bad Guy.” We collaborated with our friends at IYOYO on cover alignment and they have a great writeup about the process.

Aligning all the covers to the same beat

Building the experience

Finding and aligning the covers is a fascinating research problem, but the crucial final step is making them explorable to everyone. We’ve tried to make it intuitive and fun to navigate all the infinite combinations, while keeping latency low so the song never drops a beat.

The experience centers around three YouTube players, a number we settled on after a lot of experimentation. Initially we thought more players would be more interesting, but the experience got chaotic and slow. Around the players we’ve added discoverable features like the hashtag drawer and stats page. Video game interfaces have been a big inspiration for us, as they combine multiple interactions in a single dashboard. We’ve also added an autoplay mode for users who want to just sit back and be taken through an ever-changing mix of covers.

We’re excited about how Infinite Bad Guy showcases the incredibly diverse talent of YouTube and the potential machine learning can have for music and creativity. Give it a try and see what beautiful, strange, and brilliant covers you can find.

Irem from Turkey shares her groundbreaking work in TensorFlow and advice for the community

Posted by Jennifer Kohl, Global Program Manager, Google Developer Groups

Irem presenting at a Google Developer Group event

We recently caught up with Irem Komurcu, a TensorFlow developer and researcher at Istanbul Technical University in Turkey. Irem has been a long-serving member of Google Developer Groups (GDG) Düzce and also serves as a Women Techmakers (WTM) ambassador. Her work with TensorFlow has received several accolades, including being named a Hamdi Ulukaya Girişimi fellow. As one one of twenty-four young entrepreneurs selected, she was flown to New York City last year to learn more about business and receive professional development.

With all this experience to share, we wanted you to hear how she approaches pursuing a career in tech, hones her TensorFlow skills with the GDG community, and thinks about how upcoming programmers can best position themselves for success. Check out the full interview below for more.

What inspired you to pursue a career in technology?

I first became interested in tech when I was in high school and went on to study computer engineering. At university, I had an eye-opening experience when I traveled from Turkey to the Google Developer Day event in India. It was here where I observed various code languages, products, and projects that were new to me.

In particular, I saw TensorFlow in action for the first time. Watching the powerful machine learning tool truly sparked my interest in deep learning and project development.

Can you describe your work with TensorFlow and Machine Learning?

I have studied many different aspects of Tensorflow and ML. My first work was on voice recognition and deep learning. However, I am now working as a computer vision researcher conducting various segmentation, object detection, and classification processes with Tensorflow. In my free time, I write various articles about best practices and strategies to leverage TensorFlow in ML.

What has been a useful learning resource you have used in your career?

I kicked off my studies on deep learning on tensorflow.org. It’s a basic first step, but a powerful one. There were so many blogs, codes, examples, and tutorials for me to dive into. Both the Google Developer Group and TensorFlow communities also offered chances to bounce questions and ideas off other developers as I learned.

Between these technical resources and the person-to-person support, I was lucky to start working with the GDG community while also taking the first steps of my career. There were so many opportunities to meet people and grow all around.

What is your favorite part of the Google Developer Group community?

I love being in a large community with technology-oriented people. GDG is a network of professionals who support each other, and that enables people to develop. I am continuously sharing my knowledge with other programmers as they simultaneously mentor me. The chance for us to collaborate together is truly fulfilling.

What is unique about being a developer in your country/region?

The number of women supported in science, technology, engineering, and mathematics (STEM) is low in Turkey. To address this, I partner with Women Techmakers (WTM) to give educational talks on TensorFlow and machine learning to women who want to learn how to code in my country. So many women are interested in ML, but just need a friendly, familiar face to help them get started. With WTM, I’ve already given over 30 talks to women in STEM.

What advice would you give to someone who is trying to grow their career as a developer?

Keep researching new things. Read everything you can get your eyes on. Technology has been developing rapidly, and it is necessary to make sure your mind can keep up with the pace. That’s why I recommend communities like GDG that help make sure you’re up to date on the newest trends and learnings.


Want to work with other developers like Irem? Then find the right Google Developer Developer Group for you, here.

Passionate former DSC lead Irene inspires others to learn Google technologies with her new podcast and more

Posted by Erica Hanson, Global Program Manager, Google Developer Student Clubs

(Irene (left) and her DSC team from the Polytechnic University of Cartagena (photo prior to COVID-19)

Irene Ruiz Pozo is a former Google Developer Student Club (DSC) Lead at the Polytechnic University of Cartagena in Murcia, Spain. As one of the founding members, Irene has seen the club grow from just a few student developers at her university to hosting multiple learning events across Spain. Recently, we spoke with Irene to understand more about the unique ways in which her team helped local university students learn more about Google technologies.

Real world ML and AR learning opportunities

Irene mentioned two fascinating projects that she had the chance to work on through her DSC at the Polytechnic University of Cartagena. The first was a learning lab that helped students understand how to use 360º cameras and 3D scanners for machine learning.

(A DSC member giving a demo of a 360º camera to students at the National Museum of Underwater Archeology in Cartagena)

The second was a partnership with the National Museum of Underwater Archeology, where Irene and her team created an augmented reality game that let students explore a digital rendition of the museum’s exhibitions.

(An image from the augmented reality game created for the National Museum of Underwater Archeology)

In the above AR experience created by Irene’s team, users can create their own character and move throughout the museum and explore different virtual renditions of exhibits in a video game-like setting.

Hash Code competition and experiencing the Google work culture

One particularly memorable experience for Irene and her DSC was participating in Google’s annual programming competition, Hash Code. As Irene explained, the event allowed developers to share their skills and connect in small teams of two to four programmers. They would then come together to tackle engineering problems like how to best design the layout of a Google data center, create the perfect video streaming experience on YouTube, or establish the best practices for compiling code at Google scale.

(Students working on the Hash Code competition (photo taken prior to COVID-19)

To Irene, the experience felt like a live look at being a software engineer at Google. The event taught her and her DSC team that while programming skills are important, communication and collaboration skills are what really help solve problems. For Irene, the experience truly bridged the gap between theory and practice.

Expanding knowledge with a podcast for student developers

(Irene’s team working with other student developers (photo taken before COVID-19)

After the event, Irene felt that if a true mentorship network was established among other DSCs in Europe, students would feel more comfortable partnering with one another to talk about common problems they faced. Inspired, she began to build out her mentorship program which included a podcast where student developers could collaborate on projects together.

The podcast, which just released its second episode, also highlights upcoming opportunities for students. In the most recent episode, Irene and friends dive into how to apply for Google Summer of Code Scholarships and talk about other upcoming open source project opportunities. Organizing these types of learning experiences for the community was one of the most fulfilling parts of working as a DSC Lead, according to Irene. She explained that the podcast has been an exciting space that allows her and other students to get more experience presenting ideas to an audience. Through this podcast, Irene has already seen many new DSC members eager to join the conversation and collaborate on new ideas.

As Irene now looks out on her future, she is excited for all the learning and career development that awaits her from the entire Google Developer community. Having graduated from university, Irene is now a Google Developer Groups (GDG) Lead - a program similar to DSC, but created for the professional developer community. In this role, she is excited to learn new skills and make professional connections that will help her start her career.

Are you also a student with a passion for code? Then join a local Google Developer Student Club near you, here.

Coral makes edge AI even more accessible in 2020

Posted by the Coral team

Coral Dev Board Mini and Accelerator Module feature Google's Edge TPU co-processor to accelerate AI at the edge.

Since we launched Coral back in March 2019, we’ve added a number of new product form factors to accommodate the many ways users are adding on-device ML to their products. We've also streamlined the ML workflow and added capabilities like model pipelining with multiple Edge TPUs for an easier and more robust developer experience. And from this, we’ve helped enable amazing use cases from smart water meters that prevent water loss with Olea Edge, to systems for improving harvest yield with Farmwave, to noise cancellation in meetings in Google’s own Series One meeting kits.

This week, we’ll begin shipping the Coral Accelerator Module, a multi-chip module that combines the Edge TPU and it’s power circuitry into a solderable package. The module exposes PCIe and USB2 interfaces, which make it even easier to integrate Coral into custom designs. Several companies are already taking advantage of the compact size and capabilities with their new products coming to market. Read more about how Gumstix, STD, Siana Systems and IEI are using our module.

And in December, we’ll begin shipping the Dev Board Mini, a smaller, more power-efficient, and value-oriented board that brings forward a more traditional, flattened single-board computer design. The Dev Board Mini pairs a Mediatek 8167 SoC with the Coral Accelerator Module over USB 2 and is a great way to evaluate the module as the center of a project or deployment.

You can see the new Dev Board Mini and Accelerator Module in action in the latest episode of Level Up, where Markku Lepisto controls his studio lights with speech commands.

To get updates on when the board will be available for purchase and other Coral news, sign up for our newsletter.

Developing for the edge, now simplified

We recently announced a new version of the Coral ML APIs and tools. This release brings the C++ API into parity with Python and makes it more modular, reusable and performant. At the same time it eliminates unnecessary abstractions and surfaces replacing them with native TensorFlow Lite APIs. This release also graduates the Model Pipelining API out of beta and introduces a new model partitioner that automatically partitions models based on profiling and up to 10x better performance.

We’ve added a pre-trained version of MobileDet — a state-of-the-art object detection model for mobile systems — into our models portfolio. We’re migrating our model-development workflow to TensorFlow 2, and we’re including a handful of updated or new models based on the TF2 Keras framework. For details, check out the full announcement on the TensorFlow blog.

We’re also excited to see great developer tools coming from our ecosystem partners. For example, PerceptiLabs offers a visual API for building TensorFlow models and recently published a new demo which trains a machine learning model to identify sign language optimized for the edge with Coral.

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

And SigFox released a radio transceiver board that stacks on either the Coral Dev Board or Dev Board Mini. This allows small data payloads to be transmitted across low power, long range radio networks for use cases like smart cities, fleet management, asset tracking, agriculture and energy. The PCB design will be offered as a free download on SigFox’s website. Google Cloud Solutions Architect Markku Lepisto will present the new design today, in the opening keynote at SigFox Connect.

Customers with a Coral edge

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

Just in time for harvest we wanted to share a story about how Farmwave is using Coral to improve the efficiency of farm equipment and reduce food waste. Traditional yield loss analysis involves hand-counting grains of corn left on the ground mid harvest. It’s a time and labor intensive task, and not feasible for farmers who measure the value of their half-million-dollar combines in minutes spent running them.

By leveraging Coral’s on-device AI capabilities, Farmwave was able to build a system that automates the count while the machine is running. Thus allowing farmers to make real-time adjustments to harvesting machines in response to conditions in the field, which can make a big difference in yield.

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Revolving Sushi Bar in Japan has always been committed to the highest standards of health and safety for its customers. Known for their tech forward approach, Kura has dabbled in sushi making robots, an automated prize machine called Bikkura-pon, and a patented dome-shaped dish cover, aptly dubbed Mr. Fresh. But most recently, Kura has used Coral to develop an AI powered system that not only facilitates efficiency for better customer experiences, but also enables better tracking to prevent foodborne illnesses.

Making AI more accessible

While this year has presented the world with many obstacles, we’ve been impressed by the new ideas and innovations coming forward through technology. By providing the necessary tools and technology for edge AI, we strive to empower society to create affordable, adaptable, and intelligent systems.

We are excited to share all that Coral has to offer as we evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the Coral partnerships page.

Please visit Coral.ai to discover more about our edge ML platform and share your feedback at [email protected]. To receive future Coral updates directly in your inbox, sign up for our newsletter.

Coral makes edge AI even more accessible in 2020

Posted by the Coral team

Coral Dev Board Mini and Accelerator Module feature Google's Edge TPU co-processor to accelerate AI at the edge.

Since we launched Coral back in March 2019, we’ve added a number of new product form factors to accommodate the many ways users are adding on-device ML to their products. We've also streamlined the ML workflow and added capabilities like model pipelining with multiple Edge TPUs for an easier and more robust developer experience. And from this, we’ve helped enable amazing use cases from smart water meters that prevent water loss with Olea Edge, to systems for improving harvest yield with Farmwave, to noise cancellation in meetings in Google’s own Series One meeting kits.

This week, we’ll begin shipping the Coral Accelerator Module, a multi-chip module that combines the Edge TPU and it’s power circuitry into a solderable package. The module exposes PCIe and USB2 interfaces, which make it even easier to integrate Coral into custom designs. Several companies are already taking advantage of the compact size and capabilities with their new products coming to market. Read more about how Gumstix, STD, Siana Systems and IEI are using our module.

And in December, we’ll begin shipping the Dev Board Mini, a smaller, more power-efficient, and value-oriented board that brings forward a more traditional, flattened single-board computer design. The Dev Board Mini pairs a Mediatek 8167 SoC with the Coral Accelerator Module over USB 2 and is a great way to evaluate the module as the center of a project or deployment.

You can see the new Dev Board Mini and Accelerator Module in action in the latest episode of Level Up, where Markku Lepisto controls his studio lights with speech commands.

To get updates on when the board will be available for purchase and other Coral news, sign up for our newsletter.

Developing for the edge, now simplified

We recently announced a new version of the Coral ML APIs and tools. This release brings the C++ API into parity with Python and makes it more modular, reusable and performant. At the same time it eliminates unnecessary abstractions and surfaces replacing them with native TensorFlow Lite APIs. This release also graduates the Model Pipelining API out of beta and introduces a new model partitioner that automatically partitions models based on profiling and up to 10x better performance.

We’ve added a pre-trained version of MobileDet — a state-of-the-art object detection model for mobile systems — into our models portfolio. We’re migrating our model-development workflow to TensorFlow 2, and we’re including a handful of updated or new models based on the TF2 Keras framework. For details, check out the full announcement on the TensorFlow blog.

We’re also excited to see great developer tools coming from our ecosystem partners. For example, PerceptiLabs offers a visual API for building TensorFlow models and recently published a new demo which trains a machine learning model to identify sign language optimized for the edge with Coral.

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

And SigFox released a radio transceiver board that stacks on either the Coral Dev Board or Dev Board Mini. This allows small data payloads to be transmitted across low power, long range radio networks for use cases like smart cities, fleet management, asset tracking, agriculture and energy. The PCB design will be offered as a free download on SigFox’s website. Google Cloud Solutions Architect Markku Lepisto will present the new design today, in the opening keynote at SigFox Connect.

Customers with a Coral edge

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

Just in time for harvest we wanted to share a story about how Farmwave is using Coral to improve the efficiency of farm equipment and reduce food waste. Traditional yield loss analysis involves hand-counting grains of corn left on the ground mid harvest. It’s a time and labor intensive task, and not feasible for farmers who measure the value of their half-million-dollar combines in minutes spent running them.

By leveraging Coral’s on-device AI capabilities, Farmwave was able to build a system that automates the count while the machine is running. Thus allowing farmers to make real-time adjustments to harvesting machines in response to conditions in the field, which can make a big difference in yield.

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Revolving Sushi Bar in Japan has always been committed to the highest standards of health and safety for its customers. Known for their tech forward approach, Kura has dabbled in sushi making robots, an automated prize machine called Bikkura-pon, and a patented dome-shaped dish cover, aptly dubbed Mr. Fresh. But most recently, Kura has used Coral to develop an AI powered system that not only facilitates efficiency for better customer experiences, but also enables better tracking to prevent foodborne illnesses.

Making AI more accessible

While this year has presented the world with many obstacles, we’ve been impressed by the new ideas and innovations coming forward through technology. By providing the necessary tools and technology for edge AI, we strive to empower society to create affordable, adaptable, and intelligent systems.

We are excited to share all that Coral has to offer as we evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the Coral partnerships page.

Please visit Coral.ai to discover more about our edge ML platform and share your feedback at [email protected]. To receive future Coral updates directly in your inbox, sign up for our newsletter.