Tag Archives: machine learning

Using AutoML for Time Series Forecasting

Time series forecasting is an important research area for machine learning (ML), particularly where accurate forecasting is critical, including several industries such as retail, supply chain, energy, finance, etc. For example, in the consumer goods domain, improving the accuracy of demand forecasting by 10-20% can reduce inventory by 5% and increase revenue by 2-3%. Current ML-based forecasting solutions are usually built by experts and require significant manual effort, including model construction, feature engineering and hyper-parameter tuning. However, such expertise may not be broadly available, which can limit the benefits of applying ML towards time series forecasting challenges.

To address this, automated machine learning (AutoML) is an approach that makes ML more widely accessible by automating the process of creating ML models, and has recently accelerated both ML research and the application of ML to real-world problems. For example, the initial work on neural architecture search enabled breakthroughs in computer vision, such as NasNet, AmoebaNet, and EfficientNet, and in natural language processing, such as Evolved Transformer. More recently, AutoML has also been applied to tabular data.

Today we introduce a scalable end-to-end AutoML solution for time series forecasting, which meets three key criteria:

  • Fully automated: The solution takes in data as input, and produces a servable TensorFlow model as output with no human intervention.
  • Generic: The solution works for most time series forecasting tasks and automatically searches for the best model configuration for each task.
  • High-quality: The produced models have competitive quality compared to those manually crafted for specific tasks.

We demonstrate the success of this approach through participation in the M5 forecasting competition, where this AutoML solution achieved competitive performance against hand-crafted models with moderate compute cost.

Challenges in Time Series Forecasting
Time series forecasting presents several challenges to machine learning models. First, the uncertainty is often high since the goal is to predict the future based on historical data. Unlike other machine learning problems, the test set, for example, future product sales, might have a different distribution from the training and validation set, which are extracted from the historical data. Second, the time series data from the real world often suffers from missing data and high intermittency (i.e., when a high fraction of the time series has the value of zero). Some time series tasks may not have historical data available and suffer from the cold start problem, for example, when predicting the sales of a new product. Third, since we aim to build a fully automated generic solution, the same solution needs to apply to a variety of datasets, which can vary significantly in the domain (product sales, web traffic, etc), the granularity (daily, hourly, etc), the history length, the types of features (categorical, numerical, date time, etc), and so on.

An AutoML Solution
To tackle these challenges, we designed an end-to-end TensorFlow pipeline with a specialized search space for time series forecasting. It is based on an encoder-decoder architecture, in which an encoder transforms the historical information in a time series into a set of vectors, and a decoder generates the future predictions based on these vectors. Inspired by the state-of-the-art sequence models, such as Transformer and WaveNet, and best practices in time series forecasting, our search space included components such as attention, dilated convolution, gating, skip connections, and different feature transformations. The resulting AutoML solution searches for the best combination of these components as well as core hyperparameters.

To combat the uncertainty in predicting the future of a time series, an ensemble of the top models discovered in the search is used to make final predictions. The diversity in the top models made the predictions more robust to uncertainty and less prone to overfitting the historical data. To handle time series with missing data, we fill in the gaps with a trainable vector and let the model learn to adapt to the missing time steps. To address intermittency, we predict, for each future time step, not only the value, but also the probability that the value at this time step is non-zero, and combine the two predictions. Finally, we found that the automated search is able to adjust the architecture and hyperparameter choices for different datasets, which makes the AutoML solution generic and automates the modeling efforts.

Benchmarking in Forecasting Competitions
To benchmark our AutoML solution, we participated in the M5 forecasting competition, the latest in the M-competition series, which is one of the most important competitions in the forecasting community, with a long history spanning nearly 40 years. This most recent competition was hosted on Kaggle and used a dataset from Walmart product sales, the real-world nature of which makes the problem quite challenging.

We participated in the competition with our fully automated solution and achieved a rank of 138 out of 5558 participants (top 2.5%) on the final leaderboard, which is in the silver medal zone. Participants in the competition had almost four months to produce their models. While many of the competitive forecasting models required months of manual effort to create, our AutoML solution found the model in a short time with only a moderate compute cost (500 CPUs for 2 hours) and no human intervention.

We also benchmarked our AutoML forecasting solution on several other Kaggle datasets and found that on average it outperforms 92% of hand-crafted models, despite its limited resource use.

Evaluation of the AutoML Forecasting solution on other Kaggle Datasets (Rossman Store Sales, Web Traffic, Favorita Grocery Sales) besides M5.

This work demonstrates the strength of an end-to-end AutoML solution for time series forecasting, and we are excited about its potential impact on real-world applications.

Acknowledgements
This project was a joint effort of Google Brain team members Chen Liang, Da Huang, Yifeng Lu and Quoc V. Le. We also thank Junwei Yuan, Xingwei Yang, Dawei Jia, Chenyu Zhao, Tin-yun Ho, Meng Wang, Yaguang Li, Nicolas Loeff, Manish Kurse, Kyle Anderson and Nishant Patil for their collaboration.

Source: Google AI Blog


Transformers for Image Recognition at Scale

While convolutional neural networks (CNNs) have been used in computer vision since the 1980s, they were not at the forefront until 2012 when AlexNet surpassed the performance of contemporary state-of-the-art image recognition methods by a large margin. Two factors helped enable this breakthrough: (i) the availability of training sets like ImageNet, and (ii) the use of commoditized GPU hardware, which provided significantly more compute for training. As such, since 2012, CNNs have become the go-to model for vision tasks.

The benefit of using CNNs was that they avoided the need for hand-designed visual features, instead learning to perform tasks directly from data “end to end”. However, while CNNs avoid hand-crafted feature-extraction, the architecture itself is designed specifically for images and can be computationally demanding. Looking forward to the next generation of scalable vision models, one might ask whether this domain-specific design is necessary, or if one could successfully leverage more domain agnostic and computationally efficient architectures to achieve state-of-the-art results.

As a first step in this direction, we present the Vision Transformer (ViT), a vision model based as closely as possible on the Transformer architecture originally designed for text-based tasks. ViT represents an input image as a sequence of image patches, similar to the sequence of word embeddings used when applying Transformers to text, and directly predicts class labels for the image. ViT demonstrates excellent performance when trained on sufficient data, outperforming a comparable state-of-the-art CNN with four times fewer computational resources. To foster additional research in this area, we have open-sourced both the code and models.

The Vision Transformer treats an input image as a sequence of patches, akin to a series of word embeddings generated by a natural language processing (NLP) Transformer.

The Vision Transformer
The original text Transformer takes as input a sequence of words, which it then uses for classification, translation, or other NLP tasks. For ViT, we make the fewest possible modifications to the Transformer design to make it operate directly on images instead of words, and observe how much about image structure the model can learn on its own.

ViT divides an image into a grid of square patches. Each patch is flattened into a single vector by concatenating the channels of all pixels in a patch and then linearly projecting it to the desired input dimension. Because Transformers are agnostic to the structure of the input elements we add learnable position embeddings to each patch, which allow the model to learn about the structure of the images. A priori, ViT does not know about the relative location of patches in the image, or even that the image has a 2D structure — it must learn such relevant information from the training data and encode structural information in the position embeddings.

Scaling Up

We first train ViT on ImageNet, where it achieves a best score of 77.9% top-1 accuracy. While this is decent for a first attempt, it falls far short of the state of the art — the current best CNN trained on ImageNet with no extra data reaches 85.8%. Despite mitigation strategies (e.g., regularization), ViT overfits the ImageNet task due to its lack of inbuilt knowledge about images.

To investigate the impact of dataset size on model performance, we train ViT on ImageNet-21k (14M images, 21k classes) and JFT (300M images, 18k classes), and compare the results to a state-of-the-art CNN, Big Transfer (BiT), trained on the same datasets. As previously observed, ViT performs significantly worse than the CNN equivalent (BiT) when trained on ImageNet (1M images). However, on ImageNet-21k (14M images) performance is comparable, and on JFT (300M images), ViT now outperforms BiT.

Finally, we investigate the impact of the amount of computation involved in training the models. For this, we train several different ViT models and CNNs on JFT. These models span a range of model sizes and training durations. As a result, they require varying amounts of compute for training. We observe that, for a given amount of compute, ViT yields better performance than the equivalent CNNs.

Left: Performance of ViT when pre-trained on different datasets. Right: ViT yields a good performance/compute trade-off.

High-Performing Large-Scale Image Recognition
Our data suggest that (1) with sufficient training ViT can perform very well, and (2) ViT yields an excellent performance/compute trade-off at both smaller and larger compute scales. Therefore, to see if performance improvements carried over to even larger scales, we trained a 600M-parameter ViT model.

This large ViT model attains state-of-the-art performance on multiple popular benchmarks, including 88.55% top-1 accuracy on ImageNet and 99.50% on CIFAR-10. ViT also performs well on the cleaned-up version of the ImageNet evaluations set “ImageNet-Real”, attaining 90.72% top-1 accuracy. Finally, ViT works well on diverse tasks, even with few training data points. For example, on the VTAB-1k suite (19 tasks with 1,000 data points each), ViT attains 77.63%, significantly ahead of the single-model state of the art (SOTA) (76.3%), and even matching SOTA attained by an ensemble of multiple models (77.6%). Most importantly, these results are obtained using fewer compute resources compared to previous SOTA CNNs, e.g., 4x fewer than the pre-trained BiT models.

Vision Transformer matches or outperforms state-of-the-art CNNs on popular benchmarks. Left: Popular image classification tasks (ImageNet, including new validation labels ReaL, and CIFAR, Pets, and Flowers). Right: Average across 19 tasks in the VTAB classification suite.

Visualizations
To gain some intuition into what the model learns, we visualize some of its internal workings. First, we look at the position embeddings — parameters that the model learns to encode the relative location of patches — and find that ViT is able to reproduce an intuitive image structure. Each position embedding is most similar to others in the same row and column, indicating that the model has recovered the grid structure of the original images. Second, we examine the average spatial distance between one element attending to another for each transformer block. At higher layers (depths of 10-20) only global features are used (i.e., large attention distances), but the lower layers (depths 0-5) capture both global and local features, as indicated by a large range in the mean attention distance. By contrast, only local features are present in the lower layers of a CNN. These experiments indicate that ViT can learn features hard-coded into CNNs (such as awareness of grid structure), but is also free to learn more generic patterns, such as a mix of local and global features at lower layers, that can aid generalization.

Left: ViT learns the grid like structure of the image patches via its position embeddings. Right: The lower layers of ViT contain both global and local features, the higher layers contain only global features.

Summary
While CNNs have revolutionized computer vision, our results indicate that models tailor-made for imaging tasks may be unnecessary, or even sub-optimal. With ever-increasing dataset sizes, and the continued development of unsupervised and semi-supervised methods, the development of new vision architectures that train more efficiently on these datasets becomes increasingly important. We believe ViT is a preliminary step towards generic, scalable architectures that can solve many vision tasks, or even tasks from many domains, and are excited for future developments.

A preprint of our work as well as code and models are publically available.

Acknowledgements
We would like to thank our co-authors in Berlin, Zürich, and Amsterdam: Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, and Jakob Uszkoreit. We would like to thank Andreas Steiner for crucial help with infrastructure and open-sourcing, Joan Puigcerver and Maxim Neumann for work on large-scale training infrastructure, and Dmitry Lepikhin, Aravindh Mahendran, Daniel Keysers, Mario Lučić, Noam Shazeer, and Colin Raffel for useful discussions. Finally, we thank Tom Small for creating the Visual Transformer animation in this post.

Source: Google AI Blog


From MLPerf to MLCommons: moving machine learning forward

Today, the community of machine learning researchers and engineers behind the MLPerf benchmark is launching an open engineering consortium called MLCommons. For us, this is the next step in a journey that started almost three years ago.


Early in 2018, we gathered a group of industry researchers and academics who had published work on benchmarking machine learning (ML), in a conference room to propose the creation of an industry standard benchmark to measure ML performance. Everyone had doubts: creating an industry standard is challenging under the best conditions and ML was (and is) a poorly understood stochastic process running on extremely diverse software and hardware. Yet, we all agreed to try.

Together, along with a growing community of researchers and academics, we created a new benchmark called MLPerf. The effort took off. MLPerf is now an industry standard with over 2,000 submitted results and multiple benchmarks suites that span systems from smartphones to supercomputers. Over that time, the fastest result submitted to MLPerf for training the classic ML network ResNet improved by over 13x.

We created MLPerf because we believed in three principles:
  • Machine learning has tremendous potential: Already, machine learning helps billions of people find and understand information through tools like Google’s search engine and translation service. Active research in machine learning could one day save millions of lives through improvements in healthcare and automotive safety.
  • Transforming machine learning from promising research into wide-spread industrial practice requires investment in common infrastructure -- especially metrics: Much like computing in the ‘80s, real innovation is mixed with hype and adopting new ideas is slow and cumbersome. We need good metrics to identify the best ideas, and good infrastructure to make adoption of new techniques fast and easy.
  • Developing common infrastructure is best done by an open, fast-moving collaboration: We need the vision of academics and the resources of industry. We need the agility of startups and the scale of leading tech companies. Working together, a diverse community can develop new ideas, launch experiments, and rapidly iterate to arrive at shared solutions.
Our belief in the principles behind MLPerf has only gotten stronger, and we are excited to be part of the next step for the MLPerf community with the launch of MLCommons.

MLCommons aims to accelerate machine learning to benefit everyone. MLCommons will build a a common set of tools for ML practitioners including:
  • Benchmarks to measure progress: MLCommons will leverage MLPerf to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).
  • Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude, better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision. 
  • Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.
Google believes in the potential of machine learning, the importance of common infrastructure, and the power of open, collaborative development. Our leadership in co-founding, and deep support in sustaining, MLPerf and MLCommons has echoed our involvement in other efforts like TensorFlow and NNAPI. Together with the MLCommons community, we can improve machine learning to benefit everyone.

Want to get involved? Learn more at mlcommons.org.


By Peter Mattson – ML Metrics, Naveen Kumar – ML Performance, and Cliff Young – Google Brain

Aligning thousands of Billie Eilish covers in an infinite music video experiment

Posted by Google Creative Lab

Billie Eilish gif

“Bad Guy” by Billie Eilish is one of the most-covered songs on YouTube, inspiring thousands of fans to upload their own versions. To celebrate all these covers, YouTube and Google Creative Lab built an AI experiment to combine all of them seamlessly in the world’s first infinite music video: Infinite Bad Guy. The experience aligns every cover to the same beat, no matter its genre, language, or instrumentation.

Finding all the covers

How do you find “Bad Guy” covers amidst all the billions of videos on YouTube? Just searching for “Bad Guy” would result in false positives, like videos of Billie being interviewed about the song, or miss covers that didn’t use the song name in their titles. YouTube’s ContentID system allows us to find videos that match the musical composition “Bad Guy” and also allows us to narrow our search to videos that appear to be performances or creative interpretations of the song. That way, we can also avoid videos where “Bad Guy” was just background music. We continue to run this search daily, collecting an ever-expanding list of potential covers to use in the experience.

Finding all the covers

Aligning all the covers to the same beat

A key part of the experience is being able to jump from cover to cover seamlessly. But fan covers of “Bad Guy” vary widely. Some might be similar to the original, like a dance video set to Billie’s track. Some might vary more in tempo and instrumentation, like a heavy metal cover. And others might diverge greatly from the original, like a clarinet version with no lyrics. How can you get all these covers on the same beat? After trying several approaches like dynamic time warping and chord recognition, we’ve found the most success with a recurrent neural network trained to recognize sections and beats of “Bad Guy.” We collaborated with our friends at IYOYO on cover alignment and they have a great writeup about the process.

Aligning all the covers to the same beat

Building the experience

Finding and aligning the covers is a fascinating research problem, but the crucial final step is making them explorable to everyone. We’ve tried to make it intuitive and fun to navigate all the infinite combinations, while keeping latency low so the song never drops a beat.

The experience centers around three YouTube players, a number we settled on after a lot of experimentation. Initially we thought more players would be more interesting, but the experience got chaotic and slow. Around the players we’ve added discoverable features like the hashtag drawer and stats page. Video game interfaces have been a big inspiration for us, as they combine multiple interactions in a single dashboard. We’ve also added an autoplay mode for users who want to just sit back and be taken through an ever-changing mix of covers.

We’re excited about how Infinite Bad Guy showcases the incredibly diverse talent of YouTube and the potential machine learning can have for music and creativity. Give it a try and see what beautiful, strange, and brilliant covers you can find.

Irem from Turkey shares her groundbreaking work in TensorFlow and advice for the community

Posted by Jennifer Kohl, Global Program Manager, Google Developer Groups

Irem presenting at a Google Developer Group event

We recently caught up with Irem Komurcu, a TensorFlow developer and researcher at Istanbul Technical University in Turkey. Irem has been a long-serving member of Google Developer Groups (GDG) Düzce and also serves as a Women Techmakers (WTM) ambassador. Her work with TensorFlow has received several accolades, including being named a Hamdi Ulukaya Girişimi fellow. As one one of twenty-four young entrepreneurs selected, she was flown to New York City last year to learn more about business and receive professional development.

With all this experience to share, we wanted you to hear how she approaches pursuing a career in tech, hones her TensorFlow skills with the GDG community, and thinks about how upcoming programmers can best position themselves for success. Check out the full interview below for more.

What inspired you to pursue a career in technology?

I first became interested in tech when I was in high school and went on to study computer engineering. At university, I had an eye-opening experience when I traveled from Turkey to the Google Developer Day event in India. It was here where I observed various code languages, products, and projects that were new to me.

In particular, I saw TensorFlow in action for the first time. Watching the powerful machine learning tool truly sparked my interest in deep learning and project development.

Can you describe your work with TensorFlow and Machine Learning?

I have studied many different aspects of Tensorflow and ML. My first work was on voice recognition and deep learning. However, I am now working as a computer vision researcher conducting various segmentation, object detection, and classification processes with Tensorflow. In my free time, I write various articles about best practices and strategies to leverage TensorFlow in ML.

What has been a useful learning resource you have used in your career?

I kicked off my studies on deep learning on tensorflow.org. It’s a basic first step, but a powerful one. There were so many blogs, codes, examples, and tutorials for me to dive into. Both the Google Developer Group and TensorFlow communities also offered chances to bounce questions and ideas off other developers as I learned.

Between these technical resources and the person-to-person support, I was lucky to start working with the GDG community while also taking the first steps of my career. There were so many opportunities to meet people and grow all around.

What is your favorite part of the Google Developer Group community?

I love being in a large community with technology-oriented people. GDG is a network of professionals who support each other, and that enables people to develop. I am continuously sharing my knowledge with other programmers as they simultaneously mentor me. The chance for us to collaborate together is truly fulfilling.

What is unique about being a developer in your country/region?

The number of women supported in science, technology, engineering, and mathematics (STEM) is low in Turkey. To address this, I partner with Women Techmakers (WTM) to give educational talks on TensorFlow and machine learning to women who want to learn how to code in my country. So many women are interested in ML, but just need a friendly, familiar face to help them get started. With WTM, I’ve already given over 30 talks to women in STEM.

What advice would you give to someone who is trying to grow their career as a developer?

Keep researching new things. Read everything you can get your eyes on. Technology has been developing rapidly, and it is necessary to make sure your mind can keep up with the pace. That’s why I recommend communities like GDG that help make sure you’re up to date on the newest trends and learnings.


Want to work with other developers like Irem? Then find the right Google Developer Developer Group for you, here.

Passionate former DSC lead Irene inspires others to learn Google technologies with her new podcast and more

Posted by Erica Hanson, Global Program Manager, Google Developer Student Clubs

(Irene (left) and her DSC team from the Polytechnic University of Cartagena (photo prior to COVID-19)

Irene Ruiz Pozo is a former Google Developer Student Club (DSC) Lead at the Polytechnic University of Cartagena in Murcia, Spain. As one of the founding members, Irene has seen the club grow from just a few student developers at her university to hosting multiple learning events across Spain. Recently, we spoke with Irene to understand more about the unique ways in which her team helped local university students learn more about Google technologies.

Real world ML and AR learning opportunities

Irene mentioned two fascinating projects that she had the chance to work on through her DSC at the Polytechnic University of Cartagena. The first was a learning lab that helped students understand how to use 360º cameras and 3D scanners for machine learning.

(A DSC member giving a demo of a 360º camera to students at the National Museum of Underwater Archeology in Cartagena)

The second was a partnership with the National Museum of Underwater Archeology, where Irene and her team created an augmented reality game that let students explore a digital rendition of the museum’s exhibitions.

(An image from the augmented reality game created for the National Museum of Underwater Archeology)

In the above AR experience created by Irene’s team, users can create their own character and move throughout the museum and explore different virtual renditions of exhibits in a video game-like setting.

Hash Code competition and experiencing the Google work culture

One particularly memorable experience for Irene and her DSC was participating in Google’s annual programming competition, Hash Code. As Irene explained, the event allowed developers to share their skills and connect in small teams of two to four programmers. They would then come together to tackle engineering problems like how to best design the layout of a Google data center, create the perfect video streaming experience on YouTube, or establish the best practices for compiling code at Google scale.

(Students working on the Hash Code competition (photo taken prior to COVID-19)

To Irene, the experience felt like a live look at being a software engineer at Google. The event taught her and her DSC team that while programming skills are important, communication and collaboration skills are what really help solve problems. For Irene, the experience truly bridged the gap between theory and practice.

Expanding knowledge with a podcast for student developers

(Irene’s team working with other student developers (photo taken before COVID-19)

After the event, Irene felt that if a true mentorship network was established among other DSCs in Europe, students would feel more comfortable partnering with one another to talk about common problems they faced. Inspired, she began to build out her mentorship program which included a podcast where student developers could collaborate on projects together.

The podcast, which just released its second episode, also highlights upcoming opportunities for students. In the most recent episode, Irene and friends dive into how to apply for Google Summer of Code Scholarships and talk about other upcoming open source project opportunities. Organizing these types of learning experiences for the community was one of the most fulfilling parts of working as a DSC Lead, according to Irene. She explained that the podcast has been an exciting space that allows her and other students to get more experience presenting ideas to an audience. Through this podcast, Irene has already seen many new DSC members eager to join the conversation and collaborate on new ideas.

As Irene now looks out on her future, she is excited for all the learning and career development that awaits her from the entire Google Developer community. Having graduated from university, Irene is now a Google Developer Groups (GDG) Lead - a program similar to DSC, but created for the professional developer community. In this role, she is excited to learn new skills and make professional connections that will help her start her career.

Are you also a student with a passion for code? Then join a local Google Developer Student Club near you, here.

Coral makes edge AI even more accessible in 2020

Posted by the Coral team

Coral Dev Board Mini and Accelerator Module feature Google's Edge TPU co-processor to accelerate AI at the edge.

Since we launched Coral back in March 2019, we’ve added a number of new product form factors to accommodate the many ways users are adding on-device ML to their products. We've also streamlined the ML workflow and added capabilities like model pipelining with multiple Edge TPUs for an easier and more robust developer experience. And from this, we’ve helped enable amazing use cases from smart water meters that prevent water loss with Olea Edge, to systems for improving harvest yield with Farmwave, to noise cancellation in meetings in Google’s own Series One meeting kits.

This week, we’ll begin shipping the Coral Accelerator Module, a multi-chip module that combines the Edge TPU and it’s power circuitry into a solderable package. The module exposes PCIe and USB2 interfaces, which make it even easier to integrate Coral into custom designs. Several companies are already taking advantage of the compact size and capabilities with their new products coming to market. Read more about how Gumstix, STD, Siana Systems and IEI are using our module.

And in December, we’ll begin shipping the Dev Board Mini, a smaller, more power-efficient, and value-oriented board that brings forward a more traditional, flattened single-board computer design. The Dev Board Mini pairs a Mediatek 8167 SoC with the Coral Accelerator Module over USB 2 and is a great way to evaluate the module as the center of a project or deployment.

You can see the new Dev Board Mini and Accelerator Module in action in the latest episode of Level Up, where Markku Lepisto controls his studio lights with speech commands.

To get updates on when the board will be available for purchase and other Coral news, sign up for our newsletter.

Developing for the edge, now simplified

We recently announced a new version of the Coral ML APIs and tools. This release brings the C++ API into parity with Python and makes it more modular, reusable and performant. At the same time it eliminates unnecessary abstractions and surfaces replacing them with native TensorFlow Lite APIs. This release also graduates the Model Pipelining API out of beta and introduces a new model partitioner that automatically partitions models based on profiling and up to 10x better performance.

We’ve added a pre-trained version of MobileDet — a state-of-the-art object detection model for mobile systems — into our models portfolio. We’re migrating our model-development workflow to TensorFlow 2, and we’re including a handful of updated or new models based on the TF2 Keras framework. For details, check out the full announcement on the TensorFlow blog.

We’re also excited to see great developer tools coming from our ecosystem partners. For example, PerceptiLabs offers a visual API for building TensorFlow models and recently published a new demo which trains a machine learning model to identify sign language optimized for the edge with Coral.

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

And SigFox released a radio transceiver board that stacks on either the Coral Dev Board or Dev Board Mini. This allows small data payloads to be transmitted across low power, long range radio networks for use cases like smart cities, fleet management, asset tracking, agriculture and energy. The PCB design will be offered as a free download on SigFox’s website. Google Cloud Solutions Architect Markku Lepisto will present the new design today, in the opening keynote at SigFox Connect.

Customers with a Coral edge

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

Just in time for harvest we wanted to share a story about how Farmwave is using Coral to improve the efficiency of farm equipment and reduce food waste. Traditional yield loss analysis involves hand-counting grains of corn left on the ground mid harvest. It’s a time and labor intensive task, and not feasible for farmers who measure the value of their half-million-dollar combines in minutes spent running them.

By leveraging Coral’s on-device AI capabilities, Farmwave was able to build a system that automates the count while the machine is running. Thus allowing farmers to make real-time adjustments to harvesting machines in response to conditions in the field, which can make a big difference in yield.

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Revolving Sushi Bar in Japan has always been committed to the highest standards of health and safety for its customers. Known for their tech forward approach, Kura has dabbled in sushi making robots, an automated prize machine called Bikkura-pon, and a patented dome-shaped dish cover, aptly dubbed Mr. Fresh. But most recently, Kura has used Coral to develop an AI powered system that not only facilitates efficiency for better customer experiences, but also enables better tracking to prevent foodborne illnesses.

Making AI more accessible

While this year has presented the world with many obstacles, we’ve been impressed by the new ideas and innovations coming forward through technology. By providing the necessary tools and technology for edge AI, we strive to empower society to create affordable, adaptable, and intelligent systems.

We are excited to share all that Coral has to offer as we evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the Coral partnerships page.

Please visit Coral.ai to discover more about our edge ML platform and share your feedback at [email protected]. To receive future Coral updates directly in your inbox, sign up for our newsletter.

Coral makes edge AI even more accessible in 2020

Posted by the Coral team

Coral Dev Board Mini and Accelerator Module feature Google's Edge TPU co-processor to accelerate AI at the edge.

Since we launched Coral back in March 2019, we’ve added a number of new product form factors to accommodate the many ways users are adding on-device ML to their products. We've also streamlined the ML workflow and added capabilities like model pipelining with multiple Edge TPUs for an easier and more robust developer experience. And from this, we’ve helped enable amazing use cases from smart water meters that prevent water loss with Olea Edge, to systems for improving harvest yield with Farmwave, to noise cancellation in meetings in Google’s own Series One meeting kits.

This week, we’ll begin shipping the Coral Accelerator Module, a multi-chip module that combines the Edge TPU and it’s power circuitry into a solderable package. The module exposes PCIe and USB2 interfaces, which make it even easier to integrate Coral into custom designs. Several companies are already taking advantage of the compact size and capabilities with their new products coming to market. Read more about how Gumstix, STD, Siana Systems and IEI are using our module.

And in December, we’ll begin shipping the Dev Board Mini, a smaller, more power-efficient, and value-oriented board that brings forward a more traditional, flattened single-board computer design. The Dev Board Mini pairs a Mediatek 8167 SoC with the Coral Accelerator Module over USB 2 and is a great way to evaluate the module as the center of a project or deployment.

You can see the new Dev Board Mini and Accelerator Module in action in the latest episode of Level Up, where Markku Lepisto controls his studio lights with speech commands.

To get updates on when the board will be available for purchase and other Coral news, sign up for our newsletter.

Developing for the edge, now simplified

We recently announced a new version of the Coral ML APIs and tools. This release brings the C++ API into parity with Python and makes it more modular, reusable and performant. At the same time it eliminates unnecessary abstractions and surfaces replacing them with native TensorFlow Lite APIs. This release also graduates the Model Pipelining API out of beta and introduces a new model partitioner that automatically partitions models based on profiling and up to 10x better performance.

We’ve added a pre-trained version of MobileDet — a state-of-the-art object detection model for mobile systems — into our models portfolio. We’re migrating our model-development workflow to TensorFlow 2, and we’re including a handful of updated or new models based on the TF2 Keras framework. For details, check out the full announcement on the TensorFlow blog.

We’re also excited to see great developer tools coming from our ecosystem partners. For example, PerceptiLabs offers a visual API for building TensorFlow models and recently published a new demo which trains a machine learning model to identify sign language optimized for the edge with Coral.

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

And SigFox released a radio transceiver board that stacks on either the Coral Dev Board or Dev Board Mini. This allows small data payloads to be transmitted across low power, long range radio networks for use cases like smart cities, fleet management, asset tracking, agriculture and energy. The PCB design will be offered as a free download on SigFox’s website. Google Cloud Solutions Architect Markku Lepisto will present the new design today, in the opening keynote at SigFox Connect.

Customers with a Coral edge

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

Just in time for harvest we wanted to share a story about how Farmwave is using Coral to improve the efficiency of farm equipment and reduce food waste. Traditional yield loss analysis involves hand-counting grains of corn left on the ground mid harvest. It’s a time and labor intensive task, and not feasible for farmers who measure the value of their half-million-dollar combines in minutes spent running them.

By leveraging Coral’s on-device AI capabilities, Farmwave was able to build a system that automates the count while the machine is running. Thus allowing farmers to make real-time adjustments to harvesting machines in response to conditions in the field, which can make a big difference in yield.

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Revolving Sushi Bar in Japan has always been committed to the highest standards of health and safety for its customers. Known for their tech forward approach, Kura has dabbled in sushi making robots, an automated prize machine called Bikkura-pon, and a patented dome-shaped dish cover, aptly dubbed Mr. Fresh. But most recently, Kura has used Coral to develop an AI powered system that not only facilitates efficiency for better customer experiences, but also enables better tracking to prevent foodborne illnesses.

Making AI more accessible

While this year has presented the world with many obstacles, we’ve been impressed by the new ideas and innovations coming forward through technology. By providing the necessary tools and technology for edge AI, we strive to empower society to create affordable, adaptable, and intelligent systems.

We are excited to share all that Coral has to offer as we evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the Coral partnerships page.

Please visit Coral.ai to discover more about our edge ML platform and share your feedback at [email protected]. To receive future Coral updates directly in your inbox, sign up for our newsletter.

Coral makes edge AI even more accessible in 2020

Posted by the Coral team

Coral Dev Board Mini and Accelerator Module feature Google's Edge TPU co-processor to accelerate AI at the edge.

Since we launched Coral back in March 2019, we’ve added a number of new product form factors to accommodate the many ways users are adding on-device ML to their products. We've also streamlined the ML workflow and added capabilities like model pipelining with multiple Edge TPUs for an easier and more robust developer experience. And from this, we’ve helped enable amazing use cases from smart water meters that prevent water loss with Olea Edge, to systems for improving harvest yield with Farmwave, to noise cancellation in meetings in Google’s own Series One meeting kits.

This week, we’ll begin shipping the Coral Accelerator Module, a multi-chip module that combines the Edge TPU and it’s power circuitry into a solderable package. The module exposes PCIe and USB2 interfaces, which make it even easier to integrate Coral into custom designs. Several companies are already taking advantage of the compact size and capabilities with their new products coming to market. Read more about how Gumstix, STD, Siana Systems and IEI are using our module.

And in December, we’ll begin shipping the Dev Board Mini, a smaller, more power-efficient, and value-oriented board that brings forward a more traditional, flattened single-board computer design. The Dev Board Mini pairs a Mediatek 8167 SoC with the Coral Accelerator Module over USB 2 and is a great way to evaluate the module as the center of a project or deployment.

You can see the new Dev Board Mini and Accelerator Module in action in the latest episode of Level Up, where Markku Lepisto controls his studio lights with speech commands.

To get updates on when the board will be available for purchase and other Coral news, sign up for our newsletter.

Developing for the edge, now simplified

We recently announced a new version of the Coral ML APIs and tools. This release brings the C++ API into parity with Python and makes it more modular, reusable and performant. At the same time it eliminates unnecessary abstractions and surfaces replacing them with native TensorFlow Lite APIs. This release also graduates the Model Pipelining API out of beta and introduces a new model partitioner that automatically partitions models based on profiling and up to 10x better performance.

We’ve added a pre-trained version of MobileDet — a state-of-the-art object detection model for mobile systems — into our models portfolio. We’re migrating our model-development workflow to TensorFlow 2, and we’re including a handful of updated or new models based on the TF2 Keras framework. For details, check out the full announcement on the TensorFlow blog.

We’re also excited to see great developer tools coming from our ecosystem partners. For example, PerceptiLabs offers a visual API for building TensorFlow models and recently published a new demo which trains a machine learning model to identify sign language optimized for the edge with Coral.

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

The MRQ design from SigFox enables prototyping at the edge for low bandwidth IoT solutions with Coral

And SigFox released a radio transceiver board that stacks on either the Coral Dev Board or Dev Board Mini. This allows small data payloads to be transmitted across low power, long range radio networks for use cases like smart cities, fleet management, asset tracking, agriculture and energy. The PCB design will be offered as a free download on SigFox’s website. Google Cloud Solutions Architect Markku Lepisto will present the new design today, in the opening keynote at SigFox Connect.

Customers with a Coral edge

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

The tool, from Farmwave, includes custom-developed ML models, a harvester-mounted box with cameras, an in-cab display, and on- device AI acceleration from Coral.

Just in time for harvest we wanted to share a story about how Farmwave is using Coral to improve the efficiency of farm equipment and reduce food waste. Traditional yield loss analysis involves hand-counting grains of corn left on the ground mid harvest. It’s a time and labor intensive task, and not feasible for farmers who measure the value of their half-million-dollar combines in minutes spent running them.

By leveraging Coral’s on-device AI capabilities, Farmwave was able to build a system that automates the count while the machine is running. Thus allowing farmers to make real-time adjustments to harvesting machines in response to conditions in the field, which can make a big difference in yield.

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Sushi designed their intelligent QA system using a Raspberry Pi paired with the Coral USB Accelerator

Kura Revolving Sushi Bar in Japan has always been committed to the highest standards of health and safety for its customers. Known for their tech forward approach, Kura has dabbled in sushi making robots, an automated prize machine called Bikkura-pon, and a patented dome-shaped dish cover, aptly dubbed Mr. Fresh. But most recently, Kura has used Coral to develop an AI powered system that not only facilitates efficiency for better customer experiences, but also enables better tracking to prevent foodborne illnesses.

Making AI more accessible

While this year has presented the world with many obstacles, we’ve been impressed by the new ideas and innovations coming forward through technology. By providing the necessary tools and technology for edge AI, we strive to empower society to create affordable, adaptable, and intelligent systems.

We are excited to share all that Coral has to offer as we evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the Coral partnerships page.

Please visit Coral.ai to discover more about our edge ML platform and share your feedback at [email protected]. To receive future Coral updates directly in your inbox, sign up for our newsletter.

Using GANs to Create Fantastical Creatures

Creating art for digital video games takes a high degree of artistic creativity and technical knowledge, while also requiring game artists to quickly iterate on ideas and produce a high volume of assets, often in the face of tight deadlines. What if artists had a paintbrush that acted less like a tool and more like an assistant? A machine learning model acting as such a paintbrush could reduce the amount of time necessary to create high-quality art without sacrificing artistic choices, perhaps even enhancing creativity.

Today, we present Chimera Painter, a trained machine learning (ML) model that automatically creates a fully fleshed out rendering from a user-supplied creature outline. Employed as a demo application, Chimera Painter adds features and textures to a creature outline segmented with body part labels, such as “wings” or “claws”, when the user clicks the “transform” button. Below is an example using the demo with one of the preset creature outlines.

Using an image imported to Chimera Painter or generated with the tools provided, an artist can iteratively construct or modify a creature outline and use the ML model to generate realistic looking surface textures. In this example, an artist (Lee Dotson) customizes one of the creature designs that comes pre-loaded in the Chimera Painter demo.

In this post, we describe some of the challenges in creating the ML model behind Chimera Painter and demonstrate how one might use the tool for the creation of video game-ready assets.

Prototyping for a New Type of Model
In developing an ML model to produce video-game ready creature images, we created a digital card game prototype around the concept of combining creatures into new hybrids that can then battle each other. In this game, a player would begin with cards of real-world animals (e.g., an axolotl or a whale) and could make them more powerful by combining them (making the dreaded Axolotl-Whale chimera). This provided a creative environment for demonstrating an image-generating model, as the number of possible chimeras necessitated a method for quickly designing large volumes of artistic assets that could be combined naturally, while still retaining identifiable visual characteristics of the original creatures.

Since our goal was to create high-quality creature card images guided by artist input, we experimented with generative adversarial networks (GANs), informed by artist feedback, to create creature images that would be appropriate for our fantasy card game prototype. GANs pair two convolutional neural networks against each other: a generator network to create new images and a discriminator network to determine if these images are samples from the training dataset (in this case, artist-created images) or not. We used a variant called a conditional GAN, where the generator takes a separate input to guide the image generation process. Interestingly, our approach was a strict departure from other GAN efforts, which typically focus on photorealism.

To train the GANs, we created a dataset of full color images with single-species creature outlines adapted from 3D creature models. The creature outlines characterized the shape and size of each creature, and provided a segmentation map that identified individual body parts. After model training, the model was tasked with generating multi-species chimeras, based on outlines provided by artists. The best performing model was then incorporated into Chimera Painter. Below we show some sample assets generated using the model, including single-species creatures, as well as the more complex multi-species chimeras.

Generated card art integrated into the card game prototype showing basic creatures (bottom row) and chimeras from multiple creatures, including an Antlion-Porcupine, Axolotl-Whale, and a Crab-Antion-Moth (top row). More info about the game itself is detailed in this Stadia Research presentation.

Learning to Generate Creatures with Structure
An issue with using GANs for generating creatures was the potential for loss of anatomical and spatial coherence when rendering subtle or low-contrast parts of images, despite these being of high perceptual importance to humans. Examples of this can include eyes, fingers, or even distinguishing between overlapping body parts with similar textures (see the affectionately named BoggleDog below).

GAN-generated image showing mismatched body parts.

Generating chimeras required a new non-photographic fantasy-styled dataset with unique characteristics, such as dramatic perspective, composition, and lighting. Existing repositories of illustrations were not appropriate to use as datasets for training an ML model, because they may be subject to licensing restrictions, have conflicting styles, or simply lack the variety needed for this task.

To solve this, we developed a new artist-led, semi-automated approach for creating an ML training dataset from 3D creature models, which allowed us to work at scale and rapidly iterate as needed. In this process, artists would create or obtain a set of 3D creature models, one for each creature type needed (such as hyenas or lions). Artists then produced two sets of textures that were overlaid on the 3D model using the Unreal Engine — one with the full color texture (left image, below) and the other with flat colors for each body part (e.g., head, ears, neck, etc), called a “segmentation map” (right image, below). This second set of body part segments was given to the model at training to ensure that the GAN learned about body part-specific structure, shapes, textures, and proportions for a variety of creatures.

Example dataset training image and its paired segmentation map.

The 3D creature models were all placed in a simple 3D scene, again using the Unreal Engine. A set of automated scripts would then take this 3D scene and interpolate between different poses, viewpoints, and zoom levels for each of the 3D creature models, creating the full color images and segmentation maps that formed the training dataset for the GAN. Using this approach, we generated 10,000+ image + segmentation map pairs per 3D creature model, saving the artists millions of hours of time compared to creating such data manually (at approximately 20 minutes per image).

Fine Tuning
The GAN had many different hyper-parameters that could be adjusted, leading to different qualities in the output images. In order to better understand which versions of the model were better than others, artists were provided samples for different creature types generated by these models and asked to cull them down to a few best examples. We gathered feedback about desired characteristics present in these examples, such as a feeling of depth, style with regard to creature textures, and realism of faces and eyes. This information was used both to train new versions of the model and, after the model had generated hundreds of thousands of creature images, to select the best image from each creature category (e.g., gazelle, lynx, gorilla, etc).

We tuned the GAN for this task by focusing on the perceptual loss. This loss function component (also used in Stadia’s Style Transfer ML) computes a difference between two images using extracted features from a separate convolutional neural network (CNN) that was previously trained on millions of photographs from the ImageNet dataset. The features are extracted from different layers of the CNN and a weight is applied to each, which affects their contribution to the final loss value. We discovered that these weights were critically important in determining what a final generated image would look like. Below are some examples from the GAN trained with different perceptual loss weights.

Dino-Bat Chimeras generated using varying perceptual loss weights.

Some of the variation in the images above is due to the fact that the dataset includes multiple textures for each creature (for example, a reddish or grayish version of the bat). However, ignoring the coloration, many differences are directly tied to changes in perceptual loss values. In particular, we found that certain values brought out sharper facial features (e.g., bottom right vs. top right) or “smooth” versus “patterned” (top right vs. bottom left) that made generated creatures feel more real.

Here are some creatures generated from the GAN trained with different perceptual loss weights, showing off a small sample of the outputs and poses that the model can handle.

Creatures generated using different models.
A generated chimera (Dino-Bat-Hyena, to be exact) created using the conditional GAN. Output from the GAN (left) and the post-processed / composited card (right).

Chimera Painter
The trained GAN is now available in the Chimera Painter demo, allowing artists to work iteratively with the model, rather than drawing dozens of similar creatures from scratch. An artist can select a starting point and then adjust the shape, type, or placement of creature parts, enabling rapid exploration and for the creation of a large volume of images. The demo also allows for uploading a creature outline created in an external program, like Photoshop. Simply download one of the preset creature outlines to get the colors needed for each creature part and use this as a template for drawing one outside of Chimera Painter, and then use the “Load’ button on the demo to use this outline to flesh out your creation.

It is our hope that these GAN models and the Chimera Painter demonstration tool might inspire others to think differently about their art pipeline. What can one create when using machine learning as a paintbrush?

Acknowledgments
This project is conducted in collaboration with many people. Thanks to Ryan Poplin, Lee Dotson, Trung Le, Monica Dinculescu, Marc Destefano, Aaron Cammarata, Maggie Oh, Richard Wu, Ji Hun Kim, Erin Hoffman-John, and Colin Boswell. Thanks to everyone who pitched in to give hours of art direction, technical feedback, and drawings of fantastic creatures.

Source: Google AI Blog