Tag Archives: machine learning

Updates from Coral: Mendel Linux 4.0 and much more!

Posted by Carlos Mendonça (Product Manager), Coral TeamIllustration of the Coral Dev Board placed next to Fall foliage

Last month, we announced that Coral graduated out of beta, into a wider, global release. Today, we're announcing the next version of Mendel Linux (4.0 release Day) for the Coral Dev Board and SoM, as well as a number of other exciting updates.

We have made significant updates to improve performance and stability. Mendel Linux 4.0 release Day is based on Debian 10 Buster and includes upgraded GStreamer pipelines and support for Python 3.7, OpenCV, and OpenCL. The Linux kernel has also been updated to version 4.14 and U-Boot to version 2017.03.3.

We’ve also made it possible to use the Dev Board's GPU to convert YUV to RGB pixel data at up to 130 frames per second on 1080p resolution, which is one to two orders of magnitude faster than on Mendel Linux 3.0 release Chef. These changes make it possible to run inferences with YUV-producing sources such as cameras and hardware video decoders.

To upgrade your Dev Board or SoM, follow our guide to flash a new system image.

MediaPipe on Coral

MediaPipe is an open-source, cross-platform framework for building multi-modal machine learning perception pipelines that can process streaming data like video and audio. For example, you can use MediaPipe to run on-device machine learning models and process video from a camera to detect, track and visualize hand landmarks in real-time.

Developers and researchers can prototype their real-time perception use cases starting with the creation of the MediaPipe graph on desktop. Then they can quickly convert and deploy that same graph to the Coral Dev Board, where the quantized TensorFlow Lite model will be accelerated by the Edge TPU.

As part of this first release, MediaPipe is making available new experimental samples for both object and face detection, with support for the Coral Dev Board and SoM. The source code and instructions for compiling and running each sample are available on GitHub and on the MediaPipe documentation site.

New Teachable Sorter project tutorial

New Teachable Sorter project tutorial

A new Teachable Sorter tutorial is now available. The Teachable Sorter is a physical sorting machine that combines the Coral USB Accelerator's ability to perform very low latency inference with an ML model that can be trained to rapidly recognize and sort different objects as they fall through the air. It leverages Google’s new Teachable Machine 2.0, a web application that makes it easy for anyone to quickly train a model in a fun, hands-on way.

The tutorial walks through how to build the free-fall sorter, which separates marshmallows from cereal and can be trained using Teachable Machine.

Coral is now on TensorFlow Hub

Earlier this month, the TensorFlow team announced a new version of TensorFlow Hub, a central repository of pre-trained models. With this update, the interface has been improved with a fresh landing page and search experience. Pre-trained Coral models compiled for the Edge TPU continue to be available on our Coral site, but a select few are also now available from the TensorFlow Hub. On the site, you can find models featuring an Overlay interface, allowing you to test the model's performance against a custom set of images right from the browser. Check out the experience for MobileNet v1 and MobileNet v2.

We are excited to share all that Coral has to offer as we continue to evolve our platform. For a list of worldwide distributors, system integrators and partners, visit the new Coral partnerships page. We hope you’ll use the new features offered on Coral.ai as a resource and encourage you to keep sending us feedback at [email protected].

Our panel of experts for the #AndroidDevChallenge (apply by Dec. 2)

Just a little over a week left to finish your submission for the Android Developer Challenge, due December 2! Technology is enabling us to create a whole new era of helpful innovation by helping people get things done more quickly and surfacing patterns that would be difficult to detect using traditional methods. Ultimately, this helpful innovation is enabling us to live better, more productive, and safer lives.

Earlier this week, we highlighted the type of helpful innovation ideas powered by machine learning which are the sort of examples we’re looking for, to help inspire you. Today, we wanted to share the names of the panel of experts we’ve assembled to help bring your projects to life as part of the Android Developer Challenge. These experts will be making the final decision on the 10 finalists of the Android Developer Challenge, and if you’re selected as one of those finalists, we plan to have you meet them when we bring you to Google HQ for a bootcamp next year:

  • Dave Burke is Vice President of Engineering at Google where he leads engineering for the Android platform. Android is the largest mobile platform and ecosystem in the world, with over 2 billion active devices spanning smartphones, tablets, wearables, auto, TV, and IOT. Dave joined Google UK in 2007, becoming an engineering site lead and later moving to California in 2011. Prior to Google, Dave co-founded and was CTO of an internet/telecoms voice startup and helped define related Web and Internet standards.
  • Stephanie Cuthbertson is Senior Director of Developer PM, DevRel and UX for Android. She previously worked on Google’s Search & Ads businesses, as well as a range of developer tools used by Google employees internally. Prior to Google, she was at AWS where she led the product management team for Storage, including Amazon S3. Before AWS, she spent 10 years working on Visual Studio and developer tools.
  • Brahim Elbouchikhi is a Director of Product Management on the Android team. On Android, Brahim is responsible for developer and consumer facing ML and Camera products including CameraX and ML Kit. Prior to Android, Brahim led Daydream’s software team. Brahim was also a founding PM of the Google Play store where he led monetization, search, and discovery.
  • Yossi Matias is Vice President, Engineering, at Google. He is leading efforts in Search (Google Autocomplete, Search Live Results, Google Trends), Conversational AI (Google Duplex, Call Screen, Live Caption, Live Relay, Recorder, Pronunciation), and other Research initiatives. Yossi is the founding Head of Google's R&D Center in Israel, and the founding executive lead of Google for Startup Campus Tel Aviv and of Launchpad. He is the lead of Crisis Response and co-lead of Google’s AI for Social Good. In addition to his experience as an executive and entrepreneur, Yossi has a rich record of scientific research, published extensively, and has dozens of patents on his name. Yossi is a recipient of the Godel Prize and is an ACM Fellow.
  • Sarah Sirajuddin is an engineering director working on TensorFlow at Google. She leads the teams working on on-device machine learning, TensorFlow Extended, and efforts around training models for the best accuracy and performance with Google’s cutting-edge infrastructure, including TensorFlow and tensor processing units (TPUs).

If you’ve got a great idea that can help users get things done, we want to hear! We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition, in part from this amazing set of experts we’ve assembled. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. You can read more about all of the prizes here.

There’s still time to submit your idea before the December 2 deadline. Submitting your idea is as simple as creating a repository on GitHub, telling us what you’d build and how we can help (we’ve included all of the materials here), and then officially submitting your repository here. Ideas can be in a concept phase to something that’s already complete; we can’t wait to hear what you come up with, and to work with you on bringing helpful innovation powered by machine learning to more and more users!

Android Developer Challenge: here’s what we’re looking for! (Apply by Dec. 2)

Last month, we kicked off the next Android Developer Challenge, and asked you to submit your ideas focused on helpful innovation, powered by on-device machine learning. But what exactly do we mean when we say helpful innovation? We’re glad you asked! We rounded up a few of Google’s on-device machine learning offerings, together with some great recent examples of this technology in action, to help inspire your submission. Don’t forget, submit your idea by December 2!

Using machine learning to tackle Fall Armyworm

Take Nazirini Siraji. When she and a team of developers noticed a crop-pest threatening the livelihood of Ugandan farmers, they taught themselves TensorFlow to combat this pest. They collected training data from nearby fields in the form of images. With TensorFlow, they re-trained a MobileNet, a technique known as transfer learning and then used the TensorFlow Converter to generate a TensorFlow Lite FlatBuffer file which they deployed in an Android app. With the app, a farmer can snap a picture of their crop and the image frame is analysed to look for Fall armyworm damage. Depending on the results from this phase, a suggestion of a possible solution is given. It’s pretty cool!

Helping doctors detect respiratory diseases using machine learning

Tambua Health is helping doctors determine the likelihood of respiratory diseases by turning any smartphone into a powerful non-invasive screening tool. They developed an app using TensorFlow Lite that can help doctors analyze lung sounds for the presence of abnormal sounds like wheezes, crackles, stridor, and other adventitious sounds.

adidas uses machine learning to make the shopping experience easier

Even brands are tapping the power of machine learning. Take adidas, who recently launched a new “Bring It to Me” experience for their London store. Shoppers can use Visual Lookup to scan products on their phones while they are in the store, and the app lets them check stock and request their size without the need for queues. Under the hood, ML Kit is helping power the experience. It’s another way machine learning is helping users get things done more quickly.

The benefits of on-device machine learning

Running machine learning on a user’s device comes with a number of benefits. First, you reduce the amount of data you send to your server, enhancing user privacy. And because it runs on device, it can also work offline - perfect for inaccessible areas such as the middle of a rainforest, a desert or the London Underground. Last but not least, the most exciting aspect of running your model on device is low latency and this can enable all kinds of new user experiences. Machine learning is not just for automating tasks, it can work alongside your users and give them super powers too!

At Google, we offer a number of different technologies to help you take advantage of this:

  • ML Kit offers a turnkey SDK to help you tackle tasks with powerful Google Machine Learning models
  • The TensorFlow Lite Framework lets you take a custom model and optimise it to run it on Android
  • There’s also the infrastructure of Firebase / Google Cloud, which can help you train on-device models using AutoML Vision Edge for specific model types or give you the raw processing power to train your own model

If you’ve got a great idea that can help users get things done, we want to hear from you! We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. You can read more about all of the prizes here.

There’s still time to submit your idea before the December 2 deadline. We can’t wait to hear what you come up with, and to work with you on bringing helpful innovation powered by on-device machine learning to more and more users!

Accelerating Japan’s AI startups in our new Tokyo Campus

Posted by Takuo Suzuki

Japan is well known as an epicenter of innovation and technology, and its startup ecosystem is no different. We’ve seen this first hand from our work with startups such as Cinnamon-- who uses artificial intelligence to remove repetitive tasks from office workers daily function, allowing more work to get done by fewer people, faster.

This is why we are pleased to announce our second accelerator program, housed at the new Google for Startups Campus in the heart of Tokyo. Accelerated with Google in JapanThe Google for Startups Accelerator (previously Launchpad Accelerator) is an intensive three-month program for high potential, AI-focused startups, utilizing the proven Launchpad foundational components and content.

Founders who successfully apply for the accelerator will have the opportunity to work on the technical problems facing their startup alongside relevant experts from Google and the industry. They will receive mentorship on these challenges, support on machine learning best practices, as well as connections to relevant teams from across Google to help grow their business.

In addition to mentorship and technical project support, the accelerator also includes deep dives and workshops focused on product design, customer acquisition, and leadership development for founders.

“We hope that by providing these founders with the tools, mentorship, and connections to prepare for the next step in their journey it will, in turn, contribute to a stronger Japanese economy.” says Takuo Suzuki, Google Developers Regional Lead for Japan. “We are excited to work with such passionate startups in a new Google for Startups Campus, an environment built to foster startup growth, and meet our next cohort in 2020”

The program will run from February-May 2020 and applications are now open until 13th December 2019.

Using machine learning to tackle Fall Armyworm

Guest post by Nsubuga Hassan, CEO at Hansu Mobile and Intelligent Innovations, Android and Machine Learning DeveloperNazirini doing work on laptopIn 2016 Fall armyworm (FAW) was first reported in Africa. It has devastated maize crops across the continent.

Research shows the potential impact of FAW on continental wide maize yield lies between 8.3 and 20.6 million tonnes per year (total expected production of 39m tonnes per year); with losses lying between US$2,48m and US$6,19m per year (of a US$11,59m annual expected value). The impact of FAW is far reaching, and is now reported in many countries around the world.

Agriculture is the backbone of Uganda’s economy, employing 70% of the population. It contributes to half of Uganda’s export earnings and a quarter of the country’s gross domestic product (GDP). Fall armyworm posses a great threat on our livelihoods. Two people having a conversationWe are a small group of like minded developers living and working in Uganda. Most of our relatives grow maize so the impact of the worm was very close to home. We really felt like we needed to do something about it. Fall Armyworm that threatens cropsThe vast damage and yield losses in maize production, due to FAW, got the attention of global organizations, who are calling for innovators to help. It is the perfect time to apply machine learning. Our goal is to build an intelligent agent to help local farmers fight this pest in order to increase our food security.

Based on a Machine Learning Crash Course, our Google Developer Group (GDG) in Mbale hosted some study jams in May 2018, alongside several other code labs. This is where we first got hands-on experience using TensorFlow, from which the foundations were laid for the Farmers Companion app. Finally, we felt as though an intelligent solution to help farmers had been conceived.

Equipped with this knowledge & belief, the team embarked on collecting training data from nearby fields. This was done using a smartphone to take images, with the help of some GDG Mbale members. With farmers miles from town, and many fields inaccessible by road (not to mention the floods), this was not as simple as we had first hoped. To inhibit us further, our smartphones were (and still are) the only hard drives we had, thus decreasing the number of images & data we can capture in a day.

But we persisted! Once gathered, the images were sorted, one at a time, and categorized. With TensorFlow we re-trained a MobileNet, a technique known as transfer learning. We then used the TensorFlow Converter to generate a TensorFlow Lite FlatButter file which we deployed in an Android app. Demonstration of the Android app identifying the Fall armywormWe started with about 3956 images, but our dataset is growing exponentially. We are actively collecting more and more data to improve our model’s accuracy. The improvements in TensorFlow, with Keras high level APIs, has really made our approach to deep learning easy and enjoyable and we are now experimenting with TensorFlow 2.0.

The app is simple for the user. Once installed, the user focuses the camera through the app, on a maize crop. Then an image frame is picked and, using TensorFlow Lite, the image frame is analysed to look for Fall armyworm damage. Depending on the results from this phase, a suggestion of a possible solution is given.

The app is available for download and it is constantly undergoing updates, as we push for local farmers to adapt and use it. We strive to ensure a world with #ZeroHunger and believe technology can do a lot to help us achieve this.

We have so far been featured on a national TV station in Uganda, participated in the #hackAgainstHunger and ‘The International Symposium on Agricultural Innovations’ for family farmers, organized by the Food Agricultural Organization of the United Nations, where our solution was highlighted.

More recently, Google highlighted our work with this film:

We have embarked on scaling the solution to coffee disease and cassava diseases and will slowly be moving on to more. We have also introduced virtual reality to help farmers showcase good farming practices and various training.

Our plan is to collect more data and to scale the solution to handle more pests and diseases. We are also shifting to cloud services and Firebase to improve and serve our model better despite the lack of resources. With improved hardware and greater localised understanding, there's huge scope for Machine Learning to make a difference in the fight against hunger.

Learning to Assemble and to Generalize from Self-Supervised Disassembly



Our physical world is full of different shapes, and learning how they are all interconnected is a natural part of interacting with our surroundings — for example, we understand that coat hangers hook onto clothing racks, power plugs insert into wall outlets, and USB cables fit into USB sockets. This general concept of “how things fit together'' based on their shapes is something that people acquire over time and experience, and it helps to increase the efficiency with which we perform tasks, like assembling DIY furniture kits or packing gifts into a box. If robots could learn “how things fit together,” then perhaps they could become more adaptable to new manipulation tasks involving objects they have never seen before, like reconnecting severed pipes, or building makeshift shelters by piecing together debris during disaster response scenarios.

To explore this idea, we worked with researchers from Stanford and Columbia Universities to develop Form2Fit, a robotic manipulation algorithm that uses deep neural networks to learn to visually recognize how objects correspond (or “fit”) to each other. To test this algorithm, we tasked a real robot to perform kit assembly, where it needed to accurately assemble objects into a blister pack or corrugated display to form a single unit. Previous systems built for this task required extensive manual tuning to assemble a single kit unit at a time. However, we demonstrate that by learning the general concept of “how things fit together,” Form2Fit enables our robot to assemble various types of kits with a 94% success rate. Furthermore, Form2Fit is one of the first systems capable of generalizing to new objects and kitting tasks not seen during training.
Form2Fit learns to assemble a wide variety of kits by finding geometric correspondences between object surfaces and their target placement locations. By leveraging geometric information learned from multiple kits during training, the system generalizes to new objects and kits.
While often overlooked, shape analysis plays an important role in manipulation, especially for tasks like kit assembly. In fact, the shape of an object often matches the shape of its corresponding space in the packaging, and understanding this relationship is what allows people to do this task with minimal guesswork. At its core, Form2Fit aims to learn this relationship by training over numerous pairs of objects and their corresponding placing locations across multiple different kitting tasks – with the goal to acquire a broader understanding of how shapes and surfaces fit together. Form2Fit improves itself over time with minimal human supervision, gathering its own training data by repeatedly disassembling completed kits through trial and error, then time-reversing the disassembly sequences to get assembly trajectories. After training overnight for 12 hours, our robot learns effective pick and place policies for a variety of kits, achieving 94% assembly success rates with objects and kits in varying configurations, and over 86% assembly success rates when handling completely new objects and kits.

Data-Driven Shape Descriptors For Generalizable Assembly
The core component of Form2Fit is a two-stream matching network that learns to infer orientation-sensitive geometric pixel-wise descriptors for objects and their target placement locations from visual data. These descriptors can be understood as compressed 3D point representations that encode object geometry, textures, and contextual task-level knowledge. Form2Fit uses these descriptors to establish correspondences between objects and their target locations (i.e., where they should be placed). Since these descriptors are orientation-sensitive, they allow Form2Fit to infer how the picked object should be rotated before it is placed in its target location.

Form2Fit uses two additional networks to generate valid pick and place candidates. A suction network gets fed a 3D image of the objects and generates pixel-wise predictions of suction success. The suction probability map is visualized as a heatmap, where hotter pixels indicate better locations to grasp the object at the 3D location of the corresponding pixel. In parallel, a place network gets fed a 3D image of the target kit and outputs pixel-wise predictions of placement success. These, too, are visualized as a heatmap, where higher confidence values serve as better locations for the robot arm to approach from a top-down angle to place the object. Finally, the planner integrates the output of all three modules to produce the final pick location, place location and rotation angle.
Overview of Form2Fit. The suction and place networks infer candidate picking and placing locations in the scene respectively. The matching network generates pixel-wise orientation-sensitive descriptors to match picking locations to their corresponding placing locations. The planner then integrates it all to control the robot to execute the next best pick and place action.
Learning Assembly from Disassembly
Neural networks require large amounts of training data, which can be difficult to collect for tasks like assembly. Precisely inserting objects into tight spaces with the correct orientation (e.g., in kits) is challenging to learn through trial and error, because the chances of success from random exploration can be slim. In contrast, disassembling completed units is often easier to learn through trial and error, since there are fewer incorrect ways to remove an object than there are to correctly insert it. We leveraged this difference in order to amass training data for Form2Fit.
An example of self-supervision through time-reversal: rewinding a disassembly sequence of a deodorant kit over time generates a valid assembly sequence.
Our key observation is that in many cases of kit assembly, a disassembly sequence – when reversed over time – becomes a valid assembly sequence. This concept, called time-reversed disassembly, enables Form2Fit to train entirely through self-supervision by randomly picking with trial and error to disassemble a fully-assembled kit, then reversing that disassembly sequence to learn how the kit should be put together.

Generalization Results
The results of our experiments show great potential for learning generalizable policies for assembly. For instance, when a policy is trained to assemble a kit in only one specific position and orientation, it can still robustly assemble random rotations and translations of the kit 90% of the time.
Form2Fit policies are robust to a wide range of rotations and translations of the kits.
We also find that Form2Fit is capable of tackling novel configurations it has not been exposed to during training. For example, when training a policy on two single-object kits (floss and tape), we find that it can successfully assemble new combinations and mixtures of those kits, even though it has never seen such configurations before.
Form2Fit policies can generalize to novel kit configurations such as multiple versions of the same kit and mixtures of different kits.
Furthermore, when given completely novel kits on which it has not been trained, Form2Fit can generalize using its learned shape priors to assemble those kits with over 86% assembly accuracy.
Form2Fit policies can generalize to never-before-seen single and multi-object kits.
What Have the Descriptors Learned?
To explore what the descriptors of the matching network from Form2Fit have learned to encode, we visualize the pixel-wise descriptors of various objects in RGB colorspace through use of an embedding technique called t-SNE.
The t-SNE embedding of the learned object descriptors. Similarly oriented objects of the same category display identical colors (e.g. A, B or F, G) while different objects (e.g. C, H) and same objects but different orientation (e.g. A, C, D or H, F) exhibit different colors.
We observe that the descriptors have learned to encode (a) rotation — objects oriented differently have different descriptors (A, C, D, E) and (H, F); (b) spatial correspondence — same points on the same oriented objects share similar descriptors (A, B) and (F, G); and (c) object identity — zoo animals and fruits exhibit unique descriptors (columns 3 and 4).

Limitations & Future Work
While Form2Fit’s results are promising, its limitations suggest directions for future work. In our experiments, we assume a 2D planar workspace to constrain the kit assembly task so that it can be solved by sequencing top-down picking and placing actions. This may not work for all cases of assembly – for example, when a peg needs to be precisely inserted at a 45 degree angle. It would be interesting to expand Form2Fit to more complex action representations for 3D assembly.

You can learn more about this work and download the code from our GitHub repository.


Acknowledgments
This research was done by Kevin Zakka, Andy Zeng, Johnny Lee, and Shuran Song (faculty at Columbia University), with special thanks to Nick Hynes, Alex Nichol, and Ivan Krasin for fruitful technical discussions; Adrian Wong, Brandon Hurd, Julian Salazar, and Sean Snyder for hardware support; Ryan Hickman for valuable managerial support; and Chad Richards for helpful feedback on writing.

Source: Google AI Blog


Android Developer Challenge: helpful innovation, powered by On-Device Machine Learning + you!

Posted by The Android team

Android Developer Challenge banner

Developers like you have always played an important role in shaping the direction of Android, fueling the wave of Android innovation. It’s the reason that when we first launched the SDK for Android 10+ years ago, we simultaneously announced the Android Developer Challenge: a way to help reward model apps and show us what user problems you wanted to solve. As Android continues to push the boundaries into emerging areas like ML, 5G, foldables and more, we need your help to bring to life the consumer experiences that will define these new frontiers.

So we’re bringing back the Android Developer Challenge and asking you to help us unlock new experiences on Android, and help inspire other developers around these emerging technologies.

As we kick off this challenge, the first area we’ll be focusing on is On-Device Machine Learning. At Google, we’re big believers in how this new technology can open up a world of helpful innovation so you can get things done in ways you never thought possible. Take Live Captions: for the almost 500 million people who are deaf and hard of hearing, Live Captions bring content to life and is exactly the type of machine learning-powered innovation we expect to see more of someday, and with your help we can turn someday into today!

Bringing your idea to life in front of billions of eyes

Got an idea? Whether it’s still a concept or ready for users, tell us how you could use Google’s help, and how it supports the mission of using machine learning to help people get something done. Join the #AndroidDeveloperChallenge topic on GitHub, and share your idea as a repository under this topic. Don’t forget to come back here and officially submit your concept.

We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. Here’s what those 10 developers will get:

Expertise and development support bootcamp: We’ll work with you to provide expertise and guidance to help in your plans to bring your app from concept to reality, including:

  • An all-expenses paid, working session with a panel of experts at Google HQ in Mountain View, CA
  • Google engineer mentorship at the bootcamp, providing guidance and technical expertise on how to help your plans to bring your app to fruition

Exposure and street cred! Once your idea is ready for prime-time, we’ll help you get users, and celebrate you to the broader Android community, including:

  • A collection on Google Play where we’ll feature your app (apps must be ready for Google Play on May 1, and must meet Google’s minimum quality requirements)
  • Tickets to Google I/O 2020
  • And we’ll celebrate these experiences to the broader Android developer community on developers.android.com. We might even showcase you at Google I/O, in places like the sandbox, sessions, perhaps even a keynote!

Helpful innovation is an important investment area for us on the Android team, and On-Device Machine Learning has played a critical role in powering new features in the last several releases of Android. We’re just beginning to scratch the surface, and we can’t wait to see what you come up with!

Coral moves out of beta

Posted by Vikram Tank (Product Manager), Coral Team

microchips on coral colored background

Last March, we launched Coral beta from Google Research. Coral helps engineers and researchers bring new models out of the data center and onto devices, running TensorFlow models efficiently at the edge. Coral is also at the core of new applications of local AI in industries ranging from agriculture to healthcare to manufacturing. We've received a lot of feedback over the past six months and used it to improve our platform. Today we’re thrilled to graduate Coral out of beta, into a wider, global release.

Coral is already delivering impact across industries, and several of our partners are including Coral in products that require fast ML inferencing at the edge.

In healthcare, Care.ai is using Coral to build a device that enables hospitals and care centers to respond quickly to falls, prevent bed sores, improve patient care, and reduce costs. Virgo SVS is also using Coral as the basis of a polyp detection system that helps doctors improve the accuracy of endoscopies.

In a very different use case, Olea Edge employs Coral to help municipal water utilities accurately measure the amount of water used by their commercial customers. Their Meter Health Analytics solution uses local AI to reduce waste and predict equipment failure in industrial water meters.

Nexcom is using Coral to build gateways with local AI and provide a platform for next-gen, AI-enabled IoT applications. By moving AI processing to the gateway, existing sensor networks can stay in service without the need to add AI processing to each node.

From prototype to production

Microchips on white background

Coral’s Dev Board is designed as an integrated prototyping solution for new product development. Under the heatsink is the detachable Coral SoM, which combines Google’s Edge TPU with the NXP IMX8M SoC, Wi-Fi and Bluetooth connectivity, memory, and storage. We’re happy to announce that you can now purchase the Coral SoM standalone. We’ve also created a baseboard developer guide to help integrate it into your own production design.

Our Coral USB Accelerator allows users with existing system designs to add local AI inferencing via USB 2/3. For production workloads, we now offer three new Accelerators that feature the Edge TPU and connect via PCIe interfaces: Mini PCIe, M.2 A+E key, and M.2 B+M key. You can easily integrate these Accelerators into new products or upgrade existing devices that have an available PCIe slot.

The new Coral products are available globally and for sale at Mouser; for large volume sales, contact our sales team. By the end of 2019, we'll continue to expand our distribution of the Coral Dev Board and SoM into new markets including: Taiwan, Australia, New Zealand, India, Thailand, Singapore, Oman, Ghana and the Philippines.

Better resources

We’ve also revamped the Coral site with better organization for our docs and tools, a set of success stories, and industry focused pages. All of it can be found at a new, easier to remember URL Coral.ai.

To help you get the most out of the hardware, we’re also publishing a new set of examples. The included models and code can provide solutions to the most common on-device ML problems, such as image classification, object detection, pose estimation, and keyword spotting.

For those looking for a more in-depth application—and a way to solve the eternal problem of squirrels plundering your bird feeder—the Smart Bird Feeder project shows you how to perform classification with a custom dataset on the Coral Dev board.

Finally, we’ll soon release a new version of the Mendel OS that updates the system to Debian Buster, and we're hard at work on more improvements to the Edge TPU compiler and runtime that will improve the model development workflow.

The official launch of Coral is, of course, just the beginning, and we’ll continue to evolve the platform. Please keep sending us feedback at [email protected].

Exploring Massively Multilingual, Massive Neural Machine Translation



“... perhaps the way [of translation] is to descend, from each language, down to the common base of human communication — the real but as yet undiscovered universal language — and then re-emerge by whatever particular route is convenient.”Warren Weaver, 1949

Over the last few years there has been enormous progress in the quality of machine translation (MT) systems, breaking language barriers around the world thanks to the developments in neural machine translation (NMT). The success of NMT however, owes largely to the great amounts of supervised training data. But what about languages where data is scarce, or even absent? Multilingual NMT, with the inductive bias that “the learning signal from one language should benefit the quality of translation to other languages”, is a potential remedy.

Multilingual machine translation processes multiple languages using a single translation model. The success of multilingual training for data-scarce languages has been demonstrated for automatic speech recognition and text-to-speech systems, and by prior research on multilingual translation [1,2,3]. We previously studied the effect of scaling up the number of languages that can be learned in a single neural network, while controlling the amount of training data per language. But what happens once all constraints are removed? Can we train a single model using all of the available data, despite the huge differences across languages in data size, scripts, complexity and domains?

In “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges” and follow-up papers [4,5,6,7], we push the limits of research on multilingual NMT by training a single NMT model on 25+ billion sentence pairs, from 100+ languages to and from English, with 50+ billion parameters. The result is an approach for massively multilingual, massive neural machine translation (M4) that demonstrates large quality improvements on both low- and high-resource languages and can be easily adapted to individual domains/languages, while showing great efficacy on cross-lingual downstream transfer tasks.

Massively Multilingual Machine Translation
Though data skew across language-pairs is a great challenge in NMT, it also creates an ideal scenario in which to study transfer, where insights gained through training on one language can be applied to the translation of other languages. On one end of the distribution, there are high-resource languages like French, German and Spanish where there are billions of parallel examples, while on the other end, supervised data for low-resource languages such as Yoruba, Sindhi and Hawaiian, is limited to a few tens of thousands.
The data distribution over all language pairs (in log scale) and the relative translation quality (BLEU score) of the bilingual baselines trained on each one of these specific language pairs.
Once trained using all of the available data (25+ billion examples from 103 languages), we observe strong positive transfer towards low-resource languages, dramatically improving the translation quality of 30+ languages at the tail of the distribution by an average of 5 BLEU points. This effect is already known, but surprisingly encouraging, considering the comparison is between bilingual baselines (i.e., models trained only on specific language pairs) and a single multilingual model with representational capacity similar to a single bilingual model. This finding hints that massively multilingual models are effective at generalization, and capable of capturing the representational similarity across a large body of languages.
Translation quality comparison of a single massively multilingual model against bilingual baselines that are trained for each one of the 103 language pairs.
In our EMNLP’19 paper [5], we compare the representations of multilingual models across different languages. We find that multilingual models learn shared representations for linguistically similar languages without the need for external constraints, validating long-standing intuitions and empirical results that exploit these similarities. In [6], we further demonstrate the effectiveness of these learned representations on cross-lingual transfer on downstream tasks.
Visualization of the clustering of the encoded representations of all 103 languages, based on representational similarity. Languages are color-coded by their linguistic family.
Building Massive Neural Networks
As we increase the number of low-resource languages in the model, the quality of high-resource language translations starts to decline. This regression is recognized in multi-task setups, arising from inter-task competition and the unidirectional nature of transfer (i.e., from high- to low-resource). While working on better learning and capacity control algorithms to mitigate this negative transfer, we also extend the representational capacity of our neural networks by making them bigger by increasing the number of model parameters to improve the quality of translation for high-resource languages.

Numerous design choices can be made to scale neural network capacity, including adding more layers or making the hidden representations wider. Continuing our study on training deeper networks for translation, we utilized GPipe [4] to train 128-layer Transformers with over 6 billion parameters. Increasing the model capacity resulted in significantly improved performance across all languages by an average of 5 BLEU points. We also studied other properties of very deep networks, including the depth-width trade-off, trainability challenges and design choices for scaling Transformers to over 1500 layers with 84 billion parameters.

While scaling depth is one approach to increasing model capacity, exploring architectures that can exploit the multi-task nature of the problem is a very plausible complementary way forward. By modifying the Transformer architecture through the substitution of the vanilla feed-forward layers with sparsely-gated mixture of experts, we drastically scale up the model capacity, allowing us to successfully train and pass 50 billion parameters, which further improved translation quality across the board.
Translation quality improvement of a single massively multilingual model as we increase the capacity (number of parameters) compared to 103 individual bilingual baselines.
Making M4 Practical
It is inefficient to train large models with extremely high computational costs for every individual language, domain or transfer task. Instead, we present methods [7] to make these models more practical by using capacity tunable layers to adapt a new model to specific languages or domains, without altering the original.

Next Steps
At least half of the 7,000 languages currently spoken will no longer exist by the end of this century*. Can multilingual machine translation come to the rescue? We see the M4 approach as a stepping stone towards serving the next 1,000 languages; starting from such multilingual models will allow us to easily extend to new languages, domains and down-stream tasks, even when parallel data is unavailable. Indeed the path is rocky, and on the road to universal MT many promising solutions appear to be interdisciplinary. This makes multilingual NMT a plausible test bed for machine learning practitioners and theoreticians interested in exploring the annals of multi-task learning, meta-learning, training dynamics of deep nets and much more. We still have a long way to go.

Acknowledgements
This effort is built on contributions from Naveen Arivazhagan, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Chen, Yuan Cao, Yanping Huang, Sneha Kudugunta, Isaac Caswell, Aditya Siddhant, Wei Wang, Roee Aharoni, Sébastien Jean, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen and Yonghui Wu. We would also like to acknowledge support from the Google Translate, Brain, and Lingvo development teams, Jakob Uszkoreit, Noam Shazeer, Hyouk Joong Lee, Dehao Chen, Youlong Cheng, David Grangier, Colin Raffel, Katherine Lee, Thang Luong, Geoffrey Hinton, Manisha Jain, Pendar Yousefi and Macduff Hughes.


* The Cambridge Handbook of Endangered Languages (Austin and Sallabank, 2011).

Source: Google AI Blog


ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots



Learning-based methods for solving robotic control problems have recently seen significant momentum, driven by the widening availability of simulated benchmarks (like dm_control or OpenAI-Gym) and advancements in flexible and scalable reinforcement learning techniques (DDPG, QT-Opt, or Soft Actor-Critic). While learning through simulation is effective, these simulated environments often encounter difficulty in deploying to real-world robots due to factors such as inaccurate modeling of physical phenomena and system delays. This motivates the need to develop robotic control solutions directly in the real world, on real physical hardware.

The majority of current robotics research on physical hardware is conducted on high-cost, industrial-quality robots (PR2, Kuka-arms, ShadowHand, Baxter, etc.) intended for precise, monitored operation in controlled environments. Furthermore, these robots are designed around traditional control methods that focus on precision, repeatability, and ease of characterization. This stands in sharp contrast with the learning-based methods that are robust to imperfect sensing and actuation, and demand (a) a high degree of resilience to allow real-world trial-and-error learning, (b) low cost and ease of maintenance to enable scalability through replication and (c) a reliable reset mechanism to alleviate strict human monitoring requirements.

In “ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots”, to be presented at CoRL 2019, we introduce an open-source platform of cost-effective robots and curated benchmarks designed primarily to facilitate research and development on physical hardware in the real world. Analogous to an optical table in the field of optics, ROBEL serves as a rapid experimentation platform, supporting a wide range of experimental needs and the development of new reinforcement learning and control methods. ROBEL consists of D'Claw, a three-fingered hand robot that facilitates learning of dexterous manipulation tasks and D'Kitty, a four-legged robot that enables the learning of agile legged locomotion tasks. The robotic platforms are low-cost, modular, easy to maintain, and are robust enough to sustain on-hardware reinforcement learning from scratch.
Left: The 12 DoF D’Kitty; Middle: The 9 DoF D’Claw; Right: A functional D’Claw setup D’Lantern.
In order to make the robots relatively inexpensive and easy to build, we based ROBEL’s designs on off-the-shelf components and commonly-available prototyping tools (3D-printed or laser cut). Designs are easy to assemble and require only a few hours to build. Detailed part lists (with CAD details), assembly instructions, and software instructions for getting started are available here.

ROBEL Benchmarks
We devised a set of tasks suitable for each platform, D’Claw and D’Kitty, which can be used for benchmarking real-world robotic learning. ROBEL’s task definitions include both dense and sparse task objectives, and introduce metrics for hardware-safety in the task definition, which for example, indicate if joints are exceeding “safe” operating bounds or force thresholds. ROBEL also supports a simulator for all tasks to facilitate algorithmic development and rapid prototyping. D’Claw tasks are centered around three commonly observed manipulation behaviors — Pose, Turn, and Screw.
Left: Pose — Conform to the shape of the environment. Center: Turn — Turn the object to a specified angle. Right: Screw — Continuously rotate the object. (Click images for video.)
D’Kitty tasks are centered around three commonly observed locomotion behaviors — Stand, Orient, and Walk.
Left: Stand — Stand upright. Center: Orient — Align heading with the target. Right: Walk — Move to the target. (Click images for video.)
We evaluated several classes (on-policy, off policy, demo-accelerated, supervised) of deep reinforcement learning methods on each of these benchmark tasks. The evaluation results and the final policies are included as baselines in the software package for comparison. Full task details and baseline performances are available in the technical report.

Reproducibility & Robustness
ROBEL platforms are robust to sustain direct hardware training, and have clocked over 14,000 hours of real-world experience to-date. The platforms have significantly matured over the year. Owing to the modularity of the design, repairs are trivial and require minimal to no domain expertise, making the overall system easy to maintain.

To establish the replicability of the platforms and reproducibility of the benchmarks, ROBEL was studied in isolation by two different research labs. Only software distribution and documentation was used in this study. No in-person visits were allowed. Using ROBEL’s design files and assembly instructions both sites were able to replicate both hardware platforms. Benchmark tasks were trained on robots built at both sites. In the figure below we see that two D’Claw robots built at two different sites not only exhibit similar training progress but also converge to the same final performance, establishing reproducibility of the ROBEL benchmarks.
SAC training performance of a task on two real D’Claw robots developed at different laboratory locations.
Results Gallery
ROBEL has been useful in a variety of reinforcement learning studies so far. Below we highlight a few of the key results, and you can find all our results in this comprehensive gallery. D’Claw platforms are completely autonomous and can sustain reliable experimentation for an extended period of time, and has facilitated experimentation with a wide variety of reinforcement learning paradigms and tasks using both rigid and flexible objects.
Left: Flexible Objects — On-hardware training with DAPG effectively learns to turn flexible objects. We observe manipulation targeting the center of the valve where there is more rigidity. D'Claw is robust to on-hardware training, facilitating successful outcomes on hard to simulate tasks. Center: Disturbance Rejection — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with object perturbations (amongst others) being tested on hardware. We observe fingers working together to resist external disturbances. Right: Obstructed Finger — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with external perturbations (amongst others) being tested on hardware. We observe that free fingers fill in for the missing finger.
Importantly, D’Claw platforms are modular and easy to replicate, which facilitates scalable experimentation. With our scaled setup, we find that multiple D’Claws can collectively learn tasks faster by sharing experience.
On-hardware training with distributed version of SAC leaning to turn multiple objects to arbitrary angles in conjunction by sharing experience. Five tasks only need twice the amount of experience of single tasks, thanks to the multi-task formulation. In the video we observe five D'Claws turning different objects to 180 degrees (picked for visual effectiveness, actual policy can turn to any angle).
We have also been successful in deploying robust locomotion policies on the D’Kitty platform. Below we show a blind D’Kitty walking over indoor and outdoor terrains exhibiting the robustness of its gait in presence of unseen disturbances.
Left: Indoor – Walking in Clutter — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized perturbations learns to walk in clutter and step over objects. Center: Outdoor – Gravel and Branches — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized height field learns to walk outdoors over gravel and branches. Right: Outdoor – Slope and Grass — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized height field learns to handle moderate slopes.
When presented with information about its torso and objects present in the scene, D’Kitty can learn to interact with these objects exhibiting complex behaviors.
Left: Avoid Moving Obstacles — Policy trained via Hierarchical Sim2Real learns to avoid a moving block and reach the target (marked by the controller on the floor). Center: Push to Moving Goal — Policy trained via Hierarchical Sim2Real learns to push block towards a moving target (marked by the controller in the hand). Right: Co-ordinate — Policy trained via Hierarchical Sim2Real learns to coordinate two D'Kitties to push a heavy block towards a target (marked by two + signs on the floor).
In conclusion, ROBEL platforms are low cost, robust, reliable and are designed to accommodate the needs of the emerging learning-based paradigms that need scalability and resilience. We are proud to announce the release of ROBEL to the open source community and are excited to learn about the diversity of research and experimentation they will enable. For getting started on ROBEL platforms and ROBEL benchmarks refer to roboticsbenchmarks.org.

Acknowledgments
Google's ROBEL D'Claw evolved from earlier designs Vikash Kumar developed at the Universities of Washington and Berkeley. Multiple people across organizations have contributed towards the ROBEL projects. We thank our co-authors Henry Zhu (UC Berkeley), Kristian Hartikainen (UC Berkeley), Abhishek Gupta (UC Berkeley) and Sergey Levine (Google and UC Berkeley) for their contributions and extensive feedback throughout the project. We would like to acknowledge Matt Neiss (Google) and Chad Richards (Google) for their significant contribution to the platform designs. We would also like to thank Aravind Rajeshwaran (U-Washington), Emo Todorov (U-Washington), and Vincent Vanhoucke (Google) for their helpful discussions and comments throughout the project.

Source: Google AI Blog