Tag Archives: machine learning

Android Developer Challenge: here’s what we’re looking for! (Apply by Dec. 2)

Last month, we kicked off the next Android Developer Challenge, and asked you to submit your ideas focused on helpful innovation, powered by on-device machine learning. But what exactly do we mean when we say helpful innovation? We’re glad you asked! We rounded up a few of Google’s on-device machine learning offerings, together with some great recent examples of this technology in action, to help inspire your submission. Don’t forget, submit your idea by December 2!

Using machine learning to tackle Fall Armyworm

Take Nazirini Siraji. When she and a team of developers noticed a crop-pest threatening the livelihood of Ugandan farmers, they taught themselves TensorFlow to combat this pest. They collected training data from nearby fields in the form of images. With TensorFlow, they re-trained a MobileNet, a technique known as transfer learning and then used the TensorFlow Converter to generate a TensorFlow Lite FlatBuffer file which they deployed in an Android app. With the app, a farmer can snap a picture of their crop and the image frame is analysed to look for Fall armyworm damage. Depending on the results from this phase, a suggestion of a possible solution is given. It’s pretty cool!

Helping doctors detect respiratory diseases using machine learning

Tambua Health is helping doctors determine the likelihood of respiratory diseases by turning any smartphone into a powerful non-invasive screening tool. They developed an app using TensorFlow Lite that can help doctors analyze lung sounds for the presence of abnormal sounds like wheezes, crackles, stridor, and other adventitious sounds.

adidas uses machine learning to make the shopping experience easier

Even brands are tapping the power of machine learning. Take adidas, who recently launched a new “Bring It to Me” experience for their London store. Shoppers can use Visual Lookup to scan products on their phones while they are in the store, and the app lets them check stock and request their size without the need for queues. Under the hood, ML Kit is helping power the experience. It’s another way machine learning is helping users get things done more quickly.

The benefits of on-device machine learning

Running machine learning on a user’s device comes with a number of benefits. First, you reduce the amount of data you send to your server, enhancing user privacy. And because it runs on device, it can also work offline - perfect for inaccessible areas such as the middle of a rainforest, a desert or the London Underground. Last but not least, the most exciting aspect of running your model on device is low latency and this can enable all kinds of new user experiences. Machine learning is not just for automating tasks, it can work alongside your users and give them super powers too!

At Google, we offer a number of different technologies to help you take advantage of this:

  • ML Kit offers a turnkey SDK to help you tackle tasks with powerful Google Machine Learning models
  • The TensorFlow Lite Framework lets you take a custom model and optimise it to run it on Android
  • There’s also the infrastructure of Firebase / Google Cloud, which can help you train on-device models using AutoML Vision Edge for specific model types or give you the raw processing power to train your own model

If you’ve got a great idea that can help users get things done, we want to hear from you! We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. You can read more about all of the prizes here.

There’s still time to submit your idea before the December 2 deadline. We can’t wait to hear what you come up with, and to work with you on bringing helpful innovation powered by on-device machine learning to more and more users!

Accelerating Japan’s AI startups in our new Tokyo Campus

Posted by Takuo Suzuki

Japan is well known as an epicenter of innovation and technology, and its startup ecosystem is no different. We’ve seen this first hand from our work with startups such as Cinnamon-- who uses artificial intelligence to remove repetitive tasks from office workers daily function, allowing more work to get done by fewer people, faster.

This is why we are pleased to announce our second accelerator program, housed at the new Google for Startups Campus in the heart of Tokyo. Accelerated with Google in JapanThe Google for Startups Accelerator (previously Launchpad Accelerator) is an intensive three-month program for high potential, AI-focused startups, utilizing the proven Launchpad foundational components and content.

Founders who successfully apply for the accelerator will have the opportunity to work on the technical problems facing their startup alongside relevant experts from Google and the industry. They will receive mentorship on these challenges, support on machine learning best practices, as well as connections to relevant teams from across Google to help grow their business.

In addition to mentorship and technical project support, the accelerator also includes deep dives and workshops focused on product design, customer acquisition, and leadership development for founders.

“We hope that by providing these founders with the tools, mentorship, and connections to prepare for the next step in their journey it will, in turn, contribute to a stronger Japanese economy.” says Takuo Suzuki, Google Developers Regional Lead for Japan. “We are excited to work with such passionate startups in a new Google for Startups Campus, an environment built to foster startup growth, and meet our next cohort in 2020”

The program will run from February-May 2020 and applications are now open until 13th December 2019.

Using machine learning to tackle Fall Armyworm

Guest post by Nsubuga Hassan, CEO at Hansu Mobile and Intelligent Innovations, Android and Machine Learning DeveloperNazirini doing work on laptopIn 2016 Fall armyworm (FAW) was first reported in Africa. It has devastated maize crops across the continent.

Research shows the potential impact of FAW on continental wide maize yield lies between 8.3 and 20.6 million tonnes per year (total expected production of 39m tonnes per year); with losses lying between US$2,48m and US$6,19m per year (of a US$11,59m annual expected value). The impact of FAW is far reaching, and is now reported in many countries around the world.

Agriculture is the backbone of Uganda’s economy, employing 70% of the population. It contributes to half of Uganda’s export earnings and a quarter of the country’s gross domestic product (GDP). Fall armyworm posses a great threat on our livelihoods. Two people having a conversationWe are a small group of like minded developers living and working in Uganda. Most of our relatives grow maize so the impact of the worm was very close to home. We really felt like we needed to do something about it. Fall Armyworm that threatens cropsThe vast damage and yield losses in maize production, due to FAW, got the attention of global organizations, who are calling for innovators to help. It is the perfect time to apply machine learning. Our goal is to build an intelligent agent to help local farmers fight this pest in order to increase our food security.

Based on a Machine Learning Crash Course, our Google Developer Group (GDG) in Mbale hosted some study jams in May 2018, alongside several other code labs. This is where we first got hands-on experience using TensorFlow, from which the foundations were laid for the Farmers Companion app. Finally, we felt as though an intelligent solution to help farmers had been conceived.

Equipped with this knowledge & belief, the team embarked on collecting training data from nearby fields. This was done using a smartphone to take images, with the help of some GDG Mbale members. With farmers miles from town, and many fields inaccessible by road (not to mention the floods), this was not as simple as we had first hoped. To inhibit us further, our smartphones were (and still are) the only hard drives we had, thus decreasing the number of images & data we can capture in a day.

But we persisted! Once gathered, the images were sorted, one at a time, and categorized. With TensorFlow we re-trained a MobileNet, a technique known as transfer learning. We then used the TensorFlow Converter to generate a TensorFlow Lite FlatButter file which we deployed in an Android app. Demonstration of the Android app identifying the Fall armywormWe started with about 3956 images, but our dataset is growing exponentially. We are actively collecting more and more data to improve our model’s accuracy. The improvements in TensorFlow, with Keras high level APIs, has really made our approach to deep learning easy and enjoyable and we are now experimenting with TensorFlow 2.0.

The app is simple for the user. Once installed, the user focuses the camera through the app, on a maize crop. Then an image frame is picked and, using TensorFlow Lite, the image frame is analysed to look for Fall armyworm damage. Depending on the results from this phase, a suggestion of a possible solution is given.

The app is available for download and it is constantly undergoing updates, as we push for local farmers to adapt and use it. We strive to ensure a world with #ZeroHunger and believe technology can do a lot to help us achieve this.

We have so far been featured on a national TV station in Uganda, participated in the #hackAgainstHunger and ‘The International Symposium on Agricultural Innovations’ for family farmers, organized by the Food Agricultural Organization of the United Nations, where our solution was highlighted.

More recently, Google highlighted our work with this film:

We have embarked on scaling the solution to coffee disease and cassava diseases and will slowly be moving on to more. We have also introduced virtual reality to help farmers showcase good farming practices and various training.

Our plan is to collect more data and to scale the solution to handle more pests and diseases. We are also shifting to cloud services and Firebase to improve and serve our model better despite the lack of resources. With improved hardware and greater localised understanding, there's huge scope for Machine Learning to make a difference in the fight against hunger.

Learning to Assemble and to Generalize from Self-Supervised Disassembly



Our physical world is full of different shapes, and learning how they are all interconnected is a natural part of interacting with our surroundings — for example, we understand that coat hangers hook onto clothing racks, power plugs insert into wall outlets, and USB cables fit into USB sockets. This general concept of “how things fit together'' based on their shapes is something that people acquire over time and experience, and it helps to increase the efficiency with which we perform tasks, like assembling DIY furniture kits or packing gifts into a box. If robots could learn “how things fit together,” then perhaps they could become more adaptable to new manipulation tasks involving objects they have never seen before, like reconnecting severed pipes, or building makeshift shelters by piecing together debris during disaster response scenarios.

To explore this idea, we worked with researchers from Stanford and Columbia Universities to develop Form2Fit, a robotic manipulation algorithm that uses deep neural networks to learn to visually recognize how objects correspond (or “fit”) to each other. To test this algorithm, we tasked a real robot to perform kit assembly, where it needed to accurately assemble objects into a blister pack or corrugated display to form a single unit. Previous systems built for this task required extensive manual tuning to assemble a single kit unit at a time. However, we demonstrate that by learning the general concept of “how things fit together,” Form2Fit enables our robot to assemble various types of kits with a 94% success rate. Furthermore, Form2Fit is one of the first systems capable of generalizing to new objects and kitting tasks not seen during training.
Form2Fit learns to assemble a wide variety of kits by finding geometric correspondences between object surfaces and their target placement locations. By leveraging geometric information learned from multiple kits during training, the system generalizes to new objects and kits.
While often overlooked, shape analysis plays an important role in manipulation, especially for tasks like kit assembly. In fact, the shape of an object often matches the shape of its corresponding space in the packaging, and understanding this relationship is what allows people to do this task with minimal guesswork. At its core, Form2Fit aims to learn this relationship by training over numerous pairs of objects and their corresponding placing locations across multiple different kitting tasks – with the goal to acquire a broader understanding of how shapes and surfaces fit together. Form2Fit improves itself over time with minimal human supervision, gathering its own training data by repeatedly disassembling completed kits through trial and error, then time-reversing the disassembly sequences to get assembly trajectories. After training overnight for 12 hours, our robot learns effective pick and place policies for a variety of kits, achieving 94% assembly success rates with objects and kits in varying configurations, and over 86% assembly success rates when handling completely new objects and kits.

Data-Driven Shape Descriptors For Generalizable Assembly
The core component of Form2Fit is a two-stream matching network that learns to infer orientation-sensitive geometric pixel-wise descriptors for objects and their target placement locations from visual data. These descriptors can be understood as compressed 3D point representations that encode object geometry, textures, and contextual task-level knowledge. Form2Fit uses these descriptors to establish correspondences between objects and their target locations (i.e., where they should be placed). Since these descriptors are orientation-sensitive, they allow Form2Fit to infer how the picked object should be rotated before it is placed in its target location.

Form2Fit uses two additional networks to generate valid pick and place candidates. A suction network gets fed a 3D image of the objects and generates pixel-wise predictions of suction success. The suction probability map is visualized as a heatmap, where hotter pixels indicate better locations to grasp the object at the 3D location of the corresponding pixel. In parallel, a place network gets fed a 3D image of the target kit and outputs pixel-wise predictions of placement success. These, too, are visualized as a heatmap, where higher confidence values serve as better locations for the robot arm to approach from a top-down angle to place the object. Finally, the planner integrates the output of all three modules to produce the final pick location, place location and rotation angle.
Overview of Form2Fit. The suction and place networks infer candidate picking and placing locations in the scene respectively. The matching network generates pixel-wise orientation-sensitive descriptors to match picking locations to their corresponding placing locations. The planner then integrates it all to control the robot to execute the next best pick and place action.
Learning Assembly from Disassembly
Neural networks require large amounts of training data, which can be difficult to collect for tasks like assembly. Precisely inserting objects into tight spaces with the correct orientation (e.g., in kits) is challenging to learn through trial and error, because the chances of success from random exploration can be slim. In contrast, disassembling completed units is often easier to learn through trial and error, since there are fewer incorrect ways to remove an object than there are to correctly insert it. We leveraged this difference in order to amass training data for Form2Fit.
An example of self-supervision through time-reversal: rewinding a disassembly sequence of a deodorant kit over time generates a valid assembly sequence.
Our key observation is that in many cases of kit assembly, a disassembly sequence – when reversed over time – becomes a valid assembly sequence. This concept, called time-reversed disassembly, enables Form2Fit to train entirely through self-supervision by randomly picking with trial and error to disassemble a fully-assembled kit, then reversing that disassembly sequence to learn how the kit should be put together.

Generalization Results
The results of our experiments show great potential for learning generalizable policies for assembly. For instance, when a policy is trained to assemble a kit in only one specific position and orientation, it can still robustly assemble random rotations and translations of the kit 90% of the time.
Form2Fit policies are robust to a wide range of rotations and translations of the kits.
We also find that Form2Fit is capable of tackling novel configurations it has not been exposed to during training. For example, when training a policy on two single-object kits (floss and tape), we find that it can successfully assemble new combinations and mixtures of those kits, even though it has never seen such configurations before.
Form2Fit policies can generalize to novel kit configurations such as multiple versions of the same kit and mixtures of different kits.
Furthermore, when given completely novel kits on which it has not been trained, Form2Fit can generalize using its learned shape priors to assemble those kits with over 86% assembly accuracy.
Form2Fit policies can generalize to never-before-seen single and multi-object kits.
What Have the Descriptors Learned?
To explore what the descriptors of the matching network from Form2Fit have learned to encode, we visualize the pixel-wise descriptors of various objects in RGB colorspace through use of an embedding technique called t-SNE.
The t-SNE embedding of the learned object descriptors. Similarly oriented objects of the same category display identical colors (e.g. A, B or F, G) while different objects (e.g. C, H) and same objects but different orientation (e.g. A, C, D or H, F) exhibit different colors.
We observe that the descriptors have learned to encode (a) rotation — objects oriented differently have different descriptors (A, C, D, E) and (H, F); (b) spatial correspondence — same points on the same oriented objects share similar descriptors (A, B) and (F, G); and (c) object identity — zoo animals and fruits exhibit unique descriptors (columns 3 and 4).

Limitations & Future Work
While Form2Fit’s results are promising, its limitations suggest directions for future work. In our experiments, we assume a 2D planar workspace to constrain the kit assembly task so that it can be solved by sequencing top-down picking and placing actions. This may not work for all cases of assembly – for example, when a peg needs to be precisely inserted at a 45 degree angle. It would be interesting to expand Form2Fit to more complex action representations for 3D assembly.

You can learn more about this work and download the code from our GitHub repository.


Acknowledgments
This research was done by Kevin Zakka, Andy Zeng, Johnny Lee, and Shuran Song (faculty at Columbia University), with special thanks to Nick Hynes, Alex Nichol, and Ivan Krasin for fruitful technical discussions; Adrian Wong, Brandon Hurd, Julian Salazar, and Sean Snyder for hardware support; Ryan Hickman for valuable managerial support; and Chad Richards for helpful feedback on writing.

Source: Google AI Blog


Android Developer Challenge: helpful innovation, powered by On-Device Machine Learning + you!

Posted by The Android team

Android Developer Challenge banner

Developers like you have always played an important role in shaping the direction of Android, fueling the wave of Android innovation. It’s the reason that when we first launched the SDK for Android 10+ years ago, we simultaneously announced the Android Developer Challenge: a way to help reward model apps and show us what user problems you wanted to solve. As Android continues to push the boundaries into emerging areas like ML, 5G, foldables and more, we need your help to bring to life the consumer experiences that will define these new frontiers.

So we’re bringing back the Android Developer Challenge and asking you to help us unlock new experiences on Android, and help inspire other developers around these emerging technologies.

As we kick off this challenge, the first area we’ll be focusing on is On-Device Machine Learning. At Google, we’re big believers in how this new technology can open up a world of helpful innovation so you can get things done in ways you never thought possible. Take Live Captions: for the almost 500 million people who are deaf and hard of hearing, Live Captions bring content to life and is exactly the type of machine learning-powered innovation we expect to see more of someday, and with your help we can turn someday into today!

Bringing your idea to life in front of billions of eyes

Got an idea? Whether it’s still a concept or ready for users, tell us how you could use Google’s help, and how it supports the mission of using machine learning to help people get something done. Join the #AndroidDeveloperChallenge topic on GitHub, and share your idea as a repository under this topic. Don’t forget to come back here and officially submit your concept.

We’ll pick 10 concepts and provide expertise and guidance to those developers to help in their plans to bring their ideas to fruition. And once the app is ready, we’ll help showcase it in front of the billions of users on Google Play, through a collection and more. Here’s what those 10 developers will get:

Expertise and development support bootcamp: We’ll work with you to provide expertise and guidance to help in your plans to bring your app from concept to reality, including:

  • An all-expenses paid, working session with a panel of experts at Google HQ in Mountain View, CA
  • Google engineer mentorship at the bootcamp, providing guidance and technical expertise on how to help your plans to bring your app to fruition

Exposure and street cred! Once your idea is ready for prime-time, we’ll help you get users, and celebrate you to the broader Android community, including:

  • A collection on Google Play where we’ll feature your app (apps must be ready for Google Play on May 1, and must meet Google’s minimum quality requirements)
  • Tickets to Google I/O 2020
  • And we’ll celebrate these experiences to the broader Android developer community on developers.android.com. We might even showcase you at Google I/O, in places like the sandbox, sessions, perhaps even a keynote!

Helpful innovation is an important investment area for us on the Android team, and On-Device Machine Learning has played a critical role in powering new features in the last several releases of Android. We’re just beginning to scratch the surface, and we can’t wait to see what you come up with!

Coral moves out of beta

Posted by Vikram Tank (Product Manager), Coral Team

microchips on coral colored background

Last March, we launched Coral beta from Google Research. Coral helps engineers and researchers bring new models out of the data center and onto devices, running TensorFlow models efficiently at the edge. Coral is also at the core of new applications of local AI in industries ranging from agriculture to healthcare to manufacturing. We've received a lot of feedback over the past six months and used it to improve our platform. Today we’re thrilled to graduate Coral out of beta, into a wider, global release.

Coral is already delivering impact across industries, and several of our partners are including Coral in products that require fast ML inferencing at the edge.

In healthcare, Care.ai is using Coral to build a device that enables hospitals and care centers to respond quickly to falls, prevent bed sores, improve patient care, and reduce costs. Virgo SVS is also using Coral as the basis of a polyp detection system that helps doctors improve the accuracy of endoscopies.

In a very different use case, Olea Edge employs Coral to help municipal water utilities accurately measure the amount of water used by their commercial customers. Their Meter Health Analytics solution uses local AI to reduce waste and predict equipment failure in industrial water meters.

Nexcom is using Coral to build gateways with local AI and provide a platform for next-gen, AI-enabled IoT applications. By moving AI processing to the gateway, existing sensor networks can stay in service without the need to add AI processing to each node.

From prototype to production

Microchips on white background

Coral’s Dev Board is designed as an integrated prototyping solution for new product development. Under the heatsink is the detachable Coral SoM, which combines Google’s Edge TPU with the NXP IMX8M SoC, Wi-Fi and Bluetooth connectivity, memory, and storage. We’re happy to announce that you can now purchase the Coral SoM standalone. We’ve also created a baseboard developer guide to help integrate it into your own production design.

Our Coral USB Accelerator allows users with existing system designs to add local AI inferencing via USB 2/3. For production workloads, we now offer three new Accelerators that feature the Edge TPU and connect via PCIe interfaces: Mini PCIe, M.2 A+E key, and M.2 B+M key. You can easily integrate these Accelerators into new products or upgrade existing devices that have an available PCIe slot.

The new Coral products are available globally and for sale at Mouser; for large volume sales, contact our sales team. By the end of 2019, we'll continue to expand our distribution of the Coral Dev Board and SoM into new markets including: Taiwan, Australia, New Zealand, India, Thailand, Singapore, Oman, Ghana and the Philippines.

Better resources

We’ve also revamped the Coral site with better organization for our docs and tools, a set of success stories, and industry focused pages. All of it can be found at a new, easier to remember URL Coral.ai.

To help you get the most out of the hardware, we’re also publishing a new set of examples. The included models and code can provide solutions to the most common on-device ML problems, such as image classification, object detection, pose estimation, and keyword spotting.

For those looking for a more in-depth application—and a way to solve the eternal problem of squirrels plundering your bird feeder—the Smart Bird Feeder project shows you how to perform classification with a custom dataset on the Coral Dev board.

Finally, we’ll soon release a new version of the Mendel OS that updates the system to Debian Buster, and we're hard at work on more improvements to the Edge TPU compiler and runtime that will improve the model development workflow.

The official launch of Coral is, of course, just the beginning, and we’ll continue to evolve the platform. Please keep sending us feedback at [email protected].

Exploring Massively Multilingual, Massive Neural Machine Translation



“... perhaps the way [of translation] is to descend, from each language, down to the common base of human communication — the real but as yet undiscovered universal language — and then re-emerge by whatever particular route is convenient.”Warren Weaver, 1949

Over the last few years there has been enormous progress in the quality of machine translation (MT) systems, breaking language barriers around the world thanks to the developments in neural machine translation (NMT). The success of NMT however, owes largely to the great amounts of supervised training data. But what about languages where data is scarce, or even absent? Multilingual NMT, with the inductive bias that “the learning signal from one language should benefit the quality of translation to other languages”, is a potential remedy.

Multilingual machine translation processes multiple languages using a single translation model. The success of multilingual training for data-scarce languages has been demonstrated for automatic speech recognition and text-to-speech systems, and by prior research on multilingual translation [1,2,3]. We previously studied the effect of scaling up the number of languages that can be learned in a single neural network, while controlling the amount of training data per language. But what happens once all constraints are removed? Can we train a single model using all of the available data, despite the huge differences across languages in data size, scripts, complexity and domains?

In “Massively Multilingual Neural Machine Translation in the Wild: Findings and Challenges” and follow-up papers [4,5,6,7], we push the limits of research on multilingual NMT by training a single NMT model on 25+ billion sentence pairs, from 100+ languages to and from English, with 50+ billion parameters. The result is an approach for massively multilingual, massive neural machine translation (M4) that demonstrates large quality improvements on both low- and high-resource languages and can be easily adapted to individual domains/languages, while showing great efficacy on cross-lingual downstream transfer tasks.

Massively Multilingual Machine Translation
Though data skew across language-pairs is a great challenge in NMT, it also creates an ideal scenario in which to study transfer, where insights gained through training on one language can be applied to the translation of other languages. On one end of the distribution, there are high-resource languages like French, German and Spanish where there are billions of parallel examples, while on the other end, supervised data for low-resource languages such as Yoruba, Sindhi and Hawaiian, is limited to a few tens of thousands.
The data distribution over all language pairs (in log scale) and the relative translation quality (BLEU score) of the bilingual baselines trained on each one of these specific language pairs.
Once trained using all of the available data (25+ billion examples from 103 languages), we observe strong positive transfer towards low-resource languages, dramatically improving the translation quality of 30+ languages at the tail of the distribution by an average of 5 BLEU points. This effect is already known, but surprisingly encouraging, considering the comparison is between bilingual baselines (i.e., models trained only on specific language pairs) and a single multilingual model with representational capacity similar to a single bilingual model. This finding hints that massively multilingual models are effective at generalization, and capable of capturing the representational similarity across a large body of languages.
Translation quality comparison of a single massively multilingual model against bilingual baselines that are trained for each one of the 103 language pairs.
In our EMNLP’19 paper [5], we compare the representations of multilingual models across different languages. We find that multilingual models learn shared representations for linguistically similar languages without the need for external constraints, validating long-standing intuitions and empirical results that exploit these similarities. In [6], we further demonstrate the effectiveness of these learned representations on cross-lingual transfer on downstream tasks.
Visualization of the clustering of the encoded representations of all 103 languages, based on representational similarity. Languages are color-coded by their linguistic family.
Building Massive Neural Networks
As we increase the number of low-resource languages in the model, the quality of high-resource language translations starts to decline. This regression is recognized in multi-task setups, arising from inter-task competition and the unidirectional nature of transfer (i.e., from high- to low-resource). While working on better learning and capacity control algorithms to mitigate this negative transfer, we also extend the representational capacity of our neural networks by making them bigger by increasing the number of model parameters to improve the quality of translation for high-resource languages.

Numerous design choices can be made to scale neural network capacity, including adding more layers or making the hidden representations wider. Continuing our study on training deeper networks for translation, we utilized GPipe [4] to train 128-layer Transformers with over 6 billion parameters. Increasing the model capacity resulted in significantly improved performance across all languages by an average of 5 BLEU points. We also studied other properties of very deep networks, including the depth-width trade-off, trainability challenges and design choices for scaling Transformers to over 1500 layers with 84 billion parameters.

While scaling depth is one approach to increasing model capacity, exploring architectures that can exploit the multi-task nature of the problem is a very plausible complementary way forward. By modifying the Transformer architecture through the substitution of the vanilla feed-forward layers with sparsely-gated mixture of experts, we drastically scale up the model capacity, allowing us to successfully train and pass 50 billion parameters, which further improved translation quality across the board.
Translation quality improvement of a single massively multilingual model as we increase the capacity (number of parameters) compared to 103 individual bilingual baselines.
Making M4 Practical
It is inefficient to train large models with extremely high computational costs for every individual language, domain or transfer task. Instead, we present methods [7] to make these models more practical by using capacity tunable layers to adapt a new model to specific languages or domains, without altering the original.

Next Steps
At least half of the 7,000 languages currently spoken will no longer exist by the end of this century*. Can multilingual machine translation come to the rescue? We see the M4 approach as a stepping stone towards serving the next 1,000 languages; starting from such multilingual models will allow us to easily extend to new languages, domains and down-stream tasks, even when parallel data is unavailable. Indeed the path is rocky, and on the road to universal MT many promising solutions appear to be interdisciplinary. This makes multilingual NMT a plausible test bed for machine learning practitioners and theoreticians interested in exploring the annals of multi-task learning, meta-learning, training dynamics of deep nets and much more. We still have a long way to go.

Acknowledgements
This effort is built on contributions from Naveen Arivazhagan, Dmitry Lepikhin, Melvin Johnson, Maxim Krikun, Mia Chen, Yuan Cao, Yanping Huang, Sneha Kudugunta, Isaac Caswell, Aditya Siddhant, Wei Wang, Roee Aharoni, Sébastien Jean, George Foster, Colin Cherry, Wolfgang Macherey, Zhifeng Chen and Yonghui Wu. We would also like to acknowledge support from the Google Translate, Brain, and Lingvo development teams, Jakob Uszkoreit, Noam Shazeer, Hyouk Joong Lee, Dehao Chen, Youlong Cheng, David Grangier, Colin Raffel, Katherine Lee, Thang Luong, Geoffrey Hinton, Manisha Jain, Pendar Yousefi and Macduff Hughes.


* The Cambridge Handbook of Endangered Languages (Austin and Sallabank, 2011).

Source: Google AI Blog


ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots



Learning-based methods for solving robotic control problems have recently seen significant momentum, driven by the widening availability of simulated benchmarks (like dm_control or OpenAI-Gym) and advancements in flexible and scalable reinforcement learning techniques (DDPG, QT-Opt, or Soft Actor-Critic). While learning through simulation is effective, these simulated environments often encounter difficulty in deploying to real-world robots due to factors such as inaccurate modeling of physical phenomena and system delays. This motivates the need to develop robotic control solutions directly in the real world, on real physical hardware.

The majority of current robotics research on physical hardware is conducted on high-cost, industrial-quality robots (PR2, Kuka-arms, ShadowHand, Baxter, etc.) intended for precise, monitored operation in controlled environments. Furthermore, these robots are designed around traditional control methods that focus on precision, repeatability, and ease of characterization. This stands in sharp contrast with the learning-based methods that are robust to imperfect sensing and actuation, and demand (a) a high degree of resilience to allow real-world trial-and-error learning, (b) low cost and ease of maintenance to enable scalability through replication and (c) a reliable reset mechanism to alleviate strict human monitoring requirements.

In “ROBEL: Robotics Benchmarks for Learning with Low-Cost Robots”, to be presented at CoRL 2019, we introduce an open-source platform of cost-effective robots and curated benchmarks designed primarily to facilitate research and development on physical hardware in the real world. Analogous to an optical table in the field of optics, ROBEL serves as a rapid experimentation platform, supporting a wide range of experimental needs and the development of new reinforcement learning and control methods. ROBEL consists of D'Claw, a three-fingered hand robot that facilitates learning of dexterous manipulation tasks and D'Kitty, a four-legged robot that enables the learning of agile legged locomotion tasks. The robotic platforms are low-cost, modular, easy to maintain, and are robust enough to sustain on-hardware reinforcement learning from scratch.
Left: The 12 DoF D’Kitty; Middle: The 9 DoF D’Claw; Right: A functional D’Claw setup D’Lantern.
In order to make the robots relatively inexpensive and easy to build, we based ROBEL’s designs on off-the-shelf components and commonly-available prototyping tools (3D-printed or laser cut). Designs are easy to assemble and require only a few hours to build. Detailed part lists (with CAD details), assembly instructions, and software instructions for getting started are available here.

ROBEL Benchmarks
We devised a set of tasks suitable for each platform, D’Claw and D’Kitty, which can be used for benchmarking real-world robotic learning. ROBEL’s task definitions include both dense and sparse task objectives, and introduce metrics for hardware-safety in the task definition, which for example, indicate if joints are exceeding “safe” operating bounds or force thresholds. ROBEL also supports a simulator for all tasks to facilitate algorithmic development and rapid prototyping. D’Claw tasks are centered around three commonly observed manipulation behaviors — Pose, Turn, and Screw.
Left: Pose — Conform to the shape of the environment. Center: Turn — Turn the object to a specified angle. Right: Screw — Continuously rotate the object. (Click images for video.)
D’Kitty tasks are centered around three commonly observed locomotion behaviors — Stand, Orient, and Walk.
Left: Stand — Stand upright. Center: Orient — Align heading with the target. Right: Walk — Move to the target. (Click images for video.)
We evaluated several classes (on-policy, off policy, demo-accelerated, supervised) of deep reinforcement learning methods on each of these benchmark tasks. The evaluation results and the final policies are included as baselines in the software package for comparison. Full task details and baseline performances are available in the technical report.

Reproducibility & Robustness
ROBEL platforms are robust to sustain direct hardware training, and have clocked over 14,000 hours of real-world experience to-date. The platforms have significantly matured over the year. Owing to the modularity of the design, repairs are trivial and require minimal to no domain expertise, making the overall system easy to maintain.

To establish the replicability of the platforms and reproducibility of the benchmarks, ROBEL was studied in isolation by two different research labs. Only software distribution and documentation was used in this study. No in-person visits were allowed. Using ROBEL’s design files and assembly instructions both sites were able to replicate both hardware platforms. Benchmark tasks were trained on robots built at both sites. In the figure below we see that two D’Claw robots built at two different sites not only exhibit similar training progress but also converge to the same final performance, establishing reproducibility of the ROBEL benchmarks.
SAC training performance of a task on two real D’Claw robots developed at different laboratory locations.
Results Gallery
ROBEL has been useful in a variety of reinforcement learning studies so far. Below we highlight a few of the key results, and you can find all our results in this comprehensive gallery. D’Claw platforms are completely autonomous and can sustain reliable experimentation for an extended period of time, and has facilitated experimentation with a wide variety of reinforcement learning paradigms and tasks using both rigid and flexible objects.
Left: Flexible Objects — On-hardware training with DAPG effectively learns to turn flexible objects. We observe manipulation targeting the center of the valve where there is more rigidity. D'Claw is robust to on-hardware training, facilitating successful outcomes on hard to simulate tasks. Center: Disturbance Rejection — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with object perturbations (amongst others) being tested on hardware. We observe fingers working together to resist external disturbances. Right: Obstructed Finger — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with external perturbations (amongst others) being tested on hardware. We observe that free fingers fill in for the missing finger.
Importantly, D’Claw platforms are modular and easy to replicate, which facilitates scalable experimentation. With our scaled setup, we find that multiple D’Claws can collectively learn tasks faster by sharing experience.
On-hardware training with distributed version of SAC leaning to turn multiple objects to arbitrary angles in conjunction by sharing experience. Five tasks only need twice the amount of experience of single tasks, thanks to the multi-task formulation. In the video we observe five D'Claws turning different objects to 180 degrees (picked for visual effectiveness, actual policy can turn to any angle).
We have also been successful in deploying robust locomotion policies on the D’Kitty platform. Below we show a blind D’Kitty walking over indoor and outdoor terrains exhibiting the robustness of its gait in presence of unseen disturbances.
Left: Indoor – Walking in Clutter — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized perturbations learns to walk in clutter and step over objects. Center: Outdoor – Gravel and Branches — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized height field learns to walk outdoors over gravel and branches. Right: Outdoor – Slope and Grass — A Sim2Real policy trained via Natural Policy Gradient on MuJoCo simulation with randomized height field learns to handle moderate slopes.
When presented with information about its torso and objects present in the scene, D’Kitty can learn to interact with these objects exhibiting complex behaviors.
Left: Avoid Moving Obstacles — Policy trained via Hierarchical Sim2Real learns to avoid a moving block and reach the target (marked by the controller on the floor). Center: Push to Moving Goal — Policy trained via Hierarchical Sim2Real learns to push block towards a moving target (marked by the controller in the hand). Right: Co-ordinate — Policy trained via Hierarchical Sim2Real learns to coordinate two D'Kitties to push a heavy block towards a target (marked by two + signs on the floor).
In conclusion, ROBEL platforms are low cost, robust, reliable and are designed to accommodate the needs of the emerging learning-based paradigms that need scalability and resilience. We are proud to announce the release of ROBEL to the open source community and are excited to learn about the diversity of research and experimentation they will enable. For getting started on ROBEL platforms and ROBEL benchmarks refer to roboticsbenchmarks.org.

Acknowledgments
Google's ROBEL D'Claw evolved from earlier designs Vikash Kumar developed at the Universities of Washington and Berkeley. Multiple people across organizations have contributed towards the ROBEL projects. We thank our co-authors Henry Zhu (UC Berkeley), Kristian Hartikainen (UC Berkeley), Abhishek Gupta (UC Berkeley) and Sergey Levine (Google and UC Berkeley) for their contributions and extensive feedback throughout the project. We would like to acknowledge Matt Neiss (Google) and Chad Richards (Google) for their significant contribution to the platform designs. We would also like to thank Aravind Rajeshwaran (U-Washington), Emo Todorov (U-Washington), and Vincent Vanhoucke (Google) for their helpful discussions and comments throughout the project.

Source: Google AI Blog


Improving Quantum Computation with Classical Machine Learning



One of the primary challenges for the realization of near-term quantum computers has to do with their most basic constituent: the qubit. Qubits can interact with anything in close proximity that carries energy close to their own—stray photons (i.e., unwanted electromagnetic fields), phonons (mechanical oscillations of the quantum device), or quantum defects (irregularities in the substrate of the chip formed during manufacturing)—which can unpredictably change the state of the qubits themselves.

Further complicating matters, there are numerous challenges posed by the tools used to control qubits. Manipulating and reading out qubits is performed via classical controls: analog signals in the form of electromagnetic fields coupled to a physical substrate in which the qubit is embedded, e.g., superconducting circuits. Imperfections in these control electronics (giving rise to white noise), interference from external sources of radiation, and fluctuations in digital-to-analog converters, introduce even more stochastic errors that degrade the performance of quantum circuits. These practical issues impact the fidelity of the computation and thus limit the applications of near-term quantum devices.

To improve the computational capacity of quantum computers, and to pave the road towards large-scale quantum computation, it is necessary to first build physical models that accurately describe these experimental problems.

In “Universal Quantum Control through Deep Reinforcement Learning”, published in Nature Partner Journal (npj) Quantum Information, we present a new quantum control framework generated using deep reinforcement learning, where various practical concerns in quantum control optimization can be encapsulated by a single control cost function. Our framework provides a reduction in the average quantum logic gate error of up to two orders-of-magnitude over standard stochastic gradient descent solutions and a significant decrease in gate time from optimal gate synthesis counterparts. Our results open a venue for wider applications in quantum simulation, quantum chemistry and quantum supremacy tests using near-term quantum devices.

The novelty of this new quantum control paradigm hinges upon the development of a quantum control function and an efficient optimization method based on deep reinforcement learning. To develop a comprehensive cost function, we first need to develop a physical model for the realistic quantum control process, one where we are able to reliably predict the amount of error. One of the most detrimental errors to the accuracy of quantum computation is leakage: the amount of quantum information lost during the computation. Such information leakage usually occurs when the quantum state of a qubit gets excited to a higher energy state, or decays to a lower energy state through spontaneous emission. Leakage errors not only lose useful quantum information, they also degrade the “quantumness” and eventually reduce the performance of a quantum computer to that of a classical one.

A common practice to accurately evaluate the leaked information during the quantum computation is to simulate the whole computation first. However, this defeats the purpose of building large-scale quantum computers, since their advantage is that they are able to perform calculations infeasible for classical systems. With improved physical modeling, our generic cost function enables a joint optimization over the accumulated leakage errors, violations of control boundary conditions, total gate time, and gate fidelity.

With the new quantum control cost function in hand, the next step is to apply an efficient optimization tool to minimize it. Existing optimization methods turn out to be unsatisfactory in finding high fidelity solutions that are also robust to control fluctuations. Instead, we apply an on-policy deep reinforcement learning (RL) method, trusted-region RL, since this method exhibits good performance in all benchmark problems, is inherently robust to sample noise, and has the capability to optimize hard control problems with hundreds of millions of control parameters. The salient difference between this on-policy RL from previously studied off-policy RL methods is that the control policy is represented independently from the control cost. Off-policy RL, such as Q-learning, on the other hand, uses a single neural network (NN) to represent both the control trajectory, and the associated reward, where the control trajectory specifies the control signals to be coupled to qubits at different time steps, and the associated award evaluates how good the current step of the quantum control is.

On-policy RL is well known for its ability to leverage non-local features in control trajectories, which becomes crucial when the control landscape is high-dimensional and packed with a combinatorially large number of non-global solutions, as is often the case for quantum systems.

We encode the control trajectory into a three-layer, fully connected NN—the policy NN—and the control cost function into a second NN—the value NN—which encodes the discounted future reward. Robust control solutions were obtained by reinforcement learning agents, which trains both NNs under a stochastic environment that mimics a realistic noisy control actuation. We provide control solutions to a set of continuously parameterized two-qubit quantum gates that are important for quantum chemistry applications but are costly to implement using the conventional universal gate set.
Under this new framework, our numerical simulations show a 100x reduction in quantum gate errors and reduced gate times for a family of continuously parameterized simulation gates by an average of one order-of-magnitude over traditional approaches using a universal gate set.

This work highlights the importance of using novel machine learning techniques and near-term quantum algorithms that leverage the flexibility and additional computational capacity of a universal quantum control scheme. More experiments are needed to integrate machine learning techniques, such as the one developed in this work, into practical quantum computation procedures to fully improve its computational capacity through machine learning.

Source: Google AI Blog


An Inside Look at Flood Forecasting



Several years ago, we identified flood forecasts as a unique opportunity to improve people’s lives, and began looking into how Google’s infrastructure and machine learning expertise can help in this field. Last year, we started our flood forecasting pilot in the Patna region, and since then we have expanded our flood forecasting coverage, as part of our larger AI for Social Good efforts. In this post, we discuss some of the technology and methodology behind this effort.

The Inundation Model
A critical step in developing an accurate flood forecasting system is to develop inundation models, which use either a measurement or a forecast of the water level in a river as an input, and simulate the water behavior across the floodplain.
A 3D visualization of a hydraulic model simulating various river conditions.
This allows us to translate current or future river conditions, to highly spatially accurate risk maps - which tell us what areas will be flooded and what areas will be safe. Inundation models depend on four major components, each with its own challenges and innovations:

Real-time Water Level Measurements
To run these models operationally, we need to know what is happening on the ground in real-time, and thus we rely on partnerships with the relevant government agencies to receive timely and accurate information. Our first governmental partner is the Indian Central Water Commission (CWC), which measures water levels hourly in over a thousand stream gauges across all of India, aggregates this data, and produces forecasts based on upstream measurements. The CWC provides these real-time river measurements and forecasts, which are then used as inputs for our models.
CWC employees measuring water level and discharge near Lucknow, India.
Elevation Map Creation
Once we know how much water is in a river, it is critical that the models have a good map of the terrain. High-resolution digital elevation models (DEMs) are incredibly useful for a wide range of applications in the earth sciences, but are still difficult to acquire in most of the world, especially for flood forecasting. This is because meter-wide features of the ground conditions can create a critical difference in the resulting flooding (embankments are one exceptionally important example), but publicly accessible global DEMs have resolutions of tens of meters. To help address this challenge, we’ve developed a novel methodology to produce high resolution DEMs based on completely standard optical imagery.

We start with the large and varied collection of satellite images used in Google Maps. Correlating and aligning the images in large batches, we simultaneously optimize for satellite camera model corrections (for orientation errors, etc.) and for coarse terrain elevation. We then use the corrected camera models to create a depth map for each image. To make the elevation map, we optimally fuse the depth maps together at each location. Finally, we remove objects such as trees and bridges so that they don’t block water flow in our simulations. This can be done manually or by training convolutional neural networks that can identify where the terrain elevations need to be interpolated. The result is a roughly 1 meter DEM, which can be used to run hydraulic models.

Hydraulic Modeling
Once we have both these inputs - the riverine measurements and forecasts, and the elevation map - we can begin the modeling itself, which can be divided into two main components. The first and most substantial component is the physics-based hydraulic model, which updates the location and velocity of the water through time based on (an approximated) computation of the laws of physics. Specifically, we’ve implemented a solver for the 2D form of the shallow-water Saint-Venant equations. These models are suitably accurate when given accurate inputs and run at high resolutions, but their computational complexity creates challenges - it is proportional to the cube of the resolution desired. That is, if you double the resolution, you’ll need roughly 8 times as much processing time. Since we’re committed to the high-resolution required for highly accurate forecasts, this can lead to unscalable computational costs, even for Google!

To help address this problem, we’ve created a unique implementation of our hydraulic model, optimized for Tensor Processing Units (TPUs). While TPUs were optimized for neural networks (rather than differential equation solvers like our hydraulic model), their highly parallelized nature leads to the performance per TPU core being 85x times faster than the performance per CPU core. For additional efficiency improvements, we’re also looking at using machine learning to replace some of the physics-based algorithmics, extending data-driven discretization to two-dimensional hydraulic models, so we can support even larger grids and cover even more people.
A snapshot of a TPU-based simulation of flooding in Goalpara, mid-event.
As mentioned earlier, the hydraulic model is only one component of our inundation forecasts. We’ve repeatedly found locations where our hydraulic models are not sufficiently accurate - whether that’s due to inaccuracies in the DEM, breaches in embankments, or unexpected water sources. Our goal is to find effective ways to reduce these errors. For this purpose, we added a predictive inundation model, based on historical measurements. Since 2014, the European Space Agency has been operating a satellite constellation named Sentinel-1 with C-band Synthetic-Aperture Radar (SAR) instruments. SAR imagery is great at identifying inundation, and can do so regardless of weather conditions and clouds. Based on this valuable data set, we correlate historical water level measurements with historical inundations, allowing us to identify consistent corrections to our hydraulic model. Based on the outputs of both components, we can estimate which disagreements are due to genuine ground condition changes, and which are due to modeling inaccuracies.
Flood warnings across Google’s interfaces.
Looking Forward
We still have a lot to do to fully realize the benefits of our inundation models. First and foremost, we’re working hard to expand the coverage of our operational systems, both within India and to new countries. There’s also a lot more information we want to be able to provide in real time, including forecasted flood depth, temporal information and more. Additionally, we’re researching how to best convey this information to individuals to maximize clarity and encourage them to take the necessary protective actions.

Computationally, while the inundation model is a good tool for improving the spatial resolution (and therefore the accuracy and reliability) of existing flood forecasts, multiple governmental agencies and international organizations we’ve spoken to are concerned about areas that do not have access to effective flood forecasts at all, or whose forecasts don’t provide enough lead time for effective response. In parallel to our work on the inundation model, we’re working on some basic research into improved hydrologic models, which we hope will allow governments not only to produce more spatially accurate forecasts, but also achieve longer preparation time.

Hydrologic models accept as inputs things like precipitation, solar radiation, soil moisture and the like, and produce a forecast for the river discharge (among other things), days into the future. These models are traditionally implemented using a combination of conceptual models approximating different core processes such as snowmelt, surface runoff, evapotranspiration and more.
The core processes of a hydrologic model. Designed by Daniel Klotz, JKU Institute for Machine Learning.
These models also traditionally require a large amount of manual calibration, and tend to underperform in data scarce regions. We are exploring how multi-task learning can be used to address both of these problems — making hydrologic models both more scalable, and more accurate. In research collaboration with JKU Institute For Machine Learning group under Sepp Hochreiter on developing ML-based hydrologic models, Kratzert et al. show how LSTMs perform better than all benchmarked classic hydrologic models.
The distribution of NSE scores on basins across the United States for various models, showing the proposed EA-LSTM consistently outperforming a wide range of commonly used models.
Though this work is still in the basic research stage and not yet operational, we think it is an important first step, and hope it can already be useful for other researchers and hydrologists. It’s an incredible privilege to take part in the large eco-system of researchers, governments, and NGOs working to reduce the harms of flooding. We’re excited about the potential impact this type of research can provide, and look forward to where research in this field will go.

Acknowledgements
There are many people who contributed to this large effort, and we’d like to highlight some of the key contributors: Aaron Yonas, Adi Mano, Ajai Tirumali, Avinatan Hassidim, Carla Bromberg, Damien Pierce, Gal Elidan, Guy Shalev, John Anderson, Karan Agarwal, Kartik Murthy, Manan Singhi, Mor Schlesinger, Ofir Reich, Oleg Zlydenko, Pete Giencke, Piyush Poddar, Ruha Devanesan, Slava Salasin, Varun Gulshan, Vova Anisimov, Yossi Matias, Yi-fan Chen, Yotam Gigi, Yusef Shafi, Zach Moshe and Zvika Ben-Haim.


Source: Google AI Blog