Tag Archives: AI

Safety-first AI for autonomous data center cooling and industrial control

Many of society’s most pressing problems have grown increasingly complex, so the search for solutions can feel overwhelming. At DeepMind and Google, we believe that if we can use AI as a tool to discover new knowledge, solutions will be easier to reach.

In 2016, we jointly developed an AI-powered recommendation system to improve the energy efficiency of Google’s already highly-optimized data centers. Our thinking was simple: Even minor improvements would provide significant energy savings and reduce CO2 emissions to help combat climate change.

Now we’re taking this system to the next level: instead of human-implemented recommendations, our AI system is directly controlling data center cooling, while remaining under the expert supervision of our data center operators. This first-of-its-kind cloud-based control system is now safely delivering energy savings in multiple Google data centers.

How it works

Every five minutes, our cloud-based AI pulls a snapshot of the data center cooling system from thousands of sensors and feeds it into our deep neural networks, which predict how different combinations of potential actions will affect future energy consumption. The AI system then identifies which actions will minimize the energy consumption while satisfying a robust set of safety constraints. Those actions are sent back to the data center, where the actions are verified by the local control system and then implemented.

The idea evolved out of feedback from our data center operators who had been using our AI recommendation system. They told us that although the system had taught them some new best practices—such as spreading the cooling load across more, rather than less, equipment—implementing the recommendations required too much operator effort and supervision. Naturally, they wanted to know whether we could achieve similar energy savings without manual implementation.


We’re pleased to say the answer was yes!

We wanted to achieve energy savings with less operator overhead. Automating the system enabled us to implement more granular actions at greater frequency, while making fewer mistakes.
Dan Fuenffinger
Dan Fuenffinger
Data Center Operator, Google

Designed for safety and reliability

Google's data centers contain thousands of servers that power popular services including Google Search, Gmail and YouTube. Ensuring that they run reliably and efficiently is mission-critical. We've designed our AI agents and the underlying control infrastructure from the ground up with safety and reliability in mind, and use eight different mechanisms to ensure the system will behave as intended at all times.

One simple method we’ve implemented is to estimate uncertainty. For every potential action—and there are billions—our AI agent calculates its confidence that this is a good action. Actions with low confidence are eliminated from consideration.

Another method is two-layer verification. Optimal actions computed by the AI are vetted against an internal list of safety constraints defined by our data center operators. Once the instructions are sent from the cloud to the physical data center, the local control system verifies the instructions against its own set of constraints. This redundant check ensures that the system remains within local constraints and operators retain full control of the operating boundaries.

Most importantly, our data center operators are always in control and can choose to exit AI control mode at any time. In these scenarios, the control system will transfer seamlessly from AI control to the on-site rules and heuristics that define the automation industry today.

Find out about the other safety mechanisms we’ve developed below:

DME_DCIQ_v08-05.png

Increasing energy savings over time

Whereas our original recommendation system had operators vetting and implementing actions, our new AI control system directly implements the actions. We’ve purposefully constrained the system’s optimization boundaries to a narrower operating regime to prioritize safety and reliability, meaning there is a risk/reward trade off in terms of energy reductions.

Despite being in place for only a matter of months, the system is already delivering consistent energy savings of around 30 percent on average, with further expected improvements. That’s because these systems get better over time with more data, as the graph below demonstrates. Our optimization boundaries will also be expanded as the technology matures, for even greater reductions.

graph.gif

This graph plots AI performance over time relative to the historical baseline before AI control. Performance is measured by a common industry metric for cooling energy efficiency, kW/ton (or energy input per ton of cooling achieved). Over nine months, our AI control system performance increases from a 12 percent improvement (the initial launch of autonomous control) to around a 30 percent improvement.

Our direct AI control system is finding yet more novel ways to manage cooling that have surprised even the data center operators. Dan Fuenffinger, one of Google’s data center operators who has worked extensively alongside the system, remarked: "It was amazing to see the AI learn to take advantage of winter conditions and produce colder than normal water, which reduces the energy required for cooling within the data center. Rules don’t get better over time, but AI does."

We’re excited that our direct AI control system is operating safely and dependably, while consistently delivering energy savings. However, data centers are just the beginning. In the long term, we think there's potential to apply this technology in other industrial settings, and help tackle climate change on an even grander scale.

After NEXT 2018: Trends in higher education and research

From classrooms to campus infrastructure, higher education is rapidly adapting to cloud technology. So it’s no surprise that academic faculty and staff were well represented among panelists and attendees at this year’sGoogle Cloud Next. Several of our more than 500 breakout sessions at Next spoke to the needs of higher education, as did critical announcements like our partnership with the National Institutes of Health to make public biomedical datasets available to researchers. Here are ten major themes that came out of our higher education sessions at Next:

  1. Collaborating across campuses. Learning technologists from St. Norbert College, Lehigh University, University of Notre Dame, and Indiana University explained how G Suite and CourseKit, Google’s new integrated learning management tool, are helping teachers and students exchange ideas.
  2. Navigating change.Academic IT managers told stories of how they’ve overcome the organizational challenges of cloud migration and offered some tips for others: start small, engage key stakeholders, and take advantage of Google’s teams of engineers and representatives, who are enthusiastic and knowledgeable allies. According to Joshua Humphrey, Team Lead, Enterprise Computing, Georgia State University, "We've been using GCP for almost three years now and we've seen an average yearly savings of 44%. Whenever people ask why we moved to the cloud this is what we point to. Usability and savings."
  3. Fostering student creativity. In our higher education booth at Next, students demonstrated projects that extended their learning beyond the classroom. For example, students at California State University at San Bernardino built a mobile rover that checks internet connectivity on campus, and students at High Tech High used G Suite and Chromebooks to help them create their own handmade soap company.
  4. Reproducing scientific research. Science is built on consistent, reliable, repeatable findings. Academic research panelists at the University of Michigan are using Docker on Compute Engine to containerize pipeline tools so any researcher can run the same pipeline without having to worry about affecting the final outcome.
  5. Powering bioinformaticsToday’s biomedical research often requires storing and processing hundreds of terabytes of data. Teams at SUNY Downstate, Northeastern, and the University of South Carolina demonstrated how they used BigQuery and Compute Engine to build complex simulations and manage huge datasets for neuroscience, epidemiology, and environmental research.
  6. Accelerating genomics research. Moving data to the cloud enables faster processing to test more hypotheses and uncover insights. Researchers from Stanford, Duke, and Michigan showed how they streamlined their genomics workloads and cut months off their processing time using GCP.
  7. Democratizing access to deep learningAutoML Vision, Natural Language, and Translation, all in beta, were announced at Next and can help researchers build custom ML models without specialized knowledge in machine learning or coding. As Google’s Chief Scientist of AI and Machine Learning Fei-Fei Li noted in her blog post, Google’s aim “is to make AI not just more powerful, but more accessible.”
  8. Transforming LMS analytics. Scalable tools can turn the data collected by learning management systems and student information services into insights about student behavior. Google’s strategic partnership with Unizin allows a consortium of universities to integrate data and learning sciences, while Ivy Tech used ML Engine to build a predictive algorithm to improve student performance in courses.
  9. Personalizing machine learning and AI for student services. We’re seeing a growing trend of universities investigating AI to create virtual assistants. Recently Strayer University shared with us how they used Dialogflow to do just that, and at Next, Carnegie Mellon walked us through their process of building SARA, a socially-aware robot assistant.
  10. Strengthening security for academic IT. Natural disasters threaten on-premise data centers, with earthquakes, flooding, and hurricanes demanding robust disaster-recovery planning. Georgia State, the University of Minnesota, and Stanford’s Graduate School of Business shared how they improved the reliability and cost-efficiency of their data backup by migrating to GCP.
We've been using GCP for almost three years now and we've seen an average yearly savings of 44%. Whenever people ask why we moved to the cloud this is what we point to: usability and savings Joshua Humphrey
Enterprise Computing, Georgia State University



To learn more about our solutions for higher education, visit our website, explore our credits programs for teaching and research, or speak with a member of our team.

Android 9 Pie: Powered by AI for a smarter, simpler experience that adapts to you

https://lh6.googleusercontent.com/bKLmlK_9gXQcA9G0PhSyp2lCZ282r8irBHlyVflIjPCSl6D0wX0aaQGfTHYUoAqxzkFDvbZcveXcCjBIs7it2wEcjbPglWoDAMJH88btu7O_isbZM5VUdZPLfpDQFSz_Fy0EOl7N
The latest and greatest release of Android is officially here! Android 9 Pie comes with features that make your phone smarter and simpler to use, plus help you with your digital wellbeing.




The latest release of Android is here! And it comes with a heaping helping of artificial intelligence baked in to make your phone smarter, simpler and more tailored to you. Today we’re officially introducing Android 9 Pie.


We’ve built Android 9 to learn from you—and work better for you—the more you use it. From predicting your next task so you can jump right into the action you want to take, to prioritizing battery power for the apps you use most, to helping you disconnect from your phone at the end of the day, Android 9 adapts to your life and the ways you like to use your phone.


Tailored to you


Android 9 aims to make your phone even smarter by learning from you and adapting to your usage patterns. That’s why Android 9 comes with features like Adaptive Battery, which learns the apps you use most and prioritizes battery for them, and Adaptive Brightness, which learns how you like to set the brightness in different settings, and does it for you.
 
Android 9 also helps you get things done faster with App Actions, which predicts what you’ll want to do next based on your context and displays that action right on your phone. Say it’s Tuesday morning and you’re preparing for your commute: you’ll be suggested actions like navigating to work on Google Maps or resuming an audiobook with Google Play Books. And when you put in headphones after work, you may see options to call your mom or start your favorite Spotify playlist.
Later this fall, we’ll also roll out Slices (pie...slices...get it?!) which shows relevant information from your favorite apps when you need it. If you start typing “Lyft” into Google Search, you’ll see a “slice” of the Lyft app, showing prices for your ride home and the ETA for a driver so you can take action more quickly and easily.




Android 9 — now easy as pie
Making your phone smarter and more adaptive is important, but we also want Android to be easier to use and more approachable. In Android 9, we’ve introduced a new system navigation featuring a single home button.


This is especially helpful as phones grow taller and it’s more difficult to get things done on your phone with one hand. With a single, clean home button, you can swipe up to see a newly designed Overview, the spot where at a glance you have full-screen previews of your recently used apps.


You can swipe up from anywhere to see full-screen previews of recently used apps and simply tap to jump back into one of them. If you find yourself constantly switching between apps on your Pixel, we’ve got good news for you: Smart Text Selection (which recognizes the meaning of the text you’re selecting and suggests relevant actions) now works on the Overview of your recent apps, making it easier to perform the action you want.

Changing how you navigate your phone is a big deal, but small changes can make a big difference too. Android P also brings a redesigned Quick Settings, a better way to take and edit screenshots (say goodbye to the vulcan grip that was required before), simplified volume controls, an easier way to manage notifications and more. You’ll notice small changes like these across the platform, to help make the things you do all the time easier than ever.  


Find the balance that’s right for your life
While much of the time we spend on our phones is useful, many of us wish we could disconnect more easily and free up time for other things. In fact, over 70 percent of people we talked to in our research said they want more help with this. So we’ve been working to add key capabilities right into Android to help people achieve the balance with technology they’re looking for.  


At Google I/O in May, we previewed some of these digital wellbeing features for Android, including a new Dashboard that helps you understand how you’re spending time on your device; an App Timer that lets you set time limits on apps and grays out the icon on your home screen when the time is up; the new Do Not Disturb, which silences all the visual interruptions that pop up on your screen; and Wind Down, which switches on Night Light and Do Not Disturb and fades the screen to grayscale before bedtime.

Digital Wellbeing will officially launch on Pixel phones this fall, with Android One and other devices coming later this year. But these features are available in beta now for Pixel phones running Android 9. To try them out:

  1. Make sure you’re running Android 9 Pie on your device. (Learn how to check which version of Android you have.)
  2. Sign up for the beta with the email address you use with Google Play.
  3. Accept your invitation to become a beta tester by clicking the link in your welcome email.

Once you’ve accepted your invitation, Digital Wellbeing will appear in your phone’s Settings app. It may take up to 24 hours for Digital Wellbeing to appear on your device.


Security and privacy baked in


Improving security is always important in each of our platform releases.  In addition to continuously hardening the platform, and an improved security model for biometrics, Android 9 enables industry-leading hardware security capabilities to allow protecting sensitive data like credit card information using a secure, dedicated chip.  Android 9 also brings important privacy improvements, such as TLS by default and DNS over TLS to help protect all web communications and keep them private.


Coming to a device near you
Starting today, an over-the-air update to Android 9 will begin rolling out to Pixel phones. And devices that participated in the Beta program from Sony Mobile, Xiaomi, HMD Global, Oppo, Vivo, OnePlus, and Essential, as well as all qualifying Android One devices, will receive this update by the end of this fall! We are also working with a number of other partners to launch or upgrade devices to Android 9 this year.


Learn more about Android 9 Pie at android.com/9.

Posted by: Sameer Samat, VP of Product Management, Android & Google Play

New AIY Edge TPU Boards

Posted by Billy Rutledge, Director of AIY Projects

Over the past year and a half, we've seen more than 200K people build, modify, and create with our Voice Kit and Vision Kit products. Today at Cloud Next we announced two new devices to help professional engineers build new products with on-device machine learning(ML) at their core: the AIY Edge TPU Dev Board and the AIY Edge TPU Accelerator. Both are powered by Google's Edge TPU and represent our first steps towards expanding AIY into a platform for experimentation with on-device ML.

The Edge TPU is Google's purpose-built ASIC chip designed to run TensorFlow Lite ML models on your device. We've learned that performance-per-watt and performance-per-dollar are critical benchmarks when processing neural networks within a small footprint. The Edge TPU delivers both in a package that's smaller than the head of a penny. It can accelerate ML inferencing on device, or can pair with Google Cloud to create a full cloud-to-edge ML stack. In either configuration, by processing data directly on-device, a local ML accelerator increases privacy, removes the need for persistent connections, reduces latency, and allows for high performance using less power.

The AIY Edge TPU Dev Board is an all-in-one development board that allows you to prototype embedded systems that demand fast ML inferencing. The baseboard provides all the peripheral connections you need to effectively prototype your device — including a 40-pin GPIO header to integrate with various electrical components. The board also features a removable System-on-module (SOM) daughter board can be directly integrated into your own hardware once you're ready to scale.

The AIY Edge TPU Accelerator is a neural network coprocessor for your existing system. This small USB-C stick can connect to any Linux-based system to perform accelerated ML inferencing. The casing includes mounting holes for attachment to host boards such as a Raspberry Pi Zero or your custom device.

On-device ML is still in its early days, and we're excited to see how these two products can be applied to solve real world problems — such as increasing manufacturing equipment reliability, detecting quality control issues in products, tracking retail foot-traffic, building adaptive automotive sensing systems, and more applications that haven't been imagined yet.

Both devices will be available online this fall in the US with other countries to follow shortly.

For more product information visit g.co/aiy and sign up to be notified as products become available.

Announcing Cirq: An Open Source Framework for NISQ Algorithms



Over the past few years, quantum computing has experienced a growth not only in the construction of quantum hardware, but also in the development of quantum algorithms. With the availability of Noisy Intermediate Scale Quantum (NISQ) computers (devices with ~50 - 100 qubits and high fidelity quantum gates), the development of algorithms to understand the power of these machines is of increasing importance. However, a common problem when designing a quantum algorithm on a NISQ processor is how to take full advantage of these limited quantum devices—using resources to solve the hardest part of the problem rather than on overheads from poor mappings between the algorithm and hardware. Furthermore some quantum processors have complex geometric constraints and other nuances, and ignoring these will either result in faulty quantum computation, or a computation that is modified and sub-optimal.*

Today at the First International Workshop on Quantum Software and Quantum Machine Learning (QSML), the Google AI Quantum team announced the public alpha of Cirq, an open source framework for NISQ computers. Cirq is focused on near-term questions and helping researchers understand whether NISQ quantum computers are capable of solving computational problems of practical importance. Cirq is licensed under Apache 2, and is free to be modified or embedded in any commercial or open source package.
Once installed, Cirq enables researchers to write quantum algorithms for specific quantum processors. Cirq gives users fine tuned control over quantum circuits, specifying gate behavior using native gates, placing these gates appropriately on the device, and scheduling the timing of these gates within the constraints of the quantum hardware. Data structures are optimized for writing and compiling these quantum circuits to allow users to get the most out of NISQ architectures. Cirq supports running these algorithms locally on a simulator, and is designed to easily integrate with future quantum hardware or larger simulators via the cloud.
We are also announcing the release of OpenFermion-Cirq, an example of a Cirq based application enabling near-term algorithms. OpenFermion is a platform for developing quantum algorithms for chemistry problems, and OpenFermion-Cirq is an open source library which compiles quantum simulation algorithms to Cirq. The new library uses the latest advances in building low depth quantum algorithms for quantum chemistry problems to enable users to go from the details of a chemical problem to highly optimized quantum circuits customized to run on particular hardware. For example, this library can be used to easily build quantum variational algorithms for simulating properties of molecules and complex materials.

Quantum computing will require strong cross-industry and academic collaborations if it is going to realize its full potential. In building Cirq, we worked with early testers to gain feedback and insight into algorithm design for NISQ computers. Below are some examples of Cirq work resulting from these early adopters:
To learn more about how Cirq is helping enable NISQ algorithms, please visit the links above where many of the adopters have provided example source code for their implementations.

Today, the Google AI Quantum team is using Cirq to create circuits that run on Google’s Bristlecone processor. In the future, we plan to make this processor available in the cloud, and Cirq will be the interface in which users write programs for this processor. In the meantime, we hope Cirq will improve the productivity of NISQ algorithm developers and researchers everywhere. Please check out the GitHub repositories for Cirq and OpenFermion-Cirq — pull requests welcome!

Acknowledgements
We would like to thank Craig Gidney for leading the development of Cirq, Ryan Babbush and Kevin Sung for building OpenFermion-Cirq and a whole host of code contributors to both frameworks.


* An analogous situation is how early classical programmers needed to run complex programs in very small memory spaces by paying careful attention to the lowest level details of the hardware.

Source: Google AI Blog


Google AI in Ghana

We've seen people across Africa do amazing things with the internet and technology—for themselves, their communities and the world. Over the past 10 years in which Google has had offices in Africa, we've been excited to be a part of that transformation. Ultimately 10 million Africans will benefit from our digital skills training program with 2 million people having already completed the course, and we’re supporting 100,000 developers and over 60 tech startups through our Launchpad Accelerator Africa. We’re also adapting our products to make it easy for people to discover the best of the internet, even on low-RAM smartphones or unstable network connections.

In recent years we've also witnessed an increasing interest in machine learning research across the continent.  Events like Data Science Africa 2017 in Tanzania, the 2017 Deep Learning Indaba event in South Africa, and follow-on IndabaX events in 2018 in multiple countries have shown an exciting and continuing growth of the computer science research community in Africa.

Today, we’re announcing a Google AI research center in Africa, which will open later this year in Accra, Ghana. We’ll bring together top machine learning researchers and engineers in this new center dedicated to AI research and its applications.  

We’re committed to collaborating with local universities and research centers, as well as working with policy makers on the potential uses of AI in Africa. On a personal note, both of the authors have ties to Africa—Jeff spent part of his childhood in Uganda and Somalia, and Moustapha grew up in Senegal. As such, we’re excited to combine our research interests in AI and machine learning and our experience in Africa to push the boundaries of AI while solving challenges in areas such as healthcare, agriculture, and education.

AI has great potential to positively impact the world, and more so if the world is well represented in the development of new AI technologies. So it makes sense to us that the world should be well represented in the development of AI. Our new AI center in Accra joins the list of other locations around the world where we focus on AI. If you’re a machine learning researcher interested in joining this new center, you can apply as a Research Scientist or a Research Software Engineer. You can also view all our open opportunities on our site.

Offline translations are now a lot better thanks to on-device AI

Just about two years ago we introduced neural machine translation (NMT) to Google Translate, significantly improving accuracy of our online translations. Today, we’re bringing NMT technology offline—on device. This means that the technology will run in the Google Translate apps directly on your Android or iOS device, so that you can get high-quality translations even when you don't have access to an internet connection.

The neural system translates whole sentences at a time, rather than piece by piece. It uses broader context to help determine the most relevant translation, which it then rearranges and adjusts to sound more like a real person speaking with proper grammar. This makes translated paragraphs and articles a lot smoother and easier to read.

Offline translations can be useful when traveling to other countries without a local data plan, if you don’t have access to internet, or if you just don’t want to use cellular data. And since each language set is just 35-45MB, they won’t take too much storage space on your phone when you download them.

Comparison between phrase based translation and online/offline NMT

A comparison between our current phrase-based machine translation (PBMT), new offline neural machine translation (on-device), and online neural machine translation

To try NMT offline translations, go to your Translate app on Android or iOS. If you’ve used offline translations before, you’ll see a banner on your home screen which will take you to the right place to update your offline files. If not, go to your offline translation settings and tap the arrow next to the language name to download the package for that language. Now you’ll be ready to translate text whether you’re online or not. 

Google Translate offline NMT

We're rolling out this update in 59 languages over the next few days, so get out there and connect to the world around you!

Improving Deep Learning Performance with AutoAugment



The success of deep learning in computer vision can be partially attributed to the availability of large amounts of labeled training data — a model’s performance typically improves as you increase the quality, diversity and the amount of training data. However, collecting enough quality data to train a model to perform well is often prohibitively difficult. One way around this is to hardcode image symmetries into neural network architectures so they perform better or have experts manually design data augmentation methods, like rotation and flipping, that are commonly used to train well-performing vision models. However, until recently, less attention has been paid to finding ways to automatically augment existing data using machine learning. Inspired by the results of our AutoML efforts to design neural network architectures and optimizers to replace components of systems that were previously human designed, we asked ourselves: can we also automate the procedure of data augmentation?

In “AutoAugment: Learning Augmentation Policies from Data”, we explore a reinforcement learning algorithm which increases both the amount and diversity of data in an existing training dataset. Intuitively, data augmentation is used to teach a model about image invariances in the data domain in a way that makes a neural network invariant to these important symmetries, thus improving its performance. Unlike previous state-of-the-art deep learning models that used hand-designed data augmentation policies, we used reinforcement learning to find the optimal image transformation policies from the data itself. The result improved performance of computer vision models without relying on the production of new and ever expanding datasets.

Augmenting Training Data
The idea behind data augmentation is simple: images have many symmetries that don’t change the information present in the image. For example, the mirror reflection of a dog is still a dog. While some of these “invariances” are obvious to humans, many are not. For example, the mixup method augments data by placing images on top of each other during training, resulting in data which improves neural network performance.
Left: An original image from the ImageNet dataset. Right: The same image transformed by a commonly used data augmentation transformation, a horizontal flip about the center.
AutoAugment is an automatic way to design custom data augmentation policies for computer vision datasets, e.g., guiding the selection of basic image transformation operations, such as flipping an image horizontally/vertically, rotating an image, changing the color of an image, etc. AutoAugment not only predicts what image transformations to combine, but also the per-image probability and magnitude of the transformation used, so that the image is not always manipulated in the same way. AutoAugment is able to select an optimal policy from a search space of 2.9 x 1032 image transformation possibilities.

AutoAugment learns different transformations depending on what dataset it is run on. For example, for images involving street view of house numbers (SVHN) which include natural scene images of digits, AutoAugment focuses on geometric transforms like shearing and translation, which represent distortions commonly observed in this dataset. In addition, AutoAugment has learned to completely invert colors which naturally occur in the original SVHN dataset, given the diversity of different building and house numbers materials in the world.
Left: An original image from the SVHN dataset. Right: The same image transformed by AutoAugment. In this case, the optimal transformation was a result of shearing the image and inverting the colors of the pixels.
On CIFAR-10 and ImageNet, AutoAugment does not use shearing because these datasets generally do not include images of sheared objects, nor does it invert colors completely as these transformations would lead to unrealistic images. Instead, AutoAugment focuses on slightly adjusting the color and hue distribution, while preserving the general color properties. This suggests that the actual colors of objects in CIFAR-10 and ImageNet are important, whereas on SVHN only the relative colors are important.


Left: An original image from the ImageNet dataset. Right: The same image transformed by the AutoAugment policy. First, the image contrast is maximized, after which the image is rotated.
Results
Our AutoAugment algorithm found augmentation policies for some of the most well-known computer vision datasets that, when incorporated into the training of the neural network, led to state-of-the-art accuracies. By augmenting ImageNet data we obtain a new state-of-the-art accuracy of 83.54% top1 accuracy and on CIFAR10 we achieve an error rate of 1.48%, which is a 0.83% improvement over the default data augmentation designed by scientists. On SVHN, we improved the state-of-the-art error from 1.30% to 1.02%. Importantly, AutoAugment policies are found to be transferable — the policy found for the ImageNet dataset could also be applied to other vision datasets (Stanford Cars, FGVC-Aircraft, etc.), which in turn improves neural network performance.

We are pleased to see that our AutoAugment algorithm achieved this level of performance on many different competitive computer vision datasets and look forward to seeing future applications of this technology across more computer vision tasks and even in other domains such as audio processing or language models. The policies with the best performance are included in the appendix of the paper, so that researchers can use them to improve their models on relevant vision tasks.

Acknowledgements
Special thanks to the co-authors of the paper Dandelion Mane, Vijay Vasudevan, and Quoc V. Le. We’d also like to thank Alok Aggarwal, Gabriel Bender, Yanping Huang, Pieter-Jan Kindermans, Simon Kornblith, Augustus Odena, Avital Oliver, and Colin Raffel for their help with this project.

Source: Google AI Blog


Smart Compose: Using Neural Networks to Help Write Emails



Last week at Google I/O, we introduced Smart Compose, a new feature in Gmail that uses machine learning to interactively offer sentence completion suggestions as you type, allowing you to draft emails faster. Building upon technology developed for Smart Reply, Smart Compose offers a new way to help you compose messages — whether you are responding to an incoming email or drafting a new one from scratch.
In developing Smart Compose, there were a number of key challenges to face, including:
  • Latency: Since Smart Compose provides predictions on a per-keystroke basis, it must respond ideally within 100ms for the user not to notice any delays. Balancing model complexity and inference speed was a critical issue.
  • Scale: Gmail is used by more than 1.4 billion diverse users. In order to provide auto completions that are useful for all Gmail users, the model has to have enough modeling capacity so that it is able to make tailored suggestions in subtly different contexts.
  • Fairness and Privacy: In developing Smart Compose, we needed to address sources of potential bias in the training process, and had to adhere to the same rigorous user privacy standards as Smart Reply, making sure that our models never expose user’s private information. Furthermore, researchers had no access to emails, which meant they had to develop and train a machine learning system to work on a dataset that they themselves cannot read.
Finding the Right Model
Typical language generation models, such as ngramneural bag-of-words (BoW) and RNN language (RNN-LM) models, learn to predict the next word conditioned on the prefix word sequence. In an email, however, the words a user has typed in the current email composing session is only one “signal” a model can use to predict the next word. In order to incorporate more context about what the user wants to say, our model is also conditioned on the email subject and the previous email body (if the user is replying to an incoming email).

One approach to include this additional context is to cast the problem as a sequence-to-sequence (seq2seq) machine translation task, where the source sequence is the concatenation of the subject and the previous email body (if there is one), and the target sequence is the current email the user is composing. While this approach worked well in terms of prediction quality, it failed to meet our strict latency constraints by orders of magnitude.

To improve on this, we combined a BoW model with an RNN-LM, which is faster than the seq2seq models with only a slight sacrifice to model prediction quality. In this hybrid approach, we encode the subject and previous email by averaging the word embeddings in each field. We then join those averaged embeddings, and feed them to the target sequence RNN-LM at every decoding step, as the model diagram below shows.
Smart Compose RNN-LM model architecture. Subject and previous email message are encoded by averaging the word embeddings in each field. The averaged embeddings are then fed to the RNN-LM at each decoding step.
Accelerated Model Training & Serving
Of course, once we decided on this modeling approach we still had to tune various model hyperparameters and train the models over billions of examples, all of which can be very time-intensive. To speed things up, we used a full TPUv2 Pod to perform experiments. In doing so, we’re able to train a model to convergence in less than a day.

Even after training our faster hybrid model, our initial version of Smart Compose running on a standard CPU had an average serving latency of hundreds of milliseconds, which is still unacceptable for a feature that is trying to save users' time. Fortunately, TPUs can also be used at inference time to greatly speed up the user experience. By offloading the bulk of the computation onto TPUs, we improved the average latency to tens of milliseconds while also greatly increasing the number of requests that can be served by a single machine.

Fairness and Privacy
Fairness in machine learning is very important, as language understanding models can reflect human cognitive biases resulting in unwanted word associations and sentence completions. As Caliskan et al. point out in their recent paper “Semantics derived automatically from language corpora contain human-like biases”, these associations are deeply entangled in natural language data, which presents a considerable challenge to building any language model. We are actively researching ways to continue to reduce potential biases in our training procedures. Also, since Smart Compose is trained on billions of phrases and sentences, similar to the way spam machine learning models are trained, we have done extensive testing to make sure that only common phrases used by multiple users are memorized by our model, using findings from this paper.

Future work
We are constantly working on improving the suggestion quality of the language generation model by following state-of-the-art architectures (e.g., Transformer, RNMT+, etc.) and experimenting with most recent and advanced training techniques. We will deploy those more advanced models to production once our strict latency constraints can be met. We are also working on incorporating personal language models, designed to more accurately emulate an individual’s style of writing into our system.

Acknowledgements
Smart Compose language generation model was developed by Benjamin Lee, Mia Chen, Gagan Bansal, Justin Lu, Jackie Tsay, Kaushik Roy, Tobias Bosch, Yinan Wang, Matthew Dierker, Katherine Evans, Thomas Jablin, Dehao Chen, Vinu Rajashekhar, Akshay Agrawal, Yuan Cao, Shuyuan Zhang, Xiaobing Liu, Noam Shazeer, Andrew Dai, Zhifeng Chen, Rami Al-Rfou, DK Choe, Yunhsuan Sung, Brian Strope, Timothy Sohn, Yonghui Wu, and many others.

Source: Google AI Blog


Automatic Photography with Google Clips



To me, photography is the simultaneous recognition, in a fraction of a second, of the significance of an event as well as of a precise organization of forms which give that event its proper expression.
Henri Cartier-Bresson

The last few years have witnessed a Cambrian-like explosion in AI, with deep learning methods enabling computer vision algorithms to recognize many of the elements of a good photograph: people, smiles, pets, sunsets, famous landmarks and more. But, despite these recent advancements, automatic photography remains a very challenging problem. Can a camera capture a great moment automatically?

Recently, we released Google Clips, a new, hands-free camera that automatically captures interesting moments in your life. We designed Google Clips around three important principles:
  • We wanted all computations to be performed on-device. In addition to extending battery life and reducing latency, on-device processing means that none of your clips leave the device unless you decide to save or share them, which is a key privacy control.
  • We wanted the device to capture short videos, rather than single photographs. Moments with motion can be more poignant and true-to-memory, and it is often easier to shoot a video around a compelling moment than it is to capture a perfect, single instant in time.
  • We wanted to focus on capturing candid moments of people and pets, rather than the more abstract and subjective problem of capturing artistic images. That is, we did not attempt to teach Clips to think about composition, color balance, light, etc.; instead, Clips focuses on selecting ranges of time containing people and animals doing interesting activities.
Learning to Recognize Great Moments
How could we train an algorithm to recognize interesting moments? As with most machine learning problems, we started with a dataset. We created a dataset of thousands of videos in diverse scenarios where we imagined Clips being used. We also made sure our dataset represented a wide range of ethnicities, genders, and ages. We then hired expert photographers and video editors to pore over this footage to select the best short video segments. These early curations gave us examples for our algorithms to emulate. However, it is challenging to train an algorithm solely from the subjective selection of the curators — one needs a smooth gradient of labels to teach an algorithm to recognize the quality of content, ranging from "perfect" to "terrible."

To address this problem, we took a second data-collection approach, with the goal of creating a continuous quality score across the length of a video. We split each video into short segments (similar to the content Clips captures), randomly selected pairs of segments, and asked human raters to select the one they prefer.
We took this pairwise comparison approach, instead of having raters score videos directly, because it is much easier to choose the better of a pair than it is to specify a number. We found that raters were very consistent in pairwise comparisons, and less so when scoring directly. Given enough pairwise comparisons for any given video, we were able to compute a continuous quality score over the entire length. In this process, we collected over 50,000,000 pairwise comparisons on clips sampled from over 1,000 videos. That’s a lot of human effort!
Training a Clips Quality Model
Given this quality score training data, our next step was to train a neural network model to estimate the quality of any photograph captured by the device. We started with the basic assumption that knowing what’s in the photograph (e.g., people, dogs, trees, etc.) will help determine “interestingness”. If this assumption is correct, we could learn a function that uses the recognized content of the photograph to predict its quality score derived above from human comparisons.

To identify content labels in our training data, we leveraged the same Google machine learning technology that powers Google image search and Google Photos, which can recognize over 27,000 different labels describing objects, concepts, and actions. We certainly didn’t need all these labels, nor could we compute them all on device, so our expert photographers selected the few hundred labels they felt were most relevant to predicting the “interestingness” of a photograph. We also added the labels most highly correlated with the rater-derived quality scores.

Once we had this subset of labels, we then needed to design a compact, efficient model that could predict them for any given image, on-device, within strict power and thermal limits. This presented a challenge, as the deep learning techniques behind computer vision typically require strong desktop GPUs, and algorithms adapted to run on mobile devices lag far behind state-of-the-art techniques on desktop or cloud. To train this on-device model, we first took a large set of photographs and again used Google’s powerful, server-based recognition models to predict label confidence for each of the “interesting” labels described above. We then trained a MobileNet Image Content Model (ICM) to mimic the predictions of the server-based model. This compact model is capable of recognizing the most interesting elements of photographs, while ignoring non-relevant content.

The final step was to predict a single quality score for an input photograph from its content predicted by the ICM, using the 50M pairwise comparisons as training data. This score is computed with a piecewise linear regression model that combines the output of the ICM into a frame quality score. This frame quality score is averaged across the video segment to form a moment score. Given a pairwise comparison, our model should compute a moment score that is higher for the video segment preferred by humans. The model is trained so that its predictions match the human pairwise comparisons as well as possible.
Diagram of the training process for generating frame quality scores. Piecewise linear regression maps from an ICM embedding to a score which, when averaged across a video segment, yields a moment score. The moment score of the preferred segment should be higher.
This process allowed us to train a model that combines the power of Google image recognition technology with the wisdom of human raters–represented by 50 million opinions on what makes interesting content!

While this data-driven score does a great job of identifying interesting (and non-interesting) moments, we also added some bonuses to our overall quality score for phenomena that we know we want Clips to capture, including faces (especially recurring and thus “familiar” ones), smiles, and pets. In our most recent release, we added bonuses for certain activities that customers particularly want to capture, such as hugs, kisses, jumping, and dancing. Recognizing these activities required extensions to the ICM model.

Shot Control
Given this powerful model for predicting the “interestingness” of a scene, the Clips camera can decide which moments to capture in real-time. Its shot control algorithms follow three main principles:
  1. Respect Power & Thermals: We want the Clips battery to last roughly three hours, and we don’t want the device to overheat — the device can’t run at full throttle all the time. Clips spends much of its time in a low-power mode that captures one frame per second. If the quality of that frame exceeds a threshold set by how much Clips has recently shot, it moves into a high-power mode, capturing at 15 fps. Clips then saves a clip at the first quality peak encountered.
  2. Avoid Redundancy: We don’t want Clips to capture all of its moments at once, and ignore the rest of a session. Our algorithms therefore cluster moments into visually similar groups, and limit the number of clips in each cluster.
  3. The Benefit of Hindsight: It’s much easier to determine which clips are the best when you can examine the totality of clips captured. Clips therefore captures more moments than it intends to show to the user. When clips are ready to be transferred to the phone, the Clips device takes a second look at what it has shot, and only transfers the best and least redundant content.
Machine Learning Fairness
In addition to making sure our video dataset represented a diverse population, we also constructed several other tests to assess the fairness of our algorithms. We created controlled datasets by sampling subjects from different genders and skin tones in a balanced manner, while keeping variables like content type, duration, and environmental conditions constant. We then used this dataset to test that our algorithms had similar performance when applied to different groups. To help detect any regressions in fairness that might occur as we improved our moment quality models, we added fairness tests to our automated system. Any change to our software was run across this battery of tests, and was required to pass. It is important to note that this methodology can’t guarantee fairness, as we can’t test for every possible scenario and outcome. However, we believe that these steps are an important part of our long-term work to achieve fairness in ML algorithms.

Conclusion
Most machine learning algorithms are designed to estimate objective qualities – a photo contains a cat, or it doesn’t. In our case, we aim to capture a more elusive and subjective quality – whether a personal photograph is interesting, or not. We therefore combine the objective, semantic content of photographs with subjective human preferences to build the AI behind Google Clips. Also, Clips is designed to work alongside a person, rather than autonomously; to get good results, a person still needs to be conscious of framing, and make sure the camera is pointed at interesting content. We’re happy with how well Google Clips performs, and are excited to continue to improve our algorithms to capture that “perfect” moment!

Acknowledgements
The algorithms described here were conceived and implemented by a large group of Google engineers, research scientists, and others. Figures were made by Lior Shapira. Thanks to Lior and Juston Payne for video content.

Source: Google AI Blog