Tag Archives: Data Centers & Infrastructure

Safety-first AI for autonomous data center cooling and industrial control

Many of society’s most pressing problems have grown increasingly complex, so the search for solutions can feel overwhelming. At DeepMind and Google, we believe that if we can use AI as a tool to discover new knowledge, solutions will be easier to reach.

In 2016, we jointly developed an AI-powered recommendation system to improve the energy efficiency of Google’s already highly-optimized data centers. Our thinking was simple: Even minor improvements would provide significant energy savings and reduce CO2 emissions to help combat climate change.

Now we’re taking this system to the next level: instead of human-implemented recommendations, our AI system is directly controlling data center cooling, while remaining under the expert supervision of our data center operators. This first-of-its-kind cloud-based control system is now safely delivering energy savings in multiple Google data centers.

How it works

Every five minutes, our cloud-based AI pulls a snapshot of the data center cooling system from thousands of sensors and feeds it into our deep neural networks, which predict how different combinations of potential actions will affect future energy consumption. The AI system then identifies which actions will minimize the energy consumption while satisfying a robust set of safety constraints. Those actions are sent back to the data center, where the actions are verified by the local control system and then implemented.

The idea evolved out of feedback from our data center operators who had been using our AI recommendation system. They told us that although the system had taught them some new best practices—such as spreading the cooling load across more, rather than less, equipment—implementing the recommendations required too much operator effort and supervision. Naturally, they wanted to know whether we could achieve similar energy savings without manual implementation.


We’re pleased to say the answer was yes!

We wanted to achieve energy savings with less operator overhead. Automating the system enabled us to implement more granular actions at greater frequency, while making fewer mistakes.
Dan Fuenffinger
Dan Fuenffinger
Data Center Operator, Google

Designed for safety and reliability

Google's data centers contain thousands of servers that power popular services including Google Search, Gmail and YouTube. Ensuring that they run reliably and efficiently is mission-critical. We've designed our AI agents and the underlying control infrastructure from the ground up with safety and reliability in mind, and use eight different mechanisms to ensure the system will behave as intended at all times.

One simple method we’ve implemented is to estimate uncertainty. For every potential action—and there are billions—our AI agent calculates its confidence that this is a good action. Actions with low confidence are eliminated from consideration.

Another method is two-layer verification. Optimal actions computed by the AI are vetted against an internal list of safety constraints defined by our data center operators. Once the instructions are sent from the cloud to the physical data center, the local control system verifies the instructions against its own set of constraints. This redundant check ensures that the system remains within local constraints and operators retain full control of the operating boundaries.

Most importantly, our data center operators are always in control and can choose to exit AI control mode at any time. In these scenarios, the control system will transfer seamlessly from AI control to the on-site rules and heuristics that define the automation industry today.

Find out about the other safety mechanisms we’ve developed below:

DME_DCIQ_v08-05.png

Increasing energy savings over time

Whereas our original recommendation system had operators vetting and implementing actions, our new AI control system directly implements the actions. We’ve purposefully constrained the system’s optimization boundaries to a narrower operating regime to prioritize safety and reliability, meaning there is a risk/reward trade off in terms of energy reductions.

Despite being in place for only a matter of months, the system is already delivering consistent energy savings of around 30 percent on average, with further expected improvements. That’s because these systems get better over time with more data, as the graph below demonstrates. Our optimization boundaries will also be expanded as the technology matures, for even greater reductions.

graph.gif

This graph plots AI performance over time relative to the historical baseline before AI control. Performance is measured by a common industry metric for cooling energy efficiency, kW/ton (or energy input per ton of cooling achieved). Over nine months, our AI control system performance increases from a 12 percent improvement (the initial launch of autonomous control) to around a 30 percent improvement.

Our direct AI control system is finding yet more novel ways to manage cooling that have surprised even the data center operators. Dan Fuenffinger, one of Google’s data center operators who has worked extensively alongside the system, remarked: "It was amazing to see the AI learn to take advantage of winter conditions and produce colder than normal water, which reduces the energy required for cooling within the data center. Rules don’t get better over time, but AI does."

We’re excited that our direct AI control system is operating safely and dependably, while consistently delivering energy savings. However, data centers are just the beginning. In the long term, we think there's potential to apply this technology in other industrial settings, and help tackle climate change on an even grander scale.

Building a new data center in Singapore

We started building our first Southeast Asia data center in Singapore back in 2011, expanding quickly to a second building in 2015 due to the rapid growth in users and usage in the region.


The pace hasn’t slowed. In the three years since our last update, more than 70 million people in Southeast Asia have gotten online for the first time, bringing the region's total to more than 330 million–that’s more than the population of the United States.


More businesses are getting online too, so demand for our expanding Google Cloud Platform (GCP) offerings has grown quickly. Since first opening ourGCP region in Singapore last year, companies like Singapore Airlines, Ninjavan and Wego have joined the likes of GO-JEK and Carousell, using GCP to serve their customers globally.


To keep up with that demand, we’re starting work on a third facility in Singapore. Located in Jurong West, just down the road from our first two buildings (Singapore’s not a very large place), and looking something like the rendering below, this expansion will bring our long-term investment in Singapore data centers to USD $850 million. 

Sing DC 2

The multi-story facility will be one of the most efficient and environmentally friendly sites in Asia, in line with our global approach. It will feature the latest machine learning technology to reduce energy use. And we will use recycled water, diverting 100 percent of the data center’s waste away from landfill.

We’re looking forward to growing our small team at the data centers here, as well as expanding our ties with the local community. Data center Googlers like Haikal Fadly have been helping out with STEM workshops at the nearby Zhenghua Secondary School. Back in December we did a “Walk for Rice” hosted by the St Joseph’s Home for the Aged (our team on the walk below).

And we’re always on the lookout to contribute to nonprofits with good ideas for benefiting the community. So I’d like to encourage community organizations and registered nonprofits in Singapore seeking funding to reach out to us to learn more and apply for our annual grants program, with the applications window opening today.