We’re excited to announce that our second-generation Tensor Processing Units (TPUs) are coming to Google Cloud to accelerate a wide range of machine learning workloads, including both training and inference. We call them Cloud TPUs, and they will initially be available via Google Compute Engine.
We’ve witnessed extraordinary advances in machine learning (ML) over the past few years. Neural networks have dramatically improved the quality of Google Translate, played a key role in ranking Google Search results and made it more convenient to find the photos you want with Google Photos. Machine learning allowed DeepMind’s AlphaGo program to defeat Lee Sedol, one of the world’s top Go players, and also made it possible for software to generate natural-looking sketches.
These breakthroughs required enormous amounts of computation, both to train the underlying machine learning models and to run those models once they’re trained (this is called “inference”). We’ve designed, built and deployed a family of Tensor Processing Units, or TPUs, to allow us to support larger and larger amounts of machine learning computation, first internally and now externally.
While our first TPU was designed to run machine learning models quickly and efficiently—to translate a set of sentences or choose the next move in Go—those models still had to be trained separately. Training a machine learning model is even more difficult than running it, and days or weeks of computation on the best available CPUs and GPUs are commonly required to reach state-of-the-art levels of accuracy.
Research and engineering teams at Google and elsewhere have made great progress scaling machine learning training using readily-available hardware. However, this wasn’t enough to meet our machine learning needs, so we designed an entirely new machine learning system to eliminate bottlenecks and maximize overall performance. At the heart of this system is the second-generation TPU we're announcing today, which can both train and run machine learning models.
Each of these new TPU devices delivers up to 180 teraflops of floating-point performance. As powerful as these TPUs are on their own, though, we designed them to work even better together. Each TPU includes a custom high-speed network that allows us to build machine learning supercomputers we call “TPU pods.” A TPU pod contains 64 second-generation TPUs and provides up to 11.5 petaflops to accelerate the training of a single large machine learning model. That’s a lot of computation!
Using these TPU pods, we've already seen dramatic improvements in training times. One of our new large-scale translation models used to take a full day to train on 32 of the best commercially-available GPUs—now it trains to the same accuracy in an afternoon using just one eighth of a TPU pod.
Introducing Cloud TPUs
We’re bringing our new TPUs to Google Compute Engine as Cloud TPUs, where you can connect them to virtual machines of all shapes and sizes and mix and match them with other types of hardware, including Skylake CPUs and NVIDIA GPUs. You can program these TPUs with TensorFlow, the most popular open-source machine learning framework on GitHub, and we’re introducing high-level APIs, which will make it easier to train machine learning models on CPUs, GPUs or Cloud TPUs with only minimal code changes.
With Cloud TPUs, you have the opportunity to integrate state-of-the-art ML accelerators directly into your production infrastructure and benefit from on-demand, accelerated computing power without any up-front capital expenses. Since fast ML accelerators place extraordinary demands on surrounding storage systems and networks, we’re making optimizations throughout our Cloud infrastructure to help ensure that you can train powerful ML models quickly using real production data.
Our goal is to help you build the best possible machine learning systems from top to bottom. While Cloud TPUs will benefit many ML applications, we remain committed to offering a wide range of hardware on Google Cloud so you can choose the accelerators that best fit your particular use case at any given time. For example, Shazam recently announced that they successfully migrated major portions of their music recognition workloads to NVIDIA GPUs on Google Cloud and saved money while gaining flexibility.
Introducing the TensorFlow Research Cloud
Much of the recent progress in machine learning has been driven by unprecedentedly open collaboration among researchers around the world across both industry and academia. However, many top researchers don’t have access to anywhere near as much compute power as they need. To help as many researchers as we can and further accelerate the pace of open machine learning research, we'll make 1,000 Cloud TPUs available at no cost to ML researchers via the TensorFlow Research Cloud.