Tag Archives: Compute

7 best practices for building containers

Kubernetes Engine is a great place to run your workloads at scale. But before being able to use Kubernetes, you need to containerize your applications. You can run most applications in a Docker container without too much hassle. However, effectively running those containers in production and streamlining the build process is another story. There are a number of things to watch out for that will make your security and operations teams happier. This post provides tips and best practices to help you effectively build containers.

1. Package a single application per container

Get more details

A container works best when a single application runs inside it. This application should have a single parent process. For example, do not run PHP and MySQL in the same container: it’s harder to debug, Linux signals will not be properly handled, you can’t horizontally scale the PHP containers, etc. This allows you to tie together the lifecycle of the application to that of the container.
The container on the left follows the best practice. The container on the right does not.

2. Properly handle PID 1, signal handling, and zombie processes

Get more details

Kubernetes and Docker send Linux signals to your application inside the container to stop it. They send those signals to the process with the process identifier (PID) 1. If you want your application to stop gracefully when needed, you need to properly handle those signals.

Google Developer Advocate Sandeep Dinesh’s article —Kubernetes best practices: terminating with grace— explains the whole Kubernetes termination lifecycle.

3. Optimize for the Docker build cache

Get more details

Docker can cache layers of your images to accelerate later builds. This is a very useful feature, but it introduces some behaviors that you need to take into account when writing your Dockerfiles. For example, you should add the source code of your application as late as possible in your Dockerfile so that the base image and your application’s dependencies get cached and aren’t rebuilt on every build.

Take this Dockerfile as example:
FROM python:3.5
COPY my_code/ /src
RUN pip install my_requirements
You should swap the last two lines:
FROM python:3.5
RUN pip install my_requirements
COPY my_code/ /src
In the new version, the result of the pip command will be cached and will not be rerun each time the source code changes.

4. Remove unnecessary tools

Get more details

Reducing the attack surface of your host system is always a good idea, and it’s much easier to do with containers than with traditional systems. Remove everything that the application doesn’t need from your container. Or better yet, include just your application in a distroless or scratch image. You should also, if possible, make the filesystem of the container read-only. This should get you some excellent feedback from your security team during your performance review.

5. Build the smallest image possible

Get more details

Who likes to download hundreds of megabytes of useless data? Aim to have the smallest images possible. This decreases download times, cold start times, and disk usage. You can use several strategies to achieve that: start with a minimal base image, leverage common layers between images and make use of Docker’s multi-stage build feature.
The Docker multi-stage build process.

Google Developer Advocate Sandeep Dinesh’s article —Kubernetes best practices: How and why to build small container images— covers this topic in depth.

6. Properly tag your images

Get more details

Tags are how the users choose which version of your image they want to use. There are two main ways to tag your images: Semantic Versioning, or using the Git commit hash of your application. Whichever your choose, document it and clearly set the expectations that the users of the image should have. Be careful: while users expect some tags —like the “latest” tag— to move from one image to another, they expect other tags to be immutable, even if they are not technically so. For example, once you have tagged a specific version of your image, with something like “1.2.3”, you should never move this tag.

7. Carefully consider whether to use a public image

Get more details

Using public images can be a great way to start working with a particular piece of software. However, using them in production can come with a set of challenges, especially in a high-constraint environment. You might need to control what’s inside them, or you might not want to depend on an external repository, for example. On the other hand, building your own images for every piece of software you use is not trivial, particularly because you need to keep up with the security updates of the upstream software. Carefully weigh the pros and cons of each for your particular use-case, and make a conscious decision.

Next steps

You can read more about those best practices on Best Practices for Building Containers, and learn more about our Kubernetes Best Practices. You can also try out our Quickstarts for Kubernetes Engine and Container Builder.

Kubernetes 1.11: a look from inside Google

Congratulations to everyone involved in the recent Kubernetes 1.11 release. Now that the core has been stabilized, we here at Google have been focusing our upstream work on increasing Kubernetes’ plugability, i.e., moving more pieces out into other repositories. As the project has matured, adding a plugin no longer means "sending Tim Hockin a pull request," but instead means creating proper, well-defined interfaces with names like CNI, CRI and CSI. In fact, this maturity and extendability has been one of the things that helps us make Google Kubernetes Engine an enterprise-ready platform. Back in March, we gave you a look at what was new in Kubernetes 1.10. Now, with the release of 1.11, let’s take a look at the core Kubernetes work that Google is driving, as well as some of the innovation we've built on Kubernetes’ foundations in the last three months.

New features in 1.11

Priority and preemption
Pod priority and preemption is one of the main features of our internal scheduling system that lets us achieve high resource utilization in our data centers. We wrote about that key use case when we introduced it in Alpha in Kubernetes 1.9, and since then, we’ve added improved scheduling performance and better support for critical system pods. Now, we're pleased to move it to Beta in this release, meaning it’s enabled by default in Kubernetes Engine clusters that run 1.11. This is a feature that many users who run larger clusters have been waiting for!

Changes to CRDs
Custom Resource Definitions (CRDs) are one of the most popular extension mechanisms for Kubernetes, and new features in 1.11 make them even more powerful. CRDs are used for a broad array of Kubernetes extensions, for example to enable the use of Spark or Functions natively through the Kubernetes API.

Kubernetes objects have a schema version (e.g. v1beta1 or v1), but we only ever store one version in the etcd database. When you query an object at a particular version, a server-side conversion is done to convert the object to match the schema of the version you request.

Previously, CRD authors had to delete and recreate resources to move them between different versions. In 1.11, you can now define multiple versions for your own resources. The next step will be to enable server-side conversion for CRD, to allow for schema changes like renaming fields, without breaking existing clients.

Cloud Provider plugins
Google continues to invest in the long-term sustainability and multi-cloud portability of core Kubernetes. The Cloud Provider interface allows infrastructure providers to deliver a "batteries-included" experience for user workloads on their platform, powering common services like dynamic provisioning and management of storage and external load balancing for Services.

This code is currently compiled into Kubernetes core binaries. Google is leading a long running effort to extract this functionality into provider-specific repositories, in order to reduce the scope of the Kubernetes core. This will also allow providers to deliver enhancements and fixes to users more quickly than Kubernetes’ three-month release cadence. As a part of this effort, we’re excited to announce the creation of SIG-Cloud Provider to provide technical oversight and governance for this effort.

New features not in 1.11

That's not a headline you normally see, right?

One thing that is not in 1.11 — not even a bit of it — is Server-side Apply, a feature which moves the logic for kubectl apply from the client to server, making the expected behavior clearer, and allowing more clients to take advantage of server-side processing without shelling out to kubectl.

Normally, a feature like this would be committed to the project as it was built. But if a release is due, and the feature isn't ready, a large amount of effort would be required to go towards reverting it. Instead, Google has been leading the effort to introduce feature branches in Kubernetes, which let us work on long-running features in parallel to the main codebase. This lets us avoid last-minute scrambles to adjust for surprises, and is an example of how we are working to ensure the stability of the Kubernetes project.

Work on server-side apply is happening in the open in its feature branch, and we look forward to welcoming it into Kubernetes when it's ready — and not a moment before.

Kubernetes ecosystem work
Our work with Kubernetes doesn't stop at releasing core binaries every three months. Some of the work we are most excited about is in the form of extensions we've released since the last Kubernetes release:

We've thought a lot about how to declaratively manage application configuration. A common pattern that we saw was the use of templating solutions such as Helm (based on Google Cloud's Deployment Manager), which requires a user to learn a different configuration language than what the API server returns when you query it. A templating approach also means that if you download a YAML example, you have to turn it into a template before you can use it in your environment.

With kustomize, we're introducing a new approach to application definition. Kustomize allows you to apply overlays to existing YAML configurations, so you can customize a forked repository with your local changes, or define different configs for 'staging' and 'production' with different configs and replica counts.

Kustomize is well suited for a GitOps-style workflow, where there's a common base configuration that is tweaked in various directions with overlays to create different variants. The base and overlays can be managed by separate teams in different repositories.

Application API
Applications are made up of many services and resources, but the whole is more than the sum of its parts. After they are created, there is no well-defined way of identifying all the parts that relate to an application to Kubernetes. We want cluster users to be able to think in terms of their applications, and allow tools and UIs to define, update and display an application-centric view of your cluster.

The new Application API provides a way to aggregate Kubernetes components (e.g. Services, Deployments, StatefulSets, Ingresses, CRDs), and manage them as a group.

We have had contributions from friends at Samsung, Bitnami, Heptio, Red Hat and more, and we are looking for more contributions and feedback to ensure that the project adds value across the community.

The Application API is currently in Alpha. We hope to promote it to Beta in the next few weeks, and you'll hear more about it from us then.

Looking forward to Kubernetes Engine

If you'd like to get access to Kubernetes 1.11 on Kubernetes Engine ahead of general availability, please complete this form.

And if you liked reading this post, you'll love the Kubernetes Podcast from Google, which I co-host with Adam Glick. Every Tuesday we take a look at the week’s news and talk with Googlers or members of the wider Kubernetes community. So far we've spoken about product launches, processes and community, and this week we talk to the Kubernetes 1.11 release leads. Subscribe now!

New Cloud Filestore service brings GCP users high-performance file storage

As we celebrate the upcoming Los Angeles region for Google Cloud Platform (GCP) in one of the creative centers of the world, we’re really excited about helping you bring your creative visions to life. At Google, we want to empower artist collaboration and creation with high-performance cloud technology. We know folks need to create, read and write large files with low latency. We also know that film studios and production shops are always looking to render movies and create CGI images faster and more efficiently. So alongside our LA region launch, we’re pleased to enable these creative projects by bringing file storage capabilities to GCP for the first time with Cloud Filestore.

Cloud Filestorebeta is managed file storage for applications that require a file system interface and a shared file system. It gives users a simple, integrated, native experience for standing up fully managed network-attached storage (NAS) with their Google Compute Engine and Kubernetes Engine instances.

We’re pleased to add Cloud Filestore to the GCP storage portfolio because it enables native platform support for a broad range of enterprise applications that depend on a shared file system.

Cloud Filestore will be available as a storage option in the GCP console
We're especially excited about the high performance that Cloud Filestore offers to applications that require high throughput, low latency and high IOPS. Applications such as content management systems, website hosting, render farms and virtual workstations for artists typically require low-latency file operations, high-performance random I/O, and high throughput and performance for metadata-intensive operations. We’ve heard from some of our early users that they’ve saved time serving up websites with Cloud Filestore, cut down on hardware needs and sped up the compute-intensive process of rendering a movie.

Putting Cloud Filestore into practice

For organizations with lots of rich unstructured content, Cloud Filestore is a good place to keep it. For example, graphic design, video and image editing, and other media workflows use files as an input and files as the output. Filestore also helps creators access shared storage to manipulate and produce these types of large files. If you’re a web developer creating websites and blogs that serve file content to your audience, you’ll find it easy to integrate Cloud Filestore with web software like Wordpress. That’s what Jellyfish did.

Jellyfish is a boutique marketing agency focused on delivering high-performance marketing services to their global clients. A major part of that service is delivering a modern and flexible digital web presence.

“Wordpress hosts 30% of the world’s websites, so delivering a highly available and high performance Wordpress solution for our clients is critical to our business. Cloud Filestore enabled us to simply and natively integrate Wordpress on Kubernetes Engine , and take advantage of the flexibility that will provide our team.”
- Ashley Maloney, Lead DevOps Engineer at Jellyfish Online Marketing
Cloud Filestore also provides the reliability and consistency that latency-sensitive workloads need. One example is fuzzing, the process of running millions of permutations to identify security vulnerabilities in code. At Google, ClusterFuzz is the distributed fuzzing infrastructure behind Chrome and OSS-Fuzz that’s built for fuzzing at scale. The ClusterFuzz team needed a shared storage platform to store the millions of files that are used as input for fuzzing mutations.
“We focus on simplicity that helps us scale. Having grown from a hundred VMs to tens of thousands of VMs, we appreciate technology that is efficient, reliable, requires little to no configuration and scales seamlessly without management. It took one premium Filestore instance to support a workload that previously required 16 powerful servers. That frees us to focus on making Chrome and OSS safer and more reliable.”
- Abhishek Arya, Information Security Engineer, Google Chrome
Write once and read many is another type of workload where consistency and reliability are critical. At ever.ai, they’re training an advanced facial recognition platform on 12 billion photos and videos for tens of millions of users in 95 countries. The team constantly needs to share large amounts of data between many servers that will be written once but read a bunch. They faced a challenge in writing this data to a non-POSIX object storage, reading from which required custom code or to download the data. So they turned to Cloud Filestore.
“Cloud Filestore was easy to provision and mount, and reliable for the kind of workload we have. Having a POSIX file system that we can mount and use directly helps us speed-read our files, especially on new machines. We can also use the normal I/O features of any language and don’t have to use a specific SDK to use an object store."
- Charlie Rice, Chief Technology Officer, ever.ai
Cloud Filestore is also particularly helpful with rendering requirements. Rendering is the process by which media production companies create computer-generated images by running specialized imaging software to create one or more frames of a movie. We’ve just announced our newest GCP region in Los Angeles, where we expect there are more than a few of you visual effects artists and designers who can use Cloud Filestore. Let’s take a closer look at an example rendering workflow so you can see how Cloud Filestore can read and write data for this specialized purpose without tying up on-site hardware.

Using Cloud Filestore for rendering

When you render a movie, the rendering job typically runs across fleets ("render farms") of compute machines, all of which mount a shared file system. Chances are you’re doing this with on-premises machines and on-premises files, but with Cloud Filestore you now have a cloud option.

To get started, create a Cloud Filestore instance, and seed it with the 3D models and raw footage for the render. Set up your Compute Engine instance templates to mount the Cloud Filestore instance. Once that's set, spin up your render farm with however many nodes you need, and kick off your rendering job. The render nodes all concurrently read the same source data set from the Network File System (NFS) share, perform the rendering computations and write the output artifacts back to the share. Finally, your reassembly process reads the artifacts from Cloud Filestore and assembles it and writes into the final form.

Cloud Filestore Price and Performance

We offer two price-for-performance tiers. The high-performance Premium tier is $0.30 per GB per month, and the midrange performance Standard tier is $0.20 per GB per month in us-east1, us-central1, and us-west1 (Other regions vary). To keep your bill simple and predictable, we charge for provisioned capacity. You can resize on demand without downtime to a max of 64TB*. We do not charge per-operation fees. Networking is free in the same zone, and cross zone standard egress networking charges apply.

Cloud Filestore Premium instance throughput is designed to provide up to 700 MB/s and 30,000 IOPS for reads, regardless of the Cloud Filestore instance capacity. Standard instances are lower priced and performance scales with capacity, hitting peak performance at 10TB and above. A simple performance model makes it easier to predict costs and optimize configurations. High performance means your applications run faster. As you can see in the image below, the Cloud Filestore Premium tier outperforms the design goal with the specified benchmarks, based on performance testing we completed in-house.

Trying Cloud Filestore for yourself

Cloud Filestore will release into beta next month. To sign up to be notified about the beta release, complete this request form. Visit our Filestore page to learn more.

In addition to our new Cloud Filestore offering, we partner with many file storage providers to meet all of your file needs. We recently announced NetApp Cloud Volumes for GCP and you can find other partner solutions in our launcher.

If you’re interested in learning more about file storage from Google, check out this session at Next 2018 next month. For more information, and to register, visit the Next ‘18 website.

GPUs as a service with Kubernetes Engine are now generally available

[Editor's note: This is one of many posts on enterprise features enabled by Kubernetes Engine 1.10. For the full coverage, follow along here.]

Today, we’re excited to announce the general availability of GPUs in Google Kubernetes Engine, which have become one of the platform’s fastest growing features since they entered beta earlier this year, with core-hours soaring by 10X since the end of 2017.

Together with the GA of Kubernetes Engine 1.10, GPUs make Kubernetes Engine a great fit for enterprise machine learning (ML) workloads. By using GPUs in Kubernetes Engine for your CUDA workloads, you benefit from the massive processing power of GPUs whenever you need, without having to manage hardware or even VMs. We recently introduced the latest and the fastest NVIDIA Tesla V100 to the portfolio, and the P100 is generally available. Last but not least, we also offer the entry-level K80, which is largely responsible for the popularity of GPUs. All our GPU models are available as Preemptible GPUs, as a way to reduce costs while benefiting from GPUs in Google Cloud. Check out the latest prices for GPUs here.

As the growth in GPU core-hours indicates, our users are excited about GPUs in Kubernetes Engine. Ocado, the world’s largest online-only grocery retailer, is always looking to apply state-of-the-art machine learning models for Ocado.com customers and Ocado Smart Platform retail partners, and runs the models on preemptible, GPU-accelerated instances on Kubernetes Engine.
“GPU-attached nodes combined with Kubernetes provide a powerful, cost-effective and flexible environment for enterprise-grade machine learning. Ocado chose Kubernetes for its scalability, portability, strong ecosystem and huge community support. It’s lighter, more flexible and easier to maintain compared to a cluster of traditional VMs. It also has great ease-of-use and the ability to attach hardware accelerators such as GPUs and TPUs, providing a huge boost over traditional CPUs.”
— Martin Nikolov, Research Software Engineer, Ocado
GPUs in Kubernetes Engine also have a number of unique features:
  • Node Pools allow your existing cluster to use GPUs whenever you need.
  • Cluster Autoscaler automatically creates nodes with GPUs whenever pods requesting GPUs are scheduled, and scale down to zero when GPUs are no longer consumed by any active pods.
  • Taint and toleration technology ensures that only pods that request GPUs will be scheduled on the nodes with GPUs, and prevents pods that do not require GPUs from running on them.
  • Resource quota that allows administrators to limit resource consumption per namespace in a large cluster shared by multiple users or teams.
We also heard from you that you need an easy way to understand how your GPU jobs are performing: how busy the GPUs are, how much memory is available, and how much memory is allocated. We are thrilled to announce that you can now monitor those information natively from the GCP Console.You can also visualize these metrics in Stackdriver.
Fig 1. GPU memory usage and duty cycle 

The general availability of GPUs in Kubernetes Engine represents a lot of hard work behind the scenes, polishing the internals for enterprise workloads. Jiaying Zhang, the technical lead for this general availability, led the Device Plugins effort in Kubernetes 1.10, working closely with the OSS community to understand its needs, identify common requirements, and come up with an execution plan to build a production-ready system.

Try them today

To get started using GPUs in Kubernetes Engine using our free-trial of $300 credits, you’ll need to upgrade your account and apply for a GPU quota for the credits to take effect. For a more detailed explanation of Kubernetes Engine with GPUs, for example how to install NVIDIA drivers and how to configure a pod to consume GPUs, check out the documentation.

In addition to GPUs in Kubernetes Engine, Cloud TPUs are also now publicly available in Google Cloud. For example, RiseML uses Cloud TPUs in Kubernetes Engine for a hassle-free machine learning infrastructure that is easy-to-use, highly scalable, and cost-efficient. If you want to be among the first to access Cloud TPUs in Kubernetes Engine, join our early access program today.

Thanks for your feedback on how to shape our roadmap to better serve your needs. Keep the conversation going by connecting with us on the Kubernetes Engine Slack channel.

GCP arrives in the Nordics with a new region in Finland

Click here for the Finnish version, thank you!

Our sixteenth Google Cloud Platform (GCP) region, located in Finland, is now open for you to build applications and store your data.

The new Finland region, europe-north1, joins the Netherlands, Belgium, London, and Frankfurt in Europe and makes it easier to build highly available, performant applications using resources across those geographies.

Hosting applications in the new region can improve latencies by up to 65% for end-users in the Nordics and by up to 88% for end-users in Eastern Europe, compared to hosting them in the previously closest region. You can visit www.gcping.com to see for yourself how fast the Finland region is from your location.


The Nordic region has everything you need to build the next great application, and three zones that allow you to distribute applications and storage across multiple zones to protect against service disruptions.

You can also access our Multi-Regional services in Europe (such as BigQuery) and all the other GCP services via the Google Network, the largest cloud network as measured by number of points of presence. Please visit our Service Specific Terms to get detailed information on our data storage capabilities.

Build sustainably

The new region is located in our existing data center in Hamina. This facility is one of the most advanced and efficient data centers in the Google fleet. Our high-tech cooling system, which uses sea water from the Gulf of Finland, reduces energy use and is the first of its kind anywhere in the world. This means that when you use this region to run your compute workloads, store your data, and develop your applications, you are doing so sustainably.

Hear from our customers

“The road to emission-free and sustainable shipping is a long and challenging one, but thanks to exciting innovation and strong partnerships, Rolls-Royce is well-prepared for the journey. For us being able to train machine learning models to deliver autonomous vessels in the most effective manner is key to success. We see the Google Cloud for Finland launch as a great advantage to speed up our delivery of the project.”
– Karno Tenovuo, Senior Vice President Ship Intelligence, Rolls-Royce

“Being the world's largest producer of renewable diesel refined from waste and residues, as well as being a technologically advanced refiner of high-quality oil products, requires us to take advantage of leading-edge technological possibilities. We have worked together with Google Cloud to accelerate our journey into the digital future. We share the same vision to leave a healthier planet for our children. Running services on an efficient and sustainably operated cloud is important for us. And even better that it is now also available physically in Finland.”
– Tommi Touvila, Chief Information Officer, Neste

“We believe that technology can enhance and improve the lives of billions of people around the world. To do this, we have joined forces with visionary industry leaders such as Google Cloud to provide a platform for our future innovation and growth. We’re seeing tremendous growth in the market for our operations, and it’s essential to select the right platform. The Google Cloud Platform cloud region in Finland stands for innovation.”
– Anssi Rönnemaa, Chief Finance and Commercial Officer, HMD Global

“Digital services are key growth drivers for our renewal of a 108-year old healthcare company. 27% of our revenue is driven by digital channels, where modern technology is essential. We are moving to a container-based architecture running on GCP at Hamina. Google has a unique position to provide services within Finland. We also highly appreciate the security and environmental values of Google’s cloud operations.”
– Kalle Alppi, Chief Information Officer, Mehiläinen

Partners in the Nordics

Our partners in the Nordics are available to help design and support your deployment, migration and maintenance needs.

"Public cloud services like those provided by Google Cloud help businesses of all sizes be more agile in meeting the changing needs of the digital era—from deploying the latest innovations in machine learning to cost savings in their infrastructure. Google Cloud Platform's new Finland region enables this business optimization and acceleration with the help of cloud-native partners like Nordcloud and we believe Nordic companies will appreciate the opportunity to deploy the value to their best benefit.”
– Jan Kritz, Chief Executive Officer, Nordcloud

Nordic partners include: Accenture, Adapty, AppsPeople, Atea, Avalan Solutions, Berge, Cap10, Cloud2, Cloudpoint, Computas, Crayon, DataCenterFinland, DNA, Devoteam, Doberman, Deloitte, Enfo, Evry, Gapps, Greenbird, Human IT Cloud, IIH Nordic, KnowIT, Koivu Solutions, Lamia, Netlight, Nordcloud, Online Partners, Outfox Intelligence AB, Pilvia, Precis Digital, PwC, Quality of Service IT-Support, Qvik, Skye, Softhouse, Solita, Symfoni Next, Soprasteria, Tieto, Unifoss, Vincit, Wizkids, and Webstep.

If you want to learn more or wish to become a partner, visit our partners page.

Getting started

For additional details on the region, please visit our Finland region page where you’ll get access to free resources, whitepapers, the "Cloud On-Air" on-demand video series and more. Our locations page provides updates on the availability of additional services and regions. Contact us to request access to new regions and help us prioritize what we build next.

Try full-stack monitoring with Stackdriver on us

In advance of the new simplified Stackdriver pricing that will go into effect on June 30, we want to make sure everyone gets a chance to try Stackdriver. That’s why we’ve decided to offer the full power of Stackdriver, including premium monitoring, logging and application performance management (APM), to all customers—new and existing—for free until the new pricing goes into effect. This offer will be available starting June 18.

Stackdriver, our full-stack logging and monitoring tool, collects logs and metrics, as well as other data from your cloud apps and other sources, then generates useful dashboards, charts and alerts to let you act on information as soon as you get it. Here’s what’s included when you try Stackdriver:
  • Out-of-the-box observability across the entire Google Cloud Platform (GCP) and Amazon Web Services (AWS) services you use
  • Platform, system, application and custom metrics on demand with Metrics Explorer
  • Uptime checks to monitor the availability of the internet-facing endpoints you depend on
  • Alerting policies to let you know when something is wrong. Alerting and notification options, previously available only on the premium tier, are now available for free during this limited time
  • Access to logging and APM features like logs-based metrics, using Trace to understand application health, debugging live with debugger and more
Want to estimate your usage once the new pricing goes into effect? Check out our earlier blog post on viewing and managing your costs. You’ll see the various ways you can estimate usage to plan for the best use of Stackdriver monitoring in your environment. And if you are not already a Stackdriver user, you can sign up to try Stackdriver now!

Related content:

Introducing improved pricing for Preemptible GPUs

Not everyone needs the extra performance that GPUs bring to a compute workload, but those who do, really do. Earlier this year, we announced that you could attach GPUs to Preemptible VMs on Google Compute Engine and Google Kubernetes Engine, lowering the price of using GPUs by 50%. Today, Preemptible GPUs are generally available (GA) and we’ve lowered preemptible prices on our entire GPU portfolio to be 70% cheaper than GPUs attached to on-demand VMs.

Preemptible GPUs are ideal for customers with short-lived, fault-tolerant and batch workloads such as machine learning (ML) and high-performance computing (HPC). Customers get access to large-scale GPU infrastructure, predictable low pricing, without having to bid on capacity. GPUs attached to Preemptible VMs are the same as equivalent on-demand resources with two key differences: Compute Engine may shut them down after providing you a 30-second warning, and you can use them for a maximum of 24 hours. Any GPUs attached to a Preemptible VM instance will be considered Preemptible and will be billed at the lower rate.

We offer three different GPU platforms to choose from, making it easy to pick the right GPU for your workload.

GPU Hourly Pricing *
(Prices vary by location)
Previous Preemptible
(All Locations)
New Preemptible
(All Locations)
* GPU prices listed as hourly rate, per GPU attached to a VM that are billed by the second. Prices listed are for US regions. Prices for other regions may be different. Additional Sustained Use Discounts of up to 30% apply to GPU non-preemptible usage only.

Combined with custom machine types, Preemptible VMs with Preemptible GPUs let you build your compute stack with exactly the resources you need—and no more. Attaching Preemptible GPUs to custom Preemptible VMs allows you to reduce the amount of vCPU or host memory for your GPU VM, to save even further over  pre-defined VM shapes. Additionally, customers can use Preemptible Local SSD for a low-cost, high-performance storage option with our Preemptible GPUs. Check out this pricing calculator to configure your own preemptible environment.

The use-case for Preemptible GPUs
Hardware-accelerated infrastructure is in high demand among innovators, researchers, and academics doing machine learning research, particularly when coupled with the low, predictable pricing of Preemptible GPUs.

“Preemptible GPUs have been instrumental in enabling our research group to process large video collections at scale using our Scanner open-source platform. The predictable low cost makes it feasible for a single grad student to repeatedly deploy hundreds of GPUs in ML-based analyses of 100,000 hours of TV news video. This price drop enables us to perform twice the amount of processing with the same budget."
- Kayvon Fatahalian, Assistant Professor, Stanford University

Machine Learning Training and Preemptible GPUs
Training ML workloads is a great fit for Preemptible VMs with GPUs. Kubernetes Engine and  Compute Engine’s managed instance groups allow you to create dynamically scalable clusters of Preemptible VMs with GPUs for your large compute jobs. To help deal with Preemptible VM terminations, Tensorflow’s checkpointing feature can be used to save and restore work progress. An example and walk-through is provided here.

Getting Started
To get started with Preemptible GPUs in Google Compute Engine, simply append --preemptible to your instance create command in gcloud, specify scheduling.preemptible to true in the REST API or set Preemptibility to "On" in the Google Cloud Platform Console, and then attach a GPU as usual. You can use your regular GPU quota to launch Preemptible GPUs or, alternatively, you can request a special Preemptible GPUs quota that only applies to GPUs attached to Preemptible VMs. Check out our documentation to learn more. To learn how to use Preemptible GPUs with Google Kubernetes Engine, head over to our Kubernetes Engine GPU documentation.

For a certain class of workloads, Google Cloud GPUs provide exceptional compute performance. Now, with new low Preemptible GPU pricing, we invite you to see for yourself how easy it is to get the performance you need, at the low, predictable price that you want.

Time to “Hello, World”: VMs vs. containers vs. PaaS vs. FaaS

Do you want to build applications on Google Cloud Platform (GCP) but have no idea where to start? That was me, just a few months ago, before I joined the Google Cloud compute team. To prepare for my interview, I watched a bunch of GCP Next 2017 talks, to get up to speed with application development on GCP.

And since there is no better way to learn than by doing, I also decided to build a “Hello, World” web application on each of GCP’s compute offerings—Google Compute Engine (VMs), Google Kubernetes Engine (containers), Google App Engine (PaaS), and Google Cloud Functions (FaaS). To make this exercise more fun (and to do it in a single weekend), I timed things and took notes, the results of which I recently wrote up in a lengthy Medium post—check it out if you’re interested in following along and taking the same journey. 

So, where do I run my code?

At a high level, though, the question of which compute option to use is... it depends. Generally speaking, it boils down to thinking about the following three criteria:
  1. Level of abstraction (what you want to think about)
  2. Technical requirements and constraints
  3. Where your team and organization are going
Google Developer Advocate Brian Dorsey gave a great talk at Next last year on Deciding between Compute Engine, Container Engine, App Engine; here’s a condensed version:

As a general rule, developers prefer to take advantage of the higher levels of compute abstraction ladder, as it allows us to focus on the application and the problem we are solving, while avoiding undifferentiated work such as server maintenance and capacity planning. With Cloud Functions, all you need to think about is code that runs in response to events (developer's paradise!). But depending on the details of the problem you are trying to solve, technical constraints can pull you down the stack. For example, if you need a very specific kernel, you might be down at the base layer (Compute Engine). (For a good resource on navigating these decision points, check out: Choosing the right compute option in GCP: a decision tree.)

What programming language should I use?

GCP broadly supports the following programming languages: Go, Java, .NET, Node.js, PHP, Python, and Ruby (details and specific runtimes may vary by the service). The best language is a function of many factors, including the task at hand as well as personal preference. Since I was coming at this with no real-world backend development experience, I chose Node.js.

Quick aside for those of you who might be not familiar with Node.js: it’s an asynchronous JavaScript runtime designed for building scalable web application back-ends. Let’s unpack this last sentence:

  • Asynchronous means first-class support for asynchronous operations (compared to many other server-side languages where you might have to think about async operations and threading—a totally different mindset). It’s an ideal fit for most cloud applications, where a lot of operations are asynchronous. 
  • Node.js also is the easiest way for a lot of people who are coming from the frontend world (where JavaScript is the de-facto language) to start writing backend code. 
  • And there is also npm, the world’s largest collection of free, reusable code. That means you can import a lot of useful functionality without having to write it yourself.

Node.js is pretty cool, huh? I, for one, am convinced!

On your mark… Ready, set, go!

For my interview prep, I started with Compute Engine and VMs first, and then moved up the levels of compute service-abstraction ladder, to Kubernetes Engine and containers, App Engine and apps, and finally Cloud Functions. The following table provides a quick summary along with links to my detailed journey and useful getting started resources.

Getting from point A to point B
Time check and getting started resources
Compute Engine

Basic steps:
  1. Create & set up a VM instance
  2. Set up Node.js dev environment
  3. Code “Hello, World”
  4. Start Node server
  5. Expose the app to external traffic
  6. Understand how scaling works

4.5 hours

Kubernetes Engine

Basic steps:
  1. Code “Hello, World”
  2. Package the app into a container
  3. Push the image to Container Registry
  4. Create a Kubernetes cluster
  5. Expose the app to external traffic
  6. Understand how scaling works

6 hours

App Engine

Basic steps:
  1. Code “Hello, World”
  2. Configure an app.yaml project file
  3. Deploy the application
  4. Understand scaling options

1.5-2 hours

Cloud Functions

Basic steps:
  1. Code “Hello, World”
  2. Deploy the application

15 minutes

Time-to-results comparison

Although this might be somewhat like comparing apples and oranges, here is a summary of my results. (As a reminder, this is just in the context of standing up a “Hello, World” web application from scratch, all concerns such as running the app in production aside.)

Your speed-to-results could be very different depending on multiple factors, including your level of expertise with a given technology. My goal was to grasp the fundamentals of every option in the GCP’s compute stack and assess the amount of work required to get from point A to point B… That said, if there is ever a cross-technology Top Gear fighter jet vs. car style contest on standing up a scalable HTTP microservice from scratch, I wouldn’t be afraid to take on a Kubernetes grandmaster like Kelsey Hightower with Cloud Functions!

To find out more about application development on GCP, check out Computing on Google Cloud Platform. Don’t forget—you get $300 in free credits when you sign up.

Happy building!

Further reading on Medium:

Introducing sole-tenant nodes for Google Compute Engine — when sharing isn’t an option

Today we are excited to announce beta availability of sole-tenant nodes on Google Compute Engine. Sole-tenant nodes are physical Compute Engine servers designed for your dedicated use. Normally, VM instances run on physical hosts that may be shared by many customers.  With sole-tenant nodes, you have the host all to yourself.

You can launch instances using the same options you would use for regular compute instances, except on server capacity dedicated to you. You can launch instances of any shape (i.e., vCPU and memory). A placement algorithm automatically finds the optimal location to launch your instance across all your nodes. If you prefer more control, you can manually select the location upon which to launch your instances. Instances launched on sole-tenant nodes can take advantage of live migration to avoid downtime during proactive maintenance. Pricing remains simple--pay only for the nodes you use on a per-second basis with a one-minute minimum charge. Sustained use discounts automatically apply, as do any new or existing committed use discounts.

Sole-tenant nodes enable a number of valuable use cases:

  • Compliance and regulation - Organizations with strict compliance and regulatory requirements can use sole-tenant nodes with VM placement to facilitate physical separation of their compute resources in the cloud. 
  • Isolation and utilization - Control instance placement directly via user-defined labels, or let Compute Engine automatically handle instance placement across nodes. You can also create and launch different machine types, or “shapes” on your nodes, in order to achieve the highest level of utilization. 

It’s easy to get started with sole-tenant nodes. You can launch a VM onto a sole-tenant node from the Google Cloud SDK, as well as from the Compute Engine APIs (support for Google Cloud Console coming soon):

gcloud beta compute sole-tenancy node-templates create mynodetemplate
--node-type n1-node-96-624 --region us-central1 

gcloud beta compute sole-tenancy node-groups create mynodegroup
--node-template mynodetemplate --target-size 2 --zone us-central1-a

gcloud beta compute instances create my-vm --node-group
mynodegroup --custom-cpu 4 --custom-memory 8 --zone us-central1-a
For guidance on manual placements and more, check out the documentation. Visit the pricing page to learn more about the offering, as well as regional availability.

How to deploy geographically distributed services on Kubernetes Engine with kubemci

Increasingly, many enterprise Google Cloud Platform (GCP) customers use multiple Google Kubernetes Engine clusters to host their applications, for better resilience, scalability, isolation and compliance. In addition, their users expect to low-latency access to applications from anywhere around the world. Today we are introducing a new command-line interface (CLI) tool called kubemci to automatically configure ingress using Google Cloud Load Balancer (GCLB) for multi-cluster Kubernetes Engine environments. This allows you to use a Kubernetes Ingress definition to leverage GCLB along with multiple Kubernetes Engine clusters running in regions around the world, to serve traffic from the closest cluster using a single anycast IP address, taking advantage of GCP’s 100+ Points of Presence and global network. For more information on how the GCLB handles cross-region traffic see this link.

Further, kubemci will be the initial interface to an upcoming controller-based multi-cluster ingress (MCI) solution that can adapt to different use-cases and can be manipulated using the standard kubectl CLI tool or via Kubernetes API calls.

For example, in the picture below, we have created three independent Kubernetes Engine clusters and spread them across three continents (Asia, North America, and Europe). We then deployed the same service, “zone-printer”, to each of these clusters and used kubemci to create a single GCLB instance to stitch the services together. In this case, the 1000 requests-per-second (rps) from Tokyo are routed to the cluster in Asia, the New York requests are routed to the North American cluster, and the remaining 1 rps from London is routed to the European cluster. Because each of these requests arrive at the closest cluster to the end user they benefit from low round-trip latency. Additionally, if a region, cluster, or service were ever to become unavailable, GCLB automatically detects that and routes users to one of the other healthy service instances.

The feedback on kubemci has been great so far. Marfeel is a Spanish ad tech platform and has been using kubemci in production to improve their service offering:
“At Marfeel, we appreciate the value that this tool provides for us and our customers. Kubemci is simple to use and easily integrates with our current processes, helping to speed up our Multi-Cluster deployment process. In summary, kubemci offers us granularity, simplicity, and speed.”
-Borja García - SRE Marfeel

Getting started

To get started with kubemci, please check out the how-to guide, which contains information on the prerequisites along with step-by-step instructions on how to download the tool and set up your clusters, services and ingress objects.

As a quick preview, once your applications and services are running, you can set up a multi-cluster ingress by running the following command:
$ kubemci create my-mci --ingress=ingress.yaml \
To learn more, check out this talk on Multicluster Ingress by Google software engineers Greg Harmon and Nikhil Jindal, at KubeCon Europe in Copenhagen, demonstrating some initial work in this space.