Tag Archives: GKE

Transforming Kubernetes and GKE into the leading platform for AI/ML

The world is rapidly embracing the power of AI/ML, from training cutting-edge foundation models to deploying intelligent applications at scale. As these workloads become more sophisticated and demanding, the infrastructure required to support them must evolve. Kubernetes has emerged as the standard for container orchestration, but AI/ML introduces unique challenges that push traditional infrastructure to its limits.

AI training jobs often require massive scale, needing to coordinate thousands of specialized hardware like GPUs and TPUs. Reliability is critical, as failures can be costly for long running, large-scale training jobs. Efficient resource sharing across teams and workloads is essential given the expense of accelerators. Furthermore, deploying and scaling AI models for inference demands low latency and faster startup times for large container images and models.

At Google, we are deeply invested in the AI/ML revolution. This is why we are doubling down on our commitment to advancing Kubernetes as the foundational open standard for these workloads. Our strategy centers on evolving the core Kubernetes platform to meet the needs of the "next trillion core hours," specifically focusing on batch and AI/ML. We then bring these advancements, alongside enterprise-grade management and optimizations, to users through Google Kubernetes Engine (GKE).

Here's how we are transforming Kubernetes and GKE:

Redefining Kubernetes' relationship with specialized hardware

Kubernetes was initially designed for more uniform CPU compute. The surge of AI/ML brought new requirements for seamless integration and efficient management of expensive, sparse, and diverse accelerators. To support these new demands, Google has been a key investor in upstream Kubernetes to offer robust support for a diverse portfolio of the latest accelerators, including multiple generations of TPUs and a wide range of NVIDIA GPUs.

A core Kubernetes enhancement driven by Google and the community to better support AI/ML workloads is Dynamic Resource Allocation (DRA). This framework, developed in the heart of Kubernetes, provides a more flexible and extensible way for workloads to request and consume specialized hardware resources beyond traditional CPU and memory, which is crucial for efficiently managing accelerators. Building on such foundational open-source capabilities, GKE can then offer features like Custom Compute Classes, which improve the obtainability of these resources through intelligent fallback priorities across different capacity types like reservations, on-demand, and Spot instances. Google's active contributions to advanced resource management and scheduling capabilities within the Kubernetes community ensure that the platform evolves to meet the sophisticated demands of AI/ML, making efficient use of these specialized hardware resources more broadly accessible.

Unlocking scale and reliability

AI/ML workloads demand unprecedented scale and have new failure modes compared to traditional applications. GKE is built to handle this, supporting up to 65,000 nodes in a single cluster. We've demonstrated the ability to run the largest publicly announced training jobs, coordinating 50,000 TPU chips with near-ideal scaling efficiency.

Critically, we are enhancing core Kubernetes capabilities to support the scale and reliability needed for AI/ML. For instance, to better manage distributed AI workloads like serving large models split across multiple hosts, Google has been instrumental in developing features like JobSet (emerging from earlier concepts like LeaderWorkerSet) within the Kubernetes community (SIG Apps). This provides robust orchestration for co-scheduled, interdependent groups of Pods. We are also actively working upstream to improve Kubernetes reliability and stability through initiatives like Production Readiness Reviews, promoting safer upgrade paths, and enhancing etcd stability for the benefit of all Kubernetes users.

Optimizing Kubernetes performance for efficient inference

Low-latency and cost-efficient inference is critical for AI applications. For serving, the GKE Inference Gateway routes requests based on model server metrics like KVCache utilization and pending queue length, reducing serving costs by up to 30% and tail latency by 60% compared to traditional load balancing. We've even achieved vLLM fungibility across TPUs and GPUs, allowing users to serve the same model on either accelerator without incremental effort.

To address slow startup times for large AI/ML container images (often 20GB+), GKE offers rapid scale-out features. Secondary boot disks allow preloading container images and data, resulting in up to 29x faster container mounting time. GCS FUSE enables streaming data directly from Cloud Storage, leading to faster model load times. Furthermore, GKE Inference Quickstart provides data-driven, optimized Kubernetes deployment configurations, saving extensive benchmarking effort and enabling up to 30% lower cost, 60% lower tail latency, and 40% higher throughput.

Simplifying the Kubernetes experience and enhancing observability for AI/ML

We understand that data scientists and ML researchers may not be Kubernetes experts. Google aims to simplify the setup and management of AI-optimized Kubernetes clusters. This includes contributions to Kubernetes usability efforts and SIG-Usability. Managed offerings like GKE provide multiple paths to set up AI-optimized environments, from default configurations to customizable blueprints. Offerings like GKE Autopilot further abstract away infrastructure management, aiming for the ease of use that benefits all users.
Ensuring visibility into AI/ML workloads is paramount. Google actively supports and contributes to the integration of standard open-source observability tools within the Kubernetes ecosystem, such as Prometheus, Grafana, and OpenTelemetry. Building on this open foundation, GKE then provides enhanced, out-of-the-box observability integrated with popular AI frameworks & tools, including specific insights into workload startup latency and end-to-end tracing.

Looking ahead: continued investment in Open Source Kubernetes for AI/ML

The transformation continues. Our roadmap includes exciting developments in upstream Kubernetes for easily deploying and managing large-scale clusters, support for new GPU & TPU generations integrated through open-source mechanisms, and continued community-driven innovations in fast startup, reliability, and ease of use for AI/ML workloads.

Google is committed to making Kubernetes the premier open-source platform for AI/ML, pushing the boundaries of scale, performance, and efficiency while maintaining stability and ease of use. By driving innovation in core Kubernetes and building powerful, deeply integrated capabilities in our managed offering, GKE, we are empowering organizations to accelerate their AI/ML initiatives and unlock the next generation of intelligent applications built on an open foundation.

Come explore the possibilities with Kubernetes and GKE for your AI/ML workloads!

By Francisco Cabrera & Federico Bongiovanni, GCP Google Kubernetes Engine

Source: Google Open Source Blog

Kubernetes 1.33 is available on GKE!

Kubernetes 1.33 is now available in the Google Kubernetes Engine (GKE) Rapid Channel! For more information about the content of Kubernetes 1.33, read the official Kubernetes 1.33 Release Notes and the specific GKE 1.33 Release Notes.

Enhancements in 1.33:

In-place Pod Resizing

Workloads can be scaled horizontally by updating the Pod replica count, or vertically by updating the resources required in the Pods container(s). Before this enhancement, container resources defined in a Pod's spec were immutable, and updating any of these details within a Pod template would trigger Pod replacement impacting service's reliability.

In-place Pod Resizing (IPPR, Public Preview) allows you to change the CPU and memory requests and limits assigned to containers within a running Pod through the new /resize pod subresource, often without requiring a container restart decreasing service's disruptions.

This opens up various possibilities for vertical scale-up of stateful processes without any downtime, seamless scale-down when the traffic is low, and even allocating larger resources during startup, which can then be reduced once the initial setup is complete.

Review Resize CPU and Memory Resources assigned to Containers for detailed guidance on using the new API.

DRA

Kubernetes Dynamic Resource Allocation (DRA), currently in beta as of v1.33, offers a more flexible API for requesting devices than Device Plugin. (Instructions for opt-in beta features in GKE)

Recent updates include the promotion of driver-owned resource claim status to beta. New alpha features introduced are partitionable devices, device taints and tolerations for managing device availability, prioritized device lists for versatile workload allocation, and enhanced admin access controls. Preparations for general availability include a new v1beta2 API to improve user experience and simplify future feature integration, alongside improved RBAC rules and support for seamless driver upgrades. DRA is anticipated to reach general availability in Kubernetes v1.34.

containerd 2.0

With GKE 1.33, we are excited to introduce support for containerd 2.0. This marks the first major version update for the underlying container runtime used by GKE. Adopting this version ensures that GKE continues to leverage the latest advancements and security enhancements from the upstream containerd community.

It's important to note that as a major version update, containerd 2.0 introduces many new features and enhancements while also deprecating others. To ensure a smooth transition and maintain compatibility for your workloads, we strongly encourage you to review your Cloud Recommendations. These recommendations will help identify any workloads that may be affected by these changes. Please see "Migrate nodes to containerd 2" for detailed guidance on making your workloads forward-compatible.

Multiple Service CIDRs

This enhancement introduced a new implementation of allocation logic for Service IPs. The updated IP address allocator logic uses two newly stable API objects: ServiceCIDR and IPAddress. Now generally available, these APIs allow cluster administrators to dynamically increase the number of IP addresses available for Services by creating new ServiceCIDR objects.

Highlight of Googlers' contributions in 1.33 cycle:

Coordinated Leader Election

The Coordinated Leader Election feature progressed to beta, introducing significant enhancements in how a lease-candidate's availability is determined for an election. Specifically, the ping-acknowledgement checking process has been optimized to be fully concurrent instead of the previous sequential approach ensuring faster and more efficient detection of unresponsive candidates, which is essential for promptly identifying truly available lease candidates and maintaining the reliability of the leader election process.

Compatibility Versions

New CLI flags were added to apiserver as options for adjusting API enablement wrt an apiserver's emulated version. --emulation-forward-compatible is an option to implicitly enable all APIs which are introduced after the emulation version and have higher priority than APIs of the same group resource enabled at the emulation version.
--runtime-config-emulation-forward-compatible is an option to explicit enable specific APIs introduced after the emulation version through the runtime-config

zPages

ComponentStatusz and ComponentFlagz alpha features are now available to be turned on for all control plane components.
Components now expose two new HTTP endpoints, /statusz and /flagz, providing enhanced visibility into their internal state. /statusz details the component's uptime, golang, binary and emulation versions info, while /flagz reveals the command-line arguments used at startup.

Streaming List Responses

To improve cluster stability when handling large datasets, streaming encoding for List responses was introduced as a new Beta feature. Previously, serializing entire List responses into a single memory block could strain kube-apiserver memory. The new streaming encoder processes and transmits each item in a list individually, preventing large memory allocations. This significantly reduces memory spikes, improves API server reliability, and enhances overall cluster performance, especially for clusters with large resources, all while maintaining backward compatibility and requiring no client-side changes.

Snapshottable API server cache

Further enhancing API server performance and stability, a new Alpha feature introduces snapshotting to the watchcache. This allows serving LIST requests for historical or paginated data directly from its in-memory cache. Previously, these types of requests would query etcd directly, requiring to pipe the data through multiple encoding, decoding, and validation stages. This process often led to increased memory pressure, unpredictable performance, and potential stability issues, especially with large resources. By leveraging efficient B-tree based snapshotting within the watchcache, this enhancement significantly reduces direct etcd load and minimizes memory allocations on the API server. This results in more predictable performance, increased API server reliability, and better overall resource utilization, while incorporating mechanisms to ensure data consistency between the cache and etcd.

Declarative Validation

Kubernetes thrives on its large, vibrant community of contributors. We're constantly looking for ways to help make it easier to maintain and contribute to this project. For years, one area that posed challenges was how the Kubernetes API itself was validated: using hand-written Go code. This traditional method has proven to be difficult to authors, challenging to review and cumbersome to document, impacting overall maintainability and the contributor experience. To address these pain points, the declarative validation project was initiated.
In 1.33, the foundational infrastructure was established to transition Kubernetes API validation from handwritten Go code to a declarative model using IDL tags. This release introduced the validation-gen code generator, designed to parse these IDL tags and produce Go validation functions.

Ordered Namespace Deletion

The current namespace deletion process is semi-random, which may lead to security gaps or unintended behavior, such as Pods persisting after the deletion of their associated NetworkPolicies. By implementing an opinionated deletion mechanism, the Pods will be deleted before other resources with respect to logical and security dependencies. This design enhances the security and reliability of Kubernetes by mitigating risks arising from the non-deterministic deletion order.

Acknowledgements

As always, we want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. We would like to mention especially Googlers who helped drive the contributions mentioned in this blog: Tim Allclair, Natasha Sarkar, Vivek Bansal, Anish Shah, Dawn Chen, Tim Hockin, John Belamaric, Morten Torkildsen, Yu Liao,Cici Huang, Samuel Karp, Chris Henzie, Luiz Oliveira, Piotr Betkier, Alex Curtis, Jonah Peretz, Brad Hoekstra, Yuhan Yao, Ray Wainman, Richa Banker, Marek Siarkowicz, Siyuan Zhang, Jeffrey Ying, Henry Wu, Yuchen Zhou, Jordan Liggitt, Benjamin Elder, Antonio Ojea, Yongrui Lin, Joe Betz, Aaron Prindle and the Googlers who helped bring 1.33 to GKE!

- Benjamin Elder & Sen Lu, Google Kubernetes Engine

Source: Google Open Source Blog

Kubernetes 1.32 is now available on GKE

Kubernetes 1.32 is now available in the Google Kubernetes Engine (GKE) Rapid Channel, just one week after the OSS release! For more information about the content of Kubernetes 1.32, read the official Kubernetes 1.32 Release Notes and the specific GKE 1.32 Release Notes.

This release consists of 44 enhancements. Of those enhancements, 13 have graduated to Stable, 12 are entering Beta, and 19 have graduated to Alpha.

Kubernetes 1.32: Key Features

Dynamic Resource Allocation graduated to beta

Dynamic Resource Allocation graduated to beta, enabling advanced selection, configuration, scheduling and sharing of accelerators and other devices. As a beta API, using it in GKE clusters requires opt-in. You must also deploy a DRA-compatible kubelet plugin for your devices and use the DRA API instead of the traditional extended resource API used for the existing Device Plugin.

Support for more efficient API streaming

The Streaming lists operation has graduated to beta and is enabled by default; the new operation supplies the initial list needed by the list + watch data access pattern over a watch stream and improves kube-apiserver stability and resource usage by enabling informers to receive a continuous data stream. See k8s blog for more information.

Recovery from volume expansion failure

Support for recovery from volume expansion failure graduated to beta and is enabled by default. If a user initiates an invalid volume resize, for example by specifying a new size that is too big to be satisfied by the underlying storage system, expansion of PVC will continuously be retried and fail. With this new feature, such a PVC can now be edited to request a smaller size to unblock the PVC. The PVC can be monitored by watching .status.allocatedResourceStatuses and events on the PVC.

Job API for management by external controllers

Support in the Job API for the managed-by mechanism graduated to beta and is enabled by default. This enables integration with external controllers like MultiKueue.

Improved scheduling performance

The Kubernetes QueueingHint feature enhances scheduling throughput by preventing unnecessary scheduling retries. It’s achieved by allowing scheduler plugins to provide per-plugin callback functions that make efficient requeuing decisions.

Acknowledgements

By Federico Bongiovanni, Benjamin Elder, and Sen Lu – Google Kubernetes Engine

Source: Google Open Source Blog

Kubernetes 1.31 is now available on GKE, just one week after Open Source Release!

Kubernetes 1.31 is now available in the Google Kubernetes Engine (GKE) Rapid Channel, just one week after the OSS release! For more information about the content of Kubernetes 1.31, read the official Kubernetes 1.31 Release Notes and the specific GKE 1.31 Release Notes.

This release consists of 45 enhancements. Of those enhancements, 11 have graduated to Stable, 22 are entering Beta, and 12 have graduated to Alpha.

Kubernetes 1.31: Key Features

Field Selectors for Custom Resources

Kubernetes 1.31 makes it possible to use field selectors with custom resources. JSONPath expressions may now be added to the spec.versions[].selectableFields field in CustomResourceDefinitions to declare which fields may be used by field selectors. For example, if a custom resource has a spec.environment field, and the field is included in the selectableFields of the CustomResourceDefinition, then it is possible to filter by environment using a field selector like spec.environment=production. The filtering is performed on the server and can be used for both list and watch requests.

SPDY / Websockets migration

Kubernetes exposes an HTTP/REST interface, but a small subset of these HTTP/REST calls are upgraded to streaming connections. For example, both kubectl exec and kubectl port-forward use streaming connections. But the streaming protocol Kubernetes originally used (SPDY) has been deprecated for eight years. Users may notice this if they use a proxy or gateway in front of their cluster. If the proxy or gateway does not support the old, deprecated SPDY streaming protocol, then these streaming kubectl calls will not work. With this release, we have modernized the protocol for the streaming connections from SPDY to WebSockets. Proxies and gateways will now interact better with Kubernetes clusters.

Consistent Reads

Kubernetes 1.31 introduces a significant performance and reliability boost with the beta release of "Consistent Reads from Cache." This feature leverages etcd's progress notifications to allow Kubernetes to intelligently serve consistent reads directly from its watch cache, improving performance particularly for requests using label or field selectors that return only a small subset of a larger resource. For example, when a Kubelet requests a list of pods scheduled on its node, this feature can significantly reduce the overhead associated with filtering the entire list of pods in the cluster. Additionally, serving reads from the cache leads to more predictable request costs, enhancing overall cluster reliability.

Traffic Distribution for Services

The .spec.trafficDistribution field provides another way to influence traffic routing within a Kubernetes Service. While traffic policies focus on strict semantic guarantees, traffic distribution allows you to express preferences (such as routing to topologically closer endpoints). This can help optimize for performance, cost, or reliability.

Multiple Service CIDRs

Services IP ranges are defined during the cluster creation and can not be modified during the cluster lifetime. GKE also allocates the Service IP space from the VPC. When dealing with IP exhaustion problems, cluster admins needed to expand the assigned Service CIDR range. This new beta feature in Kubernetes 1.31 allows users to dynamically add Service CIDR ranges with zero downtime.

Acknowledgements

As always, we want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. From the features mentioned in this blog, we would like to mention especially Googlers Joe Betz, Jordan Liggitt, Sean Sullivan, Tim Hockin, Antonio Ojea, Marek Siarkowicz, Wojciech Tyczynski, Rob Scott, Gaurav Ghildiyal.

By Federico Bongiovanni – Google Kubernetes Engine

Source: Google Open Source Blog

Driving etcd Stability and Kubernetes Success

Introduction: The Critical Role of etcd in Cloud-Native Infrastructure

Imagine a cloud-native world without Kubernetes. It's hard, right? But have you ever considered the unsung hero that makes Kubernetes tick? Enter etcd, the distributed key-value store that serves as the central nervous system for Kubernetes. Etcd's ability to consistently store and replicate critical cluster state data is essential for maintaining the health and harmony of distributed systems.

etcd: The Backbone of Kubernetes

Think of Kubernetes as a magnificent vertebrate animal, capable of complex movements and adaptations. In this analogy, etcd is the animal's backbone – a strong, flexible structure that supports the entire system. Just as a backbone protects the spinal cord (which carries vital information), etcd safeguards the critical data that defines the Kubernetes environment. And just as a backbone connects to every other part of the body, etcd facilitates communication and coordination between all the components of Kubernetes, allowing it to move, adapt, and thrive in the dynamic world of distributed systems.

Credit: Original image xkcd.com/2347, alterations by Josh Berkus.

Google's Deep-Rooted Commitment to Open Source

Google has a long history of contributing to open source projects, and our commitment to etcd is no exception. As the initiator of Kubernetes, Google understands the critical role that etcd plays in its success. Google engineers consistently invest in etcd to enhance its functionality and reliability, driven by their extensive use of etcd for their own internal systems.

Google's Collaborative Impact on etcd Reliability

Google engineers have actively contributed to the stability and resilience of etcd, working alongside the wider community to address challenges and improve the project. Here are some key areas where their impact has been felt:

Post-Release Support: Following the release of etcd v3.5.0, Google engineers quickly identified and addressed several critical issues, demonstrating their commitment to maintaining a stable and production-ready etcd for Kubernetes and other systems.

Data Consistency: Early Detection and Swift Action: Google engineers led efforts to identify and resolve data inconsistency issues in etcd, advocating for public awareness and mitigation strategies. Drawing from their Site Reliability Engineering (SRE) expertise, they fostered a culture of "blameless postmortems" within the etcd community—a practice where the focus is on learning from incidents rather than assigning blame. Their detailed postmortem of the v3.5 data inconsistency issue and a co-presented KubeCon talk served to share these valuable lessons with the broader cloud-native community.

Refocusing on Stability and Testing: The v3.5 incident highlighted the need for more comprehensive testing and documentation. Google engineers took action on multiple fronts:

Improving Documentation: They contributed to creating "The Implicit Kubernetes-ETCD Contract," which formalizes the interactions between the two systems, guiding development and troubleshooting.

Automating Detection: They spearheaded the development of a more efficient data corruption detection mechanism for etcd v3.5, significantly reducing detection latency.

Prioritizing Stability and Testing: They developed the "etcd Robustness Tests," a rigorous framework simulating extreme scenarios to proactively identify inconsistency and correctness issues.

These contributions have fostered a collaborative environment where the entire community can learn from incidents and work together to improve etcd's stability and resilience. The etcd Robustness Tests have been particularly impactful, not only reproducing all the data inconsistencies found in v3.5 but also uncovering other bugs introduced in that version. Furthermore, they've found previously unnoticed bugs that existed in earlier etcd versions, some dating back to the original v3 implementation. These results demonstrate the effectiveness of the robustness tests and highlight how they've made etcd the most reliable it has ever been in the history of the project.

etcd Robustness Tests: Making etcd the Most Reliable It's Ever Been

The "etcd Robustness Tests," inspired by the Jepsen methodology, subject etcd to rigorous simulations of network partitions, node failures, and other real-world disruptions. This ensures etcd's data consistency and correctness even under extreme conditions. These tests have proven remarkably effective, identifying and addressing a variety of issues:

For deeper insights into ensuring etcd's data consistency, Marek Siarkowicz's talk, "On the Hunt for Etcd Data Inconsistencies," offers valuable information about distributed systems testing and the innovative approaches used to build these tests. To foster transparency and collaboration, the etcd community holds bi-weekly meetings to discuss test results, open to engineers, researchers, and other interested parties.

The Kubernetes-etcd Contract: A Partnership Forged in Rigorous Testing

To solidify the Kubernetes-etcd partnership, Google engineers formally defined the implicit contract between the two systems. This shared understanding guided development and troubleshooting, leading to improved testing strategies and ensuring etcd meets Kubernetes' demanding requirements.

When subtle issues were discovered in how Kubernetes utilized etcd watch, the value of this formal contract became clear. These issues could lead to missed events under specific conditions, potentially impacting Kubernetes' operation. In response, Google engineers are actively working to integrate the contract directly into the etcd Robustness Tests to proactively identify and prevent such compatibility issues.

Conclusion: Google's Continued Commitment to etcd and the Cloud-Native Community

Google's ongoing investment in etcd underscores their commitment to the stability and success of the cloud-native ecosystem. Their contributions, along with the wider community's efforts, have made etcd a more reliable and performant foundation for Kubernetes and other critical systems. As the ecosystem evolves, etcd remains a critical linchpin, empowering organizations to build and deploy distributed applications with confidence. We encourage all etcd and Kubernetes contributors to continue their active participation and contribute to the project's ongoing success.

By Marek Siarkowicz – GKE etcd

Source: Google Open Source Blog

The Kubernetes ecosystem is a candy store

For the 10th anniversary of Kubernetes, I wanted to look at the ecosystem we created together.

I recently wrote about the pervasiveness and magnitude of the Kubernetes and CNCF ecosystem. This was the result of a deliberate flywheel. This is a diagram I used several years ago:

Flywheel diagram of Kubernetes and CNCF ecosystem

Because Kubernetes runs on public clouds, private clouds, on the edge, etc., it is attractive to developers and vendors to build solutions targeting its users. Most tools built for Kubernetes or integrated with Kubernetes can work across all those environments, whereas integrating directly with cloud providers directly entails individual work for each one. Thus, Kubernetes created a large addressable market with a comparatively lower cost to build.

We also deliberately encouraged open source contribution, to Kubernetes and to other projects. Many tools in the ecosystem, not just those in CNCF, are open source. This includes many tools built by Kubernetes users and tools built by vendors but were too small to be products, as well as those intended to be the cores of products. Developers built and/or wrote about solutions to problems they experienced or saw, and shared them with the community. This made Kubernetes more usable and more visible, which likely attracted more users.

Today, the result is that if you need a tool, extension, or off-the-shelf component for pretty much anything, you can probably find one compatible with Kubernetes rather than having to build it yourself, and it’s more likely that you can find one that works out of the box with Kubernetes than for your cloud provider. And often there are several options to choose from. I’ll just mention a few. Also, I want to give a shout out to Kubetools, which has a great list of Kubernetes tools that helped me discover a few new ones.

For example, if you’re an application developer whose application runs on Kubernetes, you can build and deploy with Skaffold, test it on Kubernetes locally with Minikube, or connect to Kubernetes remotely with Telepresence, or sync to a preview environment with Gitpod or Okteto. When you need to debug multiple instances, you can use kubetail to view the logs in real time.

To deploy to production, you can use GitOps tools like FluxCD, ArgoCD, or Google Cloud’s Config Sync. You can perform database migrations with Schemahero. To aggregate logs from your production deployments, you can use fluentbit. To monitor them, you have your pick of observability tools, including Prometheus, which was inspired by Google’s Borgmon tool similar to how Kubernetes was inspired by Borg, and which was the 2nd project accepted into the CNCF.

If your application needs to receive traffic from the Internet, you can use one of the many Ingress controllers or Gateway implementations to configure HTTPS routing, and cert-manager to obtain and renew the certificates. For mutual TLS and advanced routing, you can use a service mesh like Istio, and take advantage of it for progressive delivery using tools like Flagger.

If you have a more specialized type of workload to run, you can run event-driven workloads using Knative, batch workloads using Kueue, ML workflows using Kubeflow, and Kafka using Strimzi.

If you’re responsible for operating Kubernetes workloads, to monitor costs, there’s kubecost. To enforce policy constraints, there’s OPA Gatekeeper and Kyverno. For disaster recovery, you can use Velero. To debug permissions issues, there are RBAC tools. And, of course, there are AI-powered assistants.

You can manage infrastructure using Kubernetes, such as using Config Connector or Crossplane, so you don’t need to learn a different syntax and toolchain to do that.

There are tools with a retro experience like K9s and Ktop, fun tools like xlskubectl, and tools that are both retro and fun like Kubeinvaders.

If this makes you interested in migrating to Kubernetes, you can use a tool like move2kube or kompose.

This just scratched the surface of the great tools available for Kubernetes. I view the ecosystem as more of a candy store than as a hellscape. It can take time to discover, learn, and test these tools, but overall I believe they make the Kubernetes ecosystem more productive. To develop any one of these tools yourself would require a significant time investment.

I expect new tools to continue to emerge as the use cases for Kubernetes evolve and expand. I can’t wait to see what people come up with.

By Brian Grant, Distinguished Engineer, Google Cloud Developer Experience

Source: Google Open Source Blog

Kubernetes 1.30 is now available in GKE in record time

Kubernetes 1.30 is now available in the Google Kubernetes Engine (GKE) Rapid Channel less than 20 days after the OSS release! For more information about the content of Kubernetes 1.30, read the Kubernetes 1.30 Release Notes and the specific GKE 1.30 Release Notes.

Control Plane Improvements

We're excited to announce that ValidatingAdmissionPolicy graduates to GA in 1.30. This is an exciting feature that enables many admission webhooks to be replaced with policies defined using the Common Expression Language (CEL) and evaluated directly in the kube-apiserver. This feature benefits both extension authors and cluster administrators by dramatically simplifying the development and operation of admission extensions. Many existing webhooks may be migrated to validating admission policies. For webhooks not ready or able to migrate, Match Conditions may be added to webhook configurations using CEL rules to pre-filter requests to reduce webhooks invocations.

Validation Ratcheting makes CustomResourceDefinitions even safer and easier to manage. Prior to Kubernetes 1.30, when updating a custom resource, validation was required to pass for all fields, even fields not changed by the update. Now, with this feature, only fields changed in the custom resource by an update request must pass validation. This limits validation failures on update to the changed portion of the object, and reduces the risk of controllers getting stuck when a CustomResourceDefinition schema is changed, either accidentally or as part of an effort to increase the strictness of validation.

Aggregated Discovery graduates to GA in 1.30, dramatically improving the performance of clients, particularly kubectl, when fetching the API information needed for many common operations. Aggregated discovery reduces the fetch to a single request and allows caches to be kept up-to-date by offering ETags that clients can use to efficiently poll the server for changes.

Data Plane Improvements

Dynamic Resource Allocation (DRA) is an alpha Kubernetes feature added in 1.26 that enables flexibility in configuring, selecting, and allocating specialized devices for pods. Feedback from SIG Scheduling and SIG Autoscaling revealed that the design needed revisions to reduce scheduling latency and fragility, and to support cluster autoscaling. In 1.30, the community introduced a new alpha design, DRA Structured Parameters, which takes the first step towards these goals. This is still an alpha feature with a lot of changes expected in upcoming releases. The newly formed WG Device Management has a charter to improve device support in Kubernetes - with a focus on GPUs and similar hardware - and DRA is a key component of that support. Expect further enhancements to the design in another alpha in 1.31. The working group has a goal of releasing some aspects to beta in 1.32.

Kubernetes continues the effort of eliminating perma-beta features: functionality that has long been used in production, but still wasn’t marked as generally available. With this release, AppArmor support got some attention and got closer to the final being marked as GA.

There are also quality of life improvements in Kubernetes Data Plane. Many of them will be only noticeable for system administrators and not particularly helpful for GKE users. This release, however, a notable Sleep Action KEP entered beta stage and is available on GKE. It will now be easier to use slim images while allowing graceful connections draining, specifically for some flavors of nginx images.

Acknowledgements

We want to thank all the Googlers that provide their time, passion, talent and leadership to keep making Kubernetes the best container orchestration platform. From the features mentioned in this blog, we would like to mention especially: Googlers Cici Huang, Joe Betz, Jiahui Feng, Alex Zielenski, Jeffrey Ying, John Belamaric, Tim Hockin, Aldo Culquicondor, Jordan Liggitt, Kuba Tużnik, Sergey Kanzhelev, and Tim Allclair.

Posted by Federico Bongiovanni – Google Kubernetes Engine

Source: Google Open Source Blog

Kubernetes 1.29 is available in the Regular channel of GKE

Kubernetes 1.29 is now available in the GKE Regular Channel since January 26th, and was available in the Rapid Channel January 11th, less than 30 days after the OSS release! For more information about the content of Kubernetes 1.29, read the Kubernetes 1.29 Release Notes.

New Features

Using CEL for Validating Admission Policy

Validating admission policies offer a declarative, in-process alternative to validating admission webhooks.

Validating admission policies use the Common Expression Language (CEL) to declare the validation rules of a policy. Validation admission policies are highly configurable, enabling policy authors to define policies that can be parameterized and scoped to resources as needed by cluster administrators. [source]

Validating Admission Policy graduates to beta in 1.29. We are especially excited about the work that Googlers Cici Huang, Joe Betz, and Jiahui Feng have led in this release to get to the beta milestone. As we move toward v1, we are actively working to ensure scalability and would appreciate any end-user feedback. [public doc here for those interested]

The beta of ValidatingAdmissionPolicy feature can be opted into by enabling the beta APIs.

InitContainers as a Sidecar

InitContainers can now be configured as sidecar containers and kept running alongside normal containers in a Pod. This is only supported by nodes running version 1.29 or later, so ensure all nodes in a cluster are at version 1.29 or later before using this feature in Pods. The feature was long awaited. This is evident by the fact that Istio has already widely tested it and the Istio community working hard to make sure that the enablement of it can be done early with minimal disruption for the clusters with older nodes. You can participate in the discussion here.

A big driver to deliver the feature is the growing number of AI/ML workloads which are often represented by Pods running to completion. Thos Pods need infrastructure sidecars - Istio and GCSFuse are examples of it, and Google recognizes this trend.

Implementation of sidecar containers is and continues to be the community effort. We are proud to highlight that Googler Sergey Kanzhelev is driving it via the Sidecar working group, and it was a great effort of many other Googlers to make sure this KEP landed so fast. John Howard made sure the early versions of implementation were tested with Istio, Wojciech Tyczyński made sure the safe rollout vie production readiness review, Tim Hockin spent many hours in API review of the feature, and Clayton Coleman gave advice and helped with code reviews.

New APIs

API Priority and Fairness/Flow Control

We are super excited to share that API Priority and Fairness graduated to Stable V1 / GA in 1.29! Controlling the behavior of the Kubernetes API server in an overload situation is a key task for cluster administrators, and this is what APF addresses. This ambitious project was initiated by Googler and founding API Machinery SIG lead Daniel Smith, and expanded to become a community-wide effort. Special thanks to Googler Wojciech Tyczyński and API Machinery members Mike Spreitzer from IBM and Abu Kashem from RedHat, for landing this critical feature in Kubernetes 1.29 (more details in the Kubernetes publication). In Google GKE we tested and utilized it early. In fact, any version above 1.26.4 is setting higher kubelet QPS values trusting the API server to handle it gracefully.

Deprecations and Removals

The previously deprecated v1beta2 Priority and Fairness APIs are no longer served in 1.29, so update usage to v1beta3 before upgrading to 1.29.
With the API Priority and Fairness graduation to v1, the v1beta3 Priority and Fairness APIs are newly deprecated in 1.29, and will no longer be served in 1.32.
In the Node API, take a look at the changes to the status.kubeProxyVersion field, which will not be populated starting in v1.33. The field is currently populated with the kubelet version, not the kube-proxy version, and might not accurately reflect the kube-proxy version in use. For more information, see KEP-4004.
1.29 removed support for the insecure SHA1 algorithm. To prevent impact on your clusters, you must replace incompatible certificates of webhook servers and extension API servers before upgrading your clusters to version 1.29.

GKE will not auto-upgrade clusters with webhook backends using incompatible certificates to 1.29 until you replace the certificates or until version 1.28 reaches end of life. For more information refer to the official GKE documentation.

The Ceph CephFS (kubernetes.io/cephfs) and RBD (kubernetes.io/rbd) volume plugins are deprecated since 1.28 and will be removed in a future release

For more information, refer to the OSS Kubernetes announcement and https://github.com/ceph/ceph-csi/

Shoutout to the Production Readiness Review (PRR) team

For each new Kubernetes Release, there is a dedicated sub group of SIG Architecture, composed of very senior contributors in the Kubernetes Community, that regularly conducts Production Readiness reviews for each new release, going through each feature.

OSS Production Readiness Reviews (PRR) reduce toil for all the different Cloud Providers, by shifting the effort onto OSS developers.
OSS Production Readiness Reviews surface production safety, observability, and scalability issues with OSS features at design time, when it is still possible to affect the outcomes.
By ensuring feature gates, solid enable → disable → enable testing, and attention to upgrade and rollout considerations, OSS Production Readiness Reviews enable rapid mitigation of failures in new features.

As part of this group, we want to thank Googlers John Belamaric and Wojciech Tyczyński for doing this remarkable, heavy lifting on non shiny, and often invisible work. Additionally, we’d like to congratulate Googler Joe Betz who recently graduated as a new PRR reviewer, after shadowing during all 2023 the process.

By Jordan Liggitt, Jago Macleod, Sergey Kanzhelev, and Federico Bongiovanni – Google Kubernetes Kernel team

Source: Google Open Source Blog

Gateway API Graduates to Beta

For many years, Kubernetes users have wanted more advanced routing capabilities to be configurable in a Kubernetes API. With Google leadership, Gateway API has been developed to dramatically increase the number of features available. This API enables many new features in Kubernetes, including traffic splitting, header modification, and forwarding traffic to backends in different namespaces, just to name a few.

Since the project was originally proposed, Googlers have helped lead the open source efforts. Two of the top contributors to the project are from Google, while more than 10 engineers from Google have contributed to the API.

This week, the API has graduated from alpha to beta. This marks a significant milestone in the API and reflects new-found stability. There are now over a dozen implementations of the API and many are passing a comprehensive set of conformance tests. This ensures that users will have a consistent experience when using this API, regardless of environment or underlying implementation.

A Simple Example

To highlight some of the new features this API enables, it may help to walk through an example. We’ll start with a Gateway:

apiVersion: gateway.networking.k8s.io/v1beta1

kind: Gateway

metadata:

spec:

gatewayClassName: gke-l7-gxlb

listeners:

- name: http

protocol: HTTP

port: 80

This Gateway uses the gke-l7-gxlb Gateway Class, which means a new external load balancer will be provisioned to serve this Gateway. Of course, we still need to tell the load balancer where to send traffic. We’ll use an HTTPRoute for this:

apiVersion: gateway.networking.k8s.io/v1beta1

kind: HTTPRoute

metadata:

spec:

parentRefs:

- name: store-xlb

rules:

- matches:

- path:

type: PathPrefix

value: /

backendRefs:

- name: store-svc

port: 3080

weight: 9

- name: store-canary-svc

port: 3080

weight: 1

This simple HTTPRoute tells the load balancer to route traffic to one of the “store-svc” or “store-canary-svc” on port 3080. We’re using weights to do some basic traffic splitting here. That means that approximately 10% of requests will be routed to our canary Service.

Now, imagine that you want to provide a way for users to opt in or out of the canary service. To do that, we’ll add an additional HTTPRoute with some header matching configuration:

apiVersion: gateway.networking.k8s.io/v1beta1

kind: HTTPRoute

metadata:

spec:

parentRefs:

- name: store-xlb

rules:

- matches:

- header:

value: stable

backendRefs:

- name: store-svc

port: 3080

- matches:

- header:

value: canary

backendRefs:

- name: store-canary-svc

port: 3080

This HTTPRoute works in conjunction with the first route we created. If any requests set the env header to “stable” or “canary” they’ll be routed directly to their preferred backend.

Getting Started

Unlike previous Kubernetes APIs, you don’t need to have the latest version of Kubernetes to use this API. Instead, this API is built with Custom Resource Definitions (CRDs) that can be installed in any Kubernetes cluster, as long as it is version 1.16 or newer (released almost 3 years ago).

To try this API on GKE, refer to the GKE specific installation instructions. Alternatively, if you’d like to use this API with another implementation, refer to the OSS getting started page.

What’s next for Gateway API?

As the core capabilities of Gateway API are stabilizing, new features and concepts are actively being explored. Ideas such as Route Delegation and a new GRPCRoute are deep in the design process. A new service mesh workstream has been established specifically to build consensus among mesh implementations for how this API can be used for service-to-service traffic. As with many open source projects, we’re trying to find the right balance between enabling new use cases and achieving API stability. This API has already accomplished a lot, but we’re most excited about what’s ahead.

By Rob Scott – GKE Networking

Source: Google Open Source Blog

Assess the security of Google Kubernetes Engine (GKE) with InSpec for GCP

We are excited to announce the GKE CIS 1.1.0 Benchmark InSpec profile under an open source software license is now available on GitHub, which allows you to assess Google Kubernetes Engine (GKE) clusters against security controls recommended by CIS. You can validate the security posture of your GKE clusters using Chef InSpec™ by assessing their compliance against the Center for Internet Security (CIS) 1.1.0 benchmark for GKE.

Validating security compliance of GKE

GKE is a popular platform to run containerized applications. Many organizations have selected GKE for its scalability, self-healing, observability and integrations with other services on Google Cloud. Developer agility is one of the most compelling arguments for moving to a microservices architecture on Kubernetes, introducing configuration changes at a faster pace and demanding security checks as part of the development lifecycle.

Validating the security settings of your GKE cluster is a complex challenge and requires an analysis of multiple layers within your Cloud infrastructure:

GKE is a managed service on GCP, with controls to tweak the cluster’s behaviour which have an impact on its security posture. These Cloud resource configurations can be configured and audited via Infrastructure-as-Code (IaC) frameworks such as Terraform, the gcloud command line or the Google Cloud Console.

Application workloads are deployed on GKE by interacting via the Kubernetes (K8S) API. Kubernetes resources such as pods, deployments and services are often deployed from yaml templates using the command line tool kubectl.

Kubernetes uses configuration files (such as the kube-proxy and kubelet file) typically in yaml format which are stored on the nodes’ file system.

InSpec for auditing GKE

InSpec is a popular DevSecOps framework that checks the configuration state of resources in virtual machines and containers, on Cloud providers such as Google Cloud, AWS, and Microsoft Azure. The InSpec GCP resource pack 1.8 (InSpec-GCP) provides a consistent way to audit GCP resources and can be used to validate the attributes of a GKE cluster against a desired state declared in code. We previously released a blog post on how to validate your Google Cloud resources with InSpec-GCP against compliance profiles such as the CIS 1.1.0 benchmark for GCP.

While you can use the InSpec-GCP resource pack to define the InSpec controls to validate resources against the Google Cloud API, it does not directly allow you to validate configurations of other relevant layers such as Kubernetes resources and config files on the nodes. Luckily, the challenge to audit Kubernetes resources with InSpec has already been solved by the inspec-k8s resource pack. Further, files on nodes can be audited using remote access via SSH. All together, we can validate the security posture of GKE holistically using the inspec-gcp and inspec-k8s resource packs as well as controls using the InSpec file resource executed in an ssh session.

Running the CIS for GKE compliance profile with InSpec

With the GKE CIS 1.1.0 Benchmark InSpec Profile we have implemented the security controls to validate a GKE cluster against the recommended settings on GCP resource level, Kubernetes API level and file system level. The repository is split into three profiles (inspec-gke-cis-gcp, inspec-gke-cis-k8s and inspec-gke-cis-ssh), since each profile requires a different “target”, or -t parameter when run using the InSpec command line. For ease of use, a wrapper script run_profiles.sh has been provided in the root directory of the repository with the purpose of running all three profiles and storing the reports in the dedicated folder reports.
The script requires the cluster name (-c), ssh username (-u), private key file for ssh authentication (-k), cluster region or zone (-r or -z) and InSpec input file as required by the inspec.yml files in each profile (-i). As an example, the following line will run all three profiles to validate the compliance of cluster inspec-cluster in zone us-central1-a:

./run_profiles.sh -c inspec-cluster \
                          -u konrad \
                           -k /home/konrad/.ssh/google_compute_engine \
                           -z us-central1-a \
                          -i inputs.yml
Running InSpec profile inspec-gke-cis-gcp ...

Profile: InSpec GKE CIS 1.1 Benchmark (inspec-gke-cis-gcp)
Version: 0.1.0
Target: gcp://<service account used for InSpec>

<lots of InSpec output omitted>

Profile Summary: 16 successful controls, 10 control failures, 2 controls skipped
Test Summary: 18 successful, 11 failures, 2 skipped
Stored report in reports/inspec-gke-cis-gcp_report.
Running InSpec profile inspec-gke-cis-k8s …

Profile: InSpec GKE CIS 1.1 Benchmark (inspec-gke-cis-k8s)
Version: 0.1.0
Target: kubernetes://<IP address of K8S endpoint>:443

<lots of InSpec output omitted>

Profile Summary: 9 successful controls, 1 control failure, 0 controls skipped
Test Summary: 9 successful, 1 failure, 0 skipped
Stored report in reports/inspec-gke-cis-gcp_report.
Running InSpec profile inspec-gke-cis-ssh on node <cluster node 1> ...

Profile: InSpec GKE CIS 1.1 Benchmark (inspec-gke-cis-ssh)
Version: 0.1.0
Target: ssh://<username>@<cluster node 1>:22

<lots of InSpec output omitted>

Profile Summary: 10 successful controls, 5 control failures, 1 control skipped
Test Summary: 12 successful, 6 failures, 1 skipped
Stored report in reports/inspec-gke-cis-ssh_<cluster node 1>_report.

Analyze your scan reports

Once the wrapper script has completed successfully you should analyze the JSON or HTML reports to validate the compliance of your GKE cluster. One way to perform the analysis is to upload the collection of JSON reports of a single run from the reports folder to the open source InSpec visualization tool Heimdall Lite (GitHub) by the Mitre Corporation. An example of a compliance dashboard is shown below:

Try it yourself and run the GKE CIS 1.1.0 Benchmark InSpec profile in your Google Cloud environment! Clone the repository and follow the CLI example in the Readme file to run the InSpec profiles against your GKE clusters. We also encourage you to report any issues on GitHub that you may find, suggest additional features and to contribute to the project using pull requests. Also, you can read our previous blog post on using InSpec-GCP for compliance validations of your GCP environment.

By Bakh Inamov, Security Specialist Engineer and Konrad Schieban, Infrastructure Cloud Consultant