Privileged pod escalations in Kubernetes and GKE

At the KubeCon EU 2022 conference in Valencia, security researchers from Palo Alto Networks presented research findings on “trampoline pods”—pods with an elevated set of privileges required to do their job, but that could conceivably be used as a jumping off point to gain escalated privileges.

The research mentions GKE, including how developers should look at the privileged pod problem today, what the GKE team is doing to minimize the use of privileged pods, and actions GKE users can take to protect their clusters.

Privileged pods within the context of GKE security

While privileged pods can pose a security issue, it’s important to look at them within the overall context of GKE security. To use a privileged pod as a “trampoline” in GKE, there is a major prerequisite – the attacker has to first execute a successful application compromise and container breakout attack.

Because the use of privileged pods in an attack requires a first step such as a container breakout to be effective, let’s look at two areas:
  1. features of GKE you can use to reduce the likelihood of a container breakout
  2. steps the GKE team is taking to minimize the use of privileged pods and the privileges needed in them.
Reducing container breakouts

There are a number of features in GKE along with some best practices that you can use to reduce the likelihood of a container breakout:

More information can be found in the GKE Hardening Guide.

How GKE is reducing the use of privileged pods.

While it’s not uncommon for customers to install privileged pods into their clusters, GKE works to minimize the privilege levels held by our system components, especially those that are enabled by default. However, there are limits as to how many privileges can be removed from certain features. For example, Anthos Config Management requires permissions to modify most Kubernetes objects to be able to create and manage those objects.

Some other privileges are baked into the system, such as those held by Kubelet. Previously, we worked with the Kubernetes community to build the Node Restriction and Node Authorizer features to limit Kubelet's access to highly sensitive objects, such as secrets, adding protection against an attacker with access to the Kubelet credentials.

More recently, we have taken steps to reduce the number of privileged pods across GKE and have added additional documentation on privileges used in system pods as well as information on how to improve pod isolation. Below are the steps we’ve taken:
  1. We have added an admission controller to GKE Autopilot and GKE Standard (on by default) and GKE/Anthos (opt-in) that stops attempts to run as a more privileged service account, which blocks a method of escalating privileges using privileged pods.
  2. We created a permission scanning tool that identifies pods that have privileges that could be used for escalation, and we used that tool to perform an audit across GKE and Anthos.
  3. The permission scanning tool is now integrated into our standard code review and testing processes to reduce the risk of introducing privileged pods into the system. As mentioned earlier, some features require privileges to perform their function.
  4. We are using the audit results to reduce permissions available to pods. For example, we removed “update nodes and pods” permissions from anetd in GKE.
  5. Where privileged pods are required for the operation of a feature, we’ve added additional documentation to illustrate that fact.
  6. We added documentation that outlines how to isolate GKE-managed workloads in dedicated node pools when you’re unable to use GKE Sandbox to reduce the risk of privilege escalation attacks.
In addition to the measures above, we recommend users take advantage of tools that can scan RBAC settings to detect overprivileged pods used in their applications. As part of their presentation, the Palo Alto researchers announced an open source tool, called rbac-police, that can be used for the task. So, while it only takes a single overprivileged workload to trampoline to the cluster, there are a number of actions you can take to minimize the likelihood of the prerequisite container breakout and the number of privileges used by a pod.