Tag Archives: containers

Optimizing gVisor filesystems with Directfs

gVisor is a sandboxing technology that provides a secure environment for running untrusted code. In our previous blog post, we discussed how gVisor performance improves with a root filesystem overlay. In this post, we'll dive into another filesystem optimization that was recently launched: directfs. It gives gVisor’s application kernel (the Sentry) secure direct access to the container filesystem, avoiding expensive round trips to the filesystem gofer.

Origins of the Gofer

gVisor is used internally at Google to run a variety of services and workloads. One of the challenges we faced while building gVisor was providing remote filesystem access securely to the sandbox. gVisor’s strict security model and defense in depth approach assumes that the sandbox may get compromised because it shares the same execution context as the untrusted application. Hence the sandbox cannot be given sensitive keys and credentials to access Google-internal remote filesystems.

To address this challenge, we added a trusted filesystem proxy called a "gofer". The gofer runs outside the sandbox, and provides a secure interface for untrusted containers to access such remote filesystems. For architectural simplicity, gofers were also used to serve local filesystems as well as remote.

Gofer process intermediates filesystem operations

Isolating the Container Filesystem in runsc

When gVisor was open sourced as runsc, the same gofer model was copied over to maintain the same security guarantees. runsc was configured to start one gofer process per container which serves the container filesystem to the sandbox over a predetermined protocol (now LISAFS). However, a gofer adds a layer of indirection with significant overhead.

This gofer model (built for remote filesystems) brings very few advantages for the runsc use-case, where all the filesystems served by the gofer (like rootfs and bind mounts) are mounted locally on the host. The gofer directly accesses them using filesystem syscalls.

Linux provides some security primitives to effectively isolate local filesystems. These include, mount namespaces, pivot_root and detached bind mounts1. Directfs is a new filesystem access mode that uses these primitives to expose the container filesystem to the sandbox in a secure manner. The sandbox’s view of the filesystem tree is limited to just the container filesystem. The sandbox process is not given access to anything mounted on the broader host filesystem. Even if the sandbox gets compromised, these mechanisms provide additional barriers to prevent broader system compromise.

Directfs

In directfs mode, the gofer still exists as a cooperative process outside the sandbox. As usual, the gofer enters a new mount namespace, sets up appropriate bind mounts to create the container filesystem in a new directory and then pivot_root(2)s into that directory. Similarly, the sandbox process enters new user and mount namespaces and then pivot_root(2)s into an empty directory to ensure it cannot access anything via path traversal. But instead of making RPCs to the gofer to access the container filesystem, the sandbox requests the gofer to provide file descriptors to all the mount points via SCM_RIGHTS messages. The sandbox then directly makes file-descriptor-relative syscalls (e.g. fstatat(2), openat(2), mkdirat(2), etc) to perform filesystem operations.

Sandbox directly accesses container filesystem with directfs

Earlier when the gofer performed all filesystem operations, we could deny all these syscalls in the sandbox process using seccomp. But with directfs enabled, the sandbox process's seccomp filters need to allow the usage of these syscalls. Most notably, the sandbox can now make openat(2) syscalls (which allow path traversal), but with certain restrictions: O_NOFOLLOW is required, no access to procfs and no directory FDs from the host. We also had to give the sandbox the same privileges as the gofer (for example CAP_DAC_OVERRIDE and CAP_DAC_READ_SEARCH), so it can perform the same filesystem operations.

It is noteworthy that only the trusted gofer provides FDs (of the container filesystem) to the sandbox. The sandbox cannot walk backwards (using ‘..’) or follow a malicious symlink to escape out of the container filesystem. In effect, we've decreased our dependence on the syscall filters to catch bad behavior, but correspondingly increased our dependence on Linux's filesystem isolation protections.

Performance

Making RPCs to the gofer for every filesystem operation adds a lot of overhead to runsc. Hence, avoiding gofer round trips significantly improves performance. Let's find out what this means for some of our benchmarks. We will run the benchmarks using our newly released systrap platform on bind mounts (as opposed to rootfs). This would simulate more realistic use cases because bind mounts are extensively used while configuring filesystems in containers. Bind mounts also do not have an overlay (like the rootfs mount), so all operations go through goferfs / directfs mount.

Let's first look at our stat micro-benchmark, which repeatedly calls stat(2) on a file.

Stat benchmark improvement with directfs
The stat(2) syscall is more than 2x faster! However, since this is not representative of real-world applications, we should not extrapolate these results. So let's look at some real-world benchmarks.
Stat benchmark improvement with directfs
We see a 12% reduction in the absolute time to run these workloads and 17% reduction in Ruby load time!

Conclusion

The gofer model in runsc was overly restrictive for accessing host files. We were able to leverage existing filesystem isolation mechanisms in Linux to bypass the gofer without compromising security. Directfs significantly improves performance for certain workloads. This is part of our ongoing efforts to improve gVisor performance. You can learn more about gVisor at gvisor.dev. You can also use gVisor in GKE with GKE Sandbox. Happy sandboxing!


1Detached bind mounts can be created by first creating a bind mount using mount(MS_BIND) and then detaching it from the filesystem tree using umount(MNT_DETACH).


By Ayush Ranjan, Software Engineer – Google

gVisor improves performance with root filesystem overlay

Overview

Container technology is an integral part of modern application ecosystems, making container security an increasingly important topic. Since containers are often used to run untrusted, potentially malicious code it is imperative to secure the host machine from the container.

A container's security depends on its security boundaries, such as user namespaces (which isolate security-related identifiers and attributes), seccomp rules (which restrict the syscalls available), and Linux Security Module configuration. Popular container management products like Docker and Kubernetes relax these and other security boundaries to increase usability, which means that users need additional container security tools to provide a much stronger isolation boundary between the container and the host.

The gVisor open source project, developed by Google, provides an OCI compatible container runtime called runsc. It is used in production at Google to run untrusted workloads securely. Runsc (run sandbox container) is compatible with Docker and Kubernetes and runs containers in a gVisor sandbox. gVisor sandbox has an application kernel, written in Golang, that implements a substantial portion of the Linux system call interface. All application syscalls are intercepted by the sandbox and handled in the user space kernel.

Although gVisor does not introduce large fixed overheads, sandboxing does add some performance overhead to certain workloads. gVisor has made several improvements recently that help containerized applications run faster inside the sandbox, including an improvement to the container root filesystem, which we will dive deeper into.

Costly Filesystem Access in gVisor

gVisor uses a trusted filesystem proxy process (“gofer”) to access the filesystem on behalf of the sandbox. The sandbox process is considered untrusted in gVisor’s security model. As a result, it is not given direct access to the container filesystem and its seccomp filters do not allow filesystem syscalls.

In gVisor, the container rootfs and bind mounts are configured to be served by a gofer.

Gofer mounts configuration in gVisor

When the container needs to perform a filesystem operation, it makes an RPC to the gofer which makes host system calls and services the RPC. This is quite expensive due to:

  1. RPC cost: This is the cost of communicating with the gofer process, including process scheduling, message serialization and IPC system calls.
    • To ameliorate this, gVisor recently developed a purpose-built protocol called LISAFS which is much more efficient than its predecessor.
    • gVisor is also experimenting with giving the sandbox direct access to the container filesystem in a secure manner. This would essentially nullify RPC costs as it avoids the gofer being in the critical path of filesystem operations.
  2. Syscall cost: This is the cost of making the host syscall which actually accesses/modifies the container filesystem. Syscalls are expensive, because they perform context switches into the kernel and back into userspace.
    • To help with this, gVisor heavily caches the filesystem tree in memory. So operations like stat(2) on cached files are serviced quickly. But other operations like mkdir(2) or rename(2) still need to make host syscalls.

Container Root Filesystem

In Docker and Kubernetes, the container’s root filesystem (rootfs) is based on the filesystem packaged with the image. The image’s filesystem is immutable. Any change a container makes to the rootfs is stored separately and is destroyed with the container. This way, the image’s filesystem can be shared efficiently with all containers running the same image. This is different from bind mounts, which allow containers to access the bound host filesystem tree. Changes to bind mounts are always propagated to the host and persist after the container exits.

Docker and Kubernetes both use the overlay filesystem by default to configure container rootfs. Overlayfs mounts are composed of one upper layer and multiple lower layers. The overlay filesystem presents a merged view of all these filesystem layers at its mount location and ensures that lower layers are read-only while all changes are held in the upper layer. The lower layer(s) constitute the “image layer” and the upper layer is the “container layer”. When the container is destroyed, the upper layer mount is destroyed as well, discarding the root filesystem changes the container may have made. Docker’s overlayfs driver documentation has a good explanation.

Rootfs Configuration Before

Let’s consider an example where the image has files foo and baz. The container overwrites foo and creates a new file bar. The diagram below shows how the root filesystem used to be configured in gVisor earlier. We used to go through the gofer and access/mutate the overlaid directory on the host. It also shows the state of the host overlay filesystem.

Rootfs configuration in gVisor earlier

Opportunity! Sandbox Internal Overlay

Given that the upper layer is destroyed with the container and that it is expensive to access/mutate a host filesystem from the sandbox, why keep the upper layer on the host at all? Instead we can move the upper layer into the sandbox.

The idea is to overlay the rootfs using a sandbox-internal overlay mount. We can use a tmpfs upper (container) layer and a read-only lower layer served by the gofer client. Any changes to rootfs would be held in tmpfs (in-memory). Accessing/mutating the upper layer would not require any gofer RPCs or syscalls to the host. This really speeds up filesystem operations on the upper layer, which contains newly created or copied-up files and directories.

Using the same example as above, the following diagram shows what the rootfs configuration would look like using a sandbox-internal overlay.

Rootfs configuration in gVisor with internal overlay

Host-Backed Overlay

The tmpfs mount by default will use the sandbox process’s memory to back all the file data in the mount. This can cause sandbox memory usage to blow up and exhaust the container’s memory limits, so it’s important to store all file data from tmpfs upper layer on disk. We need to have a tmpfs-backing “filestore” on the host filesystem. Using the example from above, this filestore on the host will store file data for foo and bar.

This would essentially flatten all regular files in tmpfs into one host file. The sandbox can mmap(2) the filestore into its address space. This allows it to access and mutate the filestore very efficiently, without incurring gofer RPCs or syscalls overheads.

Self-Backed Overlay

In Kubernetes, you can set local ephemeral storage limits. The upper layer of the rootfs overlay (writeable container layer) on the host contributes towards this limit. The kubelet enforces this limit by traversing the entire upper layerstat(2)-ing all files and summing up their stat.st_blocks*block_size. If we move the upper layer into the sandbox, then the host upper layer is empty and the kubelet will not be able to enforce these limits.

To address this issue, we introduced “self-backed” overlays, which create the filestore in the host upper layer. This way, when the kubelet scans the host upper layer, the filestore will be detected and its stat.st_blocks should be representative of the total file usage in the sandbox-internal upper layer. It is also important to hide this filestore from the containerized application to avoid confusing it. We do so by creating a whiteout in the sandbox-internal upper layer, which blocks this file from appearing in the merged directory.

The following diagram shows what rootfs configuration would finally look like today in gVisor.

Rootfs configuration in gVisor with self-backed internal overlay

Performance Gains

Let’s look at some filesystem-intensive workloads to see how rootfs overlay impacts performance. These benchmarks were run on a gLinux desktop with KVM platform.

Micro Benchmark

Linux Test Project provides a fsstress binary. This program performs a large number of filesystem operations concurrently, creating and modifying a large filesystem tree of all sorts of files. We ran this program on the container's root filesystem. The exact usage was:

sh -c "mkdir /test && time fsstress -d /test -n 500 -p 20 -s 1680153482 -X -l 10"

You can use the -v flag (verbose mode) to see what filesystem operations are being performed.

The results were astounding! Rootfs overlay reduced the time to run this fsstress program from 262.79 seconds to 3.18 seconds! However, note that such microbenchmarks are not representative of real-world applications and we should not extrapolate these results to real-world performance.

Real-world Benchmark

Build jobs are very filesystem intensive workloads. They read a lot of source files, compile and write out binaries and object files. Let’s consider building the abseil-cpp project with bazel. Bazel performs a lot of filesystem operations in rootfs; in bazel’s cache located at ~/.cache/bazel/.

This is representative of the real-world because many other applications also use the container root filesystem as scratch space due to the handy property that it disappears on container exit. To make this more realistic, the abseil-cpp repo was attached to the container using a bind mount, which does not have an overlay.

When measuring performance, we care about reducing the sandboxing overhead and bringing gVisor performance as close as possible to unsandboxed performance. Sandboxing overhead can be calculated using the formula overhead = (s-n)/n where ‘s’ is the amount of time taken to run a workload inside gVisor sandbox and ‘n’ is the time taken to run the same workload natively (unsandboxed). The following graph shows that rootfs overlay halved the sandboxing overhead for abseil build!

The impact of rootfs overlay on sandboxing overhead for abseil build

Conclusion

Rootfs overlay in gVisor substantially improves performance for many filesystem-intensive workloads, so that developers no longer have to make large tradeoffs between performance and security. We recently made this optimization the default in runsc. This is part of our ongoing efforts to improve gVisor performance. You can learn more about gVisor at gvisor.dev. You can also use gVisor in GKE with GKE Sandbox. Happy sandboxing!

Explore the new Learn Kubernetes with Google website!

As Kubernetes has become a mainstream global technology, with 96% of organizations surveyed by the CNCF1 using or evaluating Kubernetes for production use, it is now estimated that 31%2 of backend developers worldwide are Kubernetes developers. To add to the growing popularity, the 2021 annual report1 also listed close to 60 enhancements by special interest and working groups to the Kubernetes project. With so much information in the ecosystem, how can Kubernetes developers stay on top of the latest developments and learn what to prioritize to best support their infrastructure?

The new website Learn Kubernetes with Google brings together under one roof the guidance of Kubernetes experts—both from Google and across the industry—to communicate the latest trends in building your Kubernetes infrastructure. You can access knowledge in two formats.

One option is to participate in scheduled live events, which consist of virtual panels that allow you to ask questions to experts via a Q&A forum. Virtual panels last for an hour, and happen once quarterly. So far, we’ve hosted panels on building a multi-cluster infrastructure, the Dockershim deprecation, bringing High Performance Computing (HPC) to Kuberntes, and securing your services with Istio on Kubernetes. The other option is to pick one of the multiple on-demand series available. Series are made up of several 5-10 minute episodes and you can go through them at your own leisure. They cover different topics, including the Kubernetes Gateway API, the MCS API, Batch workloads, and Getting started with Kubernetes. You can use the search bar on the top right side of the website to look up specific topics.
ALT TEXT
As the cloud native ecosystem becomes increasingly complex, this website will continue to offer evergreen content for Kubernetes developers and users. We recently launched a new content category for ecosystem projects, which started by covering how to run Istio on Kubernetes. Soon, we will also launch a content category for developer tools, starting with Minikube.

Join hundreds of developers that are already part of the Learn Kubernetes with Google community! Bookmark the website, sign up for an event today, and be sure to check back regularly for new content.

By María Cruz, Program Manager – Google Open Source Programs Office

ko applies to become a CNCF sandbox project

Back in 2018, the team at Google working on Knative needed a faster way to iterate on Kubernetes controllers. They created a new tool dedicated to deploying Go applications to Kubernetes without having to worry about container images. That tool has proven to be indispensable to the Knative community, so in March 2019, Google released it as a stand-alone open source project named ko.

Since then, ko has gained in popularity as a simple, fast, and secure container image builder for Go applications. More recently, the ko community has added, amongst many other features, multi-platform support and automatic SBOM generation. Today, like the original team at Google, many open source and enterprise development teams depend on ko to improve their developer productivity. The ko project is also increasingly used as a solution for a number of build use-cases, and is being integrated into a variety of third party CI/CD tools.

At Google, we believe that using open source comes with a responsibility to contribute, sustain, and improve the projects that make our ecosystem better. To support the next phase of community-driven innovation, enable net-new adoption patterns, and to further raise the bar in the container tool industry, we are excited to announce today that we have submitted ko as a sandbox project to the Cloud Native Computing Foundation (CNCF).

This step begins the process of transferring the ko trademark, IP, and code to CNCF. We are excited to see how the broader open source community will continue innovating with ko.

By Mark Chmarny – Google Open Source Programs Office

Modernizing Oracle operations with Kubernetes and El Carro

Google Cloud is releasing El Carro, an open source tool to help you transform and modernize your Oracle database operations. El Carro implements the Kubernetes operator pattern to deliver automation for provisioning and ongoing operations like backups, patching, and high availability for databases running in hybrid and multi-cloud environments. And it does so using the same declarative syntax that DevOps teams are using to manage applications. With El Carro, users can choose to modernize and transform their database operations in place and benefit from a consistent management experience and hybrid and multi-cloud portability. Released under the Apache License 2.0, you are free to use El Carro in any Kubernetes environment—you are in control.

Containers and Kubernetes deliver portability on standardized infrastructure, and today Oracle supports databases running in containers; they’ve also released container build files and images and helm charts to simplify provisioning. What is missing for the next level of integration is support for lifecycle operations and an extension of the Kubernetes API to the primitives needed for database management.

In addition, fully managed or autonomous services for Oracle may not make available all the required features, such as Active Data Guard, Multitenant, and In-Memory, parameters/flags, versions, and patch levels. DBAs also find themselves locked out of many roles, including sysadmin and root. These restrictions make many cloud architects fall back to lift and shift Oracle databases onto infrastructure as a service offerings and miss out on opportunities to modernize and transform database operations. And with transactional databases growing in number and criticality, organizations are struggling to deliver innovation and modernization. Engineers are already busy keeping up with sprawl and mundane operational tasks while adhering to strict change management processes.

How do we solve this database operations gap?

El Carro solves this. It is built with scalability in mind, using the same container orchestration infrastructure, Kubernetes, that powers many businesses and is a top choice for modern architectures. Its open API allows you to manage your database configurations as declarative code, enabling CI/CD or Gitops workflows for auditability and control mechanisms. El Carro automates many database lifecycle operations, like backups, replication, and patching. And, when it distributes databases on the nodes of a cluster, it is aware of the priority and resource requirements of each database to optimize tight packing while respecting quality of service. Lastly, it helps DBAs by delivering automation without restrictions and leaving DBAs in full control over their systems. You can choose to let the operator drive for you, but you can also take over the steering wheel yourself at any time.

Because Kubernetes is now the standard for portable infrastructure automation and orchestration, engineers appreciate how Kubernetes abstracts complex problems into manageable infrastructure as code. Kubernetes can scale from small projects to large projects that support the infrastructure that powers Google products and services for billions of users around the world. Moreover, Google pioneers the next generation of infrastructure as code that we refer to as Configuration as Data to declaratively establish a contract between developer intent and the runtime operation. According to the Cloud Native Survey 2020, two-thirds of respondents were either already running stateful workloads in production or were considering doing so within the next 12 months. We expect that datastores are going to drive the next wave of enterprise Kubernetes adoption.

A number of open source operators for databases, such as PostgreSQL, MySQL, and many others, have been released, are actively maintained by the community, and are popular among developers and architects looking for a hands-off approach to manage databases with their applications. El Carro extends the list of database operators to include Oracle.

What are we building with El Carro for Oracle databases?

The operator pattern emerged in late 2016 as an extension of the Kubernetes API and control loop aimed at automating more complicated and application-specific tasks that are beyond the native Kubernetes objects.

El Carro implements a custom resource definition (CRD), which is tailored to database management. Users set and change attributes of the custom resource using the Kubernetes API the same way they do for built-in objects such as pods, deployments, or services. The El Carro controller observes changes to the CRs and compares the declared state with the current reality in the cluster, then makes the necessary changes. Those changes could either affect the Kubernetes resources used by the database such as persistent volumes or the pod itself, or may result in issuing calls via SQL or command line tools to the database to create and modify users or other database objects.

Here’s a look at how this works:
El Carro Architecture
El Carro Architecture

The diagram above shows how the major components of a database managed by the El Carro Operator interact with each other. The controller monitors the CRD for any changes made by admins. It creates and manages the cluster resources that make up the actual database deployment: persistent volumes for filesystems and data, a pod to run containers with the actual database, and a daemon that allows the controller to securely run SQL commands on the database. And lastly, a service makes sqlnet connections available to applications and end users that can either run in the same Kubernetes cluster or outside of it.

At release time, the El Carro Operator can provision Oracle databases of 12c Enterprise Edition and 18c Express Edition. It manages instance parameters, pluggable databases, and users. You can take and restore backups either using rman or storage snapshots, and we are working to add additional features.

How to get involved with El Carro?

In the development process, we collaborated with users and partners in the Oracle community to help us validate the approach. "Pythian has helped Oracle users to automate and optimize the operations of their mission-critical systems for over 20 years,” says Simon Pane, principal consultant at Pythian. “We are excited about the possibilities that El Carro brings to users on their cloud modernization journeys. We are proud to work with the community on a vision for the future of database management.".

Sean Scott covers Docker for databases on his blog oraclesean.com, and says: "There are many benefits to running Oracle databases in containers. Adding Kubernetes orchestration introduces new opportunities to bring the DevOps and Oracle communities together."

You can try out El Carro today. Follow the quick start guide and try out provisioning of instances, databases, users. Import data via Data Pump, manage instance parameters, choose between different methods for backups, and try out a restore. Have a look at how we integrate with external logging and monitoring solutions. Reach out via our Google group and leave feedback for what features you would like to see next, or even create your own patch and pull request on GitHub.

By Bjoern Rost - Product Manager and Boris Dali - Team Lead, Engineering

Container Structure Tests: Unit Tests for Docker Images

Usage of containers in software applications is on the rise, and with their increasing usage in production comes a need for robust testing and validation. Containers provide great testing environments, but actually validating the structure of the containers themselves can be tricky. The Docker toolchain provides us with easy ways to interact with the container images themselves, but no real way of verifying their contents. What if we want to ensure a set of commands runs successfully inside of our container, or check that certain files are in the correct place with the correct contents, before shipping?

The Container Tools team at Google is happy to announce the release of the Container Structure Test framework. This framework provides a convenient and powerful way to verify the contents and structure of your containers. We’ve been using this framework at Google to test all of our team’s released containers for over a year now, and we’re excited to finally share it with the public.

The framework supports four types of tests:
  • Command Tests - to run a command inside your container image and verify the output or error it produces
  • File Existence Tests - to check the existence of a file in a specific location in the image’s filesystem
  • File Content Tests - to check the contents and metadata of a file in the filesystem
  • A unique Metadata Test - to verify configuration and metadata of the container itself
Users can specify test configurations through YAML or JSON. This provides a way to abstract away the test configuration from the implementation of the tests, eliminating the need for hacky bash scripting or other solutions to test container images.

Command Tests

The Command Tests give the user a way to specify a set of commands to run inside of a container, and verify that the output, error, and exit code were as expected. An example configuration looks like this:
globalEnvVars:
- key: "VIRTUAL_ENV"
value: "/env"
- key: "PATH"
value: "/env/bin:$PATH"

commandTests:

# check that the python binary is in the correct location
- name: "python installation"
command: "which"
args: ["python"]
expectedOutput: ["/usr/bin/python\n"]

# setup a virtualenv, and verify the correct python binary is run
- name: "python in virtualenv"
setup: [["virtualenv", "/env"]]
command: "which"
args: ["python"]
expectedOutput: ["/env/bin/python\n"]

# setup a virtualenv, install gunicorn, and verify the installation
- name: "gunicorn flask"
setup: [["virtualenv", "/env"],
["pip", "install", "gunicorn", "flask"]]
command: "which"
args: ["gunicorn"]
expectedOutput: ["/env/bin/gunicorn"]
Regexes are used to match the expected output, and error, of each command (or excluded output/error if you want to make sure something didn’t happen). Additionally, setup and teardown commands can be run with each individual test, and environment variables can be specified to be set for each individual test, or globally for the entire test run (shown in the example).

File Tests

File Tests allow users to verify the contents of an image’s filesystem. We can check for existence of files, as well as examine the contents of individual files or directories. This can be particularly useful for ensuring that scripts, config files, or other runtime artifacts are in the correct places before shipping and running a container.
fileExistenceTests:

# check that the apt-packages text file exists and has the correct permissions
- name: 'apt-packages'
path: '/resources/apt-packages.txt'
shouldExist: true
permissions: '-rw-rw-r--'
Expected permissions and file mode can be specified for each file path in the form of a standard Unix permission string. As with the Command Tests’ “Excluded Output/Error,” a boolean can be provided to these tests to tell the framework to be sure a file is not present in a filesystem.

Additionally, the File Content Tests verify the contents of files and directories in the filesystem. This can be useful for checking package or repository versions, or config file contents among other things. Following the pattern of the previous tests, regexes are used to specify the expected or excluded contents.
fileContentTests:

# check that the default apt repository is set correctly
- name: 'apt sources'
path: '/etc/apt/sources.list'
expectedContents: ['.*httpredir\.debian\.org/debian jessie main.*']

# check that the retry policy is correctly specified
- name: 'retry policy'
path: '/etc/apt/apt.conf.d/apt-retry'
expectedContents: ['Acquire::Retries 3;']

Metadata Test

Unlike the previous tests which all allow any number to be specified, the Metadata test is a singleton test which verifies a container’s configuration. This is useful for making sure things specified in the Dockerfile (e.g. entrypoint, exposed ports, mounted volumes, etc.) are manifested correctly in a built container.
metadataTest:
env:
- key: "VIRTUAL_ENV"
value: "/env"
exposedPorts: ["8080", "2345"]
volumes: ["/test"]
entrypoint: []
cmd: ["/bin/bash"]
workdir: ["/app"]

Tiny Images

One interesting case that we’ve put focus on supporting is “tiny images.” We think keeping container sizes small is important, and sometimes the bare minimum in a container image might even exclude a shell. Users might be used to running something like:
`docker run -d "cat /etc/apt/sources.list && grep -rn 'httpredir.debian.org' image"`
… but this breaks without a working shell in a container. With the structure test framework, we convert images to in-memory filesystem representations, so no shell is needed to examine the contents of an image!

Dockerless Test Runs

At their core, Docker images are just bundles of tarballs. One of the major use cases for these tests is running in CI systems, and often we can't guarantee that we'll have access to a working Docker daemon in these environments. To address this, we created a tar-based test driver, which can handle the execution of all file-related tests through simple tar manipulation. Command tests are currently not supported in this mode, since running commands in a container requires a container runtime.

This means that using the tar driver, we can retrieve images from a remote registry, convert them into filesystems on disk, and verify file contents and metadata all without a working Docker daemon on the host! Our container-diff library is leveraged here to do all the image processing; see our previous blog post for more information.
structure-test -test.v -driver tar -image gcr.io/google-appengine/python:latest structure-test-examples/python/python_file_tests.yaml

Running in Bazel

Structure tests can also be run natively through Bazel, using the “container_test” rule. Bazel provides convenient build rules for building Docker images, so the structure tests can be run as part of a build to ensure any new built images are up to snuff before being released. Check out this example repo for a quick demonstration of how to incorporate these tests into a Bazel build.

We think this framework can be useful for anyone building and deploying their own containers in the wild, and hope that it can promote their usage everywhere through increasing the robustness of containers. For more detailed information on the test specifications, check out the documentation in our GitHub repository.

By Nick Kubala, Container Tools team

Introducing container-diff, a tool for quickly comparing container images

Originally posted by Nick Kubala, Colette Torres, and Abby Tisdale from the Container Tools team, on the Google Open Source Blog

The Google Container Tools team originally built container-diff, a new project to help uncover differences between container images, to aid our own development with containers. We think it can be useful for anyone building containerized software, so we're excited to release it as open source to the development community.

Containers and the Dockerfile format help make customization of an application's runtime environment more approachable and easier to understand. While this is a great advantage of using containers in software development, a major drawback is that it can be hard to visualize what changes in a container image will result from a change in the respective Dockerfile. This can lead to bloated images and make tracking down issues difficult.

Imagine a scenario where a developer is working on an application, built on a runtime image maintained by a third-party. During development someone releases a new version of that base image with updated system packages. The developer rebuilds their application and picks up the latest version of the base image, and suddenly their application stops working; it depended on a previous version of one of the installed system packages, but which one? What version was it on before? With no currently existing tool to easily determine what changed between the two base image versions, this totally stalls development until the developer can track down the package version incompatibility.

Introducing container-diff

container-diff helps users investigate image changes by computing semantic diffs between images. What this means is that container-diff figures out on a low-level what data changed, and then combines this with an understanding of package manager information to output this information in a format that's actually readable to users. The tool can find differences in system packages, language-level packages, and files in a container image.

Users can specify images in several formats - from local Docker daemon (using the prefix `daemon://` on the image path), a remote registry (using the prefix `remote://`), or a file in the .tar in the format exported by "docker save" command. You can also combine these formats to compute the diff between a local version of an image and a remote version. This can be useful when experimenting with new builds of an image that you might not be quite ready to push yet. container-diff supports image tarballs and the registry protocol natively, enabling it to run in environments without a Docker daemon.

Examples and Use Cases

Here is a basic Dockerfile that installs Python inside our Debian base image. Running container-diff on the base image and the new one with Python, users can see all the apt packages that were installed as dependencies of Python.

➜  debian_with_python cat Dockerfile
FROM gcr.io/google-appengine/debian8
RUN apt-get update && apt-get install -qq --force-yes python
➜ debian_with_python docker build -q -t debian_with_python .
sha256:be2cd1ae6695635c7041be252589b73d1539a858c33b2814a66fe8fa4b048655
➜ debian_with_python container-diff diff gcr.io/google-appengine/debian8:latest daemon://debian_with_python:latest

-----Apt-----
Packages found only in gcr.io/google-appengine/debian8:latest: None

Packages found only in debian_with_python:latest:
NAME VERSION SIZE
-file 1:5.22 15-2+deb8u3 76K
-libexpat1 2.1.0-6 deb8u4 386K
-libffi6 3.1-2 deb8u1 43K
-libmagic1 1:5.22 15-2+deb8u3 3.1M
-libpython-stdlib 2.7.9-1 54K
-libpython2.7-minimal 2.7.9-2 deb8u1 2.6M
-libpython2.7-stdlib 2.7.9-2 deb8u1 8.2M
-libsqlite3-0 3.8.7.1-1 deb8u2 877K
-mime-support 3.58 146K
-python 2.7.9-1 680K
-python-minimal 2.7.9-1 163K
-python2.7 2.7.9-2 deb8u1 360K
-python2.7-minimal 2.7.9-2 deb8u1 3.7M

Version differences: None

And below is a Dockerfile that inherits from our Python base runtime image, and then installs the mock and six packages inside of it. Running container-diff with the pip differ, users can see all the Python packages that have either been installed or changed as a result of this:

➜  python_upgrade cat Dockerfile
FROM gcr.io/google-appengine/python
RUN pip install -U six
➜ python_upgrade docker build -q -t python_upgrade .
sha256:7631573c1bf43727d7505709493151d3df8f98c843542ed7b299f159aec6f91f
➜ python_upgrade container-diff diff gcr.io/google-appengine/python:latest daemon://python_upgrade:latest --types=pip

-----Pip-----

Packages found only in gcr.io/google-appengine/python:latest: None

Packages found only in python_upgrade:latest:
NAME VERSION SIZE
-funcsigs 1.0.2 51.4K
-mock 2.0.0 531.2K
-pbr 3.1.1 471.1K

Version differences:
PACKAGE IMAGE1 (gcr.io/google-appengine/python:latest) IMAGE2 (python_upgrade:latest)
-six 1.8.0, 26.7K

This can be especially useful when it's unclear which packages might have been installed or changed incidentally as a result of dependency management of Python modules.

These are just a few examples. The tool currently has support for Python and Node.js packages installed via pip and npm, respectively, as well as comparison of image filesystems and Docker history. In the future, we'd like to see support added for additional runtime and language differs, including Java, Go, and Ruby. External contributions are welcome! For more information on contributing to container-diff, see this how-to guide.

Now that we've seen container-diff compare two images in action, it's easy to imagine how the tool may be integrated into larger workflows to aid in development:

  • Changelog generation: Given container-diff's capacity to facilitate investigation of filesystem and package modifications, it can do most of the heavy lifting in discerning changes for automatic changelog generation for new releases of an image.
  • Continuous integration: As part of a CI system, users can leverage container-diff to catch potentially breaking filesystem changes resulting from a Dockerfile change in their builds.

container-diff's default output mode is "human-readable," but also supports output to JSON, allowing for easy automated parsing and processing by users.

Single Image Analysis

In addition to comparing two images, container-diff has the ability to analyze a single image on its own. This can enable users to get a quick glance at information about an image, such as its system and language-level package installations and filesystem contents.

Let's take a look at our Debian base image again. We can use the tool to easily view a list of all packages installed in the image, along with each one's installed version and size:

➜  Development container-diff analyze gcr.io/google-appengine/debian8:latest

-----Apt-----
Packages found in gcr.io/google-appengine/debian8:latest:
NAME VERSION SIZE
-acl 2.2.52-2 258K
-adduser 3.113 nmu3 1M
-apt 1.0.9.8.4 3.1M
-base-files 8 deb8u9 413K
-base-passwd 3.5.37 185K
-bash 4.3-11 deb8u1 4.9M
-bsdutils 1:2.25.2-6 181K
-ca-certificates 20141019 deb8u3 367K
-coreutils 8.23-4 13.9M
-dash 0.5.7-4 b1 191K
-debconf 1.5.56 deb8u1 614K
-debconf-i18n 1.5.56 deb8u1 1.1M
-debian-archive-keyring 2017.5~deb8u1 137K

We could use this to verify compatibility with an application we're building, or maybe sort the packages by size in another one of our images and see which ones are taking up the most space.

For more information about this tool as well as a breakdown with examples, uses, and inner workings of the tool, please take a look at documentation on our GitHub page. Happy diffing!

Special thanks to Colette Torres and Abby Tisdale, our software engineering interns who helped build the tool from the ground up.

Introducing container-diff, a tool for quickly comparing container images

The Google Container Tools team originally built container-diff, a new project to help uncover differences between container images, to aid our own development with containers. We think it can be useful for anyone building containerized software, so we’re excited to release it as open source to the development community.

Containers and the Dockerfile format help make customization of an application’s runtime environment more approachable and easier to understand. While this is a great advantage of using containers in software development, a major drawback is that it can be hard to visualize what changes in a container image will result from a change in the respective Dockerfile. This can lead to bloated images and make tracking down issues difficult.

Imagine a scenario where a developer is working on an application, built on a runtime image maintained by a third-party. During development someone releases a new version of that base image with updated system packages. The developer rebuilds their application and picks up the latest version of the base image, and suddenly their application stops working; it depended on a previous version of one of the installed system packages, but which one? What version was it on before? With no currently existing tool to easily determine what changed between the two base image versions, this totally stalls development until the developer can track down the package version incompatibility.

Introducing container-diff

container-diff helps users investigate image changes by computing semantic diffs between images. What this means is that container-diff figures out on a low-level what data changed, and then combines this with an understanding of package manager information to output this information in a format that’s actually readable to users. The tool can find differences in system packages, language-level packages, and files in a container image.

Users can specify images in several formats - from local Docker daemon (using the prefix `daemon://` on the image path), a remote registry (using the prefix `remote://`), or a file in the .tar in the format exported by "docker save" command. You can also combine these formats to compute the diff between a local version of an image and a remote version. This can be useful when experimenting with new builds of an image that you might not be quite ready to push yet. container-diff supports image tarballs and the registry protocol natively, enabling it to run in environments without a Docker daemon.

Examples and Use Cases

Here is a basic Dockerfile that installs Python inside our Debian base image. Running container-diff on the base image and the new one with Python, users can see all the apt packages that were installed as dependencies of Python.


And below is a Dockerfile that inherits from our Python base runtime image, and then installs the mock and six packages inside of it. Running container-diff with the pip differ, users can see all the Python packages that have either been installed or changed as a result of this:


This can be especially useful when it’s unclear which packages might have been installed or changed incidentally as a result of dependency management of Python modules.

These are just a few examples. The tool currently has support for Python and Node.js packages installed via pip and npm, respectively, as well as comparison of image filesystems and Docker history. In the future, we’d like to see support added for additional runtime and language differs, including Java, Go, and Ruby. External contributions are welcome! For more information on contributing to container-diff, see this how-to guide.

Now that we’ve seen container-diff compare two images in action, it’s easy to imagine how the tool may be integrated into larger workflows to aid in development:
  • Changelog generation: Given container-diff’s capacity to facilitate investigation of filesystem and package modifications, it can do most of the heavy lifting in discerning changes for automatic changelog generation for new releases of an image.
  • Continuous integration: As part of a CI system, users can leverage container-diff to catch potentially breaking filesystem changes resulting from a Dockerfile change in their builds.
container-diff’s default output mode is “human-readable,” but also supports output to JSON, allowing for easy automated parsing and processing by users.

Single Image Analysis

In addition to comparing two images, container-diff has the ability to analyze a single image on its own. This can enable users to get a quick glance at information about an image, such as its system and language-level package installations and filesystem contents.

Let’s take a look at our Debian base image again. We can use the tool to easily view a list of all packages installed in the image, along with each one’s installed version and size:


We could use this to verify compatibility with an application we’re building, or maybe sort the packages by size in another one of our images and see which ones are taking up the most space.

For more information about this tool as well as a breakdown with examples, uses, and inner workings of the tool, please take a look at documentation on our GitHub page. Happy diffing!

Special thanks to Colette Torres and Abby Tisdale, our software engineering interns who helped build the tool from the ground up.

By Nick Kubala, Container Tools team