Tag Archives: Security

Retrofitting Temporal Memory Safety on C++


Memory safety in Chrome is an ever-ongoing effort to protect our users. We are constantly experimenting with different technologies to stay ahead of malicious actors. In this spirit, this post is about our journey of using heap scanning technologies to improve memory safety of C++.



Let’s start at the beginning though. Throughout the lifetime of an application its state is generally represented in memory. Temporal memory safety refers to the problem of guaranteeing that memory is always accessed with the most up to date information of its structure, its type. C++ unfortunately does not provide such guarantees. While there is appetite for different languages than C++ with stronger memory safety guarantees, large codebases such as Chromium will use C++ for the foreseeable future.



auto* foo = new Foo();

delete foo;

// The memory location pointed to by foo is not representing

// a Foo object anymore, as the object has been deleted (freed).

foo->Process();



In the example above, foo is used after its memory has been returned to the underlying system. The out-of-date pointer is called a dangling pointer and any access through it results in a use-after-free (UAF) access. In the best case such errors result in well-defined crashes, in the worst case they cause subtle breakage that can be exploited by malicious actors. 



UAFs are often hard to spot in larger codebases where ownership of objects is transferred between various components. The general problem is so widespread that to this date both industry and academia regularly come up with mitigation strategies. The examples are endless: C++ smart pointers of all kinds are used to better define and manage ownership on application level; static analysis in compilers is used to avoid compiling problematic code in the first place; where static analysis fails, dynamic tools such as C++ sanitizers can intercept accesses and catch problems on specific executions.



Chrome’s use of C++ is sadly no different here and the majority of high-severity security bugs are UAF issues. In order to catch issues before they reach production, all of the aforementioned techniques are used. In addition to regular tests, fuzzers ensure that there’s always new input to work with for dynamic tools. Chrome even goes further and employs a C++ garbage collector called Oilpan which deviates from regular C++ semantics but provides temporal memory safety where used. Where such deviation is unreasonable, a new kind of smart pointer called MiraclePtr was introduced recently to deterministically crash on accesses to dangling pointers when used. Oilpan, MiraclePtr, and smart-pointer-based solutions require significant adoptions of the application code.



Over the last years, another approach has seen some success: memory quarantine. The basic idea is to put explicitly freed memory into quarantine and only make it available when a certain safety condition is reached. In the Linux kernel a probabilistic approach was used where memory was eventually just recycled. A more elaborate approach uses heap scanning to avoid reusing memory that is still reachable from the application. This is similar to a garbage collected system in that it provides temporal memory safety by prohibiting reuse of memory that is still reachable. The rest of this article summarizes our journey of experimenting with quarantines and heap scanning in Chrome.



(At this point, one may ask where pointer authentication fits into this picture – keep on reading!)

Quarantining and Heap Scanning, the Basics

The main idea behind assuring temporal safety with quarantining and heap scanning is to avoid reusing memory until it has been proven that there are no more (dangling) pointers referring to it. To avoid changing C++ user code or its semantics, the memory allocator providing new and delete is intercepted.

Upon invoking delete, the memory is actually put in a quarantine, where it is unavailable for being reused for subsequent new calls by the application. At some point a heap scan is triggered which scans the whole heap, much like a garbage collector, to find references to quarantined memory blocks. Blocks that have no incoming references from the regular application memory are transferred back to the allocator where they can be reused for subsequent allocations.



There are various hardening options which come with a performance cost:

  • Overwrite the quarantined memory with special values (e.g. zero);

  • Stop all application threads when the scan is running or scan the heap concurrently;

  • Intercept memory writes (e.g. by page protection) to catch pointer updates;

  • Scan memory word by word for possible pointers (conservative handling) or provide descriptors for objects (precise handling);

  • Segregation of application memory in safe and unsafe partitions to opt-out certain objects which are either performance sensitive or can be statically proven as being safe to skip;

  • Scan the execution stack in addition to just scanning heap memory;



We call the collection of different versions of these algorithms StarScan [stɑː skæn], or *Scan for short.

Reality Check

We apply *Scan to the unmanaged parts of the renderer process and use Speedometer2 to evaluate the performance impact. 



We have experimented with different versions of *Scan. To minimize performance overhead as much as possible though, we evaluate a configuration that uses a separate thread to scan the heap and avoids clearing of quarantined memory eagerly on delete but rather clears quarantined memory when running *Scan. We opt in all memory allocated with new and don’t discriminate between allocation sites and types for simplicity in the first implementation.


Note that the proposed version of *Scan is not complete. Concretely, a malicious actor may exploit a race condition with the scanning thread by moving a dangling pointer from an unscanned to an already scanned memory region. Fixing this race condition requires keeping track of writes into blocks of already scanned memory, by e.g. using memory protection mechanisms to intercept those accesses, or stopping all application threads in safepoints from mutating the object graph altogether. Either way, solving this issue comes at a performance cost and exhibits an interesting performance and security trade-off. Note that this kind of attack is not generic and does not work for all UAF. Problems such as depicted in the introduction would not be prone to such attacks as the dangling pointer is not copied around.



Since the security benefits really depend on the granularity of such safepoints and we want to experiment with the fastest possible version, we disabled safepoints altogether.



Running our basic version on Speedometer2 regresses the total score by 8%. Bummer…



Where does all this overhead come from? Unsurprisingly, heap scanning is memory bound and quite expensive as the entire user memory must be walked and examined for references by the scanning thread.



To reduce the regression we implemented various optimizations that improve the raw scanning speed. Naturally, the fastest way to scan memory is to not scan it at all and so we partitioned the heap into two classes: memory that can contain pointers and memory that we can statically prove to not contain pointers, e.g. strings. We avoid scanning memory that cannot contain any pointers. Note that such memory is still part of the quarantine, it is just not scanned.



We extended this mechanism to also cover allocations that serve as backing memory for other allocators, e.g., zone memory that is managed by V8 for the optimizing JavaScript compiler. Such zones are always discarded at once (c.f. region-based memory management) and temporal safety is established through other means in V8.



On top, we applied several micro optimizations to speed up and eliminate computations: we use helper tables for pointer filtering; rely on SIMD for the memory-bound scanning loop; and minimize the number of fetches and lock-prefixed instructions.



We also improve upon the initial scheduling algorithm that just starts a heap scan when reaching a certain limit by adjusting how much time we spent in scanning compared to actually executing the application code (c.f. mutator utilization in garbage collection literature).



In the end, the algorithm is still memory bound and scanning remains a noticeably expensive procedure. The optimizations helped to reduce the Speedometer2 regression from 8% down to 2%.



While we improved raw scanning time, the fact that memory sits in a quarantine increases the overall working set of a process. To further quantify this overhead, we use a selected set of Chrome’s real-world browsing benchmarks to measure memory consumption. *Scan in the renderer process regresses memory consumption by about 12%. It’s this increase of the working set that leads to more memory being paged in which is noticeable on application fast paths.


Hardware Memory Tagging to the Rescue

MTE (Memory Tagging Extension) is a new extension on the ARM v8.5A architecture that helps with detecting errors in software memory use. These errors can be spatial errors (e.g. out-of-bounds accesses) or temporal errors (use-after-free). The extension works as follows. Every 16 bytes of memory are assigned a 4-bit tag. Pointers are also assigned a 4-bit tag. The allocator is responsible for returning a pointer with the same tag as the allocated memory. The load and store instructions verify that the pointer and memory tags match. In case the tags of the memory location and the pointer do not match a hardware exception is raised.



MTE doesn't offer a deterministic protection against use-after-free. Since the number of tag bits is finite there is a chance that the tag of the memory and the pointer match due to overflow. With 4 bits, only 16 reallocations are enough to have the tags match. A malicious actor may exploit the tag bit overflow to get a use-after-free by just waiting until the tag of a dangling pointer matches (again) the memory it is pointing to.



*Scan can be used to fix this problematic corner case. On each delete call the tag for the underlying memory block gets incremented by the MTE mechanism. Most of the time the block will be available for reallocation as the tag can be incremented within the 4-bit range. Stale pointers would refer to the old tag and thus reliably crash on dereference. Upon overflowing the tag, the object is then put into quarantine and processed by *Scan. Once the scan verifies that there are no more dangling pointers to this block of memory, it is returned back to the allocator. This reduces the number of scans and their accompanying cost by ~16x.



The following picture depicts this mechanism. The pointer to foo initially has a tag of 0x0E which allows it to be incremented once again for allocating bar. Upon invoking delete for bar the tag overflows and the memory is actually put into quarantine of *Scan.

We got our hands on some actual hardware supporting MTE and redid the experiments in the renderer process. The results are promising as the regression on Speedometer was within noise and we only regressed memory footprint by around 1% on Chrome’s real-world browsing stories.



Is this some actual free lunch? Turns out that MTE comes with some cost which has already been paid for. Specifically, PartitionAlloc, which is Chrome’s underlying allocator, already performs the tag management operations for all MTE-enabled devices by default. Also, for security reasons, memory should really be zeroed eagerly. To quantify these costs, we ran experiments on an early hardware prototype that supports MTE in several configurations:

  1. MTE disabled and without zeroing memory;

  2. MTE disabled but with zeroing memory;

  3. MTE enabled without *Scan;

  4. MTE enabled with *Scan;



(We are also aware that there’s synchronous and asynchronous MTE which also affects determinism and performance. For the sake of this experiment we kept using the asynchronous mode.) 

The results show that MTE and memory zeroing come with some cost which is around 2% on Speedometer2. Note that neither PartitionAlloc, nor hardware has been optimized for these scenarios yet. The experiment also shows that adding *Scan on top of MTE comes without measurable cost. 


Conclusions

C++ allows for writing high-performance applications but this comes at a price, security. Hardware memory tagging may fix some security pitfalls of C++, while still allowing high performance. We are looking forward to see a more broad adoption of hardware memory tagging in the future and suggest using *Scan on top of hardware memory tagging to fix temporary memory safety for C++. Both the used MTE hardware and the implementation of *Scan are prototypes and we expect that there is still room for performance optimizations.


Privileged pod escalations in Kubernetes and GKE



At the KubeCon EU 2022 conference in Valencia, security researchers from Palo Alto Networks presented research findings on “trampoline pods”—pods with an elevated set of privileges required to do their job, but that could conceivably be used as a jumping off point to gain escalated privileges.

The research mentions GKE, including how developers should look at the privileged pod problem today, what the GKE team is doing to minimize the use of privileged pods, and actions GKE users can take to protect their clusters.

Privileged pods within the context of GKE security

While privileged pods can pose a security issue, it’s important to look at them within the overall context of GKE security. To use a privileged pod as a “trampoline” in GKE, there is a major prerequisite – the attacker has to first execute a successful application compromise and container breakout attack.

Because the use of privileged pods in an attack requires a first step such as a container breakout to be effective, let’s look at two areas:
  1. features of GKE you can use to reduce the likelihood of a container breakout
  2. steps the GKE team is taking to minimize the use of privileged pods and the privileges needed in them.
Reducing container breakouts

There are a number of features in GKE along with some best practices that you can use to reduce the likelihood of a container breakout:

More information can be found in the GKE Hardening Guide.

How GKE is reducing the use of privileged pods.

While it’s not uncommon for customers to install privileged pods into their clusters, GKE works to minimize the privilege levels held by our system components, especially those that are enabled by default. However, there are limits as to how many privileges can be removed from certain features. For example, Anthos Config Management requires permissions to modify most Kubernetes objects to be able to create and manage those objects.

Some other privileges are baked into the system, such as those held by Kubelet. Previously, we worked with the Kubernetes community to build the Node Restriction and Node Authorizer features to limit Kubelet's access to highly sensitive objects, such as secrets, adding protection against an attacker with access to the Kubelet credentials.

More recently, we have taken steps to reduce the number of privileged pods across GKE and have added additional documentation on privileges used in system pods as well as information on how to improve pod isolation. Below are the steps we’ve taken:
  1. We have added an admission controller to GKE Autopilot and GKE Standard (on by default) and GKE/Anthos (opt-in) that stops attempts to run as a more privileged service account, which blocks a method of escalating privileges using privileged pods.
  2. We created a permission scanning tool that identifies pods that have privileges that could be used for escalation, and we used that tool to perform an audit across GKE and Anthos.
  3. The permission scanning tool is now integrated into our standard code review and testing processes to reduce the risk of introducing privileged pods into the system. As mentioned earlier, some features require privileges to perform their function.
  4. We are using the audit results to reduce permissions available to pods. For example, we removed “update nodes and pods” permissions from anetd in GKE.
  5. Where privileged pods are required for the operation of a feature, we’ve added additional documentation to illustrate that fact.
  6. We added documentation that outlines how to isolate GKE-managed workloads in dedicated node pools when you’re unable to use GKE Sandbox to reduce the risk of privilege escalation attacks.
In addition to the measures above, we recommend users take advantage of tools that can scan RBAC settings to detect overprivileged pods used in their applications. As part of their presentation, the Palo Alto researchers announced an open source tool, called rbac-police, that can be used for the task. So, while it only takes a single overprivileged workload to trampoline to the cluster, there are a number of actions you can take to minimize the likelihood of the prerequisite container breakout and the number of privileges used by a pod.

I/O 2022: Android 13 security and privacy (and more!)

Every year at I/O we share the latest on privacy and security features on Android. But we know some users like to go a level deeper in understanding how we’re making the latest release safer, and more private, while continuing to offer a seamless experience. So let’s dig into the tools we’re building to better secure your data, enhance your privacy and increase trust in the apps and experiences on your devices.

Low latency, frictionless security

Regardless of whether a smartphone is used for consumer or enterprise purposes, attestation is a key underpinning to ensure the integrity of the device and apps running on the device. Fundamentally, key attestation lets a developer bind a secret or designate data to a device. This is a strong assertion: "same user, same device" as long as the key is available, a cryptographic assertion of integrity can be made.

With Android 13 we have migrated to a new model for the provisioning of attestation keys to Android devices which is known as Remote Key Provisioning (RKP). This new approach will strengthen device security by eliminating factory provisioning errors and providing key vulnerability recovery by moving to an architecture where Google takes more responsibility in the certificate management lifecycle for these attestation keys. You can learn more about RKP here.

We’re also making even more modules updatable directly through Google Play System Updates so we can automatically upgrade more system components and fix bugs, seamlessly, without you having to worry about it. We now have more than 30 components in Android that can be automatically updated through Google Play, including new modules in Android 13 for Bluetooth and ultra-wideband (UWB).

Last year we talked about how the majority of vulnerabilities in major operating systems are caused by undefined behavior in programming languages like C/C++. Rust is an alternative language that provides the efficiency and flexibility required in advanced systems programming (OS, networking) but Rust comes with the added boost of memory safety. We are happy to report that Rust is being adopted in security critical parts of Android, such as our key management components and networking stacks.

Hardening the platform doesn’t just stop with continual improvements with memory safety and expansion of anti-exploitation techniques. It also includes hardening our API surfaces to provide a more secure experience to our end users.

In Android 13 we implemented numerous enhancements to help mitigate potential vulnerabilities that app developers may inadvertently introduce. This includes making runtime receivers safer by allowing developers to specify whether a particular broadcast receiver in their app should be exported and visible to other apps on the device. On top of this, intent filters block non-matching intents which further hardens the app and its components.

For enterprise customers who need to meet certain security certification requirements, we’ve updated our security logging reporting to add more coverage and consolidate security logs in one location. This is helpful for companies that need to meet standards like Common Criteria and is useful for partners such as management solutions providers who can review all security-related logs in one place.

Privacy on your terms

Android 13 brings developers more ways to build privacy-centric apps. Apps can now implement a new Photo picker that allows the user to select the exact photos or videos they want to share without having to give another app access to their media library.

With Android 13, we’re also reducing the number of apps that require your location to function using the nearby devices permission introduced last year. For example, you won’t have to turn on location to enable Wi-fi for certain apps and situations. We’ve also changed how storage works, requiring developers to ask for separate permissions to access audio, image and video files.

Previously, we’ve limited apps from accessing your clipboard in the background and alerted you when an app accessed it. With Android 13, we’re automatically deleting your clipboard history after a short period so apps are blocked from seeing old copied information.

In Android 11, we began automatically resetting permissions for apps you haven’t used for an extended period of time, and have since expanded the feature to devices running Android 6 and above. Since then, we’ve automatically reset over 5 billion permissions.

In Android 13, app makers can go above and beyond in removing permissions even more proactively on behalf of their users. Developers will be able to provide even more privacy by reducing the time their apps have access to unneeded permissions.

Finally, we know notifications are critical for many apps but are not always of equal importance to users. In Android 13, you’ll have more control over which apps you would like to get alerts from, as new apps on your device are required to ask you for permission by default before they can send you notifications.

Apps you can trust

Most app developers build their apps using a variety of software development kits (SDKs) that bundle in pre-packaged functionality. While SDKs provide amazing functionality, app developers typically have little visibility or control over the SDK code or insight into their performance.

We’re working with developers to make their apps more secure with a new Google Play SDK Index that helps them see SDK safety and reliability signals before they build the code into their apps. This ensures we're helping everyone build a more secure and private app ecosystem.

Last month, we also started rolling out a new Data safety section in Google Play to help you understand how apps plan to collect, share, and protect your data, before you install it. To instill even more trust in Play apps, we're enabling developers to have their apps independently validated against OWASP’s MASVS, a globally recognized standard for mobile app security.

We’re working with a small group of developers and authorized lab partners to evolve the program. Developers who have completed this independent validation can showcase this on their Data safety section.

Additional mobile security and safety

Just like our anti-malware protection Google Play, which now scans 125 billion apps a day, we believe spam and phishing detection should be built in. We’re proud to announce that in a recent analyst report, Messages was the highest rated built-in messaging app for anti-phishing and scams protection.

Messages is now also helping to protect you against 1.5 billion spam messages per month, so you can avoid both annoying texts and attempts to access your data. These phishing attempts are increasingly how bad actors are trying to get your information, by getting you to click on a link or download an app, so we are always looking for ways to offer another line of defense.

Last year, we introduced end-to-end encryption in Messages to provide more security for your mobile conversations. Later this year, we’ll launch end-to-end encryption group conversations in beta to ensure your personal messages get even more protection.

As with a lot of features we build, we try to do it in an open and transparent way. In Android 11 we announced a new platform feature that was backed by an ISO standard to enable the use of digital IDs on a smartphone in a privacy-preserving way. When you hand over your plastic license (or other credential) to someone for verification it’s all or nothing which means they have access to your full name, date of birth, address, and other personally identifiable information (PII). The mobile version of this allows for much more fine-grained control where the end user and/or app can select exactly what to share with the verifier. In addition, the verifier must declare whether they intend to retain the data returned. In addition, you can present certain details of your credentials, such as age, without revealing your identity.

Over the last two Android releases we have been improving this API and making it easier for third-party organizations to leverage it for various digital identity use cases, such as driver’s licenses, student IDs, or corporate badges. We’re now announcing that Google Wallet uses Android Identity Credential to support digital IDs and driver’s licenses. We’re working with states in the US and governments around the world to bring digital IDs to Wallet later this year. You can learn more about all of the new enhancements in Google Wallet here.

Protected by Android

We don’t think your security and privacy should be hard to understand and control. Later this year, we’ll begin rolling out a new destination in settings on Android 13 devices that puts all your device security and data privacy front and center.

The new Security & Privacy settings page will give you a simple, color-coded way to understand your safety status and will offer clear and actionable guidance to improve it. The page will be anchored by new action cards that notify you of critical steps you should take to address any safety risks. In addition to notifications to warn you about issues, we’ll also provide timely recommendations on how to enhance your privacy.

We know that to feel safe and in control of your data, you need to have a secure foundation you can count on. Because if your device isn’t secure, it’s not private either. We’re working hard to make sure you’re always protected by Android. Learn more about these protections on our website.

Taking on the Next Generation of Phishing Scams

 

Every year, security technologies improve: browsers get better, encryption becomes ubiquitous on the Web, authentication becomes stronger. But phishing persistently remains a threat (as shown by a recent phishing attack on the U.S. Department of Labor) because users retain the ability to log into their online accounts, often with a simple password, from anywhere in the world. It’s why today at I/O we announced new ways we’re reducing the risks of phishing by: scaling phishing protections to Google Docs, Sheets and Slides, continuing to auto enroll people in 2-Step Verification and more. This blog will deep dive into the method of phishing and how it has evolved today.

As phishing adoption has grown, multi-factor authentication has become a particular focus for attackers. In some cases, attackers phish SMS codes directly, by following a legitimate "one-time passcode" (triggered by the attacker trying to log into the victim's account) with a spoofed message asking the victim to "reply back with the code you just received.”


Left: legitimate Google SMS verification. Right: spoofed message asking victim to share verification code.


In other cases, attackers have leveraged more sophisticated dynamic phishing pages to conduct relay attacks. In these attacks, a user thinks they're logging into the intended site, just as in a standard phishing attack. But instead of deploying a simple static phishing page that saves the victim's email and password when the victim tries to login, the phisher has deployed a web service that logs into the actual website at the same time the user is falling for the phishing page.

The simplest approach is an almost off-the-shelf "reverse proxy" which acts as a "person in the middle", forwarding the victim's inputs to the legitimate page and sending the response from the legitimate page back to the victim's browser.



These attacks are especially challenging to prevent because additional authentication challenges shown to the attacker—like a prompt for an SMS code—are also relayed to the victim, and the victim's response is in turn relayed back to the real website. In this way, the attacker can count on their victim to solve any authentication challenge presented.

Traditional multi-factor authentication with PIN codes can only do so much against these attacks, and authentication with smartphone approvals via a prompt — while more secure against SIM-swap attacks — is still vulnerable to this sort of real-time interception.

The Solution Space

Over the past year, we've started to automatically enable device-based two-factor authentication for our users. This authentication not only helps protect against traditional password compromise but, with technology improvements, we can also use it to help defend against these more sophisticated forms of phishing.

Taking a broad view, most efforts to protect and defend against phishing fall into the following categories:
  • Browser UI improvements to help users identify authentic websites.
  • Password managers that can validate the identity of the web page before logging in.
  • Phishing detection, both in email—the most common delivery channel—and in the browser itself, to warn users about suspicious web pages.
  • Preventing the person-in-the-middle attacks mentioned above by preventing automated login attempts.
  • Phishing-resistant authentication using FIDO with security keys or a Bluetooth connection to your phone.
  • Hardening the Google Prompt challenge to help users identify suspicious sign-in attempts, or to ask them to take additional steps that can defeat phishing (like navigating to a new web address, or to join the same wireless network as the computer they're logging into).

Expanding phishing-resistant authentication to more users


Over the last decade we’ve been working hard with a number of industry partners on expanding phishing-resistant authentication mechanisms, as part of FIDO Alliance. Through these efforts we introduced physical FIDO security keys, such as the Titan Security Key, which prevent phishing by verifying the identity of the website you're logging into. (This verification protects against the "person-in-the-middle" phishing described above.) Recently, we announced a major milestone with the FIDO Alliance, Apple and Microsoft by expanding our support for the FIDO Sign-in standards, helping to launch us into a truly passwordless, phishing-resistant future.

Even though security keys work great, we don't expect everyone to add one to their keyring.



Instead, to make this level of security more accessible, we're building it into mobile phones. Unlike physical FIDO security keys that need to be connected to your device via USB, we use Bluetooth to ensure your phone is close to the device you're logging into. Like physical security keys, this helps prevent a distant attacker from tricking you into approving a sign-in on their browser, giving us an added layer of security against the kind of "person in the middle" attacks that can still work against SMS or Google Prompt.

(But don't worry: this doesn't allow computers within Bluetooth range to login as you—it only grants that approval to the computer you're logging into. And we only use this to verify that your phone is near the device you're logging into, so you only need to have Bluetooth on during login.)

Over the next couple of months we’ll be rolling out this technology in more places, which you might notice as a request for you to enable Bluetooth while logging in, so we can perform this additional security check. If you've signed into your Google account on your Android phone, we can enroll your phone automatically—just like with Google Prompt—allowing us to give this added layer of security to many of our users without the need for any additional setup.

But unfortunately this secure login doesn't work everywhere—for example, when logging into a computer that doesn't support Bluetooth, or a browser that doesn't support security keys. That's why, if we are to offer phishing-resistant security to everyone, we have to offer backups when security keys aren't available—and those backups must also be secure enough to prevent attackers from taking advantage of them.


Hardening existing challenges against phishin
g

Over the past few months, we've started experimenting with making our traditional Google Prompt challenges more phishing resistant.

We already use different challenge experiences depending on the situation—for example, sometimes we ask the user to match a PIN code with what they're seeing on the screen in addition to clicking "allow" or "deny". This can help prevent static phishing pages from tricking you into approving a challenge.

We've also begun experimenting with more involved challenges for higher-risk situations, including more prominent warnings when we see you logging in from a computer that we think might belong to a phisher, or asking you to join your phone to the same Wi-Fi network as the computer you're logging into so we can be sure the two are near each other. Similar to our use of Bluetooth for Security Keys, this prevents an attacker from tricking you into logging into a "person-in-the-middle" phishing page.


Bringing it all together

Of course, while all of these options dramatically increase account security, we also know that they can be a challenge for some of our users, which is why we're rolling them out gradually, as part of a risk-based approach that also focuses on usability. If we think an account is at a higher risk, or if we see abnormal behavior, we're more likely to use these additional security measures.

Over time, as FIDO2 authentication becomes more widely available, we expect to be able to make it the default for many of our users, and to rely on stronger versions of our existing challenges like those described above to provide secure fallbacks.

All these new tools in our toolbox—detecting browser automation to prevent "person in the middle" attacks, warning users in Chrome and Gmail, making the Google Prompt more secure, and automatically enabling Android phones as easy-to-use Security Keys—work together to allow us to better protect our users against phishing.

Phishing attacks have long been seen as a persistent threat, but these recent developments give us the ability to really move the needle and help more of our users stay safer online.

The Package Analysis Project: Scalable detection of malicious open source packages

Despite open source software’s essential role in all software built today, it’s far too easy for bad actors to circulate malicious packages that attack the systems and users running that software. Unlike mobile app stores that can scan for and reject malicious contributions, package repositories have limited resources to review the thousands of daily updates and must maintain an open model where anyone can freely contribute. As a result, malicious packages like ua-parser-js, and node-ipc are regularly uploaded to popular repositories despite their best efforts, with sometimes devastating consequences for users.

Google, a member of the Open Source Security Foundation (OpenSSF), is proud to support the OpenSSF’s Package Analysis project, which is a welcome step toward helping secure the open source packages we all depend on. The Package Analysis program performs dynamic analysis of all packages uploaded to popular open source repositories and catalogs the results in a BigQuery table. By detecting malicious activities and alerting consumers to suspicious behavior before they select packages, this program contributes to a more secure software supply chain and greater trust in open source software. The program also gives insight into the types of malicious packages that are most common at any given time, which can guide decisions about how to better protect the ecosystem.

To better understand how the Package Analysis program is contributing to supply chain security, we analyzed the nearly 200 malicious packages it captured over a one-month period. Here’s what we discovered: 

Results

All signals collected are published in our BigQuery table. Using simple queries on this table, we found around 200 meaningful results from the packages uploaded to NPM and PyPI in a period of just over a month. Here are some notable examples, with more available in the repository.

PyPI: discordcmd
This Python package will attack the desktop client for Discord on Windows. It was found by spotting the unusual requests to raw.githubusercontent.com, Discord API, and ipinfo.io.

First, it downloaded a backdoor from GitHub and installed it into the Discord electron client.

Next, it looked through various local databases for the user's Discord token.


Finally, it grabbed the data associated with the token from the Discord API and exfiltrated it back to a Discord server controlled by the attacker.

NPM: @roku-web-core/ajax

During install, this NPM package exfiltrates details of the machine it is running on and then opens a reverse shell, allowing the remote execution of commands.
This package was discovered from its requests to an attacker-controlled address.

Dependency Confusion / Typosquatting

The vast majority of the malicious packages we detected are dependency confusion and typosquatting attacks.


The packages we found usually contain a simple script that runs during an install and calls home with a few details about the host. These packages are most likely the work of security researchers looking for bug bounties, since most are not exfiltrating meaningful data except the name of the machine or a username, and they make no attempt to disguise their behavior.


These dependency confusion attacks were discovered through the domains they used, such as burpcollaborator.net, pipedream.com, interact.sh, which are commonly used for reporting back attacks. The same domains appear across unrelated packages and have no apparent connection to the packages themselves. Many packages also used unusual version numbers that were high (e.g. v5.0.0, v99.10.9) for a package with no previous versions.Conclusions

The short time frame and low sophistication needed for finding the results above underscore the challenge facing open source package repositories. While many of the results above were likely the work of security researchers, any one of these packages could have done far more to hurt the unfortunate victims who installed them.

These results show the clear need for more investment in vetting packages being published in order to keep users safe. This is a growing space, and having an open standard for reporting would help centralize analysis results and offer consumers a trusted place to assess the packages they’re considering using. Creating an open standard should also foster healthy competition, promote integration, and raise the overall security of open source packages.
 
Over time we hope that the Package Analysis program will offer comprehensive knowledge about the behavior and capabilities of packages across open source software, and help guide the future efforts needed to make the ecosystem more secure for everyone. To get involved, please check out the GitHub Project and Milestones for opportunities to contribute.

How we fought bad apps and developers in 2021

Providing a safe experience to billions of users continues to be one of the highest priorities for Google Play. Last year we introduced multiple privacy focused features, enhanced our protections against bad apps and developers, and improved SDK data safety. In addition, Google Play Protect continues to scan billions of installed apps each day across billions of devices to keep people safe from malware and unwanted software.

We continue to enhance our machine learning systems and review processes, and in 2021 we blocked 1.2 million policy violating apps from being published on Google Play, preventing billions of harmful installations. We also continued in our efforts to combat malicious and spammy developers, banning 190k bad accounts in 2021. In addition, we have closed around 500k developer accounts that are inactive or abandoned.

In May we announced our new Data safety section for Google Play where developers will be required to give users deeper insight into the privacy and security practices of the apps they download, and provide transparency into the data the app may collect and why. The Data safety section launched this week, and developers are required to complete this section for their apps by July 20th.

We’ve also invested in making life easier for our developers. We added the Policy and Programs section to Google Play Console to help developers manage all their app compliance issues in one central location. This includes the ability to appeal a decision and track its status from this page.

In addition, we continued to partner with SDK developers to improve app safety, limit how user data is shared, and improve lines of communication with app developers. SDKs provide functionality for app developers, but it can sometimes be tricky to know when an SDK is safe to use. Last year, we engaged with SDK developers to build a safer Android and Google Play ecosystem. As a result of this work, SDK developers have improved the safety of SDKs used by hundreds of thousands of apps impacting billions of users. This remains a huge investment area for our team, and we will continue in our efforts to make SDKs safer across the ecosystem.

Limiting access

The best way to ensure users' data stays safe is to limit access to it in the first place.

As a result of new platform protections and policies, developer collaboration and education, 98% of apps migrating to Android 11 or higher have reduced their access to sensitive APIs and user data. We've also significantly reduced the unnecessary, dangerous, or disallowed use of Accessibility APIs in apps migrating to Android 12, while preserving the functionality of legitimate use cases.

We also continued in our commitment to make Android a great place for families. Last year we disallowed the collection of Advertising ID (AAID) and other device identifiers from all users in apps solely targeting children, and gave all users the ability to delete their Advertising ID entirely, regardless of the app.

Pixel enhancements

For Pixel users, we had even more great features to help keep you safe. Our new Security hub helps protect your phone, apps, Google Account, and passwords by giving you a central view of your device’s current configuration. Security hub also provides recommendations to improve your security, helping you decide what settings best meet your needs.

In addition, Pixels now use new machine learning models that improve the detection of malware in Google Play Protect. The detection runs on your Pixel, and uses a privacy preserving technology called federated analytics to discover bad apps.

Our global teams are dedicated to keeping our billions of users safe, and look forward to many exciting announcements in 2022.

How to SLSA Part 3 – Putting it all together


In our last two posts (1,2) we introduced a fictional example of Squirrel, Oppy, and Acme learning to SLSA and covered the basics and details of how they’d use SLSA for their organizations. Today we’ll close out the series by exploring how each organization pulls together the various solutions into a heterogeneous supply chain.

As a reminder, Acme is trying to produce a container image that contains three artifacts:
  1. The Squirrel package ‘foo’
  2. The Oppy package ‘baz’
  3. A custom executable, ‘bar’, written by Acme employees.
The process starts with ‘foo’ package authors triggering a build using GitHub Actions. This results in a new version of ‘foo’ (an artifact with hash ‘abc’) being pushed to the Squirrel repo along with its SLSA provenance (signed by Fulcio) and source attestation. When Squirrel gets this push request it verifies the artifact against the specific policy for ‘foo’ which checks that it was built by GitHub Actions from the expected source repository. After the artifact passes the policy check a VSA is created and the new package, its original SLSA provenance, and the VSA are made public in the Squirrel repo, available to all users of package ‘foo’.

Next the maintainers of the Oppy ‘baz’ package trigger a new build using the Oppy Autobuilder. This results in a new version of ‘baz’ (an artifact with hash ‘def’) being pushed to a public Oppy repo with the SLSA provenance (signed by their org-specific keys) published to Rekor. When the repo gets the push request it makes the artifact available to the public. The repo does not perform any verification at this time.

An Acme employee then makes a change to their Dockerfile, sending it for review by their co-worker, who approves the change and merges the PR. This then causes the Acme builder to trigger a build. During this build:
  • bar is compiled from source code stored in the same source repo as the Dockerfile.
  • acorn install downloads ‘foo’ from the Squirrel repo, verifying the VSA, and recording the use of acorn://[email protected] and its VSA in the build.
  • acme_oppy_get install (a custom script made by Acme) downloads the latest version of the Oppy ‘baz’ package and queries its SLSA provenance and other attestations from Rekor. It then performs a full verification checking that it was built by ‘https://oppy.example/slsa/builder/v1’ and the publicized key. Once verification is complete it records the use of oppy://[email protected] and the associated attestations in the build.
  • The build process assembles the SLSA provenance for the container by:

Once the container is ready for release the Acme verifier checks the SLSA provenance (and other data in the in-toto bundle) using the policy from their own policy repo and issues a VSA. The VSA and all associated attestations are then published to an internal Rekor instance. Acme can then create an SBOM for the container leveraging data about the build as stored in Rekor. Acme then publishes the container image, the VSA, and the SBOM on Dockerhub.

Downstream users of this Acme container can then check the Acme issued VSA, and if there are any problems Acme can consult their internal Rekor instance to get more details on the build allowing Acme to trace all of their dependencies back to source code and the systems used to create them.
Conclusion

With SLSA implemented in the ways described in this series, downstream users are protected from many of the threats affecting the software supply chain today. While users still need to trust certain parties, the number of systems requiring trust is much lower and users are in a much better position to investigate any issues that arise.

We’d love to see the ideas in this series implemented, refuted, or used as a foundation to build even stronger solutions. We’d also love to hear some other methods on how to solve these issues. Show us how you like to SLSA. 

How to SLSA Part 2 – The Details


In our last post we introduced a fictional example of Squirrel, Oppy, and Acme learning to use SLSA and covered the basics of what their implementations might look like. Today we’ll cover the details: where to store attestations and policies, what policies should check, and how to handle key distribution and trust.

Attestation storage

Attestations play a large role in SLSA and it’s essential that consumers of artifacts know where to find the attestations for those artifacts.

Co-located in repo

Attestations could be colocated in the repository that hosts the artifact. This is how Squirrel plans to store attestations for packages. They even want to add support to the Squirrel CLI (e.g. acorn get-attestations [email protected]).

Acme really likes this approach because the attestations are always available and it doesn’t introduce any new dependencies.

Rekor

Meanwhile, Oppy plans to store attestations in Rekor. They like being able to direct users to an existing public instance while not having to maintain any new infrastructure themselves, and the in-depth defense the transparency log provides against tampering with the attestations.

Though the latency of querying attestations from Rekor is likely too high for doing verification at time of use, Oppy isn’t too concerned since they expect users to query Rekor at install time.

Hybrid

A hybrid model is also available where the publisher stores the attestations in Rekor as well as co-located with the artifact in the repo—along with Rekor’s inclusion proof. This provides confidence the data was added to Rekor while providing the benefits of co-locating attestations in the repository.

Policy content

‘Policy’ refers to the rules used to determine if an artifact should be allowed for a use case.

Policies often use the package name as a proxy for determining the use case. An example being, if you want to find the policy to apply you could look up the policy using the package name of the artifact you’re evaluating.

Policy specifics may vary based on ease of use, availability of data, risk tolerance and more. Full verification needs more from policies than delegated verification does.

Default policy

Default policies allow admission decisions without the need to create specific policies for each package. A default policy is a way of saying “anything that doesn’t have a more specific policy must comply with this policy”.

Squirrel plans to eventually implement a default policy of “any package without a more specific policy will be accepted as long as it meets SLSA 3”, but they recognize that most packages don’t support this yet. Until they achieve critical mass they’ll have a default SLSA 0 policy (all artifacts are accepted).

While Oppy is leaving verification to their users, they’ll suggest a default policy of “any package built by ‘https://oppy.example/slsa/builder/v1’”.

Specific policy

Squirrel also plans to allow users to create policies for specific packages. For example, this policy requires that package ‘foo’ must have been built by GitHub Actions, from github.com/foo/acorn-foo, and be SLSA 4.

scope: 'acorn://foo'

target_level: SLSA_L4

allow_github_actions {

  workflow: 'https://github.com/gossts/slsa-acorn/.github/workflows/[email protected]'

  source_repo: 'https://github.com/foo/acorn-foo.git'

  allow_branch: 'main'

}


Squirrel will also allow packages to create SLSA 0 policies if they’re not using SLSA compliant infrastructure.


scope: 'acorn://qux'

target_level: SLSA_L0



Policy auto generation

Squirrel has an enormous number of existing packages. It’s not feasible to get all those package maintainers to create specific policies themselves. Therefore, Squirrel plans to leverage process mining to auto generate policies for packages based on the history of the package. E.g. “The last 10 times Squirrel package foo was published it was built by GitHub Actions from github.com/foo/acorn-foo, and met SLSA 4 (this is the policy above). Let’s create a policy that requires that and send it to the maintainers to review.”
Policy add-ons

Policy evaluation could do more than just evaluate the SLSA requirements. The same policies that check SLSA requirements are well placed to check other properties that are important to organizations like “was static analysis performed”, “are there any known CVEs in this artifact”, “was integration testing successful”, etc…

Acme is really interested in some of these policy add-ons. They’d like to avoid the embarrassing situation of publishing a new container image with known CVEs. They’re not sure how to implement it yet but they’ll be on the lookout for tools that can help them do so.

Delegated policies

When using delegated verification there’s much less that actually needs to be checked and they can be hard-coded directly in tooling. A minimal delegated verification policy might be “allow if trusted-party verified this artifact (identified by digest) as <package name>”. This can be tightened further by adding requirements on the artifact & its dependencies SLSA levels (data which is available in the VSA). For example, “allow if trusted-party verified this artifact as <package name> at SLSA 3 and it doesn’t have any dependencies less than SLSA 2”.

# Delegated verification implicitly checks that the package name we're

# checking matches the VSA's subject.name field.

allow_delegated_verification {

  trusted_verifier: 'https://delegatedverifier.com/slsa/v1'

  minimum_level: SLSA_L3

  minimum_dependency_level: SLSA_L2

}


Policy storage


When using specific, non-default, policies verifiers need to know where to find the policy they need to evaluate.

Co-located in repo

Squirrel plans to store specific policies as a property of the package in the repository. This makes them very easy for users and their tooling to find. It also allows the maintainer of the package to easily set the policy (they already have write permissions!).

A potential downside is that the write permissions are the same as for the package itself. An attacker that compromises the developer’s credentials could also change the policy. This may not be as bad as it seems. Policies are human-readable so anyone paying attention would notice that package foo’s policy now says that it can be built from github.com/not-foo/acorn-foo. Squirrel plans to notify interested parties (including the maintainer!) when the policy changes, potentially letting them “sound the alarm” if anything nefarious happens.

A similar approach is taken in a number of contact-change workflows. For example, when you change your address with your bank, the bank will send you an email (and a letter to the old address) letting you know the address has been changed. This type of notification would alert the maintainer to a potential compromise.

Squirrel would also consider requiring a second person to review any policy changes for packages with over 10,000 users.

Public canonical Git repo

Another option might be to just create a canonical git repo (e.g. github.com/slsa-framework/slsa-acorn-policies) and let people publish proposed policies there. This has the advantage of using a separate ACL control mechanism from the package repository itself, but the disadvantages of being difficult to ensure the author of the policy is actually allowed to set the policy for that package and not scaling well as the repo grows.

The approach outlined in policy auto generation could help here. Automation in the repo could just look at the last N releases of the package and determine if the proposed policy matches what’s actually been published. Proactive changes to the policy (like deciding to switch from GitHub Actions to CircleCI) would be harder to coordinate however.

Org specific repo

Acme plans to establish their own org specific repo for policy storage. This gives them a single place to store all their policies, regardless of ecosystem type, and lets them provide more specific policies for packages provided by upstream repos. Since Oppy doesn’t have any plans to provide package-specific policies this gives Acme a place to store their own policies for Oppy packages (if they ever get around to it).

Organizations can also use their policy repo to vet any upstream changes to policy and potentially add additional checks (e.g. “doesn’t have any known vulnerabilities”).


Trusted Verifier


Acme wants to use delegated verification and that relies on having trusted verifiers to make decisions for downstream users. Who are these trusted verifiers?

Public verifier

A public repo is in a great position to act as a trusted verifier for their users. Users already trust these repos and they may already be doing verification on import.

Squirrel plans to make use of this by making VSAs available for each artifact published, publicizing their verifier ID (i.e. ‘https://squirrel.example/slsa-verifier’) and the public key used to sign the VSAs. They even plan to build VSA verification directly into the Squirrel tooling, so that users can get SLSA protection by default.

Org-wide verifier

While Acme is happy to use Squirrel’s verifier (and the verification built into the tooling) they still need their own verifier so they can publish VSAs to Acme customers. So Acme plans to stand up their own verification service and publish their verifier ID (i.e. ‘https://acme.example/private-verifier’) and signing key. Acme customers can then verify the software they get from Acme.

In the future Acme could require all software used throughout the company to be verified with this verifier (instead of relying on public verifiers). They’d do the verification and generate VSAs whenever artifacts are imported into their private Artifactory instance. They could then configure this ID/key pair for use throughout Acme and be confident that any software used has been verified according to Acme policy. That’s not Acme’s highest priority at the moment, but they like having this option open to them.

Key distribution & Trust

Both full and delegated verification depend upon key distribution to the users doing the verification. Depending on the specifics and what’s getting verified this can be a difficult problem.

Org-specific keys

When using delegated verification this could be the easiest case. Squirrel can just build the key they used for delegated verification directly into the Squirrel tooling. Acme can also fairly easily configure the use of their keys through the company using existing configuration control mechanisms.

When using full verification this can be harder. If there are multiple builders that could be accepted the keys that sign the attestations need to be distributed to everyone that might use that builder. For Squirrel this would be really difficult since they plan to allow package maintainers to use whatever builder they want. How those keys get configured would be tricky just for Squirrel, and much more difficult if downstream Squirrel users wanted to do full verification of the Squirrel packages.

The situation is easier, however, for Oppy. That’s because Oppy plans to only accept artifacts built by their autobuilder network. Oppy can configure this network to use a single (or small set) of keys and then publish those keys (and the SLSA level Oppy believes it meets) for downstream users.

Fulcio

Squirrel plans to solve the problem of which keys they accept by leveraging Fulcio. Squirrel will build support for Fulcio root keys into their verifier and then express which Fulcio subject is allowed to sign attestations in the specific policy of each package. E.g. “Squirrel package ‘foo’ must have been built & signed by ‘spiffe://foobar.com/foo-builder, from github.com/foo/acorn-foo, and be SLSA 4”.

scope: 'acorn://foo'

target_level: SLSA_L4

allow_fulcio_builder {

  id: 'spiffe://foobar.com/foo-builder'

  source_repo: 'https://github.com/foo/acorn-foo.git'

  allow_branch: 'main'

  allow_entrypoint: 'package.json'

}



The Update Framework (TUF)

The above methods could be further enhanced with TUF to allow the secure maintenance of keys. TUF metadata could include all the SLSA keys, the build services and other entities they’re valid for, and the SLSA levels they’re qualified at. Oppy is considering using TUF to let verifiers securely fetch and update keys used by the Autobuilder network. Oppy would use a TUF delegation to indicate that these keys should only be used for the builder id ‘https://oppy.example/slsa/builder/v1’. Squirrel might do something similar to allow for updating the Fulcio key in its tooling.

Recording & verifying dependencies

Acme wants to record and verify the dependencies that go into its container into the SLSA provenance. Acme would prefer that this functionality were just built-in their build service, but that feature isn’t available yet. Instead they’ll need to do something themselves. They have a few options at their disposal:

Tool wrappers

Since Oppy doesn’t build SLSA into it’s tooling Acme will create wrapper scripts for dependency import/installation that record and verify (using cosign) dependencies as they’re installed. Acme will update their build scripts to replace all instances of Oppy package installation with the wrapper script and then use the recorded results to help populate the materials section of the provenance.

A downside is that this approach, if run in the build itself, is not guaranteed to be complete and cannot meet the “non-falsifiable” requirement (since the results reported by the wrapper could be falsified by the build process), relegating this approach to SLSA 2. Still, it allows Acme to make progress SLSA-fying their builds and provides a starting point for achieving higher SLSA levels.

Built into ecosystem tooling

Since Squirrel does build verification into their tooling, Acme can just use acorn install to verify the dependencies and record what was installed. Acme can use this information to populate the Squirrel packages installed in the materials section of the provenance and it can include the attestations of those dependencies in the in-toto bundle for their container image.

As with tool wrappers, if this method is used in the build itself it cannot meet “non-falsifiable” requirement.

Proxied verification

Acme considered creating a proxy for their existing builder to proxy outbound connections. This proxy could verify everything fetched and use its logs to populate the provenance. Since this proxy is trusted it would be easier to meet “non-falsifiable” requirement. Unfortunately it’s also a lot of work for Acme so they’re going to defer this idea for now.

Next time

In the first two parts of this series, we’ve covered the basics of getting started with SLSA and the details of policy and provenance storage, policy verification, and key handling. In our next post we’ll cover how Squirrel, Oppy, and Acme put this all together to protect a heterogeneous supply chain.

How to SLSA Part 1 – The Basics


One of the great benefits of SLSA (Supply-chain Levels for Software Artifacts) is its flexibility. As an open source framework designed to improve the integrity of software packages and infrastructure, it is as applicable to small open source projects as to enterprise organizations. But with this flexibility can come a bewildering array of options for beginners—much like salsa dancing, someone just starting out might be left on the dance floor wondering how and where to jump in.

Though it’s tempting to try to establish a single standard for how to use SLSA, it’s not possible: SLSA is not a line dance where everyone does the same moves, at the same time, to the same song. It’s a varied system with different styles, moves, and flourishes. The open source community, organizations, and consumers may all implement SLSA differently, but they can still work with each other.


In this three-part series, we’ll explore how three fictional organizations would apply SLSA to meet their different needs. In doing so, we will answer some of the main questions that newcomers to SLSA have:


Part 1: The basics
  • How and when do you verify a package with SLSA?
  • How to handle artifacts without provenance?
Part 2: The details

  • Where is the provenance stored?
  • Where is the appropriate policy stored and who should verify it?
  • What should the policies check?
  • How do you establish trust & distribute keys?
Part 3: Putting it all together
  • What does a secure, heterogeneous supply chain look like?

The Situation

Our fictional example involves three organizations that want to use SLSA:

Squirrel: a package manager with a large number of developers and users

Oppy: an open source operating system with an enterprise distribution

Acme: a mid sized enterprise.

Squirrel wants to make SLSA as easy for their users as possible, even if that means abstracting some details away. Meanwhile, Oppy doesn’t want to abstract anything away from their users under the philosophy that they should explicitly understand exactly what they’re consuming.

Acme is trying to produce a container image that contains three artifacts:
  1. The Squirrel package ‘foo’
  2. The Oppy package ‘baz’
  3. A custom executable, ‘bar’, written by Acme employees
This series demonstrates one approach to using SLSA that lets Acme verify the Squirrel and Oppy packages ‘foo’ and ‘baz’ and its customers verify the container image. Though not every suggested solution is perfect, the solutions described can be a starting point for discussion and a foundation for new solutions.

Basics

In order to SLSA, Squirrel, Oppy, and Acme will all need SLSA capable build services. Squirrel wants to give their maintainers wide latitude to pick a builder service of their own. To support this, Squirrel will qualify some build services at specific SLSA levels (meaning they can produce artifacts up to that level). To start, Squirrel plans to qualify GitHub Actions using an approach like this, and hopes it can achieve SLSA 4 (pending the result of an independent audit). They’re also willing to qualify other build services as needed. Oppy on the other hand, doesn’t need to support arbitrary build services. They plan to have everyone use their Autobuilder network which they hope to qualify at SLSA 4 (they’ll conduct the audit/certification themselves). Finally, Acme plans to use Google Cloud Build which they’ll self-certify at SLSA 4 (pending the result of a Google-conducted audit).

Squirrel, Oppy, and Acme will follow a similar qualification process for the source control systems they plan to support.

Verification options

Full verification

At some point, one or more of the organizations will need to do full verification of each artifact to determine if it is acceptable for a given use case. This is accomplished by checking if the artifact meets the appropriate policy.

Typically, full verification would take place with SLSA provenance, source attestations, and perhaps other specialized attestations (like vulnerability scan results). While having to coordinate this data for all of its dependencies seems like a lot of work to Acme, they’re prepared to do full verification if Squirrel and Oppy are unable to.

Delegated verification

When Acme isn’t using full verification, they can instead use delegated verification where they check if an artifact is acceptable for a use case by checking if some other trusted party who performed a full verification (such as Squirrel or Oppy) believes the artifact is acceptable.

Delegated verification is easier to perform quickly with limited data and network connectivity. It may also be easier for some users who value if someone they trust verified the artifact is good.

Squirrel likes how easy delegated verification would make things for their users and plans to support it by creating a Verification Summary Attestation (VSA) when they perform full verification.

When to verify

Verification (full or delegated) could happen at a number of different times.

On import to repo

Squirrel plans to perform full verification when an artifact is published to their repo. This will ensure that packages in the repo have met their corresponding policy. It’s also helpful because all the required data can be gathered when latency isn’t critical.

If this were the only time verification is performed, it would put the repository's storage in the trusted computing base (TCB) of its users. Squirrel’s plans to use delegated verification (and issue VSAs) can prevent this. The signature on the VSA will prevent the artifacts from being tampered with while sitting in storage, even if they’re just SLSA 0. Downstream users will just need to verify the VSA.

Acme also wants to do some sort of verification on the import to their internal repo since it simplifies their security story. They’re not quite sure what this will look like yet.

On install

Acme also wants to do verification when an artifact is actually installed since it can remove a number of intermediaries from their TCB (their repo, the network, upstream storage systems).

If they perform full verification at install then they must gather all the required information. That could be a lot of data, but it might be simplified by gathering the data from external sources and caching it in their internal repo. A larger problem is that it requires Acme to have established trust in all parties that produced that information (e.g. every builder of every package). For a complex supply chain that may be difficult.

If Acme performs delegated verification, they only need the VSA for the packages being installed and to explicitly trust a handful of parties. This allows the complex full verification to be performed once while allowing all users of that package to perform a much simpler operation.

Given these tradeoffs Acme prefers delegated verification at install time. Squirrel also really likes the idea and plans to build install time verification directly into the Squirrel tool.

On use

Verification could also take place each time an artifact is actually used. In this model, latency and reliability are very important (a sudden increase in site traffic may necessitate a scaling operation launching many new jobs).

Time of use verification allows the most context with which decisions can be made (“is this job allowed to run this code and is it free from vulns right now?”). It also allows policy changes to affect already built & installed software (which may or may not be desirable).

Acme wants their users to be able to verify on use without too many dependencies so they plan to provide VSAs users can use to perform delegated verification when they start the container (perhaps using something like Kyverno).

How to handle artifacts without provenance?

Inevitably a build or system may require that an artifact without ‘original’ provenance is used. In these cases it may be desirable for the importer to generate provenance that details where it got this artifact. For example, this generated provenance shows that http://example.com/foo.tgz with sha256:abc was imported by ‘auto-importer’:


Such an artifact would likely not be accepted at higher SLSA levels, but the provenance can be used to: 1) prevent tampering with the artifact after it’s been imported and 2) be a data point for future analysis (e.g. should we prioritize asking for foo.tgz to be distributed with native SLSA provenance?).

Acme might be interested in taking this approach at some point, but they don’t need it at the moment.

Next time

In our next post we’ll cover specific approaches that can be used to answer questions like “where should attestations and policies be stored?” and “how do I trust the attestations that I receive?”

Improving software supply chain security with tamper-proof builds


Many of the recent high-profile software attacks that have alarmed open-source users globally were consequences of supply chain integrity vulnerabilities: attackers gained control of a build server to use malicious source files, inject malicious artifacts into a compromised build platform, and bypass trusted builders to upload malicious artifacts.

Each of these attacks could have been prevented if there were a way to detect that the delivered artifacts diverged from the expected origin of the software. But until now, generating verifiable information that described where, when, and how software artifacts were produced (information known as provenance) was difficult. This information allows users to trace artifacts verifiably back to the source and develop risk-based policies around what they consume. Currently, provenance generation is not widely supported, and solutions that do exist may require migrating build processes to services like Tekton Chains.

This blog post describes a new method of generating non-forgeable provenance using GitHub Actions workflows for isolation and Sigstore’s signing tools for authenticity. Using this approach, projects building on GitHub runners can achieve SLSA 3 (the third of four progressive SLSA “levels”), which affirms to consumers that your artifacts are authentic and trustworthy.

Provenance


SLSA ("Supply-chain Levels for Software Artifacts”) is a framework to help improve the integrity of your project throughout its development cycle, allowing consumers to trace the final piece of software you release all the way back to the source. Achieving a high SLSA level helps to improve the trust that your artifacts are what you say they are.

This blog post focuses on build provenance, which gives users important information about the build: who performed the release process? Was the build artifact protected against malicious tampering? Source provenance describes how the source code was protected, which we’ll cover in future blog posts, so stay tuned.

Go prototype to generate non-forgeable build provenance


To create tamperless evidence of the build and allow consumer verification, you need to:
  1. Isolate the provenance generation from the build process;
  2. Isolate against maintainers interfering in the workflow;
  3. Provide a mechanism to identify the builder during provenance verification.

The full isolation described in the first two points allows consumers to trust that the provenance was faithfully recorded; entities that provide this guarantee are called trusted builders.

Our Go prototype solves all three challenges. It also includes running the build inside the trusted builder, which provides a strong guarantee that the build achieves SLSA 3’s ephemeral and isolated requirement.

How does it work?

The following steps create the trusted builder that is necessary to generate provenance in isolation from the build and maintainer’s interference.

Step One: Create a reusable workflow on GitHub runners

Leveraging GitHub’s reusable workflows provides the isolation mechanism from both maintainers’ caller workflows and from the build process. Within the workflow, Github Actions creates fresh instances of virtual machines (VMs), called runners, for each job. These separate VMs give the necessary isolation for a trusted builder, so that different VMs compile the project and generate and sign the SLSA provenance (see diagram below).

Running the workflow on GitHub-hosted runners gives the guarantee that the code run is in fact the intended workflow, which self-hosted runners do not. This prototype relies on GitHub to run the exact code defined in the workflow.

The reusable workflow also protects against possible interference from maintainers, who could otherwise try to define the workflow in a way that interferes with the builder. The only way to interact with a reusable workflow is through the input parameters it exposes to the calling workflow, which stops maintainers from altering information via environment variables, steps, services and defaults.

To protect against the possibility of one job (e.g. the build step) tampering with the other artifacts used by another job (the provenance step), this approach uses a trusted channel to protect the integrity of the data. We use job outputs to send hashes (due to size limitations) and then use the hashes to verify the binary received via the untrusted artifact registry.

Step 2: Use OpenID Connect (OIDC) to prove the identity of the workflow to an external service (Sigstore)

OpenID Connect (OIDC) is a standard used across the web for identity providers (e.g., Google) to attest to the identity of a user for a third party. GitHub now supports OIDC in their workflows. Each time a workflow is run, a runner can mint a unique JWT token from GitHub’s OIDC provider. The token contains verifiable information of the workflow identity, including the caller repository, commit hash, trigger, and the current (reuseable) workflow path and reference.

Using OIDC, the workflow proves its identity to Sigstore's Fulcio root Certificate Authority, which acts as an external verification service. Fulcio signs a short-lived certificate attesting to an ephemeral signing key generated in the runner and tying it to the workload identity. A record of signing the provenance is kept in Sigstore’s transparency log Rekor. Users can use the signing certificate as a trust anchor to verify that the provenance was authenticated and non-forgeable; it must have been created inside the trusted builder.

Verification


The consumer can verify the artifact and its signed provenance with these steps:
  1. Look up the corresponding Rekor log entry and verify the signature;
  2. Verify the trusted builder identity by extracting it from the signing certificate;
  3. Check that the provenance information matches the expected source and build.
See an example in action in the official repository.

Performing these steps guarantees to the consumer that the binary was produced in the trusted builder at a given commit hash attested to in the provenance. They can trust that the information in the provenance was non-forgeable, allowing them to trust the build “recipe” and trace their artifact verifiably back to the source.

Extra Bonus: Keyless signing

One extra benefit of this method is that maintainers don’t need to manage or distribute cryptographic keys for signing, avoiding the notoriously difficult problem of key management. The OIDC protocol requires no hardcoded, long-term secrets be stored in GitHub's secrets, which sidesteps the potential problem of key mismanagement invalidating the SLSA provenance. Consumers simply use OIDC to verify that the binary artifact was built from a trusted builder that produced the expected provenance.

Next Steps

Utilizing the SLSA framework is a proven way for ensuring software supply-chain integrity at scale. This prototype shows that achieving high SLSA levels is easier than ever thanks to the newest features of popular CI/CD systems and open-source tooling. Increased adoption of tamper-safe (SLSA 3+) build services will contribute to a stronger open-source ecosystem and help close one easily exploited gap in the current supply chain.

We encourage testing and adoption and welcome any improvements to the project. Please share feedback, comments and suggestions at slsa-github-generator-go and slsa-verifier project repositories. We will officially release v1 in a few weeks!

In follow-up posts, we will demonstrate adding non-forgeable source provenance attesting to secure repository settings, and showcase the same techniques for other build toolchains and package managers, etc. Stay tuned!