Category Archives: Online Security Blog

The latest news and insights from Google on security and safety on the Internet

Taking on the Next Generation of Phishing Scams

 

Every year, security technologies improve: browsers get better, encryption becomes ubiquitous on the Web, authentication becomes stronger. But phishing persistently remains a threat (as shown by a recent phishing attack on the U.S. Department of Labor) because users retain the ability to log into their online accounts, often with a simple password, from anywhere in the world. It’s why today at I/O we announced new ways we’re reducing the risks of phishing by: scaling phishing protections to Google Docs, Sheets and Slides, continuing to auto enroll people in 2-Step Verification and more. This blog will deep dive into the method of phishing and how it has evolved today.

As phishing adoption has grown, multi-factor authentication has become a particular focus for attackers. In some cases, attackers phish SMS codes directly, by following a legitimate "one-time passcode" (triggered by the attacker trying to log into the victim's account) with a spoofed message asking the victim to "reply back with the code you just received.”


Left: legitimate Google SMS verification. Right: spoofed message asking victim to share verification code.


In other cases, attackers have leveraged more sophisticated dynamic phishing pages to conduct relay attacks. In these attacks, a user thinks they're logging into the intended site, just as in a standard phishing attack. But instead of deploying a simple static phishing page that saves the victim's email and password when the victim tries to login, the phisher has deployed a web service that logs into the actual website at the same time the user is falling for the phishing page.

The simplest approach is an almost off-the-shelf "reverse proxy" which acts as a "person in the middle", forwarding the victim's inputs to the legitimate page and sending the response from the legitimate page back to the victim's browser.



These attacks are especially challenging to prevent because additional authentication challenges shown to the attacker—like a prompt for an SMS code—are also relayed to the victim, and the victim's response is in turn relayed back to the real website. In this way, the attacker can count on their victim to solve any authentication challenge presented.

Traditional multi-factor authentication with PIN codes can only do so much against these attacks, and authentication with smartphone approvals via a prompt — while more secure against SIM-swap attacks — is still vulnerable to this sort of real-time interception.

The Solution Space

Over the past year, we've started to automatically enable device-based two-factor authentication for our users. This authentication not only helps protect against traditional password compromise but, with technology improvements, we can also use it to help defend against these more sophisticated forms of phishing.

Taking a broad view, most efforts to protect and defend against phishing fall into the following categories:
  • Browser UI improvements to help users identify authentic websites.
  • Password managers that can validate the identity of the web page before logging in.
  • Phishing detection, both in email—the most common delivery channel—and in the browser itself, to warn users about suspicious web pages.
  • Preventing the person-in-the-middle attacks mentioned above by preventing automated login attempts.
  • Phishing-resistant authentication using FIDO with security keys or a Bluetooth connection to your phone.
  • Hardening the Google Prompt challenge to help users identify suspicious sign-in attempts, or to ask them to take additional steps that can defeat phishing (like navigating to a new web address, or to join the same wireless network as the computer they're logging into).

Expanding phishing-resistant authentication to more users


Over the last decade we’ve been working hard with a number of industry partners on expanding phishing-resistant authentication mechanisms, as part of FIDO Alliance. Through these efforts we introduced physical FIDO security keys, such as the Titan Security Key, which prevent phishing by verifying the identity of the website you're logging into. (This verification protects against the "person-in-the-middle" phishing described above.) Recently, we announced a major milestone with the FIDO Alliance, Apple and Microsoft by expanding our support for the FIDO Sign-in standards, helping to launch us into a truly passwordless, phishing-resistant future.

Even though security keys work great, we don't expect everyone to add one to their keyring.



Instead, to make this level of security more accessible, we're building it into mobile phones. Unlike physical FIDO security keys that need to be connected to your device via USB, we use Bluetooth to ensure your phone is close to the device you're logging into. Like physical security keys, this helps prevent a distant attacker from tricking you into approving a sign-in on their browser, giving us an added layer of security against the kind of "person in the middle" attacks that can still work against SMS or Google Prompt.

(But don't worry: this doesn't allow computers within Bluetooth range to login as you—it only grants that approval to the computer you're logging into. And we only use this to verify that your phone is near the device you're logging into, so you only need to have Bluetooth on during login.)

Over the next couple of months we’ll be rolling out this technology in more places, which you might notice as a request for you to enable Bluetooth while logging in, so we can perform this additional security check. If you've signed into your Google account on your Android phone, we can enroll your phone automatically—just like with Google Prompt—allowing us to give this added layer of security to many of our users without the need for any additional setup.

But unfortunately this secure login doesn't work everywhere—for example, when logging into a computer that doesn't support Bluetooth, or a browser that doesn't support security keys. That's why, if we are to offer phishing-resistant security to everyone, we have to offer backups when security keys aren't available—and those backups must also be secure enough to prevent attackers from taking advantage of them.


Hardening existing challenges against phishin
g

Over the past few months, we've started experimenting with making our traditional Google Prompt challenges more phishing resistant.

We already use different challenge experiences depending on the situation—for example, sometimes we ask the user to match a PIN code with what they're seeing on the screen in addition to clicking "allow" or "deny". This can help prevent static phishing pages from tricking you into approving a challenge.

We've also begun experimenting with more involved challenges for higher-risk situations, including more prominent warnings when we see you logging in from a computer that we think might belong to a phisher, or asking you to join your phone to the same Wi-Fi network as the computer you're logging into so we can be sure the two are near each other. Similar to our use of Bluetooth for Security Keys, this prevents an attacker from tricking you into logging into a "person-in-the-middle" phishing page.


Bringing it all together

Of course, while all of these options dramatically increase account security, we also know that they can be a challenge for some of our users, which is why we're rolling them out gradually, as part of a risk-based approach that also focuses on usability. If we think an account is at a higher risk, or if we see abnormal behavior, we're more likely to use these additional security measures.

Over time, as FIDO2 authentication becomes more widely available, we expect to be able to make it the default for many of our users, and to rely on stronger versions of our existing challenges like those described above to provide secure fallbacks.

All these new tools in our toolbox—detecting browser automation to prevent "person in the middle" attacks, warning users in Chrome and Gmail, making the Google Prompt more secure, and automatically enabling Android phones as easy-to-use Security Keys—work together to allow us to better protect our users against phishing.

Phishing attacks have long been seen as a persistent threat, but these recent developments give us the ability to really move the needle and help more of our users stay safer online.

The Package Analysis Project: Scalable detection of malicious open source packages

Despite open source software’s essential role in all software built today, it’s far too easy for bad actors to circulate malicious packages that attack the systems and users running that software. Unlike mobile app stores that can scan for and reject malicious contributions, package repositories have limited resources to review the thousands of daily updates and must maintain an open model where anyone can freely contribute. As a result, malicious packages like ua-parser-js, and node-ipc are regularly uploaded to popular repositories despite their best efforts, with sometimes devastating consequences for users.

Google, a member of the Open Source Security Foundation (OpenSSF), is proud to support the OpenSSF’s Package Analysis project, which is a welcome step toward helping secure the open source packages we all depend on. The Package Analysis program performs dynamic analysis of all packages uploaded to popular open source repositories and catalogs the results in a BigQuery table. By detecting malicious activities and alerting consumers to suspicious behavior before they select packages, this program contributes to a more secure software supply chain and greater trust in open source software. The program also gives insight into the types of malicious packages that are most common at any given time, which can guide decisions about how to better protect the ecosystem.

To better understand how the Package Analysis program is contributing to supply chain security, we analyzed the nearly 200 malicious packages it captured over a one-month period. Here’s what we discovered: 

Results

All signals collected are published in our BigQuery table. Using simple queries on this table, we found around 200 meaningful results from the packages uploaded to NPM and PyPI in a period of just over a month. Here are some notable examples, with more available in the repository.

PyPI: discordcmd
This Python package will attack the desktop client for Discord on Windows. It was found by spotting the unusual requests to raw.githubusercontent.com, Discord API, and ipinfo.io.

First, it downloaded a backdoor from GitHub and installed it into the Discord electron client.

Next, it looked through various local databases for the user's Discord token.


Finally, it grabbed the data associated with the token from the Discord API and exfiltrated it back to a Discord server controlled by the attacker.

NPM: @roku-web-core/ajax

During install, this NPM package exfiltrates details of the machine it is running on and then opens a reverse shell, allowing the remote execution of commands.
This package was discovered from its requests to an attacker-controlled address.

Dependency Confusion / Typosquatting

The vast majority of the malicious packages we detected are dependency confusion and typosquatting attacks.


The packages we found usually contain a simple script that runs during an install and calls home with a few details about the host. These packages are most likely the work of security researchers looking for bug bounties, since most are not exfiltrating meaningful data except the name of the machine or a username, and they make no attempt to disguise their behavior.


These dependency confusion attacks were discovered through the domains they used, such as burpcollaborator.net, pipedream.com, interact.sh, which are commonly used for reporting back attacks. The same domains appear across unrelated packages and have no apparent connection to the packages themselves. Many packages also used unusual version numbers that were high (e.g. v5.0.0, v99.10.9) for a package with no previous versions.Conclusions

The short time frame and low sophistication needed for finding the results above underscore the challenge facing open source package repositories. While many of the results above were likely the work of security researchers, any one of these packages could have done far more to hurt the unfortunate victims who installed them.

These results show the clear need for more investment in vetting packages being published in order to keep users safe. This is a growing space, and having an open standard for reporting would help centralize analysis results and offer consumers a trusted place to assess the packages they’re considering using. Creating an open standard should also foster healthy competition, promote integration, and raise the overall security of open source packages.
 
Over time we hope that the Package Analysis program will offer comprehensive knowledge about the behavior and capabilities of packages across open source software, and help guide the future efforts needed to make the ecosystem more secure for everyone. To get involved, please check out the GitHub Project and Milestones for opportunities to contribute.

How we fought bad apps and developers in 2021

Providing a safe experience to billions of users continues to be one of the highest priorities for Google Play. Last year we introduced multiple privacy focused features, enhanced our protections against bad apps and developers, and improved SDK data safety. In addition, Google Play Protect continues to scan billions of installed apps each day across billions of devices to keep people safe from malware and unwanted software.

We continue to enhance our machine learning systems and review processes, and in 2021 we blocked 1.2 million policy violating apps from being published on Google Play, preventing billions of harmful installations. We also continued in our efforts to combat malicious and spammy developers, banning 190k bad accounts in 2021. In addition, we have closed around 500k developer accounts that are inactive or abandoned.

In May we announced our new Data safety section for Google Play where developers will be required to give users deeper insight into the privacy and security practices of the apps they download, and provide transparency into the data the app may collect and why. The Data safety section launched this week, and developers are required to complete this section for their apps by July 20th.

We’ve also invested in making life easier for our developers. We added the Policy and Programs section to Google Play Console to help developers manage all their app compliance issues in one central location. This includes the ability to appeal a decision and track its status from this page.

In addition, we continued to partner with SDK developers to improve app safety, limit how user data is shared, and improve lines of communication with app developers. SDKs provide functionality for app developers, but it can sometimes be tricky to know when an SDK is safe to use. Last year, we engaged with SDK developers to build a safer Android and Google Play ecosystem. As a result of this work, SDK developers have improved the safety of SDKs used by hundreds of thousands of apps impacting billions of users. This remains a huge investment area for our team, and we will continue in our efforts to make SDKs safer across the ecosystem.

Limiting access

The best way to ensure users' data stays safe is to limit access to it in the first place.

As a result of new platform protections and policies, developer collaboration and education, 98% of apps migrating to Android 11 or higher have reduced their access to sensitive APIs and user data. We've also significantly reduced the unnecessary, dangerous, or disallowed use of Accessibility APIs in apps migrating to Android 12, while preserving the functionality of legitimate use cases.

We also continued in our commitment to make Android a great place for families. Last year we disallowed the collection of Advertising ID (AAID) and other device identifiers from all users in apps solely targeting children, and gave all users the ability to delete their Advertising ID entirely, regardless of the app.

Pixel enhancements

For Pixel users, we had even more great features to help keep you safe. Our new Security hub helps protect your phone, apps, Google Account, and passwords by giving you a central view of your device’s current configuration. Security hub also provides recommendations to improve your security, helping you decide what settings best meet your needs.

In addition, Pixels now use new machine learning models that improve the detection of malware in Google Play Protect. The detection runs on your Pixel, and uses a privacy preserving technology called federated analytics to discover bad apps.

Our global teams are dedicated to keeping our billions of users safe, and look forward to many exciting announcements in 2022.

How to SLSA Part 3 – Putting it all together


In our last two posts (1,2) we introduced a fictional example of Squirrel, Oppy, and Acme learning to SLSA and covered the basics and details of how they’d use SLSA for their organizations. Today we’ll close out the series by exploring how each organization pulls together the various solutions into a heterogeneous supply chain.

As a reminder, Acme is trying to produce a container image that contains three artifacts:
  1. The Squirrel package ‘foo’
  2. The Oppy package ‘baz’
  3. A custom executable, ‘bar’, written by Acme employees.
The process starts with ‘foo’ package authors triggering a build using GitHub Actions. This results in a new version of ‘foo’ (an artifact with hash ‘abc’) being pushed to the Squirrel repo along with its SLSA provenance (signed by Fulcio) and source attestation. When Squirrel gets this push request it verifies the artifact against the specific policy for ‘foo’ which checks that it was built by GitHub Actions from the expected source repository. After the artifact passes the policy check a VSA is created and the new package, its original SLSA provenance, and the VSA are made public in the Squirrel repo, available to all users of package ‘foo’.

Next the maintainers of the Oppy ‘baz’ package trigger a new build using the Oppy Autobuilder. This results in a new version of ‘baz’ (an artifact with hash ‘def’) being pushed to a public Oppy repo with the SLSA provenance (signed by their org-specific keys) published to Rekor. When the repo gets the push request it makes the artifact available to the public. The repo does not perform any verification at this time.

An Acme employee then makes a change to their Dockerfile, sending it for review by their co-worker, who approves the change and merges the PR. This then causes the Acme builder to trigger a build. During this build:
  • bar is compiled from source code stored in the same source repo as the Dockerfile.
  • acorn install downloads ‘foo’ from the Squirrel repo, verifying the VSA, and recording the use of acorn://foo@abc and its VSA in the build.
  • acme_oppy_get install (a custom script made by Acme) downloads the latest version of the Oppy ‘baz’ package and queries its SLSA provenance and other attestations from Rekor. It then performs a full verification checking that it was built by ‘https://oppy.example/slsa/builder/v1’ and the publicized key. Once verification is complete it records the use of oppy://baz@def and the associated attestations in the build.
  • The build process assembles the SLSA provenance for the container by:
    • Recording the Acme git repo the bar source and Dockerfile came from, into materials.
    • Copying the reported dependencies of acorn://foo@abc and oppy://baz@def into materials and adding their attestations to the output in-toto bundle.
    • Recording the CI/CD entrypoint as the invocation.
    • Creating a signed DSSE with the SLSA provenance and adding it to the output in-toto bundle.

Once the container is ready for release the Acme verifier checks the SLSA provenance (and other data in the in-toto bundle) using the policy from their own policy repo and issues a VSA. The VSA and all associated attestations are then published to an internal Rekor instance. Acme can then create an SBOM for the container leveraging data about the build as stored in Rekor. Acme then publishes the container image, the VSA, and the SBOM on Dockerhub.

Downstream users of this Acme container can then check the Acme issued VSA, and if there are any problems Acme can consult their internal Rekor instance to get more details on the build allowing Acme to trace all of their dependencies back to source code and the systems used to create them.
Conclusion

With SLSA implemented in the ways described in this series, downstream users are protected from many of the threats affecting the software supply chain today. While users still need to trust certain parties, the number of systems requiring trust is much lower and users are in a much better position to investigate any issues that arise.

We’d love to see the ideas in this series implemented, refuted, or used as a foundation to build even stronger solutions. We’d also love to hear some other methods on how to solve these issues. Show us how you like to SLSA. 

How to SLSA Part 2 – The Details


In our last post we introduced a fictional example of Squirrel, Oppy, and Acme learning to use SLSA and covered the basics of what their implementations might look like. Today we’ll cover the details: where to store attestations and policies, what policies should check, and how to handle key distribution and trust.

Attestation storage

Attestations play a large role in SLSA and it’s essential that consumers of artifacts know where to find the attestations for those artifacts.

Co-located in repo

Attestations could be colocated in the repository that hosts the artifact. This is how Squirrel plans to store attestations for packages. They even want to add support to the Squirrel CLI (e.g. acorn get-attestations [email protected]).

Acme really likes this approach because the attestations are always available and it doesn’t introduce any new dependencies.

Rekor

Meanwhile, Oppy plans to store attestations in Rekor. They like being able to direct users to an existing public instance while not having to maintain any new infrastructure themselves, and the in-depth defense the transparency log provides against tampering with the attestations.

Though the latency of querying attestations from Rekor is likely too high for doing verification at time of use, Oppy isn’t too concerned since they expect users to query Rekor at install time.

Hybrid

A hybrid model is also available where the publisher stores the attestations in Rekor as well as co-located with the artifact in the repo—along with Rekor’s inclusion proof. This provides confidence the data was added to Rekor while providing the benefits of co-locating attestations in the repository.

Policy content

‘Policy’ refers to the rules used to determine if an artifact should be allowed for a use case.

Policies often use the package name as a proxy for determining the use case. An example being, if you want to find the policy to apply you could look up the policy using the package name of the artifact you’re evaluating.

Policy specifics may vary based on ease of use, availability of data, risk tolerance and more. Full verification needs more from policies than delegated verification does.

Default policy

Default policies allow admission decisions without the need to create specific policies for each package. A default policy is a way of saying “anything that doesn’t have a more specific policy must comply with this policy”.

Squirrel plans to eventually implement a default policy of “any package without a more specific policy will be accepted as long as it meets SLSA 3”, but they recognize that most packages don’t support this yet. Until they achieve critical mass they’ll have a default SLSA 0 policy (all artifacts are accepted).

While Oppy is leaving verification to their users, they’ll suggest a default policy of “any package built by ‘https://oppy.example/slsa/builder/v1’”.

Specific policy

Squirrel also plans to allow users to create policies for specific packages. For example, this policy requires that package ‘foo’ must have been built by GitHub Actions, from github.com/foo/acorn-foo, and be SLSA 4.

scope: 'acorn://foo'

target_level: SLSA_L4

allow_github_actions {

  workflow: 'https://github.com/gossts/slsa-acorn/.github/workflows/builder.yml@main'

  source_repo: 'https://github.com/foo/acorn-foo.git'

  allow_branch: 'main'

}


Squirrel will also allow packages to create SLSA 0 policies if they’re not using SLSA compliant infrastructure.


scope: 'acorn://qux'

target_level: SLSA_L0



Policy auto generation

Squirrel has an enormous number of existing packages. It’s not feasible to get all those package maintainers to create specific policies themselves. Therefore, Squirrel plans to leverage process mining to auto generate policies for packages based on the history of the package. E.g. “The last 10 times Squirrel package foo was published it was built by GitHub Actions from github.com/foo/acorn-foo, and met SLSA 4 (this is the policy above). Let’s create a policy that requires that and send it to the maintainers to review.”
Policy add-ons

Policy evaluation could do more than just evaluate the SLSA requirements. The same policies that check SLSA requirements are well placed to check other properties that are important to organizations like “was static analysis performed”, “are there any known CVEs in this artifact”, “was integration testing successful”, etc…

Acme is really interested in some of these policy add-ons. They’d like to avoid the embarrassing situation of publishing a new container image with known CVEs. They’re not sure how to implement it yet but they’ll be on the lookout for tools that can help them do so.

Delegated policies

When using delegated verification there’s much less that actually needs to be checked and they can be hard-coded directly in tooling. A minimal delegated verification policy might be “allow if trusted-party verified this artifact (identified by digest) as <package name>”. This can be tightened further by adding requirements on the artifact & its dependencies SLSA levels (data which is available in the VSA). For example, “allow if trusted-party verified this artifact as <package name> at SLSA 3 and it doesn’t have any dependencies less than SLSA 2”.

# Delegated verification implicitly checks that the package name we're

# checking matches the VSA's subject.name field.

allow_delegated_verification {

  trusted_verifier: 'https://delegatedverifier.com/slsa/v1'

  minimum_level: SLSA_L3

  minimum_dependency_level: SLSA_L2

}


Policy storage


When using specific, non-default, policies verifiers need to know where to find the policy they need to evaluate.

Co-located in repo

Squirrel plans to store specific policies as a property of the package in the repository. This makes them very easy for users and their tooling to find. It also allows the maintainer of the package to easily set the policy (they already have write permissions!).

A potential downside is that the write permissions are the same as for the package itself. An attacker that compromises the developer’s credentials could also change the policy. This may not be as bad as it seems. Policies are human-readable so anyone paying attention would notice that package foo’s policy now says that it can be built from github.com/not-foo/acorn-foo. Squirrel plans to notify interested parties (including the maintainer!) when the policy changes, potentially letting them “sound the alarm” if anything nefarious happens.

A similar approach is taken in a number of contact-change workflows. For example, when you change your address with your bank, the bank will send you an email (and a letter to the old address) letting you know the address has been changed. This type of notification would alert the maintainer to a potential compromise.

Squirrel would also consider requiring a second person to review any policy changes for packages with over 10,000 users.

Public canonical Git repo

Another option might be to just create a canonical git repo (e.g. github.com/slsa-framework/slsa-acorn-policies) and let people publish proposed policies there. This has the advantage of using a separate ACL control mechanism from the package repository itself, but the disadvantages of being difficult to ensure the author of the policy is actually allowed to set the policy for that package and not scaling well as the repo grows.

The approach outlined in policy auto generation could help here. Automation in the repo could just look at the last N releases of the package and determine if the proposed policy matches what’s actually been published. Proactive changes to the policy (like deciding to switch from GitHub Actions to CircleCI) would be harder to coordinate however.

Org specific repo

Acme plans to establish their own org specific repo for policy storage. This gives them a single place to store all their policies, regardless of ecosystem type, and lets them provide more specific policies for packages provided by upstream repos. Since Oppy doesn’t have any plans to provide package-specific policies this gives Acme a place to store their own policies for Oppy packages (if they ever get around to it).

Organizations can also use their policy repo to vet any upstream changes to policy and potentially add additional checks (e.g. “doesn’t have any known vulnerabilities”).


Trusted Verifier


Acme wants to use delegated verification and that relies on having trusted verifiers to make decisions for downstream users. Who are these trusted verifiers?

Public verifier

A public repo is in a great position to act as a trusted verifier for their users. Users already trust these repos and they may already be doing verification on import.

Squirrel plans to make use of this by making VSAs available for each artifact published, publicizing their verifier ID (i.e. ‘https://squirrel.example/slsa-verifier’) and the public key used to sign the VSAs. They even plan to build VSA verification directly into the Squirrel tooling, so that users can get SLSA protection by default.

Org-wide verifier

While Acme is happy to use Squirrel’s verifier (and the verification built into the tooling) they still need their own verifier so they can publish VSAs to Acme customers. So Acme plans to stand up their own verification service and publish their verifier ID (i.e. ‘https://acme.example/private-verifier’) and signing key. Acme customers can then verify the software they get from Acme.

In the future Acme could require all software used throughout the company to be verified with this verifier (instead of relying on public verifiers). They’d do the verification and generate VSAs whenever artifacts are imported into their private Artifactory instance. They could then configure this ID/key pair for use throughout Acme and be confident that any software used has been verified according to Acme policy. That’s not Acme’s highest priority at the moment, but they like having this option open to them.

Key distribution & Trust

Both full and delegated verification depend upon key distribution to the users doing the verification. Depending on the specifics and what’s getting verified this can be a difficult problem.

Org-specific keys

When using delegated verification this could be the easiest case. Squirrel can just build the key they used for delegated verification directly into the Squirrel tooling. Acme can also fairly easily configure the use of their keys through the company using existing configuration control mechanisms.

When using full verification this can be harder. If there are multiple builders that could be accepted the keys that sign the attestations need to be distributed to everyone that might use that builder. For Squirrel this would be really difficult since they plan to allow package maintainers to use whatever builder they want. How those keys get configured would be tricky just for Squirrel, and much more difficult if downstream Squirrel users wanted to do full verification of the Squirrel packages.

The situation is easier, however, for Oppy. That’s because Oppy plans to only accept artifacts built by their autobuilder network. Oppy can configure this network to use a single (or small set) of keys and then publish those keys (and the SLSA level Oppy believes it meets) for downstream users.

Fulcio

Squirrel plans to solve the problem of which keys they accept by leveraging Fulcio. Squirrel will build support for Fulcio root keys into their verifier and then express which Fulcio subject is allowed to sign attestations in the specific policy of each package. E.g. “Squirrel package ‘foo’ must have been built & signed by ‘spiffe://foobar.com/foo-builder, from github.com/foo/acorn-foo, and be SLSA 4”.

scope: 'acorn://foo'

target_level: SLSA_L4

allow_fulcio_builder {

  id: 'spiffe://foobar.com/foo-builder'

  source_repo: 'https://github.com/foo/acorn-foo.git'

  allow_branch: 'main'

  allow_entrypoint: 'package.json'

}



The Update Framework (TUF)

The above methods could be further enhanced with TUF to allow the secure maintenance of keys. TUF metadata could include all the SLSA keys, the build services and other entities they’re valid for, and the SLSA levels they’re qualified at. Oppy is considering using TUF to let verifiers securely fetch and update keys used by the Autobuilder network. Oppy would use a TUF delegation to indicate that these keys should only be used for the builder id ‘https://oppy.example/slsa/builder/v1’. Squirrel might do something similar to allow for updating the Fulcio key in its tooling.

Recording & verifying dependencies

Acme wants to record and verify the dependencies that go into its container into the SLSA provenance. Acme would prefer that this functionality were just built-in their build service, but that feature isn’t available yet. Instead they’ll need to do something themselves. They have a few options at their disposal:

Tool wrappers

Since Oppy doesn’t build SLSA into it’s tooling Acme will create wrapper scripts for dependency import/installation that record and verify (using cosign) dependencies as they’re installed. Acme will update their build scripts to replace all instances of Oppy package installation with the wrapper script and then use the recorded results to help populate the materials section of the provenance.

A downside is that this approach, if run in the build itself, is not guaranteed to be complete and cannot meet the “non-falsifiable” requirement (since the results reported by the wrapper could be falsified by the build process), relegating this approach to SLSA 2. Still, it allows Acme to make progress SLSA-fying their builds and provides a starting point for achieving higher SLSA levels.

Built into ecosystem tooling

Since Squirrel does build verification into their tooling, Acme can just use acorn install to verify the dependencies and record what was installed. Acme can use this information to populate the Squirrel packages installed in the materials section of the provenance and it can include the attestations of those dependencies in the in-toto bundle for their container image.

As with tool wrappers, if this method is used in the build itself it cannot meet “non-falsifiable” requirement.

Proxied verification

Acme considered creating a proxy for their existing builder to proxy outbound connections. This proxy could verify everything fetched and use its logs to populate the provenance. Since this proxy is trusted it would be easier to meet “non-falsifiable” requirement. Unfortunately it’s also a lot of work for Acme so they’re going to defer this idea for now.

Next time

In the first two parts of this series, we’ve covered the basics of getting started with SLSA and the details of policy and provenance storage, policy verification, and key handling. In our next post we’ll cover how Squirrel, Oppy, and Acme put this all together to protect a heterogeneous supply chain.

How to SLSA Part 1 – The Basics


One of the great benefits of SLSA (Supply-chain Levels for Software Artifacts) is its flexibility. As an open source framework designed to improve the integrity of software packages and infrastructure, it is as applicable to small open source projects as to enterprise organizations. But with this flexibility can come a bewildering array of options for beginners—much like salsa dancing, someone just starting out might be left on the dance floor wondering how and where to jump in.

Though it’s tempting to try to establish a single standard for how to use SLSA, it’s not possible: SLSA is not a line dance where everyone does the same moves, at the same time, to the same song. It’s a varied system with different styles, moves, and flourishes. The open source community, organizations, and consumers may all implement SLSA differently, but they can still work with each other.


In this three-part series, we’ll explore how three fictional organizations would apply SLSA to meet their different needs. In doing so, we will answer some of the main questions that newcomers to SLSA have:


Part 1: The basics
  • How and when do you verify a package with SLSA?
  • How to handle artifacts without provenance?
Part 2: The details

  • Where is the provenance stored?
  • Where is the appropriate policy stored and who should verify it?
  • What should the policies check?
  • How do you establish trust & distribute keys?
Part 3: Putting it all together
  • What does a secure, heterogeneous supply chain look like?

The Situation

Our fictional example involves three organizations that want to use SLSA:

Squirrel: a package manager with a large number of developers and users

Oppy: an open source operating system with an enterprise distribution

Acme: a mid sized enterprise.

Squirrel wants to make SLSA as easy for their users as possible, even if that means abstracting some details away. Meanwhile, Oppy doesn’t want to abstract anything away from their users under the philosophy that they should explicitly understand exactly what they’re consuming.

Acme is trying to produce a container image that contains three artifacts:
  1. The Squirrel package ‘foo’
  2. The Oppy package ‘baz’
  3. A custom executable, ‘bar’, written by Acme employees
This series demonstrates one approach to using SLSA that lets Acme verify the Squirrel and Oppy packages ‘foo’ and ‘baz’ and its customers verify the container image. Though not every suggested solution is perfect, the solutions described can be a starting point for discussion and a foundation for new solutions.

Basics

In order to SLSA, Squirrel, Oppy, and Acme will all need SLSA capable build services. Squirrel wants to give their maintainers wide latitude to pick a builder service of their own. To support this, Squirrel will qualify some build services at specific SLSA levels (meaning they can produce artifacts up to that level). To start, Squirrel plans to qualify GitHub Actions using an approach like this, and hopes it can achieve SLSA 4 (pending the result of an independent audit). They’re also willing to qualify other build services as needed. Oppy on the other hand, doesn’t need to support arbitrary build services. They plan to have everyone use their Autobuilder network which they hope to qualify at SLSA 4 (they’ll conduct the audit/certification themselves). Finally, Acme plans to use Google Cloud Build which they’ll self-certify at SLSA 4 (pending the result of a Google-conducted audit).

Squirrel, Oppy, and Acme will follow a similar qualification process for the source control systems they plan to support.

Verification options

Full verification

At some point, one or more of the organizations will need to do full verification of each artifact to determine if it is acceptable for a given use case. This is accomplished by checking if the artifact meets the appropriate policy.

Typically, full verification would take place with SLSA provenance, source attestations, and perhaps other specialized attestations (like vulnerability scan results). While having to coordinate this data for all of its dependencies seems like a lot of work to Acme, they’re prepared to do full verification if Squirrel and Oppy are unable to.

Delegated verification

When Acme isn’t using full verification, they can instead use delegated verification where they check if an artifact is acceptable for a use case by checking if some other trusted party who performed a full verification (such as Squirrel or Oppy) believes the artifact is acceptable.

Delegated verification is easier to perform quickly with limited data and network connectivity. It may also be easier for some users who value if someone they trust verified the artifact is good.

Squirrel likes how easy delegated verification would make things for their users and plans to support it by creating a Verification Summary Attestation (VSA) when they perform full verification.

When to verify

Verification (full or delegated) could happen at a number of different times.

On import to repo

Squirrel plans to perform full verification when an artifact is published to their repo. This will ensure that packages in the repo have met their corresponding policy. It’s also helpful because all the required data can be gathered when latency isn’t critical.

If this were the only time verification is performed, it would put the repository's storage in the trusted computing base (TCB) of its users. Squirrel’s plans to use delegated verification (and issue VSAs) can prevent this. The signature on the VSA will prevent the artifacts from being tampered with while sitting in storage, even if they’re just SLSA 0. Downstream users will just need to verify the VSA.

Acme also wants to do some sort of verification on the import to their internal repo since it simplifies their security story. They’re not quite sure what this will look like yet.

On install

Acme also wants to do verification when an artifact is actually installed since it can remove a number of intermediaries from their TCB (their repo, the network, upstream storage systems).

If they perform full verification at install then they must gather all the required information. That could be a lot of data, but it might be simplified by gathering the data from external sources and caching it in their internal repo. A larger problem is that it requires Acme to have established trust in all parties that produced that information (e.g. every builder of every package). For a complex supply chain that may be difficult.

If Acme performs delegated verification, they only need the VSA for the packages being installed and to explicitly trust a handful of parties. This allows the complex full verification to be performed once while allowing all users of that package to perform a much simpler operation.

Given these tradeoffs Acme prefers delegated verification at install time. Squirrel also really likes the idea and plans to build install time verification directly into the Squirrel tool.

On use

Verification could also take place each time an artifact is actually used. In this model, latency and reliability are very important (a sudden increase in site traffic may necessitate a scaling operation launching many new jobs).

Time of use verification allows the most context with which decisions can be made (“is this job allowed to run this code and is it free from vulns right now?”). It also allows policy changes to affect already built & installed software (which may or may not be desirable).

Acme wants their users to be able to verify on use without too many dependencies so they plan to provide VSAs users can use to perform delegated verification when they start the container (perhaps using something like Kyverno).

How to handle artifacts without provenance?

Inevitably a build or system may require that an artifact without ‘original’ provenance is used. In these cases it may be desirable for the importer to generate provenance that details where it got this artifact. For example, this generated provenance shows that http://example.com/foo.tgz with sha256:abc was imported by ‘auto-importer’:


Such an artifact would likely not be accepted at higher SLSA levels, but the provenance can be used to: 1) prevent tampering with the artifact after it’s been imported and 2) be a data point for future analysis (e.g. should we prioritize asking for foo.tgz to be distributed with native SLSA provenance?).

Acme might be interested in taking this approach at some point, but they don’t need it at the moment.

Next time

In our next post we’ll cover specific approaches that can be used to answer questions like “where should attestations and policies be stored?” and “how do I trust the attestations that I receive?”

Improving software supply chain security with tamper-proof builds


Many of the recent high-profile software attacks that have alarmed open-source users globally were consequences of supply chain integrity vulnerabilities: attackers gained control of a build server to use malicious source files, inject malicious artifacts into a compromised build platform, and bypass trusted builders to upload malicious artifacts.

Each of these attacks could have been prevented if there were a way to detect that the delivered artifacts diverged from the expected origin of the software. But until now, generating verifiable information that described where, when, and how software artifacts were produced (information known as provenance) was difficult. This information allows users to trace artifacts verifiably back to the source and develop risk-based policies around what they consume. Currently, provenance generation is not widely supported, and solutions that do exist may require migrating build processes to services like Tekton Chains.

This blog post describes a new method of generating non-forgeable provenance using GitHub Actions workflows for isolation and Sigstore’s signing tools for authenticity. Using this approach, projects building on GitHub runners can achieve SLSA 3 (the third of four progressive SLSA “levels”), which affirms to consumers that your artifacts are authentic and trustworthy.

Provenance


SLSA ("Supply-chain Levels for Software Artifacts”) is a framework to help improve the integrity of your project throughout its development cycle, allowing consumers to trace the final piece of software you release all the way back to the source. Achieving a high SLSA level helps to improve the trust that your artifacts are what you say they are.

This blog post focuses on build provenance, which gives users important information about the build: who performed the release process? Was the build artifact protected against malicious tampering? Source provenance describes how the source code was protected, which we’ll cover in future blog posts, so stay tuned.

Go prototype to generate non-forgeable build provenance


To create tamperless evidence of the build and allow consumer verification, you need to:
  1. Isolate the provenance generation from the build process;
  2. Isolate against maintainers interfering in the workflow;
  3. Provide a mechanism to identify the builder during provenance verification.

The full isolation described in the first two points allows consumers to trust that the provenance was faithfully recorded; entities that provide this guarantee are called trusted builders.

Our Go prototype solves all three challenges. It also includes running the build inside the trusted builder, which provides a strong guarantee that the build achieves SLSA 3’s ephemeral and isolated requirement.

How does it work?

The following steps create the trusted builder that is necessary to generate provenance in isolation from the build and maintainer’s interference.

Step One: Create a reusable workflow on GitHub runners

Leveraging GitHub’s reusable workflows provides the isolation mechanism from both maintainers’ caller workflows and from the build process. Within the workflow, Github Actions creates fresh instances of virtual machines (VMs), called runners, for each job. These separate VMs give the necessary isolation for a trusted builder, so that different VMs compile the project and generate and sign the SLSA provenance (see diagram below).

Running the workflow on GitHub-hosted runners gives the guarantee that the code run is in fact the intended workflow, which self-hosted runners do not. This prototype relies on GitHub to run the exact code defined in the workflow.

The reusable workflow also protects against possible interference from maintainers, who could otherwise try to define the workflow in a way that interferes with the builder. The only way to interact with a reusable workflow is through the input parameters it exposes to the calling workflow, which stops maintainers from altering information via environment variables, steps, services and defaults.

To protect against the possibility of one job (e.g. the build step) tampering with the other artifacts used by another job (the provenance step), this approach uses a trusted channel to protect the integrity of the data. We use job outputs to send hashes (due to size limitations) and then use the hashes to verify the binary received via the untrusted artifact registry.

Step 2: Use OpenID Connect (OIDC) to prove the identity of the workflow to an external service (Sigstore)

OpenID Connect (OIDC) is a standard used across the web for identity providers (e.g., Google) to attest to the identity of a user for a third party. GitHub now supports OIDC in their workflows. Each time a workflow is run, a runner can mint a unique JWT token from GitHub’s OIDC provider. The token contains verifiable information of the workflow identity, including the caller repository, commit hash, trigger, and the current (reuseable) workflow path and reference.

Using OIDC, the workflow proves its identity to Sigstore's Fulcio root Certificate Authority, which acts as an external verification service. Fulcio signs a short-lived certificate attesting to an ephemeral signing key generated in the runner and tying it to the workload identity. A record of signing the provenance is kept in Sigstore’s transparency log Rekor. Users can use the signing certificate as a trust anchor to verify that the provenance was authenticated and non-forgeable; it must have been created inside the trusted builder.

Verification


The consumer can verify the artifact and its signed provenance with these steps:
  1. Look up the corresponding Rekor log entry and verify the signature;
  2. Verify the trusted builder identity by extracting it from the signing certificate;
  3. Check that the provenance information matches the expected source and build.
See an example in action in the official repository.

Performing these steps guarantees to the consumer that the binary was produced in the trusted builder at a given commit hash attested to in the provenance. They can trust that the information in the provenance was non-forgeable, allowing them to trust the build “recipe” and trace their artifact verifiably back to the source.

Extra Bonus: Keyless signing

One extra benefit of this method is that maintainers don’t need to manage or distribute cryptographic keys for signing, avoiding the notoriously difficult problem of key management. The OIDC protocol requires no hardcoded, long-term secrets be stored in GitHub's secrets, which sidesteps the potential problem of key mismanagement invalidating the SLSA provenance. Consumers simply use OIDC to verify that the binary artifact was built from a trusted builder that produced the expected provenance.

Next Steps

Utilizing the SLSA framework is a proven way for ensuring software supply-chain integrity at scale. This prototype shows that achieving high SLSA levels is easier than ever thanks to the newest features of popular CI/CD systems and open-source tooling. Increased adoption of tamper-safe (SLSA 3+) build services will contribute to a stronger open-source ecosystem and help close one easily exploited gap in the current supply chain.

We encourage testing and adoption and welcome any improvements to the project. Please share feedback, comments and suggestions at slsa-github-generator-go and slsa-verifier project repositories. We will officially release v1 in a few weeks!

In follow-up posts, we will demonstrate adding non-forgeable source provenance attesting to secure repository settings, and showcase the same techniques for other build toolchains and package managers, etc. Stay tuned!

Find and $eek! Increased rewards for Google Nest & Fitbit devices

At Google, we constantly invest in security research to raise the bar for our devices, keeping our users safe and building their trust in our products. In 2021, we published Google Nest security commitments, in which we committed to engage with the research community to examine our products and services and report vulnerabilities.

We are now looking to deepen this relationship and accelerate the path toward building more secure devices. Starting today, we will introduce a new vulnerability rewards structure for submissions impacting smart home (Google Nest) and wearables (Fitbit) devices through our Bug Hunters platform.

Bonus!

We are paying higher rewards retroactively for eligible Google Nest and Fitbit devices reports submitted in 2021. And, starting today, for the next six months, will double the reward amount for all new eligible reports applicable to Google Nest & Fitbit devices in scope.

We will continue to take reports on our web applications, services, and mobile apps at their existing reward levels. Please keep those coming!

An enhanced rewards program

Building on our previous programs to improve devices' embedded security posture, we’re bringing all our first-party devices under a single program, starting with Google Nest, Fitbit, and Pixel.
This program extends the Android Security Reward Program, making it easier for researchers to submit a vulnerability in first-party devices and improving consistency across our severity assignments. Refer to the Android and Google Devices Security Reward Program for more details.

What interests us?

We encourage researchers to report firmware, system software, and hardware vulnerabilities. Our wide diversity of platforms provides researchers with a smorgasbord of environments to explore.

What's next?

We will be at the Hardwear.io conference this year! The VRP team is looking forward to meeting our security peers in person. We’ll be talking about the architecture of a couple of our devices, hoping to give security researchers a head start in finding vulnerabilities. We’ll have plenty of swag, too!

We will continue to enhance the researchers' experience and participation. We intend to add training documentations and target areas that interest us as we grow the program.

A huge thanks to Sarah Jacobus, Adam Bacchus, Ankur Chakraborty, Eduardo' Vela" <Nava>, Jay Cox, and Nic Watson.

What’s up with in-the-wild exploits? Plus, what we’re doing about it.

If you are a regular reader of our Chrome release blog, you may have noticed that phrases like 'exploit for CVE-1234-567 exists in the wild' have been appearing more often recently. In this post we'll explore why there seems to be such an increase in exploits, and clarify some misconceptions in the process. We'll then share how Chrome is continuing to make it harder for attackers to achieve their goals.

How things work today

While the increase may initially seem concerning, it’s important to understand the reason behind this trend. If it's because there are many more exploits in the wild, it could point to a worrying trend. On the other hand, if we’re simply gaining more visibility into exploitation by attackers, it's actually a good thing! It’s good because it means we can respond by providing bug fixes to our users faster, and we can learn more about how real attackers operate.

So, which is it? It’s likely a little of both.

Our colleagues at Project Zero publicly track all known in-the-wild “zero day” bugs. Here’s what they’ve reported for browsers:

First, we don’t believe there was no exploitation of Chromium based browsers between 2015 and 2018. We recognize that we don’t have full view into active exploitation, and just because we didn’t detect any zero-days during those years, doesn’t mean exploitation didn’t happen. Available exploitation data suffers from sampling bias.

Teams like Google’s Threat Analysis Group are also becoming increasingly sophisticated in their efforts to protect users by discovering zero-days and in-the-wild attacks. A good example is a bug in our Portals feature that we fixed last fall. This bug was discovered by a team member in Switzerland and reported to Chrome through our bug tracker. While Chrome normally keeps each web page locked away in a box called the “renderer sandbox,” this bug allowed the code to break out, potentially allowing attackers to steal information. Working across multiple time zones and teams, it took the team three days to come up with a fix and roll it out, as detailed in our video on the process:


Why so many exploits?

There are a number of factors at play, from changes in vendor and attacker behavior, to changes in the software itself. Here are four in particular that we've been discussing and exploring as a team.

First, we believe we’re seeing more bugs thanks to vendor transparency. Historically, many browser makers didn’t announce that a bug was being exploited in the wild, even if they knew it was happening. Today, most major browser makers have increased transparency via publishing details in release communications, and that may account for more publicly tracked “in the wild” exploitation. These efforts have been spearheaded by both browser security teams and dedicated research groups, such as Project Zero.

Second, we believe we’re seeing more exploits due to evolved attacker focus. There are two reasons to suspect attackers might be choosing to attack Chrome more than they did in the past.

  • Flash deprecation: In 2015 and 2016, Flash was a primary exploitation target. Chrome gradually made Flash a less attractive target for attackers (for instance requiring user clicks to activate Flash content) before finally removing it in Chrome 88 in January last year. As Flash is no longer available, attackers have had to switch to a harder target: the browser itself.
  • Chromium popularity: Attackers go for the most popular target. In early 2020, Edge switched to using the Chromium rendering engine. If attackers can find a bug in Chromium, they can now attack a greater percentage of users.

Third, some attacks that could previously be accomplished with a single bug now require multiple bugs. Before 2015, only a single in-the-wild bug was required to steal a user’s secrets from other websites, because multiple web pages lived together in a single renderer process. If an attacker could compromise the renderer process belonging to a malicious website that a user visited, they might have been able to access the credentials for some other more sensitive website.

With Chrome’s multiyear Site Isolation project largely complete, a single bug is almost never sufficient to do anything really bad. Attackers often need to chain at least two bugs: first, to compromise the renderer process, and second, to jump into the privileged Chrome browser process or directly into the device operating system. Sometimes multiple bugs are needed to achieve one or both of these steps.

So, to achieve the same result, an attacker generally now has to use more bugs than they previously did. For exactly the same level of attacker success, we’d see more in-the-wild bugs reported over time, as we add more layers of defense that the attacker needs to bypass.

Fourth, there’s simply the fact that software has bugs. Some fraction of those bugs are exploitable. Browsers increasingly mirror the complexity of operating systems — providing access to your peripherals, filesystem, 3D rendering, GPUs — and more complexity means more bugs.

Ultimately, we believe data is an important part of the story, but the absolute number of exploited bugs isn't a sufficient measure of security risk. Since some security bugs are inevitable, how a software vendor architects their software (so that the impact of any single bug is limited) and responds to critical security bugs is often much more important than the specifics of any single bug.

How Chrome is raising the bar

The Chrome team works hard to both detect and fix bugs before releases and get bug fixes out to users as quickly as possible. We’re proud of our record at fixing serious bugs quickly, and we are continually working to do better.

For example, one area of concern for us is the risk of n-day attacks: that is, exploitation of bugs we’ve already fixed, where the fixes are visible in our open-source code repositories. We have greatly reduced our “patch gap” from 35 days in Chrome 76 to an average of 18 days in subsequent milestones, and we expect this to reduce slightly further with Chrome’s faster release cycle.

Irrespective of how quickly bugs are fixed, any in-the-wild exploitation is bad. Chrome is working hard to make it expensive and difficult for attackers to achieve their goals.

Some examples of the projects ongoing:

  • We continue to strengthen Site Isolation, especially on Android.
  • The V8 heap sandbox will prevent attackers using JavaScript just-in-time (JIT) compilation bugs to compromise the renderer process. This will require attackers to add a third bug to these exploit chains, which means increased security, but could increase the amount of in-the-wild exploits reported.
  • The MiraclePtr and *Scan projects aim to prevent exploitability of many of our largest class of browser process bugs, called “use-after-free”. We will be applying similar systematic solutions to other classes of bugs over time.
  • Since “memory safety” bugs account for 70% of the exploitable security bugs, we aim to write new parts of Chrome in memory-safe languages.
  • We continue to work on post-exploitation mitigations such as CET and CFG.

We are well past the stage of having “easy wins” when it comes to raising the bar for security. All of these are long term projects with significant engineering challenges. But as we've shown with Site Isolation, Chrome isn't afraid of making long term investments in major security engineering projects. One of the major challenges is performance: all of these technologies (except memory safe languages) could risk slowing the browser. Expect a series of blog posts over the coming months as we explore performance vs. security trade-offs. These decisions are really hard: we do not want to make Chrome slower for billions of people, especially as this disproportionately hits users with slower devices – we strive to make Chrome secure for all our users, not just those with the high end systems.

How you can help

Above all: if Chrome is reminding you to update, please do!

If you’re an enterprise IT professional, keep your users up-to-date by keeping auto-update on, and familiarize yourself with the added enterprise policies and controls that you can apply to Chrome within your organization. We strongly advise not focusing on zero-days when making decisions about updates, but instead to assume any Chrome security bug is under exploitation as an n-day.

If you're a security researcher, you can report bugs you find to the Chrome Vulnerability Rewards Program — and thanks for helping us make Chrome safer for everyone!

Mitigating kernel risks on 32-bit ARM


Linux kernel support for the 32-bit ARM architecture was contributed in the late 90s, when there was little corporate involvement in Linux development, and most contributors were students or hobbyists, tinkering with development boards, often without much in the way of documentation.



Now 20+ years later, 32-bit ARM's maintainer has downgraded its support level to 'odd fixes,' while remaining active as a kernel contributor. This is a common pattern for aging and obsolete architectures: corporate funding for Linux kernel development has tremendously increased the pace of development, but only for architectures with a high return on investment. As a result, the 32-bit ARM port of Linux is essentially in maintenance-only mode, and lacks core Linux advancements such as THREAD_INFO_IN_TASK or VMAP_STACK, which protect against stack overflow attacks.

The lack of developer attention does not imply that the 32-bit ARM port has ceased to make economic sense, though. Instead, it has evolved from being one of the spearheads of Linux innovation to a stable and mature platform, and while funding its upstream development may not make sense in the long term, deploying 32-bit ARM into the field today most certainly still makes economic sense when margins are razor thin and BOM costs need to be kept to an absolute minimum. This is why 32-bit ARM is still widely used in embedded systems like set-top boxes and wireless routers.

Running 32-bit Linux on 64-bit ARM systems

Ironically, at these low price points, the DRAM is actually the dominant component in terms of BOM cost, and many of these 32-bit ARM systems incorporate a cheap ARMv8 SoC that happens to be capable of running in 64-bit mode as well. The reason for running 32-bit applications nonetheless is that these generally use less of the expensive DRAM, and can be deployed directly without the need to recompile the binaries. As 32-bit applications don't need a 64-bit kernel (which itself uses more memory due to its internal use of 64-bit pointers), the product ships with a 32-bit kernel instead.


If you're choosing to use a 32-bit kernel for its smaller memory footprint, it's not without risks. You'll likely experience performance issues, unpatched vulnerabilities, and unexpected misbehaviors such as:

  • 32-bit kernels generally cannot manage more than 1 GiB of physical memory without resorting to HIGHMEM bouncing, and cannot provide a full virtual address space of 4 GiB to user space, as 64-bit kernels can.
  • Side channels or other flaws caused by silicon errata may exist that haven't been mitigated in 32-bit kernels. For example, the hardening against Spectre and Meltdown vulnerabilities were only done for ARMv7 32-bit only CPUs, and many ARMv8 cores running in 32-bit mode may still be vulnerable (only Cortex-A73 and A75 are handled specifically). And in general, silicon flaws in 64-bit parts that affect the 32-bit kernel are less likely to be found or documented, simply because the silicon validation teams don’t prioritize them.
  • The 32-bit ARM kernel does not implement the elaborate alternatives patching framework that is used by other architectures to implement handling of silicon errata, which are particular to certain revisions of certain CPUs. Instead, on 32-bit multiplatform kernels, we simply enable all errata workarounds that may be needed by any of the cores that may ever run the image in question, potentially affecting performance unnecessarily on cores that have no need for them.
  • Silicon vendors are phasing out 32-bit support in the longer term. Given an ecosystem containing a handful of operating systems and thousands of applications, support for 32-bit operating systems (which is more complex technically) is highly likely to be dropped first. For products with longer life cycles, long-term procurement contracts for components available today are usually much more costly than adjusting the BOM over time and using newer, cheaper parts.
  • The 32-bit kernel does not implement kernel address space randomization, and even if it did, its comparatively tiny address space simply leaves very little space for randomization. Other hardening features, such as rodata=full or hierarchical eXecute Never attributes, are missing as well on 32-bit, and are not likely to be implemented, either due to lack of support in the architecture, or because of the complexity of the 32-bit memory management code, which still supports all of the different architecture revisions dating back to the initial Linux port running on the Risc PC.
Keeping the 32-bit ARM kernel secure

There are cases, though, where using the 32-bit kernel is the only option, e.g., if the CPUs are in fact 32-bit only (which is the case even for some ARMv8 cores such as Cortex-A32), or when relying on an existing 32-bit only codebase running in the kernel (drivers for legacy peripherals). Note that in such cases, it still makes sense to use the most recent kernel version compatible with the hardware, since we are in fact making an effort to enable some of the existing hardening features on 32-bit ARM as well.

  • THREAD_INFO_IN_TASK for v7 SMP cores
The v5.16 release of the Linux kernel implements support for THREAD_INFO_IN_TASK when running on ARMv7 SMP systems. This protects the kernel's per-task bookkeeping (called thread_info), which lives on the far (and normally unused) end of the stack, against stack overflows which may occur in rare -yet sometimes exploitable- cases where the control flow of the program simply ends up accumulating more state than the stack can hold. (Note that a stack overflow is not the same as a stack buffer overflow, where the overflow happens in the opposite direction.)

By moving thread_info off the stack and into the kernel heap, and by using a special SMP CPU register to keep track of its location, we can mitigate the risk of stack overflows resulting in thread_info corruption. However, it does not prevent stack overflows themselves: these may still occur, and result in corruption of other data structures that happen to be adjacent to the task stack in memory.
  • THREAD_INFO_IN_TASK for other cores
For CPUs that lack this special SMP CPU register, we also proposed an implementation of THREAD_INFO_IN_TASK that is expected to land in v5.18. Instead of a special register, it uses a global variable to keep track of the location of thread_info.


  • VMAP_STACK support
Preventing stack overflows from corrupting unrelated memory contents is the goal of VMAP_STACK, which we are enabling for 32-bit ARM as well. When VMAP_STACK is enabled, kernel mode stacks are allocated from the kernel heap as before, but mapped into a different part of the kernel's address space, and surrounded by guard regions, which are guaranteed to be kept unpopulated. Given that accesses to such unpopulated regions will trigger an exception, the kernel's memory management layer can step in and terminate the program as soon as a stack overflow occurs, and prevent it from causing memory corruption.


Support for IRQ stacks

Coming up with a bounded worst case on which to base the size of the kernel stack is rather hard, especially given the fact that it is shared between the program itself and any exception handling routines that may be called on its behalf, including interrupt handlers. To mitigate the risk of a pathological worst case occurring, where an interrupt fires that needs a lot of stack space right at a time when most of the stack is already being used by the program, we are also enabling IRQ_STACKS for 32-bit ARM, which will run handlers of both hard and soft interrupts from a dedicated stack, one for each CPU. By decoupling the task and interrupt contexts like this, the likelihood that a well-behaved program needs to be terminated due to stack overflow should be all but eliminated.

Conclusion

With these changes in place, kernel stack overflow protection will be available for all ARM systems supported by Linux, including ancient ones like the Risc PC or Netwinder, provided that it runs a Linux distribution that is keeping up with the times.




However, relying on legacy hardware and software comes with a risk, and even though we try to help keep users of the 32-bit kernel as safe as we reasonably can, it is not the right choice for new designs that incorporate 64-bit capable hardware.