Category Archives: Online Security Blog

The latest news and insights from Google on security and safety on the Internet

Verifiable design in modern systems


The way we design and build software is continually evolving. Just as we now think of security as something we build into software from the start, we are also increasingly looking for new ways to minimize trust in that software. One of the ways we can do that is by designing software so that you can get cryptographic certainty of what the software has done.

In this post, we'll introduce the concept of verifiable data structures that help us get this cryptographic certainty. We'll describe some existing and new applications of verifiable data structures, and provide some additional resources we have created to help you use them in your own applications.
A verifiable data structure is a class of data structure that lets people efficiently agree, with cryptographic certainty, that the data contained within it is correct.

Merkle Trees are the most famous of these and have been used for decades because they can enable efficient verification that a particular piece of data is included among many records - as a result they also form the basis of most blockchains.

Although these verifiable data structures are not new, we now have a new generation of developers who have discovered them and the designs they enable -- further accelerating their adoption.
These verifiable data structures enable building a new class of software that have elements of verifiability and transparency built into the way they operate. This gives us new ways to defend against coercion, introduce accountability to existing and new ecosystems, and make it easier to demonstrate compliance to regulators, customers and partners.

Certificate Transparency is a great example of a non-blockchain use of these verifiable data structures at scale to secure core internet infrastructure. By using these patterns, we have been able to introduce transparency and accountability to an existing system used by everyone without breaking the web.
Unfortunately, despite the capabilities of verifiable data structures and the associated patterns, there are not many resources developers can use to design, build, and deploy scalable and production-quality systems based on them.

To address this gap we have generalized the platform we used to build Certificate Transparency so it can be applied to other classes of problems as well. Since this infrastructure has been used for years as part of this ecosystem it is well understood and can be deployed confidently in production systems.
This is why we have seen solutions in areas of healthcare, financial services, and supply chain leverage this platform. Beyond that, we have also applied these patterns to bring these transparency and accountability properties to other problems within our own products and services.

To this end, in 2019, we used this platform to bring supply chain integrity to the Go language ecosystem via the Go Checksum Database. This system allows developers to have confidence that the package management systems supporting the Go ecosystem can’t intentionally, arbitrarily, or accidentally start giving out the wrong code without getting caught. The reproducibility of Go builds makes this particularly powerful as it enables the developer to ensure what is in the source repository matches what is in the package management system. This solution delivers a verifiable chaiin all the way from the source repositories to the final compiled artifacts.

Another example of using these patterns is our recently announced partnership with the Linux Foundation on Sigstore. This project is a response to the ever-increasing influx of supply chain attacks on the Open Source ecosystem.

Supply chain attacks have been possible because there are weaknesses at every link in the chain. Components like build systems, source code management tools, and artifact repositories all need to be treated as critical production environments, because they are. To address this, we first need to make it possible to verify provenance along the entire chain and the goal of the Sigstore effort is to enable just that.

We are now working on using these patterns and tools to enable hardware-enforced supply chain integrity for device firmware, which we hope will discourage supply chain attacks on the devices, like smartphones, that we rely on every day by bringing transparency and accountability to their firmware supply chain.

In all of the above examples, we are using these verifiable data structures to ensure the integrity of artifacts in the supply chain. This enables customers, auditors, and internal security teams to be confident that each actor in the supply chain has lived up to their responsibilities. This helps earn the trust of those that rely on the supply chain, discourages insiders from using their position as it increases the chance they will get caught, introduces accountability, and enables proving the associated systems continually meet their compliance obligations.

When using these patterns the most important task is defining what data should be logged. This is why we put together a taxonomy and modeling framework which we have found to be helpful in designing verifiability into the systems we discussed above, and which we hope you will find valuable too.
Please take a look at the transparency.dev website to learn about these verifiable data structures, and the tools and guidance we have put together to help use them in your own applications.

Measuring Security Risks in Open Source Software: Scorecards Launches V2


Contributors to the Scorecards project, an automated security tool that produces a “risk score” for open source projects, have accomplished a lot since our launch last fall. Today, in collaboration with the Open Source Security Foundation community, we are announcing Scorecards v2. We have added new security checks, scaled up the number of projects being scored, and made this data easily accessible for analysis.


With so much software today relying on open-source projects, consumers need an easy way to judge whether their dependencies are safe. Scorecards helps reduce the toil and manual effort required to continually evaluate changing packages when maintaining a project’s supply chain. Consumers can automatically assess the risks that dependencies introduce and use this data to make informed decisions about accepting these risks, evaluating alternative solutions, or working with the maintainers to make improvements.


Identifying Risks


Since last fall, Scorecards’ coverage has grown; we've added several new checks, following the Know, Prevent, Fix framework proposed by Google earlier this year, to prioritize our additions:

Malicious contributors

Contributors with malicious intent or compromised accounts can introduce potential backdoors into code. Code reviews help mitigate against such attacks. With the new Branch-Protection check, developers can verify that the project enforces mandatory code review from another developer before code is committed. Currently, this check can only be run by a repository admin due to GitHub API limitations. For a third-party repository, use the less informative Code-Review check instead.

Vulnerable code

Despite best efforts by developers and peer reviews, vulnerable code can enter source control and remain undetected. That’s why it's important to enable continuous fuzzing and static code analysis to catch bugs early in the development lifecycle. We have added checks to detect if a project uses Fuzzing and SAST tools as part of their CI/CD system.

Build system compromise

A common CI/CD solution used by GitHub projects is GitHub Actions. A danger with these action workflows is that they may handle untrusted user input. Meaning, an attacker can craft a malicious pull request to gain access to the privileged GitHub token, and with it the ability to push malicious code to the repo without review. To mitigate this risk, Scorecard's Token-Permissions prevention check now verifies that the GitHub workflows follow the principle of least privilege by making GitHub tokens read-only by default.

Bad dependencies

Any software is as secure as its weakest dependency. This may sound obvious, but the first step to knowing our dependencies is simply to declare them... and have our dependencies declare them too. Once we have this provenance information, we can assess the risks of our software and mitigate those risks. Unfortunately, there are several widely-used anti-patterns that break this provenance principle. The first of these anti-patterns is checked-in binaries -- as there's no way to easily verify or check the contents of the binary in the project. Scorecards provides Binary-Artifacts check for testing this.


Another anti-pattern is the use of curl | bash in scripts which dynamically pulls dependencies. Cryptographic hashes let us pin our dependencies to a known value: if this value ever changes, the build system will detect it and refuse to build. Pinning dependencies is useful everywhere we have dependencies: not just during compilation, but also in Dockerfiles, CI/CD workflows, etc. Scorecards checks for these anti-patterns with the Frozen-Deps check. This check is helpful for mitigating against malicious dependency attacks such as the recent CodeCov attack.


Even with hash-pinning, hashes need to be updated once in a while when dependencies patch vulnerabilities. Tools like dependabot or renovatebot give us the opportunity to review and update the hashes. The Scorecards Automated-Dependency-Update check verifies that developers rely on such tools to update their dependencies.


It is important to know vulnerabilities in a project before uptaking it as a dependency. Scorecards can provide this information via the new Vulnerabilities check, without the need to subscribe to a vulnerability alert system.


Scaling the impact


To date, the Scorecards project has scaled up to evaluate security criteria for over 50,000 open source projects. In order to scale this project, we undertook a massive redesign of our architecture and used a PubSub model which achieved horizontal scalability and higher throughput. This fully automated tool periodically evaluates critical open source projects and exposes the Scorecards check information through a public BigQuery dataset which is refreshed weekly.



This data can be retrieved using the bq command line tool. The following example shows how to export data for the Kubernetes project. Substitute the url for the repo to export data from a different project:

$ bq query --nouse_legacy_sql 'SELECT Repo, Date, Checks FROM openssf.scorecardcron.scorecard_latest WHERE Repo="github.com/kubernetes/kubernetes"'


To export the latest data on all analyzed projects, see instructions here.

How does the internet measure up?

Scorecards data for available projects is now included in the recently announced Google Open Source Insights project and also showcased in OpenSSF Security Metrics project. The data on these sites shows that there are still important security gaps to fill, even in widely used packages like Kubernetes.


We also analyzed Scorecards data through Google Data Studio -- one of our data analysis and visualization tools.The diagram below shows a breakdown of the checks that were run and the pass/fail outcome for the 50,000 repositories:

 



As we can see, a lot needs to be done to improve the security of these critical projects. A large number of these projects are not continuously fuzzed, do not define a security policy for reporting vulnerabilities, and do not pin dependencies, to name just a few common problems. We all need to come together as an industry to drive awareness of these widespread security risks, and to make improvements that will benefit everyone.

Scorecards in Action

Several large projects have adopted Scorecards and are keeping us updated on their experiences with it. Below are some examples of Scorecards in action:

Envoy
Early on we talked about how the Envoy maintainers adopted Scorecards for their project and integrated it within their policy on introducing new dependencies. Since then, pull requests introducing new dependencies to Envoy must get approval from a dependency maintainer who uses Scorecards to evaluate the dependency against a set of criteria.

In addition, Envoy also got right to work in improving its own security health metrics according to its own Scorecards evaluation, and is now pinning C++ dependencies and requiring pip hashes for python dependencies. Github actions are also pinned in the continuous integration flow.

Previously, Envoy had created a tool that outputs Scorecards data on its dependencies as a CSV that can be used to generate a table of results:



Now with more project data, Envoy is able to automatically generate up-to-date Scorecard information about its dependencies and publish it in documentation, like the following:


Scorecards
We improved our own score for the Scorecards! For example, we are now pinning our own dependencies by hash (e.g. docker dependencies, workflow dependencies) to prevent CodeCov style attacks. We’ve also included a Security Policy based on this recommended template.

Get involved

We look forward to continuing to grow the Scorecards community. The project now has contributions from 23 developers. Thank you to Azeem, Naveen, Laurent, Asra and Chris for their work building these new features and scaling Scorecards.

If you would like to join the fun, check out these good first timer issues.

If you would like us to help you run Scorecards on specific projects, please submit a GitHub pull request to add those projects here.

Last but not least, we have a lot of ideas and many more checks we’d like to add, but we want to hear from you. Tell us which checks you would like to see in the next version of Scorecards.


What’s next?

There are a couple of big enhancements we’re especially excited about:


Thanks again to the entire Scorecards community and the OpenSSF for making this project successful. If you’re adopting and improving the score of the projects you maintain, tell us about it. Until next time, keep on improving those scores!

Announcing a unified vulnerability schema for open source


In recent months, Google has launched several efforts to strengthen open-source security on multiple fronts. One important focus is improving how we identify and respond to known security vulnerabilities without doing extensive manual work. It is essential to have a precise common data format to triage and remediate security vulnerabilities, particularly when communicating about risks to affected dependencies—it enables easier automation and empowers consumers of open-source software to know when they are impacted and make security fixes as soon as possible.

We released the Open Source Vulnerabilities (OSV) database in February with the goal of automating and improving vulnerability triage for developers and users of open source software. This initial effort was bootstrapped with a dataset of a few thousand vulnerabilities from the OSS-Fuzz project. Implementing OSV to communicate precise vulnerability data for hundreds of critical open-source projects proved the success and utility of the format, and garnered feedback to help us improve the project; for example, we dropped the Cloud API key requirement, making the database even easier to access by more users. The community response also showed that there was broad interest in extending the effort further.

Today, we’re excited to announce a new milestone in expanding OSV to several key open-source ecosystems: Go, Rust, Python, and DWF. This expansion unites and aggregates four important vulnerability databases, giving software developers a better way to track and remediate the security issues that affect them. Our effort also aligns with the recent US Executive Order on Improving the Nation’s Cybersecurity, which emphasized the need to remove barriers to sharing threat information in order to strengthen national infrastructure. This expanded shared vulnerability database marks an important step toward creating a more secure open-source environment for all users.
A simple, unified schema for describing vulnerabilities precisely

As with open source development, vulnerability databases in open source follow a distributed model, with many ecosystems and organizations creating their own database. Since each uses their own format to describe vulnerabilities, a client tracking vulnerabilities across multiple databases must handle each completely separately. Sharing of vulnerabilities between databases is also difficult.

The Google Open Source Security team, Go team, and the broader open-source community have been developing a simple vulnerability interchange schema for describing vulnerabilities that’s designed from the beginning for open-source ecosystems. After starting work on the schema a few months ago, we requested public feedback and received hundreds of comments. We have incorporated the input from readers to arrive at the current schema:

{

        "id": string,

        "modified": string,

        "published": string,

        "withdrawn": string,

        "aliases": [ string ],

        "related": [ string ],

        "package": {

                "ecosystem": string,

                "name": string,

                "purl": string,

        },

        "summary": string,

        "details": string,

        "affects": [ {

                "ranges": [ {

                        "type": string,

                        "repo": string,

                        "introduced": string,

                        "fixed": string

                } ],

                "versions": [ string ]

        } ],

        "references": [ {

                "type": string,

                "url": string

        } ],

        "ecosystem_specific": { see spec },

        "database_specific": { see spec },

}



This new vulnerability schema aims to address some key problems with managing vulnerabilities in open source. We found that there was no existing standard format which:

  • Enforces version specification that precisely matches naming and versioning schemes used in actual open source package ecosystems. For instance, matching a vulnerability such as a CVE to a package name and set of versions in a package manager is difficult to do in an automated way using existing mechanisms such as CPEs.
  • Can be used to describe vulnerabilities in any open source ecosystem, while not requiring ecosystem-dependent logic to process them.
  • Is easy to use by both automated systems and humans.

With this schema we hope to define a format that all vulnerability databases can export. A unified format means that vulnerability databases, open source users, and security researchers can easily share tooling and consume vulnerabilities across all of open source. This means a more complete view of vulnerabilities in open source for everyone, as well as faster detection and remediation times resulting from easier automation.

The current state


The vulnerability schema spec has gone through several iterations, and we are inviting further feedback as it gets closer to finalized. A number of public vulnerability databases today are already exporting this format, with more in the pipeline:
The OSV service has also aggregated all of these vulnerability databases, which are viewable at our web UI. They can also be queried with a single command via the same existing APIs:

  curl -X POST -d \

      '{"commit": "a46c08c533cfdf10260e74e2c03fa84a13b6c456"}' \

      "https://api.osv.dev/v1/query"

    

  curl -X POST -d \

      '{"version": "2.4.1", "package": {"name": "jinja2", "ecosystem": "PyPI"}}' \

      "https://api.osv.dev/v1/query"



Automating vulnerability database maintenance


Producing quality vulnerability data is also difficult. In addition to OSV’s existing automation, we built more automation tools for vulnerability database maintenance, and used these tools to bootstrap the community Python advisory database. This automation takes existing feeds, accurately matches them to packages, and generates entries containing precise, validated version ranges with minimal human intervention. We plan to extend this tooling to other ecosystems for which there is no existing vulnerability database, or little support for ongoing database maintenance.


Get involved


Thank you to all the open source developers who have provided feedback and adopted this format. We’re continuing to work with open source communities to develop this further and earn more widespread adoption in all ecosystems. If you are interested in adopting this format, we’d appreciate any feedback on our public spec.

Get ready for the 2021 Google CTF



Are you ready for no sleep, no chill and a lot of hacking? Our annual Google CTF is back!


The competition kicks off on Saturday July 17 00:00:01 AM UTC and runs through Sunday July 18 23:59:59 UTC. Teams can register at http://goo.gle/ctf.


Just like last year, the top 16 teams will qualify for our Hackceler8 speed run and the chance to take home a total of $30,301.70 in prize money.



As we reminisce on last years event, we’d be remiss if we didn’t recognize our 2020 winning teams:


  • Plaid Parliament of Pwning
  • I Use Bing
  • pasten
  • The Flat Network Society

We are eager to see if they can defend their leet status. For those interested, we have published all 2020 Hackceler8 videos for your viewing pleasure here.


Whether you’re a seasoned CTF player or just curious about cyber security and ethical hacking, we want you to join us. Sign up to learn skills, meet new friends in the security community and even watch the pros in action. For the latest announcements, see g.co/ctf, subscribe to our mailing list or follow us on @GoogleVRP. See you there!


P.S. Curious about last year’s Google CTF challenges? We open-sourced them here.

Introducing SLSA, an End-to-End Framework for Supply Chain Integrity



Supply chain integrity attacks—unauthorized modifications to software packages—have been on the rise in the past two years, and are proving to be common and reliable attack vectors that affect all consumers of software. The software development and deployment supply chain is quite complicated, with numerous threats along the source ➞ build ➞ publish workflow. While point solutions do exist for some specific vulnerabilities, there is no comprehensive end-to-end framework that both defines how to mitigate threats across the software supply chain, and provides reasonable security guarantees. There is an urgent need for a solution in the face of the eye-opening, multi-billion dollar attacks in recent months (e.g. SolarWinds, Codecov), some of which could have been prevented or made more difficult had such a framework been adopted by software developers and consumers.


Our proposed solution is Supply chain Levels for Software Artifacts (SLSA, pronounced “salsa”), an end-to-end framework for ensuring the integrity of software artifacts throughout the software supply chain. It is inspired by Google’s internal “Binary Authorization for Borg” which has been in use for the past 8+ years and is mandatory for all of Google's production workloads. The goal of SLSA is to improve the state of the industry, particularly open source, to defend against the most pressing integrity threats. With SLSA, consumers can make informed choices about the security posture of the software they consume.

How SLSA helps

SLSA helps to protect against common supply chain attacks. The following image illustrates a typical software supply chain and includes examples of attacks that can occur at every link in the chain. Each type of attack has occured over the past several years and, unfortunately, is increasing as time goes on.




Threat

Known example

How SLSA could have helped

A

Submit bad code to the source repository

Linux hypocrite commits: Researcher attempted to intentionally introduce vulnerabilities into the Linux kernel via patches on the mailing list.

Two-person review caught most, but not all, of the vulnerabilities.

B

Compromise source control platform

PHP: Attacker compromised PHP’s self-hosted git server and injected two malicious commits.

A better-protected source code platform would have been a much harder target for the attackers. 

C

Build with official process but from code not matching source control

Webmin: Attacker modified the build infrastructure to use source files not matching source control.

A SLSA-compliant build server would have produced provenance identifying the actual sources used, allowing consumers to detect such tampering.

D

Compromise build platform

SolarWinds: Attacker compromised the build platform and installed an implant that injected malicious behavior during each build.

Higher SLSA levels require stronger security controls for the build platform, making it more difficult to compromise and gain persistence.

E

Use bad dependency (i.e. A-H, recursively)

event-stream: Attacker added an innocuous dependency and then updated the dependency to add malicious behavior. The update did not match the code submitted to GitHub (i.e. attack F).

Applying SLSA recursively to all dependencies would have prevented this particular vector, because the provenance would have indicated that it either wasn’t built from a proper builder or that the source did not come from GitHub.

F

Upload an artifact that was not built by the CI/CD system

CodeCov: Attacker used leaked credentials to upload a malicious artifact to a GCS bucket, from which users download directly.

Provenance of the artifact in the GCS bucket would have shown that the artifact was not built in the expected manner from the expected source repo.

G

Compromise package repository

Attacks on Package Mirrors: Researcher ran mirrors for several popular package repositories, which could have been used to serve malicious packages.

Similar to above (F), provenance of the malicious artifacts would have shown that they were not built as expected or from the expected source repo.

H

Trick consumer into using bad package

Browserify typosquatting: Attacker uploaded a malicious package with a similar name as the original.

SLSA does not directly address this threat, but provenance linking back to source control can enable and enhance other solutions.


What is SLSA

In its current state, SLSA is a set of incrementally adoptable security guidelines being established by industry consensus. In its final form, SLSA will differ from a list of best practices in its enforceability: it will support the automatic creation of auditable metadata that can be fed into policy engines to give "SLSA certification" to a particular package or build platform. SLSA is designed to be incremental and actionable, and to provide security benefits at every step. Once an artifact qualifies at the highest level, consumers can have confidence that it has not been tampered with and can be securely traced back to source—something that is difficult, if not impossible, to do with most software today.

SLSA consists of four levels, with SLSA 4 representing the ideal end state. The lower levels represent incremental milestones with corresponding incremental integrity guarantees. The requirements are currently defined as follows.



SLSA 1 requires that the build process be fully scripted/automated and generate provenance. Provenance is metadata about how an artifact was built, including the build process, top-level source, and dependencies. Knowing the provenance allows software consumers to make risk-based security decisions. Though provenance at SLSA 1 does not protect against tampering, it offers a basic level of code source identification and may aid in vulnerability management.


SLSA 2 requires using version control and a hosted build service that generates authenticated provenance. These additional requirements give the consumer greater confidence in the origin of the software. At this level, the provenance prevents tampering to the extent that the build service is trusted. SLSA 2 also provides an easy upgrade path to SLSA 3.


SLSA 3 further requires that the source and build platforms meet specific standards to guarantee the auditability of the source and the integrity of the provenance, respectively. We envision an accreditation process whereby auditors certify that platforms meet the requirements, which consumers can then rely on. SLSA 3 provides much stronger protections against tampering than earlier levels by preventing specific classes of threats, such as cross-build contamination.


SLSA 4 is currently the highest level, requiring two-person review of all changes and a hermetic, reproducible build process. Two-person review is an industry best practice for catching mistakes and deterring bad behavior. Hermetic builds guarantee that the provenance’s list of dependencies is complete. Reproducible builds, though not strictly required, provide many auditability and reliability benefits. Overall, SLSA 4 gives the consumer a high degree of confidence that the software has not been tampered with.


More details on these proposed levels can be found in the GitHub repository, including the corresponding Source and Build/Provenance requirements. We are open to feedback and suggestions for changes on these requirements.

Proof of Concept

Today, we are releasing a proof of concept for SLSA 1 provenance generator (repo, marketplace). This will allow a user to create and upload provenance alongside their build artifacts, thereby achieving SLSA 1. To use it, add the following snippet to your workflow:

- name: Generate provenance

  uses: slsa-framework/github-actions-demo@v0.1

  with:

    artifact_path: <path-to-artifact/directory>


Going forward, we plan to work with popular source, build, and packaging platforms to make it as easy as possible to reach higher levels of SLSA. These plans include generating provenance automatically in build systems, propagating provenance natively in package repositories, and adding security features across the major platforms. Our long-term goal is to raise the security bar across the industry so that the default expectation is higher-level SLSA security standards, with minimal effort on the part of software producers.
 
Summary

SLSA is a practical framework for end-to-end software supply chain integrity, based on a model proven to work at scale in one of the world’s largest software engineering organizations. Achieving the highest level of SLSA for most projects may be difficult, but incremental improvements recognized by lower SLSA levels will already go a long way toward improving the security of the open source ecosystem.

We look forward to working with the community on refining the levels as we begin adopting SLSA for our own open source projects. If you are a project maintainer and interested in trying to adopt and provide feedback on SLSA, please reach out or come join the discussions taking place in the OpenSSF Digital Identity Attestation Working Group.

Check out the Know, Prevent, Fix post to read more about Google’s overall approach to open source security.

Rust/C++ interop in the Android Platform

One of the main challenges of evaluating Rust for use within the Android platform was ensuring we could provide sufficient interoperability with our existing codebase. If Rust is to meet its goals of improving security, stability, and quality Android-wide, we need to be able to use Rust anywhere in the codebase that native code is required. To accomplish this, we need to provide the majority of functionality platform developers use. As we discussed previously, we have too much C++ to consider ignoring it, rewriting all of it is infeasible, and rewriting older code would likely be counterproductive as the bugs in that code have largely been fixed. This means interoperability is the most practical way forward.

Before introducing Rust into the Android Open Source Project (AOSP), we needed to demonstrate that Rust interoperability with C and C++ is sufficient for practical, convenient, and safe use within Android. Adding a new language has costs; we needed to demonstrate that Rust would be able to scale across the codebase and meet its potential in order to justify those costs. This post will cover the analysis we did more than a year ago while we evaluated Rust for use in Android. We also present a follow-up analysis with some insights into how the original analysis has held up as Android projects have adopted Rust.

Language interoperability in Android

Existing language interoperability in Android focuses on well defined foreign-function interface (FFI) boundaries, which is where code written in one programming language calls into code written in a different language. Rust support will likewise focus on the FFI boundary as this is consistent with how AOSP projects are developed, how code is shared, and how dependencies are managed. For Rust interoperability with C, the C application binary interface (ABI) is already sufficient.

Interoperability with C++ is more challenging and is the focus of this post. While both Rust and C++ support using the C ABI, it is not sufficient for idiomatic usage of either language. Simply enumerating the features of each language results in an unsurprising conclusion: many concepts are not easily translatable, nor do we necessarily want them to be. After all, we’re introducing Rust because many features and characteristics of C++ make it difficult to write safe and correct code. Therefore, our goal is not to consider all language features, but rather to analyze how Android uses C++ and ensure that interop is convenient for the vast majority of our use cases.

We analyzed code and interfaces in the Android platform specifically, not codebases in general. While this means our specific conclusions may not be accurate for other codebases, we hope the methodology can help others to make a more informed decision about introducing Rust into their large codebase. Our colleagues on the Chrome browser team have done a similar analysis, which you can find here.

This analysis was not originally intended to be published outside of Google: our goal was to make a data-driven decision on whether or not Rust was a good choice for systems development in Android. While the analysis is intended to be accurate and actionable, it was never intended to be comprehensive, and we’ve pointed out a couple of areas where it could be more complete. However, we also note that initial investigations into these areas showed that they would not significantly impact the results, which is why we decided to not invest the additional effort.

Methodology

Exported functions from Rust and C++ libraries are where we consider interop to be essential. Our goals are simple:

  • Rust must be able to call functions from C++ libraries and vice versa.
  • FFI should require a minimum of boilerplate.
  • FFI should not require deep expertise.

While making Rust functions callable from C++ is a goal, this analysis focuses on making C++ functions available to Rust so that new Rust code can be added while taking advantage of existing implementations in C++. To that end, we look at exported C++ functions and consider existing and planned compatibility with Rust via the C ABI and compatibility libraries. Types are extracted by running objdump on shared libraries to find external C++ functions they use1 and running c++filt to parse the C++ types. This gives functions and their arguments. It does not consider return values, but a preliminary analysis2 of those revealed that they would not significantly affect the results.

We then classify each of these types into one of the following buckets:

Supported by bindgen

These are generally simple types involving primitives (including pointers and references to them). For these types, Rust’s existing FFI will handle them correctly, and Android’s build system will auto-generate the bindings.

Supported by cxx compat crate

These are handled by the cxx crate. This currently includes std::string, std::vector, and C++ methods (including pointers/references to these types). Users simply have to define the types and functions they want to share across languages and cxx will generate the code to do that safely.

Native support

These types are not directly supported, but the interfaces that use them have been manually reworked to add Rust support. Specifically, this includes types used by AIDL and protobufs.

We have also implemented a native interface for StatsD as the existing C++ interface relies on method overloading, which is not well supported by bindgen and cxx3. Usage of this system does not show up in the analysis because the C++ API does not use any unique types.

Potential addition to cxx

This is currently common data structures such as std::optional and std::chrono::duration and custom string and vector implementations.

These can either be supported natively by a future contribution to cxx, or by using its ExternType facilities. We have only included types in this category that we believe are relatively straightforward to implement and have a reasonable chance of being accepted into the cxx project.

We don't need/intend to support

Some types are exposed in today’s C++ APIs that are either an implicit part of the API, not an API we expect to want to use from Rust, or are language specific. Examples of types we do not intend to support include:

  • Mutexes - we expect that locking will take place in one language or the other, rather than needing to pass mutexes between languages, as per our coarse-grained philosophy.
  • native_handle - this is a JNI interface type, so it is inappropriate for use in Rust/C++ communication.
  • std::locale& - Android uses a separate locale system from C++ locales. This type primarily appears in output due to e.g., cout usage, which would be inappropriate to use in Rust.

Overall, this category represents types that we do not believe a Rust developer should be using.

HIDL

Android is in the process of deprecating HIDL and migrating to AIDL for HALs for new services.We’re also migrating some existing implementations to stable AIDL. Our current plan is to not support HIDL, preferring to migrate to stable AIDL instead. These types thus currently fall into the “We don't need/intend to support'' bucket above, but we break them out to be more specific. If there is sufficient demand for HIDL support, we may revisit this decision later.

Other

This contains all types that do not fit into any of the above buckets. It is currently mostly std::string being passed by value, which is not supported by cxx.

Top C++ libraries

One of the primary reasons for supporting interop is to allow reuse of existing code. With this in mind, we determined the most commonly used C++ libraries in Android: liblog, libbase, libutils, libcutils, libhidlbase, libbinder, libhardware, libz, libcrypto, and libui. We then analyzed all of the external C++ functions used by these libraries and their arguments to determine how well they would interoperate with Rust.

Overall, 81% of types are in the first three categories (which we currently fully support) and 87% are in the first four categories (which includes those we believe we can easily support). Almost all of the remaining types are those we believe we do not need to support.

Mainline modules

In addition to analyzing popular C++ libraries, we also examined Mainline modules. Supporting this context is critical as Android is migrating some of its core functionality to Mainline, including much of the native code we hope to augment with Rust. Additionally, their modularity presents an opportunity for interop support.

We analyzed 64 binaries and libraries in 21 modules. For each analyzed library we examined their used C++ functions and analyzed the types of their arguments to determine how well they would interoperate with Rust in the same way we did above for the top 10 libraries.

Here 88% of types are in the first three categories and 90% in the first four, with almost all of the remaining being types we do not need to handle.

Analysis of Rust/C++ Interop in AOSP

With almost a year of Rust development in AOSP behind us, and more than a hundred thousand lines of code written in Rust, we can now examine how our original analysis has held up based on how C/C++ code is currently called from Rust in AOSP.4

The results largely match what we expected from our analysis with bindgen handling the majority of interop needs. Extensive use of AIDL by the new Keystore2 service results in the primary difference between our original analysis and actual Rust usage in the “Native Support” category.

A few current examples of interop are:

  • Cxx in Bluetooth - While Rust is intended to be the primary language for Bluetooth, migrating from the existing C/C++ implementation will happen in stages. Using cxx allows the Bluetooth team to more easily serve legacy protocols like HIDL until they are phased out by using the existing C++ support to incrementally migrate their service.
  • AIDL in keystore - Keystore implements AIDL services and interacts with apps and other services over AIDL. Providing this functionality would be difficult to support with tools like cxx or bindgen, but the native AIDL support is simple and ergonomic to use.
  • Manually-written wrappers in profcollectd - While our goal is to provide seamless interop for most use cases, we also want to demonstrate that, even when auto-generated interop solutions are not an option, manually creating them can be simple and straightforward. Profcollectd is a small daemon that only exists on non-production engineering builds. Instead of using cxx it uses some small manually-written C wrappers around C++ libraries that it then passes to bindgen.

Conclusion

Bindgen and cxx provide the vast majority of Rust/C++ interoperability needed by Android. For some of the exceptions, such as AIDL, the native version provides convenient interop between Rust and other languages. Manually written wrappers can be used to handle the few remaining types and functions not supported by other options as well as to create ergonomic Rust APIs. Overall, we believe interoperability between Rust and C++ is already largely sufficient for convenient use of Rust within Android.

If you are considering how Rust could integrate into your C++ project, we recommend doing a similar analysis of your codebase. When addressing interop gaps, we recommend that you consider upstreaming support to existing compat libraries like cxx.

Acknowledgements

Our first attempt at quantifying Rust/C++ interop involved analyzing the potential mismatches between the languages. This led to a lot of interesting information, but was difficult to draw actionable conclusions from. Rather than enumerating all the potential places where interop could occur, Stephen Hines suggested that we instead consider how code is currently shared between C/C++ projects as a reasonable proxy for where we’ll also likely want interop for Rust. This provided us with actionable information that was straightforward to prioritize and implement. Looking back, the data from our real-world Rust usage has reinforced that the initial methodology was sound. Thanks Stephen!

Also, thanks to:

  • Andrei Homescu and Stephen Crane for contributing AIDL support to AOSP.
  • Ivan Lozano for contributing protobuf support to AOSP.
  • David Tolnay for publishing cxx and accepting our contributions.
  • The many authors and contributors to bindgen.
  • Jeff Vander Stoep and Adrian Taylor for contributions to this post.


  1. We used undefined symbols of function type as reported by objdump to perform this analysis. This means that any header-only functions will be absent from our analysis, and internal (non-API) functions which are called by header-only functions may appear in it. 

  2. We extracted return values by parsing DWARF symbols, which give the return types of functions. 

  3. Even without automated binding generation, manually implementing the bindings is straightforward. 

  4. In the case of handwritten C/C++ wrappers, we analyzed the functions they call, not the wrappers themselves. For all uses of our native AIDL library, we analyzed the types used in the C++ version of the library. 

Verifiable Supply Chain Metadata for Tekton


If you've been paying attention to the news at all lately, you've probably noticed that software supply chain attacks are rapidly becoming a big problem. Whether you're trying to prevent these attacks, responding to an ongoing one or recovering from one, you understand that knowing what is happening in your CI/CD pipeline is critical.

Fortunately, the Kubernetes-native Tekton project – an open-source framework for creating CI/CD systems – was designed with security in mind from Day One, and the new Tekton Chains project is here to help take it to the next level. Tekton Chains securely captures metadata for CI/CD pipeline executions. We made two really important design decisions early on in Tekton that make supply chain security easy: declarative pipeline definitions and explicit state transitions. This next section will explain what these mean in practice and how they make it easy to build a secure delivery pipeline.


Definitions or “boxes and arrows”
Just like everything in your high school physics class, a CI/CD pipeline can be modeled as a series of boxes. Each box has some inputs, some outputs, and some steps that happen in the middle. Even if you have one big complicated bash script that fetches dependencies, builds programs, runs tests, downloads the internet and deploys to production, you can draw boxes and arrows to represent this flow. The boxes might be really big, but you can do it.

Since the initial whiteboard sketches, the Pipeline and Task CRDs in Tekton were designed to allow users to define each step of their pipeline at a granular level. These types include support for mandatory declared inputs, outputs, and build environments. This means you can track exactly what sources went into a build, what tools were used during the build itself and what artifacts came out at the end. By breaking up a large monolithic pipeline into a series of smaller, reusable steps, you can increase visibility into the overall system. This makes it easier to understand your exposure to supply chain attacks, detect issues when they do happen and recover from them after.


Explicit transitions
After a pipeline is defined, there are a few approaches to orchestrating it: level-triggered and edge-triggered. Like most of the Kubernetes ecosystem, Tekton is designed to operate in a level-triggered fashion. This means steps are executed explicitly by a central orchestrator which runs one task, waits for completion, then decides what to do next. In edge-based systems, a pipeline definition would be translated into a set of events and listeners. Each step fires off events when it completes, and these events are then picked up by listeners which run the next set of steps.

Event-based or edge-triggered systems are easy to reason about, but can be tricky to manage at scale. They also make it much harder to track an artifact as it flows through the entire system. Each step in the pipeline only knows about the one immediately before it; no step is responsible for tracking the entire execution. This can become problematic when you try to understand the security posture of your delivery pipeline.

Tekton was designed with the opposite approach in mind - level-triggered. Instead of a Rube-Goldberg machine tied together with duct tape and clothespins, Tekton is more like an explicit assembly-line. Level-triggered systems like Tekton move from state-to-state in a calculated manner by a central orchestrator. They require more explicit-design up front, but they are easier to observe and reason about after. Supply chains that use systems like Tekton are more secure.


Secure delivery pipeline through chains and provenance
So how do these two design decisions combine to make supply chain security easier? Enter Tekton Chains.

By observing the execution of a Task or a Pipeline and paying careful attention to the inputs, outputs, and steps along the way, we can make it easier to track down what happened and why later on. This "observer" can be run in a separate trust domain and cryptographically sign all of this captured metadata as it's stored, leaving a tamper-proof activity ledger. This technique is called "verifiable builds." This securely generated metadata can be used in a number of ways, from audit logging to recovering from security breaches to pre-deployment policy enforcement.

You can install Chains into any Tekton-enabled cluster and configure it to generate this cryptographically-signed supply chain metadata for your builds. Chains supports pluggable signature systems like PGP, x509 and Cloud KMS's. Payloads can be generated in a few different industry-standard formats like the RedHat Simple-Signing and the In-Toto Provenance specifications. The full documentation is available here, but you can get started quickly with something like this:


For this tutorial, you’ll need access to a GKE Kubernetes cluster and a GCR registry with push credentials. The cluster should already have Tekton Pipelines installed.


Install Tekton Chains into your cluster:

$ kubectl apply --filename https://storage.googleapis.com/tekton-releases/chains/latest/release.yaml



Next, you’ll set up registry authentication for the Tekton Chains controller, so that it can push OCI image signatures to your registry. To set up authentication, you’ll create a Service Account and download credentials:

$ export PROJECT_ID=<GCP Project ID>

$ gcloud iam service-accounts create tekton-chains

$ gcloud iam service-accounts keys create credentials.json --iam-account=tekton-chains@${PROJECT_ID}.iam.gserviceaccount.com



Now, create a Kubernetes Secret from your credentials file so the Chains controller can access it:

$ kubectl create secret docker-registry registry-credentials \

  --docker-server=gcr.io \

  --docker-username=_json_key \

  [email protected] \

  --docker-password="$(cat credentials.json)" \

  -n tekton-chains

$ kubectl patch serviceaccount tekton-chains-controller \

  -p "{\"imagePullSecrets\": [{\"name\": \"registry-credentials\"}]}" -n tekton-chains



We can use cosign to generate a keypair as a Kubernetes secret, which the Chains controller will use for signing. Cosign will ask for a password, which will be stored in the secret:

$ cosign generate-key-pair -k8s tekton-chains/signing-secrets


Next, you’ll need to set up authentication to your GCR registry for the kaniko task as another Kubernetes Secret.

$ export CREDENTIALS_SECRET=kaniko-credentials

$ kubectl create secret generic $CREDENTIALS_SECRET --from-file credentials.json



Now, we’ll create a kaniko-chains task which will build and push a container image to your registry. Tekton Chains will recognize that an image has been built, and sign it automatically.

$ kubectl apply -f https://raw.githubusercontent.com/tektoncd/chains/main/examples/kaniko/gcp/kaniko.yaml

$ cat <<EOF | kubectl apply -f -

apiVersion: tekton.dev/v1beta1

kind: TaskRun

metadata:

  name: kaniko-run

spec:

  taskRef:

    name: kaniko-gcp

  params:

  - name: IMAGE

    value: gcr.io/${PROJECT_ID}/kaniko-chains

  workspaces:

  - name: source

    emptyDir: {}

  - name: credentials

    secret:

      secretName: ${CREDENTIALS_SECRET} 

EOF



Wait for the TaskRun to complete, and give the Tekton Chains controller a few seconds to sign the image and store the signature. You should be able to verify the signature with cosign and your public key:

$ cosign verify -key cosign.pub gcr.io/${PROJECT_ID}/kaniko-chains


Congratulations! You’ve successfully signed and verified an OCI image with Tekton Chains and cosign.


What's Next
Within Chains, we'll be improving integration with other supply-chain security projects. This includes support for Binary Transparency and Verifiable Builds through integrations with the Sigstore and In-Toto projects. We'll also be improving and providing a set of well-designed, highly secure Tasks and Pipeline definitions in the TektonCD Catalog.

In Tekton Pipelines, we plan on finishing up TEP-0025 (Hermekton) to enable the support for hermetic build execution. If you want to play around with it now, hermekton can be run as an alpha feature in experimental mode. When hermekton is enabled, a build runs in a locked-down environment without network connectivity. Hermetic builds guarantee all inputs have been explicitly declared ahead-of-time, providing for a more auditable supply-chain. Hermetic builds and Chains align well, because the hermeticity build property is contained in the full build provenance captured by Chains. Chains can generate and attest to metadata specifying exactly which sections of a build had network access.

This means policy can be defined around exactly which build tools are allowed to access the network and which ones are not. This metadata can be used in policies at build time (banning compilers with security vulnerabilities) or stored and used by policy engines at deploy time (only code-reviewed and verifiably built containers are allowed to run).

We believe supply-chain security must be built-in and by default. No task orchestrator can promise perfect supply-chain security, but TektonCD was designed with unique features in mind that make it easier to do the right thing. We're always looking for feedback on the design, goals and requirements. You can reach out on GitHub or the #chains Slack channel.

Announcing New Abuse Research Grants Program

Our Abuse Bug Bounty program has proved tremendously successful in the past three years since its introduction – thanks to our incredibly engaged community of researchers. Their contributions resulted in +1,000 valid bugs, helping us raise the bar in combating product abuse.

As a result of this continued success, today we are announcing a new experimental Abuse Research Grants Program in addition to the already existing Vulnerability Research Grants. Similar to other Research Grant Programs, these grants are up-front awards that our top researchers will receive before they ever submit a bug.

Last year, we increased our rewards to recognize the important work of our community. The growth of this program would not have been possible without partners like David (@xdavidhu), Zohar (ehpus.com), and Ademar (@nowaskyjr) who, on top of becoming our top research experts in Product Abuse, regularly contribute to transparency by sharing their work, further inspiring and influencing our community of researchers.

Despite the growth and success of this program, there remains more work to be done.

With our new Abuse Research Grants Program, we hope to bring even more awareness to product abuse by connecting more closely with our experienced researchers – so we can all work together to overcome these challenges, prevent product abuse and keep our users safe. Here’s how the program works:
  • We invite our top abuse researchers to the program.
  • We award grants immediately before research begins, no strings attached.
  • Bug Hunters apply for the targets we share with them and start their research.
  • On top of the grant, researchers are eligible for regular rewards for the bugs they discover in scope of our Bug Bounty program.
To learn more about this and other grant programs, visit our rules page.

New protections for Enhanced Safe Browsing users in Chrome

In 2020 we launched Enhanced Safe Browsing, which you can turn on in your Chrome security settings, with the goal of substantially increasing safety on the web. These improvements are being built on top of existing security mechanisms that already protect billions of devices. Since the initial launch, we have continuously worked behind the scenes to improve our real-time URL checks and apply machine learning models to warn on previously-unknown attacks. As a result, Enhanced Safe Browsing users are successfully phished 35% less than other users. Starting with Chrome 91, we will roll out new features to help Enhanced Safe Browsing users better choose their extensions, as well as offer additional protections against downloading malicious files on the web.

Chrome extensions - Better protection before installation

Every day millions of people rely on Chrome extensions to help them be more productive, save money, shop or simply improve their browser experience. This is why it is important for us to continuously improve the safety of extensions published in the Chrome Web Store. For instance, through our integration with Google Safe Browsing in 2020, the number of malicious extensions that Chrome disabled to protect users grew by 81%. This comes on top of a number of improvements for more peace of mind when it comes to privacy and security.

Enhanced Safe Browsing will now offer additional protection when you install a new extension from the Chrome Web Store. A dialog will inform you if an extension you’re about to install is not a part of the list of extensions trusted by Enhanced Safe Browsing.

Any extensions built by a developer who follows the Chrome Web Store Developer Program Policies, will be considered trusted by Enhanced Safe Browsing. For new developers, it will take at least a few months of respecting these conditions to become trusted. Eventually, we strive for all developers with compliant extensions to reach this status upon meeting these criteria. Today, this represents nearly 75% of all extensions in the Chrome Web Store and we expect this number to keep growing as new developers become trusted.

Improved download protection

Enhanced Safe Browsing will now offer you even better protection against risky files.

bad_file.exe may be dangerous. Send to Google for scanning?When you download a file, Chrome performs a first level check with Google Safe Browsing using metadata about the downloaded file, such as the digest of the contents and the source of the file, to determine whether it’s potentially suspicious. For any downloads that Safe Browsing deems risky, but not clearly unsafe, Enhanced Safe Browsing users will be presented with a warning and the ability to send the file to be scanned for a more in depth analysis (pictured above).

If you choose to send the file, Chrome will upload it to Google Safe Browsing, which will scan it using its static and dynamic analysis classifiers in real time. After a short wait, if Safe Browsing determines the file is unsafe, Chrome will display a warning. As always, you can bypass the warning and open the file without scanning. Uploaded files are deleted from Safe Browsing a short time after scanning.

Introducing Security By Design

Integrating security into your app development lifecycle can save a lot of time, money, and risk. That’s why we’ve launched Security by Design on Google Play Academy to help developers identify, mitigate, and proactively protect against security threats.

The Android ecosystem, including Google Play, has many built-in security features that help protect developers and users. The course Introduction to app security best practices takes these protections one step further by helping you take advantage of additional security features to build into your app. For example, Jetpack Security helps developers properly encrypt their data at rest and provides only safe and well known algorithms for encrypting Files and SharedPreferences. The SafetyNet Attestation API is a solution to help identify potentially dangerous patterns in usage. There are several common design vulnerabilities that are important to look out for, including using shared or improper file storage, using insecure protocols, unprotected components such as Activities, and more. The course also provides methods to test your app in order to help you keep it safe after launch. Finally, you can set up a Vulnerability Disclosure Program (VDP) to engage security researchers to help.

In the next course, you can learn how to integrate security at every stage of the development process by adopting the Security Development Lifecycle (SDL). The SDL is an industry standard process and in this course you’ll learn the fundamentals of setting up a program, getting executive sponsorship and integration into your development lifecycle.

Threat modeling is part of the Security Development Lifecycle, and in this course you will learn to think like an attacker to identify, categorize, and address threats. By doing so early in the design phase of development, you can identify potential threats and start planning for how to mitigate them at a much lower cost and create a more secure product for your users.

Improving your app’s security is a never ending process. Sign up for the Security by Design module where in a few short courses, you will learn how to integrate security into your app development lifecycle, model potential threats, and app security best practices into your app, as well as avoid potential design pitfalls.