Category Archives: Open Source Blog

News about Google’s open source projects and programs

Assess the security of Google Kubernetes Engine (GKE) with InSpec for GCP

We are excited to announce the GKE CIS 1.1.0 Benchmark InSpec profile under an open source software license is now available on GitHub, which allows you to assess Google Kubernetes Engine (GKE) clusters against security controls recommended by CIS. You can validate the security posture of your GKE clusters using Chef InSpec™ by assessing their compliance against the Center for Internet Security (CIS) 1.1.0 benchmark for GKE.

Validating security compliance of GKE

GKE is a popular platform to run containerized applications. Many organizations have selected GKE for its scalability, self-healing, observability and integrations with other services on Google Cloud. Developer agility is one of the most compelling arguments for moving to a microservices architecture on Kubernetes, introducing configuration changes at a faster pace and demanding security checks as part of the development lifecycle.

Validating the security settings of your GKE cluster is a complex challenge and requires an analysis of multiple layers within your Cloud infrastructure:

GKE is a managed service on GCP, with controls to tweak the cluster’s behaviour which have an impact on its security posture. These Cloud resource configurations can be configured and audited via Infrastructure-as-Code (IaC) frameworks such as Terraform, the gcloud command line or the Google Cloud Console.

Application workloads are deployed on GKE by interacting via the Kubernetes (K8S) API. Kubernetes resources such as pods, deployments and services are often deployed from yaml templates using the command line tool kubectl.

Kubernetes uses configuration files (such as the kube-proxy and kubelet file) typically in yaml format which are stored on the nodes’ file system.

InSpec for auditing GKE

InSpec is a popular DevSecOps framework that checks the configuration state of resources in virtual machines and containers, on Cloud providers such as Google Cloud, AWS, and Microsoft Azure. The InSpec GCP resource pack 1.8 (InSpec-GCP) provides a consistent way to audit GCP resources and can be used to validate the attributes of a GKE cluster against a desired state declared in code. We previously released a blog post on how to validate your Google Cloud resources with InSpec-GCP against compliance profiles such as the CIS 1.1.0 benchmark for GCP.

While you can use the InSpec-GCP resource pack to define the InSpec controls to validate resources against the Google Cloud API, it does not directly allow you to validate configurations of other relevant layers such as Kubernetes resources and config files on the nodes. Luckily, the challenge to audit Kubernetes resources with InSpec has already been solved by the inspec-k8s resource pack. Further, files on nodes can be audited using remote access via SSH. All together, we can validate the security posture of GKE holistically using the inspec-gcp and inspec-k8s resource packs as well as controls using the InSpec file resource executed in an ssh session.

Running the CIS for GKE compliance profile with InSpec

With the GKE CIS 1.1.0 Benchmark InSpec Profile we have implemented the security controls to validate a GKE cluster against the recommended settings on GCP resource level, Kubernetes API level and file system level. The repository is split into three profiles (inspec-gke-cis-gcp, inspec-gke-cis-k8s and inspec-gke-cis-ssh), since each profile requires a different “target”, or -t parameter when run using the InSpec command line. For ease of use, a wrapper script run_profiles.sh has been provided in the root directory of the repository with the purpose of running all three profiles and storing the reports in the dedicated folder reports.
The script requires the cluster name (-c), ssh username (-u), private key file for ssh authentication (-k), cluster region or zone (-r or -z) and InSpec input file as required by the inspec.yml files in each profile (-i). As an example, the following line will run all three profiles to validate the compliance of cluster inspec-cluster in zone us-central1-a:

./run_profiles.sh -c inspec-cluster \
                           -u konrad \
                           -k /home/konrad/.ssh/google_compute_engine \
                           -z us-central1-a \
                           -i inputs.yml
Running InSpec profile inspec-gke-cis-gcp ...

Profile: InSpec GKE CIS 1.1 Benchmark (inspec-gke-cis-gcp)
Version: 0.1.0
Target: gcp://<service account used for InSpec>

<lots of InSpec output omitted>

Profile Summary: 16 successful controls, 10 control failures2 controls skipped
Test Summary: 18 successful, 11 failures, 2 skipped
Stored report in reports/inspec-gke-cis-gcp_report.
Running InSpec profile inspec-gke-cis-k8s …

Profile: InSpec GKE CIS 1.1 Benchmark (inspec-gke-cis-k8s)
Version: 0.1.0
Target: kubernetes://<IP address of K8S endpoint>:443

<lots of InSpec output omitted>

Profile Summary: 9 successful controls, 1 control failure, 0 controls skipped
Test Summary: 9 successful, 1 failure, 0 skipped
Stored report in reports/inspec-gke-cis-gcp_report.
Running InSpec profile inspec-gke-cis-ssh on node <cluster node 1> ...

Profile: InSpec GKE CIS 1.1 Benchmark (inspec-gke-cis-ssh)
Version: 0.1.0
Target: ssh://<username>@<cluster node 1>:22

<lots of InSpec output omitted>

Profile Summary: 10 successful controls, 5 control failures, 1 control skipped
Test Summary: 12 successful, 6 failures, 1 skipped
Stored report in reports/inspec-gke-cis-ssh_<cluster node 1>_report.


Analyze your scan reports

Once the wrapper script has completed successfully you should analyze the JSON or HTML reports to validate the compliance of your GKE cluster. One way to perform the analysis is to upload the collection of JSON reports of a single run from the reports folder to the open source InSpec visualization tool Heimdall Lite (GitHub) by the Mitre Corporation. An example of a compliance dashboard is shown below:
Scan Reports dashboard


Try it yourself and run the GKE CIS 1.1.0 Benchmark InSpec profile in your Google Cloud environment! Clone the repository and follow the CLI example in the Readme file to run the InSpec profiles against your GKE clusters. We also encourage you to report any issues on GitHub that you may find, suggest additional features and to contribute to the project using pull requests. Also, you can read our previous blog post on using InSpec-GCP for compliance validations of your GCP environment.

By Bakh Inamov, Security Specialist Engineer and Konrad Schieban, Infrastructure Cloud Consultant

The API Registry API

We’ve found that many organizations are challenged by the increasing number of APIs that they make and use. APIs become harder to track, which can lead to duplication rather than reuse. Also, as APIs expand to cover an ever-broadening set of topics, they can proliferate different design styles, at times creating frustrating inefficiencies.

To address this, we’ve designed the Registry API, an experimental approach to organizing information about APIs. The Registry API allows teams to upload and share machine-readable descriptions of APIs that are in use and in development. These descriptions include API specifications in standard formats like OpenAPI, the Google API Discovery Service Format, and the Protocol Buffers Language.

An organized collection of API descriptions can be the foundation for a wide range of tools and services that make APIs better and easier to use.
  • Linters verify that APIs follow standard patterns
  • Documentation generators provide documentation in consistent, easy-to-read, accessible formats
  • Code generators produce API clients and server scaffolding
  • Searchable online catalogs make everything easier to find
But perhaps most importantly, bringing everything about APIs together into one place can accelerate the consistency of an API portfolio. With organization-wide visibility, many find they need less explicit governance even as their APIs become more standardized and easy to use.

The Registry API is a gRPC service that is formally described by Protocol Buffers and that closely follows the Google API Design Guidelines at aip.dev. The Registry API description is annotated to support gRPC HTTP/JSON transcoding, which allows it to be automatically published as a JSON REST API using a proxy. Proxies also enable gRPC web, which allows gRPC calls to be directly made from browser-based applications, and the project includes an experimental GraphQL interface.

We’ve released a reference implementation that can be run locally or deployed in a container with Google Cloud Run or other container-based services. It stores data using the Google Cloud Datastore API or a configurable relational interface layer that currently supports PostgreSQL and SQLite.

Following AIP-181, we’ve set the Registry API’s stability level as "alpha," but our aim is to make it a stable base for API lifecycle applications. We’ve open-sourced our implementation to share progress and gather feedback. Please tell us about your experience if you use it.

By Tim Burks, Tech Lead – Apigee API Lifecycle and Governance

Season of Docs announces results of 2020 program

Season of Docs has announced the 2020 program results for standard-length projects. You can view a list of successfully completed technical writing projects on the website along with their final project reports.

Seasons of docs graphic
During the program, technical writers spend a few months working closely with an open source community. They bring their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

For standard-length technical writing projects in Season of Docs, the doc development phase is September 14, 2020 – November 30, 2020. However, some technical writers may apply for a long-running project. The technical writer makes this decision under consultation with the open source organization, based on the expectations for their project. For a long-running project, the doc development phase is September 14, 2020 – March 1, 2021.

64 technical writers successfully completed their standard-length technical writing projects. There are 18 long-running projects in progress that are expected to finish in March.
  • 80% of the mentors had a positive experience and want to mentor again in future Season of Docs cycles
  • 96% of the technical writers had a positive experience
  • 96% plan to continue contributing to open source projects
  • 94% of the technical writers said that Season of Docs helped improved their knowledge of code and/or open source
Take a look at the list of successful projects to see the wide range of subjects covered!

What is next?

The long-running projects are still in progress and finish in March 2021. Technical writers participating in these long-running projects submit their project reports before March 8th, and the writer and mentor evaluations are due by March 12th. Successfully completed long-running technical writing projects will be published on the results page on March 15, 2021.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

Stay tuned for information about Season of Docs 2021—watch for posts in this blog and sign up for the announcements email list.

By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Using MicroK8s with Anthos Config Management in the world of IoT

When dealing with large scale Kubernetes deployments, managing configuration and policy is often very complicated. We discussed why Kubernetes’ declarative approach to configuration as data has become the most popular choice for most users a few weeks ago. Today, we will discuss bringing this approach to your MicroK8 deployments using Anthos Config Management.
Image of Anthos Config Management + Cloud Source Repositories + MicroK8s
Anthos Config Management helps you easily create declarative security and operational policies and implement them at scale for your Kubernetes deployments across hybrid and multi-cloud environments. At a high level, you represent the desired state of your deployment as code committed to a central Git repository. Anthos Config Management will ensure the desired state is achieved and also maintained across all your registered clusters.

You can use Anthos Config Management for both your Kubernetes Engine (GKE) clusters as well as on Anthos attached clusters. Anthos attached clusters is a deployment option that extends Anthos’ reach into Kubernetes clusters running in other clouds as well as edge devices and the world of IoT, the Internet of Things. In this blog you will learn by experimenting with attached clusters with MicroK8s, a conformant Kubernetes platform popular in IoT and edge environments.

Consider an organization with a large number of distributed manufacturing facilities or laboratories that use MicroK8s to provide services to IoT devices. In such a deployment, Anthos can help you manage remote clusters directly from the Anthos Console rather than investing engineering resources to build out a multitude of custom tools.

Consider the diagram below.

Diagram of Anthos Config Management with MicroK8s on the Factory Floor with IoT
This diagram shows a set of “N” factory locations each with a MicroK8s cluster supporting IoT devices such as lights, sensors, or even machines. You register each of the MicroK8s clusters in an Anthos environ: a logical collection of Kubernetes clusters. When you want to deploy the application code to the MicroK8s clusters, you commit the code to the repository and Anthos Config Management takes care of the deployment across all locations. In this blog we will show you how you can quickly try this out using a MicroK8s test deployment.

We will use the following Google Cloud services:
  • Compute Engine provides an Ubuntu instance for a single-node MicroK8s cluster. Ubuntu will use cloud-init to install MicroK8s and generate shell scripts and other files to save time.
  • Cloud Source Repositories will provide the Git-based repository to which we will commit our workload.
  • Anthos Config Management will perform the deployment from the repository to the MicroK8s cluster.

Let’s start with a picture

Here’s a diagram of how these components fit together.

Diagram of how Anthos Config Management works together with MicroK8s
  • A workstation instance is created from which Terraform is used to deploy four components: (1) an IAM service account, (2) a Google Compute Engine Instance with MicroK8s using permissions provided by the service account, (3) a Kubernetes configuration repo provided by Cloud Source Repositories, and (4) a public/private key pair.
  • The GCE instance will use the service account key to register the MicroK8s cluster with an Anthos environ.
  • The public key from the public/ private key pair will be registered to the repository while the private key will be registered with the MicroK8s cluster.
  • Anthos Config Management will be configured to point to the repository and branch to poll for updates.
  • When a Kubernetes YAML document is pushed to the appropriate branch of the repository, Anthos Config Management will use the private key to connect to the repository, detect that a commit has been made against the branch, fetch the files and apply the document to the MicroK8s cluster.
Anthos Config Management enables you to deploy code from a Git repository to Kubernetes clusters that have been registered with Anthos. Google Cloud officially supports GKE, AKS, and EKS clusters, but you can use other conformant clusters such as MicroK8s in accordance with your needs. The repository below shows you how to register a single MicroK8s cluster to receive deployments. You can also scale this to larger numbers of clusters all of which can receive updates from commitments to the repository. If your organization has large numbers of IoT devices supported by Kubernetes clusters you can update all of them from the Anthos console to provide for consistent deployments across the organization regardless of the locations of the clusters, including the IoT edge. If you would like to learn more, you can build this project yourself. Please check out this Git repository and learn firsthand about how Anthos can help you manage Kubernetes deployments in the world of IoT.

By Jeff Levine, Customer Engineer – Google Cloud

Finding Critical Open Source Projects

Comic graphic of modern digital infrastructure
Open source software (OSS) has long suffered from a "tragedy of the commons" problem. Most organizations, large and small, make use of open source software every day to build modern products, but many OSS projects are struggling for the time, resources and attention they need. This is a resource allocation problem and Google, as part of Open Source Security Foundation (OpenSSF), can help solve it together. We need ways to connect critical open source projects we all rely on, with organizations that can provide them with adequate support.

Criticality of an open source project is difficult to define; what might be a critical dependency for one consumer of open source software may be entirely absent for another. However, arriving at a shared understanding and framework allows us to have productive conversations about our dependencies. Simply put, we define criticality to be the influence and importance of a project.

In order for OpenSSF to fund these critical open source projects, they need to be identified first. For this purpose, we are releasing a new project - “Criticality Score” under the OpenSSF. Criticality score indicates a project’s criticality (a number between 0 and 1) and is derived from various project usage metrics in a fully automated way. Our initial evaluation metrics include a project’s age, number of individual contributors and organizations involved, user involvement (in terms of new issue requests and updates), and a rough estimate of its dependencies using commit mentions. We also provide a way to add your own metric(s). For example, you can add internal project usage data to re-adjust a project's criticality score for individualized prioritization needs.

Identifying these critical projects is only the first step in making security improvements. OpenSSF is also exploring ways to provide maintainers of these projects with the resources they need. If you're a maintainer of a critical software package and are interested in getting help, funding, or infrastructure to run your project, reach out to the OpenSSF’s Securing Critical Projects working group here.

Check out the Criticality Score project on GitHub, including a list of critical open source projects. Though we have made some progress on this problem, we have not solved it and are eager for the community’s help in refining these metrics to identify critical open source projects.

By Abhishek Arya, Kim Lewandowski, Dan Lorenc and Julia Ferraioli – Google Open Source

Expanding Fuchsia’s open source model

Fuchsia is a long-term project to create a general-purpose, open source operating system, and today we are expanding Fuchsia’s open source model to welcome contributions from the public.

Fuchsia is designed to prioritize security, updatability, and performance, and is currently under active development by the Fuchsia team. We have been developing Fuchsia in the open, in our git repository for the last four years. You can browse the repository history at https://fuchsia.googlesource.com to see how Fuchsia has evolved over time. We are laying this foundation from the kernel up to make it easier to create long-lasting, secure products and experiences.

Starting today, we are expanding Fuchsia's open source model to make it easier for the public to engage with the project. We have created new public mailing lists for project discussions, added a governance model to clarify how strategic decisions are made, and opened up the issue tracker for public contributors to see what’s being worked on. As an open source effort, we welcome high-quality, well-tested contributions from all. There is now a process to become a member to submit patches, or a committer with full write access.

In addition, we are also publishing a technical roadmap for Fuchsia to provide better insights for project direction and priorities. Some of the highlights of the roadmap are working on a driver framework for updating the kernel independently of the drivers, improving file systems for performance, and expanding the input pipeline for accessibility.

Fuchsia is an open source project that is inclusive by design, from the architecture of the platform itself, to the open source community that we’re building. The project is still evolving rapidly, but the underlying principles and values of the system have remained relatively constant throughout the project. More information about the core architectural principles are available in the documentation: secure, updatable, inclusive, and pragmatic.

Fuchsia is not ready for general product development or as a development target, but you can clone, compile, and contribute to it. It has support for a limited set of x64-based hardware, and you can also test it with Fuchsia’s emulator. You can download and build the source code by following the getting started guide.

Fuchsia emulator startup with fx emu
If you would like to learn more about Fuchsia, join our mailing lists and browse the documentation at fuchsia.dev. You can now be part of the project and help build the future of this operating system. We are looking forward to receiving contributions from the community as we grow Fuchsia together.

By Wayne Piekarski, Developer Advocate for Fuchsia

OpenTitan at one year: the open source journey to secure silicon

During the past year, OpenTitan has grown tremendously as an open source project and is on track to provide transparent, trustworthy, and cost-free security to the broader silicon ecosystem. OpenTitan, the industry’s first open source silicon root of trust, has rapidly increased engineering contributions, added critical new partners, selected our first tapeout target, and published a comprehensive logical security model for the OpenTitan silicon, among other accomplishments.

OpenTitan by the Numbers

OpenTitan has doubled many metrics in the year since our public launch: in design size, verification testing, software test suites, documentation, and unique collaborators at least. Crucially, this growth has been both in the design verification collateral required for high volume production-quality silicon, as well as the digital design itself, a first for any open source silicon project.
  • More than doubled the number of commits at launch: from 2,500 to over 6,100 (across OpenTitan and the Ibex RISC-V core sub-project).
  • Grew to over 141K lines of code (LOC) of System Verilog digital design and verification.
  • Added 13 new IP blocks to grow to a total to 29 distinct hardware units.
  • Implemented 14 Device Interface Functions (DIFs) for a total 15 KLOC of C11 source code and 8 KLOC of test software.
  • Increased our design verification suite to over 66,000 lines of test code for all IP blocks.
  • Expanded documentation to over 35,000 lines of Markdown.
  • Accepted contributions from 52 new unique contributors, bringing our total to 100.
  • Increased community presence as shown by an aggregate of over 1,200 Github stars between OpenTitan and Ibex.
Chart that shows: One year of OpenTitan and Ibex growth on GitHub: the total number of commits grew from 2,500 to over 6,100
One year of OpenTitan and Ibex growth on GitHub: the total number of commits grew from 2,500 to over 6,100.
High quality development is one of OpenTitan’s core principles. Besides our many style guides, we require thorough documentation and design verification for each IP block. Each piece of hardware starts with auto-generated documentation to ensure consistency between documentation and design, along with extensive, progressively improving, design verification as it advances through the OpenTitan hardware stages to reach tapeout readiness.
One year of growth in Design Verification: from 30,000 to over 65,000 lines of testing source code. Each color represents design verification for an individual IP block.

Innovating for Open Silicon Development

Besides writing code, we have made significant advances in developing processes and security framework for high quality, secure open source silicon development. Design success is not just measured by the hardware, highly functional software and a firm contract between the two, with well-defined interfaces and well-understood behavior, play an important role.

OpenTitan’s hardware-software contract is realized by our DIF methodology, yet another way in which we ensure hardware IP quality. DIFs are a form of hardware-software co-design and the basis of our chip-level design verification testing infrastructure. Each OpenTitan IP block requires a style guide-compliant DIF, and this year we implemented 14 DIFs for a total 15 KLOC of C11 source code and 8 KLOC of tests.

We also reached a major milestone by publishing an open Security Model for a silicon root of trust, an industry first. This comprehensive guidance demonstrates how OpenTitan provides the core security properties required of a secure root of trust. It covers provisioning, secure boot, device identity, and attestation, and our ownership transfer mechanism, among other topics.

Expanding the OpenTitan Ecosystem

Besides engineering effort and methodology development, the OpenTitan coalition added two new Steering Committee members in support of lowRISC as an open source not-for-profit organization. Seagate, a leader in storage technology, and Giesecke and Devrient Mobile Security, a major producer of certified secure systems. We also chartered our Technical Committee to steer technical development of the project. Technical Committee members are drawn from across our organizational and individual contributors, approving 9 technical RFCs and adding 11 new project committers this past year.

On the strength of the OpenTitan open source project’s engineering progress, we are excited to announce today that Nuvoton and Google are collaborating on the first discrete OpenTitan silicon product. Much like the Linux kernel is itself not a complete operating system, OpenTitan’s open source design must be instantiated in a larger, complete piece of silicon. We look forward to sharing more on the industry’s first open source root of trust silicon tapeout in the coming months.

Onward to 2021

OpenTitan’s future is bright, and as a project it fully demonstrates the potential for open source design to enable collaboration across disparate, geographically far flung teams and organizations, to enhance security through transparency, and enable innovation in the open. We could not do this without our committed project partners and supporters, to whom we owe all this progress: Giesecke and Devrient Mobile Security, Western Digital, Seagate, the lowRISC CIC, Nuvoton, ETH Zürich, and many independent contributors.

Interested in contributing to the industry's first open source silicon root of trust? Contact us here.

By Dominic Rizzo, OpenTitan Lead – Google Cloud

Announcing the Atheris Python Fuzzer

Fuzz testing is a well-known technique for uncovering programming errors. Many of these detectable errors have serious security implications. Google has found thousands of security vulnerabilities and other bugs using this technique. Fuzzing is traditionally used on native languages such as C or C++, but last year, we built a new Python fuzzing engine. Today, we’re releasing the Atheris fuzzing engine as open source.

What can Atheris do?

Atheris can be used to automatically find bugs in Python code and native extensions. Atheris is a “coverage-guided” fuzzer, which means that Atheris will repeatedly try various inputs to your program while watching how it executes, and try to find interesting paths.

One of the best uses for Atheris is for differential fuzzers. These are fuzzers that look for differences in behavior of two libraries that are intended to do the same thing. One of the example fuzzers packaged with Atheris does exactly this to compare the Python “idna” package to the C “libidn2” package. Both of these packages are intended to decode and resolve internationalized domain names. However, the example fuzzer idna_uts46_fuzzer.py shows that they don’t always produce the same results. If you ever decided to purchase a domain containing (Unicode codepoints [U+0130, U+1df9]), you’d discover that the idna and libidn2 libraries resolve that domain to two completely different websites.

In general, Atheris is useful on pure Python code whenever you have a way of expressing what the “correct” behavior is - or at least expressing what behaviors are definitely not correct. This could be as complex as custom code in the fuzzer that evaluates the correctness of a library’s output, or as simple as a check that no unexpected exceptions are raised. This last case is surprisingly useful. While the worst outcome from an unexpected exception is typically denial-of-service (by causing a program to crash), unexpected exceptions tend to reveal more serious bugs in libraries. As an example, the one YAML parsing library we tested Atheris on says that it will only raise YAMLErrors; however, yaml_fuzzer.py detects numerous other exceptions, such as ValueError from trying to interpret “-_” as an integer, or TypeError from trying to use a list as a key in a dict. (Bug report.) This indicates flaws in the parser.

Finally, Atheris supports fuzzing native Python extensions, using libFuzzer. libFuzzer is a fuzzing engine integrated into Clang, typically used for fuzzing C or C++. When using libFuzzer with Atheris, Atheris can still find all the bugs previously described, but can also find memory corruption bugs that only exist in native code. Atheris supports the Clang sanitizers Address Sanitizer and Undefined Behavior Sanitizer. These make it easy to detect corruption when it happens, rather than far later. In one case, the author of this document found an LLVM bug using an Atheris fuzzer (now fixed).

What does Atheris support?

Atheris supports fuzzing Python code and native extensions in Python 2.7 and Python 3.3+. When fuzzing Python code, using Python 3.8+ is strongly recommended, as it allows for much better coverage information. When fuzzing native extensions, Atheris can be used in combination with Address Sanitizer or Undefined Behavior Sanitizer.

OSS-Fuzz is a fuzzing service hosted by Google, where we execute fuzzers on open source code free of charge. OSS-Fuzz will soon support Atheris!

How can I get started?

Take a look at the repo, in particular the examples. For fuzzing pure Python, it’s as simple as:

pip3 install atheris

And then, just define a TestOneInput function that runs the code you want to fuzz:

import atheris
import sys


def TestOneInput(data):
    if data == b"bad":
        raise RuntimeError("Badness!")


atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()

That’s it! Atheris will repeatedly invoke TestOneInput and monitor the execution flow, until a crash or exception occurs.

For more details, including how to fuzz native code, see the README.


By Ian Eldred Pudney, Google Information Security

From MLPerf to MLCommons: moving machine learning forward

Today, the community of machine learning researchers and engineers behind the MLPerf benchmark is launching an open engineering consortium called MLCommons. For us, this is the next step in a journey that started almost three years ago.


Early in 2018, we gathered a group of industry researchers and academics who had published work on benchmarking machine learning (ML), in a conference room to propose the creation of an industry standard benchmark to measure ML performance. Everyone had doubts: creating an industry standard is challenging under the best conditions and ML was (and is) a poorly understood stochastic process running on extremely diverse software and hardware. Yet, we all agreed to try.

Together, along with a growing community of researchers and academics, we created a new benchmark called MLPerf. The effort took off. MLPerf is now an industry standard with over 2,000 submitted results and multiple benchmarks suites that span systems from smartphones to supercomputers. Over that time, the fastest result submitted to MLPerf for training the classic ML network ResNet improved by over 13x.

We created MLPerf because we believed in three principles:
  • Machine learning has tremendous potential: Already, machine learning helps billions of people find and understand information through tools like Google’s search engine and translation service. Active research in machine learning could one day save millions of lives through improvements in healthcare and automotive safety.
  • Transforming machine learning from promising research into wide-spread industrial practice requires investment in common infrastructure -- especially metrics: Much like computing in the ‘80s, real innovation is mixed with hype and adopting new ideas is slow and cumbersome. We need good metrics to identify the best ideas, and good infrastructure to make adoption of new techniques fast and easy.
  • Developing common infrastructure is best done by an open, fast-moving collaboration: We need the vision of academics and the resources of industry. We need the agility of startups and the scale of leading tech companies. Working together, a diverse community can develop new ideas, launch experiments, and rapidly iterate to arrive at shared solutions.
Our belief in the principles behind MLPerf has only gotten stronger, and we are excited to be part of the next step for the MLPerf community with the launch of MLCommons.

MLCommons aims to accelerate machine learning to benefit everyone. MLCommons will build a a common set of tools for ML practitioners including:
  • Benchmarks to measure progress: MLCommons will leverage MLPerf to measure speed, but also expand benchmarking other aspects of ML such as accuracy and algorithmic efficiency. ML models continue to increase in size and consequently cost. Sustaining growth in capability will require learning how to do more (accuracy) with less (efficiency).
  • Public datasets to fuel research: MLCommons new People’s Speech project seeks to develop a public dataset that, in addition to being larger than any other public speech dataset by more than an order of magnitude, better reflects diverse languages and accents. Public datasets drive machine learning like nothing else; consider ImageNet’s impact on the field of computer vision. 
  • Best practices to accelerate development: MLCommons will make it easier to develop and deploy machine learning solutions by fostering consistent best practices. For instance, MLCommons’ MLCube project provides a common container interface for machine learning models to make them easier to share, experiment with (including benchmark), develop, and ultimately deploy.
Google believes in the potential of machine learning, the importance of common infrastructure, and the power of open, collaborative development. Our leadership in co-founding, and deep support in sustaining, MLPerf and MLCommons has echoed our involvement in other efforts like TensorFlow and NNAPI. Together with the MLCommons community, we can improve machine learning to benefit everyone.

Want to get involved? Learn more at mlcommons.org.


By Peter Mattson – ML Metrics, Naveen Kumar – ML Performance, and Cliff Young – Google Brain

Best practices for accessibility for virtual events

As everyone knows, most of our open source events have transformed from in-person to digital this year. However, due to the sudden change, not everything is accessible. We took this issue seriously and decided to work with one of our accessibility experts, Neighborhood Access, to share best practices for our community. We hope this will help you organize your digital events!


Virtual events can be a wonderful way to make information more accessible to people, including attendees with disabilities, but hosts often fall short in providing the appropriate logistical support needed for this audience to fully engage in the programming. In this article, we’ll go over some best practices for hosting virtual events to make them accessible to the d/Deaf/Hard of Hearing, Blind/Low Vision, Developmentally Disabled, and Neurodivergent communities. Since some of these accommodations require a bit of planning ahead of the event, we’ll use a timeline format to discuss what needs to be arranged and when.

Initial Planning Stages

Selecting a Platform
When selecting a video conferencing platform for your virtual event, you will want to make sure the platform has the following features:
  1. Ability to dial into the event via phone (see Accessible Event Links for more info)
  2. Ability to assign captioning to someone in the call or via a third party within the platform
Working With Your Speakers
Once you read through the rest of this guide and decide what accessibility tools you’ll use, you should brief your speakers on those practices so everyone is on the same page. For example, if you are going to do an image description of yourself (as detailed in Incorporating Audio Cues), make sure your guests do the same to provide a consistently accessible experience.

At Least 2 Weeks Before the Event

Disability Accommodation Requests
If you are hosting a large event with many attendees, you should provide accommodations like interpreters and audio description without people having to request them. If you are hosting a smaller event (<20 guests), the RSVP form for your event should link to a separate form where you will track accommodation requests. This should be a simple form that collects no identifying information about the person filling it out. A person should not have to disclose their disability in order to request accommodations; an anonymous form will get you all the information you need. The one question you need to ask is: “Are there any disability accommodations you need us to provide in order for you to fully participate in this event?” You should aim to collect this information at least two weeks before the event, so that you have ample time to arrange necessary accommodations.

Accessible Event Links
When sending a link to join a video call, be sure to include the part of the link that includes the dial-in number if you will not be providing an interpreter for the event. d/Deaf/HoH people who can access the Video Relay System (VRS) will need this number to be able to dial their interpreter into the meeting, and it is much easier for these attendees to have this ahead of time rather than trying to ask for it on the day of the event.

American Sign Language (ASL) Interpreters
This section applies to events that have a mostly-US-based audience. If you have an international audience, you will need to ask attendees what nationality of sign language is needed. Interpreters are booked to do everything from attending doctor’s appointments with d/Deaf/HoH patients to interpreting for larger scale events. Therefore, they are quite busy, and require booking a few weeks before you need their services. One of the easiest ways to find an interpreter is through your state or city’s Deaf and Hard of Hearing Services provider. Most agencies are called “(State name) Deaf and Hard of Hearing Services”, and if you search this term you should find the right agency. Most agencies will have you fill out a short form with a few details about your event, and then they will work to connect you with an available interpreter. Once connected, the interpreter may ask you a few additional questions about the event, so they can have any field-specific signs and names prepared ahead of time.

Live Captioning
There are several companies that offer live captioning (sometimes referred to as CART, or Communication Access Real-time Translation) for virtual events. If you are able to hire one of these services, they are your best bet—they know the ins and outs of the technology—and this gives you one less thing to stress about on the day of your event. They will usually schedule a time to perform a test run with you a few days ahead of the event, so you both can be sure things are in working order.

If you are unable to hire a service to provide live captions, most online meeting platforms give you the option to assign captioning to someone. You can have a volunteer transcribe the event, and the captions will show up for anyone who opts to see them.

Interpreting vs Live Captioning
There is a hierarchy in terms of what services to utilize in the event that you must choose between them. The best case scenario, and what you should absolutely strive for, is to have both an interpreter and live captioning. There are many people who can benefit from captions who may not benefit from sign language interpreting, and vise versa. In the event that you have to choose either an interpreter or live captioning, go with live captioning, as more people will be able to benefit from it.

One Week Before the Event

Make Your Visual Materials Accessible
There are a few steps you’ll need to take to ensure that your visual materials (slideshows and pictures) are accessible to Blind/Low Vision attendees. Many Blind/Low Vision people use screen readers, which read aloud the text that is on a screen. Since you cannot use screen readers during a presentation (both because the reader cannot read text in video format and because attendees will want to hear what you’re saying), you will need to provide a copy of your visual materials to attendees either before or after the event so they can review them. If you are unable to share the exact materials, try to share a version of them that has the same text. This ensures that everyone has full access to all event materials, albeit possibly at different times. Before sharing visual materials, be sure of two things:
  1. You are not sharing an image file of text. If you are sharing a slideshow, for example, do not send a PDF printout of the slides (screen readers will not recognize these as text). Instead, download and share a version of the slideshow and send that directly.
  2. You must add alt text or captions to any pictures. How you will do this varies from software to software, but instructions can usually be found on the software’s website or help forum. Here is a guide on how to do this in Google Slides..
Format Visual Elements
People with visual processing disorders, such as dyslexia, may find some fonts harder to read than others. While needs can vary from person to person, it is generally agreed that sans serif fonts (Arial) are easier to read than serif fonts (Times New Roman). Use sans serif fonts wherever possible in your visual components.

Timing the Event
Most people can benefit from a five minute break every half hour or so (the 30:5 Rule), but this is especially true for people with disabilities that impact their need to go to the bathroom, and also people who struggle with focus. Following the 30:5 Rule ensures that people have time to take care of their needs throughout the event, and that everyone will be continually focused and refreshed. Schedule these five minute breaks into your event timeline.

During the Event

Incorporating Audio Cues
When you first introduce yourself at the event, provide a verbal description of yourself and your surroundings. This allows Blind/Low Vision attendees to learn key visual characteristics about each presenter, much like how a sighted person might remember someone by their statement necklace or unique hairstyle. Here is a format you can follow:
“Hi, I’m (name). I’m going to do a quick image description of myself for any Blind/Low-Vision attendees. I’m a (race) (gender), and I’m wearing (color of shirt, notable accessories). Behind me is (color of wall, clock, etc).”
Additionally, when switching between speakers, it is crucial that you briefly state your name before talking. Many peoples’ voices sound similar, and this practice is helpful to Blind/Low Vision people who need to know who is speaking. This is also important for d/Deaf/HoH attendees who are utilizing their own interpreter via VRS, because the interpreter needs to be able to sign to them the name of the person speaking.

Content Warnings
If your presentation includes very loud noises, flashing lights, or rapidly-transitioning imagery, you need to give attendees a 10-second warning before presenting that content. This is a safety measure for people who are prone to seizures and others who are sensitive to these elements.

Verbally Highlighting Key Visual Features
If you are sharing an important image with attendees (a chart or graph, for example), be sure to verbally describe all of the important information the image relays: general trends, names of data groups, axis titles, etc. Additionally, be sure to at least summarize the text on screen (if applicable)--not everyone is able to read and listen at the same time. Blind/Low Vision attendees will not be able to fully participate in your event if they are not getting the same access to information as sighted people are.

Be Prepared to Answer Questions from the Interpreter
Sometimes, an interpreter may need to ask for clarification on something you said, either because your audio cut out, or because you used a term they have not heard before and they want to make sure they are signing it correctly. Answer interpreter questions right away, so d/Deaf/HoH attendees are able to keep up with what you’re saying.

By following these guidelines, you will ensure that your event is inclusive and engaging for all attendees. If you have questions on how to implement some of these measures, or about how your organization can benefit from becoming more accessible, visit neighborhoodaccess.org to chat with our Accessibility Consulting Team. Let’s work together to create virtual events that work for everyone!

Introduction by Teresa Terasaki, Google Open Source Programs Office
Guidelines by guest author Juliana Good, Founder and Consulting Lead – Neighborhood Access