Tag Archives: Open source

Announcing the First Group of Google Open Source Peer Bonus winners in 2021!

 

Google Open Source Peer Bonus logo


The Google Open Source Peer Bonus program is designed to reward external open source contributors nominated by Googlers for their exceptional contributions to open source. We are very excited to announce our first group of winners in 2021!

Our current winners have contributed to a wide range of projects including Apache Beam, Kubernetes, Tekton and many others. We reward open source enthusiasts not only for their code contributions, but also community work, documentation, mentorships and other types of engagement.

We have award recipients from 25 countries all over the world: Austria, Canada, China, Cyprus, Denmark, Finland, France, Germany, India, Isle of Man, Italy, Japan, Korea, Netherlands, Norway, Russia, Singapore, Spain, Sweden, Switzerland, Uganda, Taiwan, Ukraine, United Kingdom, and the United States.

Open source encourages innovation through collaboration and our modern world, and technology that we rely on, wouldn’t be the same without you—the contributors, who are in many cases volunteers. We would like to thank you for your hard work and congratulate you on receiving this award!

Below is the list of current winners who gave us permission to thank them publicly:

WinnerProject
Kashyap JoisAndroid FHIR SDK
David AllisonAnkiDroid
Chad DombrovaApache Beam
Jeff KlukasApache Beam
Steve NiemitzApache Beam
Yoshiki ObataApache Beam
Jaskirat SinghCHAOSS - Community Health Analytics Open Source Software
Eric AmordeCocoaPods
Subrata Banikcoreboot
Ned BatchelderCoverage.py & related CPython internals
Matthew BryantCursedChrome
Simon Legnerdevdocs.io
Dmitry GutovEmacs/company-mode
Brian JostFirebase
Joe HinkleFirebase iOS SDK
Lorenzo FiamigoFirebase iOS SDK
Mike GerasymenkoFirebase iOS SDK
Morten Bek DitlevsenFirebase iOS SDK
Angel PonsFlashrom
Ole André Vadla RavnåsFrida
Junegunn Choifzf
Alex SaveauGradle Play Publisher
Nate GrahamKDE
Amit SagtaniKDE Community
Niklas HanssonKubeflow Pipelines
William TeoKubeflow Pipelines
Antonio OjeaKubernetes
Dan MangumKubernetes
Jian ZengKubernetes
Darrell Commanderlibjpeg-turbo
James (purpleidea)mgmt
Kareem ErgawyMLIR
Lily BallardNix / Fish
Eelco DolstraNix, NixOS, Nixpkgs
Samuel Dionne-RielNixOS
Dmitry DemenskyOpen source TypeScript definitions for Google Maps Platform
Kay WilliamsOpenSSF
Hassan Kibirigeplotnine
Henry Schreinerpybind11
Paul MoorePython 'pip' project
Tzu-ping ChungPython 'pip' project
Alex GrönholmPython 'wheel' project
Ramon Santamariaraylib
Alexander Weissrestic
Michael Eischerrestic
Ben Leshrxjs
Takeshi Nakatanis3fs
Daniel Wee Soong LimSymbiFlow
Unai Martinez-CorralSymbiFlow, Surelog, Verible, more
Andrea FrittoliTekton
Priti DesaiTekton
Vincent DemeesterTekton
Chengyu Zhangtestsmt & testsmt/yinyang
Dominik Winterertestsmt & testsmt/yinyang
Tom RiniU-Boot

Thank you for your contributions to open source!

By Maria Tabak — Google Open Source Programs Office

Announcing the First Group of Google Open Source Peer Bonus winners in 2021!

 

Google Open Source Peer Bonus logo


The Google Open Source Peer Bonus program is designed to reward external open source contributors nominated by Googlers for their exceptional contributions to open source. We are very excited to announce our first group of winners in 2021!

Our current winners have contributed to a wide range of projects including Apache Beam, Kubernetes, Tekton and many others. We reward open source enthusiasts not only for their code contributions, but also community work, documentation, mentorships and other types of engagement.

We have award recipients from 25 countries all over the world: Austria, Canada, China, Cyprus, Denmark, Finland, France, Germany, India, Isle of Man, Italy, Japan, Korea, Netherlands, Norway, Russia, Singapore, Spain, Sweden, Switzerland, Uganda, Taiwan, Ukraine, United Kingdom, and the United States.

Open source encourages innovation through collaboration and our modern world, and technology that we rely on, wouldn’t be the same without you—the contributors, who are in many cases volunteers. We would like to thank you for your hard work and congratulate you on receiving this award!

Below is the list of current winners who gave us permission to thank them publicly:

WinnerProject
Kashyap JoisAndroid FHIR SDK
David AllisonAnkiDroid
Chad DombrovaApache Beam
Jeff KlukasApache Beam
Steve NiemitzApache Beam
Yoshiki ObataApache Beam
Jaskirat SinghCHAOSS - Community Health Analytics Open Source Software
Eric AmordeCocoaPods
Subrata Banikcoreboot
Ned BatchelderCoverage.py & related CPython internals
Matthew BryantCursedChrome
Simon Legnerdevdocs.io
Dmitry GutovEmacs/company-mode
Brian JostFirebase
Joe HinkleFirebase iOS SDK
Lorenzo FiamigoFirebase iOS SDK
Mike GerasymenkoFirebase iOS SDK
Morten Bek DitlevsenFirebase iOS SDK
Angel PonsFlashrom
Ole André Vadla RavnåsFrida
Junegunn Choifzf
Alex SaveauGradle Play Publisher
Nate GrahamKDE
Amit SagtaniKDE Community
Niklas HanssonKubeflow Pipelines
William TeoKubeflow Pipelines
Antonio OjeaKubernetes
Dan MangumKubernetes
Jian ZengKubernetes
Darrell Commanderlibjpeg-turbo
James (purpleidea)mgmt
Kareem ErgawyMLIR
Lily BallardNix / Fish
Eelco DolstraNix, NixOS, Nixpkgs
Samuel Dionne-RielNixOS
Dmitry DemenskyOpen source TypeScript definitions for Google Maps Platform
Kay WilliamsOpenSSF
Hassan Kibirigeplotnine
Henry Schreinerpybind11
Paul MoorePython 'pip' project
Tzu-ping ChungPython 'pip' project
Alex GrönholmPython 'wheel' project
Ramon Santamariaraylib
Alexander Weissrestic
Michael Eischerrestic
Ben Leshrxjs
Takeshi Nakatanis3fs
Daniel Wee Soong LimSymbiFlow
Unai Martinez-CorralSymbiFlow, Surelog, Verible, more
Andrea FrittoliTekton
Priti DesaiTekton
Vincent DemeesterTekton
Chengyu Zhangtestsmt & testsmt/yinyang
Dominik Winterertestsmt & testsmt/yinyang
Tom RiniU-Boot

Thank you for your contributions to open source!

By Maria Tabak — Google Open Source Programs Office

Analyzing genomic data in families with deep learning

The Genomics team at Google Health is excited to share our latest expansion to DeepVariant - DeepTrio.

First released in 2017, DeepVariant is an open source tool that enables researchers and clinicians to analyze an individual’s genome sequencing data and identify genetic variants, such as those that may cause disease. Our continued work on DeepVariant has been recognized for its top-of-class accuracy. With DeepTrio, we have expanded DeepVariant to be able to consider the genetic variants in the sequence data of a mother-father-child trio.

Humans are diploid organisms, carrying two copies of the human genome. Every individual inherits one copy of the genome from their mother, and the other from their father. Parental inheritance informs analysis of traits and diseases that follow Mendelian inheritance. DeepTrio learns to use the properties of Mendelian inheritance directly from sequencing data in order to more accurately identify genetic variants in cases when both parent and a child sample can be co-analyzed.

Modifying DeepVariant to analyze trio samples

DeepVariant learns to classify positions in a genome as reference or variant using representations of data similar to the “genome browser” which experts use in analysis. “Improving the Accuracy of Genomic Analysis with DeepVariant 1.0” provides a good overview.

DeepVariant receives data as a window of the genome centered on a candidate variant which it is asked to classify as either reference (no variant), heterozygous (one copy of a variant) or homozygous (both copies are variant). DeepVariant sees the sequence evidence as channels representing features of the data (see: “Looking through DeepVariant’s eyes” for a deeper explanation).

We modified DeepTrio to represent the sequence data from a trio in a single image, with a fixed height for each sample and the child in the middle. Using gold standard samples from NIST Genome in a Bottle for truth labels, we train one model to call variants in the child and another to call variants in the top parent. To call both parents, we flip the position of the parent samples.

An image of 4 of the channels that DeepTrio uses in classification (these, and 4 other channels are shown in a stack.

conceptual schematic of how trio files are used to create examples, which are then called by DeepTrio.

Figure 1. (top) An image of 4 of the channels that DeepTrio uses in classification (these, and 4 other channels are shown in a stack. (bottom) conceptual schematic of how trio files are used to create examples, which are then called by DeepTrio.

Measuring DeepTrio’s improved accuracy

We show that DeepTrio is more accurate than DeepVariant for both parent and child variant detection, with an especially pronounced advantage at lower coverages. This enables researchers to either analyze samples at higher accuracy, or to maintain comparable accuracy at a substantially reduced expense.

To assess the accuracy of DeepTrio, we compare its accuracy to DeepVariant using extensively characterized gold standards made available by NIST Genome in a Bottle. In order to have an evaluation dataset which is never seen in training, we exclude chromosome 20 from training and perform evaluations on chromosome 20.

We train DeepVariant and DeepTrio for sequencing data from two different instruments, Illumina and Pacific Biosciences (PacBio), for more information on the differences between these technologies, please see our previous blog. These sequencers both randomly sample the genome in an error-prone manner. To accurately analyze a genome, the same region needs to be sampled repeatedly. The depth of sampling at a position is called coverage. Sequencing to greater coverage is more expensive in an approximately linear manner. This often forces trade-offs between cost, accuracy, and samples sequenced. As a result, in trios parents are often sequenced at lower depth.

In the charts below, we plot the accuracy of DeepTrio and DeepVariant across a range of coverages.

DeepTrio child accuracy

DeepTrio parent accuracy

Figure 2. F1-score for DeepTrio (solid line) and DeepVariant (dashed line) on a child sample (top) and a parent sample (bottom), sequenced with an Illumina (blue) and PacBio (black) instrument. F1 is measured for all types of small variants on chromosome 20, across samples with a range of sequencing coverage (x-axis).

DeepTrio’s performance on de novo variants

Each individual has roughly 5 million variants relative to the human reference genome. The overwhelming majority of these are inherited from their parents. A small number, around 100, are new (referred to as de novo), due to copying errors during DNA replication. We demonstrate that DeepTrio substantially reduces false positives for de novo variants. For Illumina data, this comes with a smaller decrease in recovery of true positives, while for PacBio data, this trade-off does not occur.

To assess accuracy we analyzed sites where both parents are called as non-variant, but the child is called as heterozygous variant. We observe that DeepTrio is more reluctant to call a variant as de novo, which is similar to how a human would require a higher level of evidence for sites violating Mendelian inheritance. This results in a much lower false positive rate for these de novo variants, but a slightly lower recall rate in DeepTrio Illumina. Usually when this occurs, the child is still called as a variant, but the parents are given “no-call” (the classifier is not confident enough to make a call).

Accuracy on de novo calls (child heterozygous variant, parents reference call) for recall of true de novo events


Accuracy on de novo calls (child heterozygous variant, parents reference call) for recall of false positive de novo events

Figure 3. Accuracy on de novo calls (child heterozygous variant, parents reference call) for recall of true de novo events (top) and false positive de novo events (bottom) for DeepTrio (solid line) and DeepVariant (dashed line) on Illumina (blue) and PacBio (black). Accuracy is measured on chromosome 20, across samples with a range of sequencing coverage (x-axis).

Contributing to rare disease research

By releasing DeepTrio as open source software, we hope to improve analysis of genomic data, by allowing scientists to more accurately analyze samples. We hope this will enable research and clinical pipelines, leading to better resolution of rare disease cases, and improve development of therapeutics.

In addition to the release of DeepTrio’s code as open source, we have also released the sequencing data that we generated in order to train these models. That data is described in our pre-print “An Extensive Sequence Dataset of Gold-Standard Samples for Benchmarking and Development”. By releasing both this production model, and the data required to train models of similar complexity, we hope to contribute to methods development by the genomics community.

By Andrew Carroll, Product Lead Genomics and Howard Yang, Program Manager Genomics — Google Health

Lyra – enabling voice calls for the next billion users

 

Lyra Logo

The past year has shown just how vital online communication is to our lives. Never before has it been more important to clearly understand one another online, regardless of where you are and whatever network conditions are available. That’s why in February we introduced Lyra: a revolutionary new audio codec using machine learning to produce high-quality voice calls.

As part of our efforts to make the best codecs universally available, we are open sourcing Lyra, allowing other developers to power their communications apps and take Lyra in powerful new directions. This release provides the tools needed for developers to encode and decode audio with Lyra, optimized for the 64-bit ARM android platform, with development on Linux. We hope to expand this codebase and develop improvements and support for additional platforms in tandem with the community.

The Lyra Architecture

Lyra’s architecture is separated into two pieces, the encoder and decoder. When someone talks into their phone the encoder captures distinctive attributes from their speech. These speech attributes, also called features, are extracted in chunks of 40ms, then compressed and sent over the network. It is the decoder’s job to convert the features back into an audio waveform that can be played out over the listener’s phone speaker. The features are decoded back into a waveform via a generative model. Generative models are a particular type of machine learning model well suited to recreate a full audio waveform from a limited number of features. The Lyra architecture is very similar to traditional audio codecs, which have formed the backbone of internet communication for decades. Whereas these traditional codecs are based on digital signal processing (DSP) techniques, the key advantage for Lyra comes from the ability of the generative model to reconstruct a high-quality voice signal.

Lyra Architecture Chart

The Impact

While mobile connectivity has steadily increased over the past decade, the explosive growth of on-device compute power has outstripped access to reliable high speed wireless infrastructure. For regions where this contrast exists—in particular developing countries where the next billion internet users are coming online—the promise that technology will enable people to be more connected has remained elusive. Even in areas with highly reliable connections, the emergence of work-from-anywhere and telecommuting have further strained mobile data limits. While Lyra compresses raw audio down to 3kbps for quality that compares favourably to other codecs, such as Opus, it is not aiming to be a complete alternative, but can save meaningful bandwidth in these kinds of scenarios.

These trends provided motivation for Lyra and are the reason our open source library focuses on its potential for real time voice communication. There are also other applications we recognize Lyra may be uniquely well suited for, from archiving large amounts of speech, and saving battery by leveraging the computationally cheap Lyra encoder, to alleviating network congestion in emergency situations where many people are trying to make calls at once. We are excited to see the creativity the open source community is known for applied to Lyra in order to come up with even more unique and impactful applications.

The Open Source Release

The Lyra code is written in C++ for speed, efficiency, and interoperability, using the Bazel build framework with Abseil and the GoogleTest framework for thorough unit testing. The core API provides an interface for encoding and decoding at the file and packet levels. The complete signal processing toolchain is also provided, which includes various filters and transforms. Our example app integrates with the Android NDK to show how to integrate the native Lyra code into a Java-based android app. We also provide the weights and vector quantizers that are necessary to run Lyra.

We are releasing Lyra as a beta version today because we wanted to enable developers and get feedback as soon as possible. As a result, we expect the API and bitstream to change as it is developed. All of the code for running Lyra is open sourced under the Apache license, except for a math kernel, for which a shared library is provided until we can implement a fully open solution over more platforms. We look forward to seeing what people do with Lyra now that it is open sourced. Check out the code and demo on GitHub, let us know what you think, and how you plan to use it!

By Andrew Storus and Michael Chinen – Chrome

Acknowledgements

The following people helped make the open source release possible:
Yero Yeh, Alejandro Luebs, Jamieson Brettle, Tom Denton, Felicia Lim, Bastiaan Kleijn, Jan Skoglund, Yaowu Xu, Jim Bankoski (Chrome), Chenjie Gu, Zach Gleicher, Tom Walters, Norman Casagrande, Luis Cobo, Erich Elsen (DeepMind).

Introducing TestParameterInjector: A JUnit4 parameterized test runner

 When writing unit tests, you may want to run the same or a very similar test for different inputs or input/output pairs. In Java, as in most programming languages, the best way to do this is by using a parameterized test framework.

JUnit4 has a number of such frameworks available, such as junit.runners.Parameterized and JUnitParams. A couple of years ago, a few Google engineers found the existing frameworks lacking in functionality and simplicity, and decided to create their own alternative. After a lot of tweaks, fixes, and feature additions based on feedback from clients all over Google, we arrived at what TestParameterInjector is today.

As can be seen in the graph below, TestParameterInjector is now the most used framework for new tests in the Google codebase:

Graph of the different parameterized test frameworks in Google

How does TestParameterInjector work?

The TestParameterInjector exposes two annotations: @TestParameter and @TestParameters. The following code snippet shows how the former works:


@RunWith(TestParameterInjector.class)

public class MyTest {


  @TestParameter boolean isDryRun;


  @Test public void test1(@TestParameter boolean enableFlag) {

    // This method is run 4 times for all combinations of isDryRun and enableFlag

  }


  @Test public void test2(@TestParameter MyEnum myEnum) {

    // This method is run 6 times for all combinations of isDryRun and myEnum

  }


  enum MyEnum { VALUE_A, VALUE_B, VALUE_C }

}


Annotated fields (such as isDryRun) will cause each test method to run for all possible values while annotated method parameters (such as enableFlag) will only impact that test method. Note that the generated test names will typically be helpful but concise, for example: MyTest#test2[isDryRun=true, VALUE_A].

The other annotation, @TestParameters, can be seen at work in this snippet:

@RunWith(TestParameterInjector.class)

public class MyTest {


  @Test

  @TestParameters({

    "{age: 17, expectIsAdult: false}",

    "{age: 22, expectIsAdult: true}",

  })

  public void personIsAdult(int age, boolean expectIsAdult) {

    // This method is run 2 times

  }

}


In contrast to the first example, which tests all combinations, a @TestParameters-annotated method runs once for each test case specified.

How does TestParameterInjector compare to other frameworks?

To our knowledge, the table below summarizes the features of the different frameworks in use at Google:

TestParameterInjector

junit.runners. Parameterized

JUnitParams

Burst

DataProvider

Jukito

Theories

Documentation

GitHub

GitHub

GitHub

GitHub

GitHub

GitHub

junit.org

Supports field injection





Supports parameter injection


Considers sets of parameters correlated or orthogonal

both are supported

correlated

correlated

orthogonal

correlated

orthogon

al

orthogonal

Refactor friendly

(✓)




Learn more

Our GitHub README at https://github.com/google/TestParameterInjector gives an overview of the possibilities of this framework. Let us know on GitHub if you have any questions, comments, or feature requests!

By Jens Nyman, TestParameterInjector team

1Parameters are considered dependent. You specify explicit combinations to be run.
2Parameters are considered independent. The framework will run all possible combinations of parameters

Student applications for Google Summer of Code 2021 are now open!

Student applications for Google Summer of Code (GSoC) 2021 are now open!

Google Summer of Code introduces students from around the world to open source communities. The program exposes students to real-world software development scenarios, helps them develop their technical skills, and introduces them to our enthusiastic and generous community of GSoC mentors. Since 2005, GSoC has brought over 16,000 student developers from 111 countries into 715 open source communities!
Google Summer of Code logo
Now in our 17th consecutive year, the GSoC program has made some exciting changes for 2021. Students will now focus on a 175-hour project over a 10-week coding period (entirely online) and receive stipends based on the successful completion of their project milestones. We are also opening up the program to students 18 years of age and older, who are enrolled in post-secondary academic programs (including university, masters, PhD programs, licensed coding schools, community colleges, etc.) or have graduated from such a program between December 1, 2020 and May 17, 2021.

Ready to apply? The first step is to browse the list of 2021 GSoC organizations and look for project ideas that appeal to you. Next, reach out to the organization to introduce yourself and determine if your skills and interests are a good fit. Since spots are limited, we recommend writing a strong proposal and submitting a draft early so you can communicate with the organization and get their feedback to increase your odds of being selected. We recommend reading through the student guide and advice for students for important tips on preparing your proposal. Students may register and submit project proposals on the GSoC site from now until Tuesday, April 13th at 18:00 UTC.

You can find more information on our website, which includes a full timeline of important dates, GSoC videos, FAQ’s and Program Rules.

Good luck to all of the student applicants!

By Romina Vicente, Project Coordinator for Google Open Source Programs Office

Student applications for Google Summer of Code 2021 are now open!

Student applications for Google Summer of Code (GSoC) 2021 are now open!

Google Summer of Code introduces students from around the world to open source communities. The program exposes students to real-world software development scenarios, helps them develop their technical skills, and introduces them to our enthusiastic and generous community of GSoC mentors. Since 2005, GSoC has brought over 16,000 student developers from 111 countries into 715 open source communities!
Google Summer of Code logo
Now in our 17th consecutive year, the GSoC program has made some exciting changes for 2021. Students will now focus on a 175-hour project over a 10-week coding period (entirely online) and receive stipends based on the successful completion of their project milestones. We are also opening up the program to students 18 years of age and older, who are enrolled in post-secondary academic programs (including university, masters, PhD programs, licensed coding schools, community colleges, etc.) or have graduated from such a program between December 1, 2020 and May 17, 2021.

Ready to apply? The first step is to browse the list of 2021 GSoC organizations and look for project ideas that appeal to you. Next, reach out to the organization to introduce yourself and determine if your skills and interests are a good fit. Since spots are limited, we recommend writing a strong proposal and submitting a draft early so you can communicate with the organization and get their feedback to increase your odds of being selected. We recommend reading through the student guide and advice for students for important tips on preparing your proposal. Students may register and submit project proposals on the GSoC site from now until Tuesday, April 13th at 18:00 UTC.

You can find more information on our website, which includes a full timeline of important dates, GSoC videos, FAQ’s and Program Rules.

Good luck to all of the student applicants!

By Romina Vicente, Project Coordinator for Google Open Source Programs Office

Season of Docs announces the successful 2020 long-running projects

And, that’s a wrap! Season of Docs has announced the 2020 program results for long-running projects. You can view a list of successfully completed technical writing projects on the website along with their final project reports.

15 technical writers successfully completed their long-running technical writing projects. During the program, technical writers spent a few months working closely with an open source community. They brought their technical writing expertise to improve the project's documentation while the open source projects provided mentors to introduce the technical writers to open source tools, workflows, and the project's technology.

Congratulations to the technical writers and organization mentors on these successful projects!

What’s next?

Program participants should expect an email in the next few weeks about how to get their Season of Docs 2020 t-shirt!

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.

If you’re interested in participating in a future Season of Docs, we’re currently accepting organization applications for the 2021 program. Be sure to sign up for the announcements email list to stay informed!

By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Fuzzing Java in OSS-Fuzz

Posted by Jonathan Metzman, Google Open Source Security Team

OSS-Fuzz, Google’s open source fuzzing service, now supports fuzzing applications written in Java and other Java Virtual Machine (JVM) based languages (e.g. Kotlin, Scala, etc.). Open source projects written in JVM based languages can add their project to OSS-Fuzz by following our documentation.

The Google Open Source Security team partnered with Code Intelligence to integrate their Jazzer fuzzer with OSS-Fuzz. Thanks to their integration, open source projects written in JVM-based languages can now use OSS-Fuzz for continuous fuzzing.

OSS-Fuzz has found more than 25,000 bugs in open source projects using fuzzing. We look forward to seeing how this technique can help secure and improve code written in JVM-based languages.

What can Jazzer do?

Jazzer allows users to fuzz code written in JVM-based languages with libFuzzer, as they already can for code written in C/C++. It does this by providing code coverage feedback from JVM bytecode to libFuzzer. Jazzer already supports important libFuzzer features such as:

  • FuzzedDataProvider for fuzzing code that doesn’t accept an array of bytes.
  • Evaluation of code coverage based on 8-bit edge counters.
  • Value profile.
  • Minimization of crashing inputs.
The intent for Jazzer is to support all libFuzzer features eventually.

What Does Jazzer Support?

Jazzer supports all languages that compile to JVM bytecode, since instrumentation is done on the bytecode level. This includes:
  • Java
  • Kotlin
  • Scala
  • Clojure
Jazzer can also provide coverage feedback from native code that is executed through JNI. This can uncover interesting memory corruption vulnerabilities in memory unsafe native code.

Why Fuzz Java/JVM-based Code?

As discussed in our post on Atheris, fuzzing code written in memory safe languages, such as JVM-based languages, is useful for finding bugs where code behaves incorrectly or crashes. Incorrect behavior can be just as dangerous as memory corruption. For example, Jazzer was used to find CVE-2021-23899 in json-sanitizer which could be exploited for cross-site scripting (XSS). Bugs causing crashes or incorrect exceptions can sometimes be used for denial of service. For example, OSS-Fuzz recently found a denial of service issue that could have been used to take “a major part of the ethereum network offline”.

When fuzzing memory safe code, you can use the same classic approach for fuzzing memory unsafe code: passing mutated input to code and waiting for crashes. Or you can take a more unit test like approach where your fuzzer verifies that the code is behaving correctly (example).

Another way fuzzing can find interesting bugs in JVM-based code is through differential fuzzing. With differential fuzzing, your fuzzer passes mutated input from the fuzzer to multiple library implementations that should have the same functionality. Then it compares the results from each library to find differences.
Check out our documentation to get started. We will explore this more during our OSS-Fuzz talk at FuzzCon Europe.

Introducing sigstore: Easy Code Signing & Verification for Supply Chain Integrity



One of the fundamental security issues with open source is that it’s difficult to know where the software comes from or how it was built, making it susceptible to supply chain attacks. A few recent examples of this include dependency confusion attack and malicious RubyGems package to steal cryptocurrency.

Today we welcome the announcement of sigstore, a new project in the Linux Foundation that aims to solve this issue by improving software supply chain integrity and verification.

Installing most open source software today is equivalent to picking up a random thumb-drive off the sidewalk and plugging it into your machine. To address this we need to make it possible to verify the provenance of all software - including open source packages. We talked about the importance of this in our recent Know, Prevent, Fix post.

The mission of sigstore is to make it easy for developers to sign releases and for users to verify them. You can think of it like Let’s Encrypt for Code Signing. Just like how Let’s Encrypt provides free certificates and automation tooling for HTTPS, sigstore provides free certificates and tooling to automate and verify signatures of source code. Sigstore also has the added benefit of being backed by transparency logs, which means that all the certificates and attestations are globally visible, discoverable and auditable.

Sigstore is designed with open source maintainers, for open source maintainers. We understand long-term key management is hard, so we've taken a unique approach of issuing short-lived certificates based on OpenID Connect grants. Sigstore also stores all activity in Transparency Logs, backed by Trillian so that we can more easily detect compromises and recover from them when they do occur. Key distribution is notoriously difficult, so we've designed away the need for them by building a special Root CA just for code signing, which will be made available for free.

We have a working prototype and proof of concepts that we're excited to share for feedback. Our goal is to make it seamless and easy to sign and verify code:


It has been fun collaborating with the folks from Red Hat and the open source community on this project. Luke Hinds, one of the lead developers on sigstore and Security Engineering Lead at Red Hat says, "I am very excited about sigstore and what this means for improving the security of software supply chains. sigstore is an excellent example of an open source community coming together to collaborate and develop a solution to ease the adoption of software signing in a transparent manner." We couldn’t agree more.

Mike Malone, the CEO of Smallstep, helped with the overall design of sigstore. He adds, “In less than a generation, open source has grown from a niche community to a critical ecosystem that powers our global economy and institutions of society and culture. We must ensure the security of this ecosystem without undermining the open, decentralized collaboration that makes it work. By building on a clever composition of existing technologies that respect privacy and work at scale, sigstore is the core infrastructure we need to solve this fundamental problem. It’s an ambitious project with potential for global impact. I’m impressed by the rapid progress that’s been made by Google, Red Hat, and Linux Foundation over the past few months, and I’m excited to hear feedback from the broader community.”

While we are happy with the progress that has been made, we know there is still work to be done before this can be widely relied upon. Upcoming plans for sigstore include: hardening the system, adding support for other OpenID Connect providers, updating documentation and responding to community feedback.

Sigstore is in its early days, but we're really excited about its future. Now is a great time to provide feedback, try out the tooling and get involved with the project as design details are still being refined.