Category Archives: Open Source Blog

News about Google’s open source projects and programs

Java zPages for OpenTelemetry

What is OpenTelemetry?

OpenTelemetry is an open source project aimed at improving the observability of our applications. It is a collection of cloud monitoring libraries and services for capturing distributed traces and metrics and integrates naturally with external observability tools, such as Prometheus and Zipkin. As of now, OpenTelemetry is in its beta stage and supports a few different languages.

What are zPages?

zPages are a set of dynamically generated HTML web pages that display trace and metrics data from the running application. The term zPages was coined at Google, where similar pages are used to view basic diagnostic data from a particular host or service. For our project, we built the Java /tracez and /traceconfigz zPages, which focus on collecting and displaying trace spans.

TraceZ

The /tracez zPage displays span data from the instrumented application. Spans are split into two groups: spans that are still running and spans that have completed.

TraceConfigZ

The /traceconfigz zPage displays the currently active tracing configuration and allows users to change the tracing parameters. Examples of such parameters include the sampling probability and the maximum number of attributes.

Using the zPages

This section describes how to start and use the Java zPages.

Add the dependencies to your project

First, you need to add OpenTelemetry as a dependency to your Java application.

Maven

For Maven, add the following to your pom.xml file:
<dependencies>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-api</artifactId>
        <version>0.7.0</version>
    </dependency>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-sdk</artifactId>
        <version>0.7.0</version>
    </dependency>
    <dependency>
        <groupId>io.opentelemetry</groupId>
        <artifactId>opentelemetry-sdk-extension-    zpages</artifactId>
        <version>0.7.0</version>
    </dependency>
</dependencies>

Gradle

For Gradle, add the following to your build.gradle dependencies:
implementation 'io.opentelemetry:opentelemetry-api:0.7.0'
implementation 'io.opentelemetry:opentelemetry-sdk:0.7.0'
implementation 'io.opentelemetry:opentelemetry-sdk-extension-zpages:0.7.0'

Register the zPages

To set-up the zPages, simply call startHttpServerAndRegisterAllPages(int port) from the ZPageServer class in your main function:
import io.opentelemetry.sdk.extensions.zpages.ZPageServer;

public class MyMainClass {
    public static void main(String[] args) throws Exception {
        ZPageServer.startHttpServerAndRegisterAllPages(8080);
        // ... do work
    }
}
Note that the package com.sun.net.httpserver is required to use the default zPages setup. Please make sure your version of the JDK includes this package if you plan to use the default server.

Alternatively, you can call registerAllPagesToHttpServer(HttpServer server) to register the zPages to a shared server:
import io.opentelemetry.sdk.extensions.zpages.ZPageServer;

public class MyMainClass {
    public static void main(String[] args) throws Exception {
        HttpServer server = HttpServer.create(new                     InetSocketAddress(8000), 10);
        ZPageServer.registerAllPagesToHttpServer(server);
        server.start();
        // ... do work
    }
}

Access the zPages

View all available zPages on the index page

The index page (at /) lists all available zPages with a link and description.


View trace spans on the /tracez zPage

The /tracez zPage displays information about running and completed spans, with completed spans further organized into latency and error buckets. The data is aggregated into a summary-level table:


You can click on each of the counts in the table cells to access the corresponding span details. For example, here are the details of the ChildSpan latency sample (row 1, col 4):


View and update the tracing configuration on the /traceconfigz zPage.

The /traceconfigz zPage provides an interface for users to modify the current tracing parameters:


Design

This section goes into the underlying design of our code.

Frontend


The frontend consists of two main parts: HttpHandler and HttpServer. The HttpHandler is responsible for rendering the HTML content, with each zPage implementing its own ZPageHandler. The HttpServer, on the other hand, is responsible for listening to incoming requests, obtaining the requested data, and then invoking the aforementioned ZPageHandlers. The HttpServer class from com.sun.net is used to construct the default server and to handle http requests on different routes.

Backend





The backend consists of two components as well: SpanProcessor and DataAggregator. The SpanProcessor watches the lifecycle of each span, invoking functions each time a span starts or ends. The DataAggregator, on the other hand, restructures the data from the SpanProcessor into an accessible format for the frontend to display. The class constructor requires a TracezSpanProcessor instance, so that the TracezDataAggregator class can access the spans collected by a specific TracezSpanProcessor. The frontend only needs to call functions in the DataAggregator to obtain information required for the web page.

Conclusion

We hope that this blog post has given you a little insight into the development and use cases of OpenTelemetry’s Java zPages. The zPages themselves are lightweight performance monitoring tools that allow users to troubleshoot and better understand their applications. Once OpenTelemetry is officially released, we hope that you try out and use the /tracez and /traceconfigz zPages!

By William Hu and Terry Wang – Software Engineering Interns, Core Compute Observability

Google Summer of Code 2020 Statistics: Part 2

With the program nearing the end of the summer, it’s time for another round of updates!

Universities

The 1,198 students accepted into the GSoC 2020 program came from 550 universities, of which, 114 have students participating for the first time in GSoC.

Schools with the most accepted students for GSoC 2020:
University# of Accepted Students
Indian Institute of Technology, Roorkee48
Indian Institute of Technology, Kanpur27
International Institute of Information Technology, Hyderabad24
National Institute of Technology Karnataka, Surathkal23
Birla Institute of Technology and Science, Pilani (BITS Pilani)13
Indian Institute of Technology, Kharagpur13
Indian Institute of Technology (BHU), Varanasi11
University of Moratuwa11
National Institute of Technology, Hamirpur10
Amrita Vishwa Vidyapeetham, Amritapuri Campus10
University of Tokyo10
University Of Colombo School Of Computing (UCSC)10

Mentors

Each year we pore over gobs of data to extract some interesting statistics about the GSoC mentors. Here’s a quick synopsis of our 2020 crew:
  • Registered mentors: 3,592
  • Mentors with assigned student projects: 2,156
  • Mentors who have participated in GSoC for 10 or more years: 78
  • Mentors who have been a part of GSoC for 5 years or more: 199
  • Mentors that are former GSoC students: 533 (24.7%)
  • Mentors that have also been involved in the Google Code-in program: 405 (18.8%)
  • Percentage of new mentors: 34.18%
GSoC 2020 had an international representation with mentors from 67 countries around the world!

The global pandemic, COVID-19, brought additional challenges to this year’s GSoC program. Whether living with the virus, adjusting to shifting school and work schedules, or pivoting to a remote lifestyle, students and mentors have had to prioritize their safety and delicately balance their new way of life. Despite these unprecedented times, our students continue to push on and our mentors fully support our students by sharing their passion for open source, listening to their concerns and providing them with valuable advice. For that commitment, we would like to acknowledge and give thanks to all students and mentors in the GSoC 2020 program. Not even a pandemic can dampen your enthusiasm and tireless contributions to the open source community!

By Stephanie Taylor – Program Manager, Google Open Source Programs Office

Supporting Wikipedia with more tools for editors

Google has a commitment to making information more accessible to people around the world, while ensuring that the information on the web is accurate and reflects the diversity of its users. Just like Google supports open source software development, WikiLoop is one of the explorations on better contributing to the open knowledge movement. To this day, Wikipedia, a Wikimedia movement project, remains a reliable information source that is also available with an open license, which makes it possible for Knowledge Engine of Google—as well as knowledge graph systems operated by others —to draw excerpts from it for Search features and other apps. With the project’s sustainability in mind, Google has contributed back to the Wikimedia movement in a number of ways since 2018. Building on this commitment, Google created the umbrella program WikiLoop in 2019, which hosts several tools for editors that focus on content quality, like WikiLoop DoubleCheck. 
WikiLoop is led by Zainan Zhou—a Googler for the last 7 years, and a Wikipedian for the last 5—who works as a software engineer in the Knowledge Engine team at Google. When he joined the free encyclopedia as an editor, he always wondered how his company could contribute to this open source project. Zainan involved the Wikipedia communities in every step of the development of WikiLoop, connecting with editors from different parts of the world at Wikimedia events, like WikidataCon, Wikimania, Wiki Conference North America, and WikiDevSummit, throughout 2019. The most recent involvement with the community of Wikipedia editors included a consultation and vote to change the name of the most popular WikiLoop artifact, a tool for peer-review of Wikipedia articles, DoubleCheck.

In the past few months, we focused on raising awareness of WikiLoop DoubleCheck. This tool allows registered and unregistered users to mark new edits with tags “looks good”, “not sure”, and “should revert”, a peer review system which editors could use to approve or revert new content on Wikipedia. Since its launch, the tool has witnessed a 309% quarter over quarter growth in tags added, and over 1,000 editors have used it to review Wikipedia content. With the help of volunteer translators and machine translation, WikiLoop DoubleCheck is now made available in 25 languages, and we hope to continue serving more Wikipedia editors in the months to come. In order for Google’s Knowledge Engine to organize the world's information, the knowledge source needs to be healthy. While peer-review on Wikipedia is an established process that has been going on for years, tools like WikiLoop DoubleCheck support the thousands of volunteers who dedicate their time to this task on Wikipedia by making information verification more accessible.


The WikiLoop program was originally conceived as a virtuous circle: providing data and tools to enhance human editor's productivity, and making the Wikipedia editorial input more machine-readable for open knowledge institutions, academia and researchers interested in advancing machine learning technology.

WikiLoop leverages Google’s talents at software development to contribute to global Wikipedia content accuracy by enhancing the existing suite of Wikimedia and community tools for content validation at scale. While WikiLoop is a contribution of Google to the Wikipedia communities, Google and the Wikimedia Foundation have partnered in other areas as well. Learn more about Google’s partnership with the Wikimedia Foundation on the partnership’s page on Meta-Wikimedia

While probably the most popular in the set, DoubleCheck is not the only tool under the WikiLoop umbrella. We are also building data sets and tools and continue to explore other opportunities to contribute to the open knowledge movement. Learn more about tools and other initiatives, like the Coalition call, on the program’s page on Meta-Wikimedia.

By María Cruz  Program Manager, Google Open Source Programs Office

Supporting Wikipedia with more tools for editors

Google has a commitment to making information more accessible to people around the world, while ensuring that the information on the web is accurate and reflects the diversity of its users. Just like Google supports open source software development, WikiLoop is one of the explorations on better contributing to the open knowledge movement. To this day, Wikipedia, a Wikimedia movement project, remains a reliable information source that is also available with an open license, which makes it possible for Knowledge Engine of Google—as well as knowledge graph systems operated by others —to draw excerpts from it for Search features and other apps. With the project’s sustainability in mind, Google has contributed back to the Wikimedia movement in a number of ways since 2018. Building on this commitment, Google created the umbrella program WikiLoop in 2019, which hosts several tools for editors that focus on content quality, like WikiLoop DoubleCheck. 
WikiLoop is led by Zainan Zhou—a Googler for the last 7 years, and a Wikipedian for the last 5—who works as a software engineer in the Knowledge Engine team at Google. When he joined the free encyclopedia as an editor, he always wondered how his company could contribute to this open source project. Zainan involved the Wikipedia communities in every step of the development of WikiLoop, connecting with editors from different parts of the world at Wikimedia events, like WikidataCon, Wikimania, Wiki Conference North America, and WikiDevSummit, throughout 2019. The most recent involvement with the community of Wikipedia editors included a consultation and vote to change the name of the most popular WikiLoop artifact, a tool for peer-review of Wikipedia articles, DoubleCheck.

In the past few months, we focused on raising awareness of WikiLoop DoubleCheck. This tool allows registered and unregistered users to mark new edits with tags “looks good”, “not sure”, and “should revert”, a peer review system which editors could use to approve or revert new content on Wikipedia. Since its launch, the tool has witnessed a 309% quarter over quarter growth in tags added, and over 1,000 editors have used it to review Wikipedia content. With the help of volunteer translators and machine translation, WikiLoop DoubleCheck is now made available in 25 languages, and we hope to continue serving more Wikipedia editors in the months to come. In order for Google’s Knowledge Engine to organize the world's information, the knowledge source needs to be healthy. While peer-review on Wikipedia is an established process that has been going on for years, tools like WikiLoop DoubleCheck support the thousands of volunteers who dedicate their time to this task on Wikipedia by making information verification more accessible.


The WikiLoop program was originally conceived as a virtuous circle: providing data and tools to enhance human editor's productivity, and making the Wikipedia editorial input more machine-readable for open knowledge institutions, academia and researchers interested in advancing machine learning technology.

WikiLoop leverages Google’s talents at software development to contribute to global Wikipedia content accuracy by enhancing the existing suite of Wikimedia and community tools for content validation at scale. While WikiLoop is a contribution of Google to the Wikipedia communities, Google and the Wikimedia Foundation have partnered in other areas as well. Learn more about Google’s partnership with the Wikimedia Foundation on the partnership’s page on Meta-Wikimedia

While probably the most popular in the set, DoubleCheck is not the only tool under the WikiLoop umbrella. We are also building data sets and tools and continue to explore other opportunities to contribute to the open knowledge movement. Learn more about tools and other initiatives, like the Coalition call, on the program’s page on Meta-Wikimedia.

By María Cruz  Program Manager, Google Open Source Programs Office

Introducing TensorFlow Recorder

When training computer vision machine learning models, data loading can often be a performance bottleneck, causing your GPU or TPU resources to be underutilized while waiting for data to be loaded into the model. Storing your dataset in the efficient TensorFlow Record (TFRecord) format is a great way to solve these problems, but creating TFRecords can unfortunately often require a great deal of complex code.

Last week we open sourced the TensorFlow Recorder project (also known as TFRecorder), which makes it possible for data scientists, data engineers, or AI/ML engineers to create image based TFRecords with just a few lines of code. Using TFRecords is incredibly important for creating efficient TensorFlow ML pipelines, but until now they haven’t been so easy to create. Before TFRecorder, in order to create TFRecords at scale you would have had to write a data pipeline that parsed your structured data, loaded images from storage, and serialized the results into the TFRecord format. TFRecorder allows you to write TFRecords directly from a Pandas dataframe or CSV without writing any complicated code.

You can see an example of TFRecoder below, but first let’s talk about some of the specific advantages of TFRecords.

How TFRecords Can Help

Using the TFRecord file format allows you to store your data in sets of files, each containing a sequence of protocol buffers serialized as a binary record that can be read very efficiently, which will help reduce the data loading bottleneck mentioned above.

Data loading performance can be further improved by implementing prefetching and parallel interleave along with using the TFRecord format. Prefetching reduces the time of each model training step(s) by fetching the data for the next training step while your model is executing training on the current step. Parallel interleave allows you to read from multiple TFRecords shards (pieces of a TFRecord file) and apply preprocessing of those interleaved data streams. This reduces the latency required to read a training batch and is especially helpful when reading data from the network.

Using TensorFlow Recorder

Creating a TFRecord using TFRecorder requires only a few lines of code. Here’s how it works.
import pandas as pd
import tfrecorder
df = pd.read_csv(...)
df.tensorflow.to_tfrecord(output_dir="gs://my/bucket")

TFRecorder currently expects data to be in the same format as Google AutoML Vision.

This format looks like a pandas dataframe or CSV formatted as:
splitimage_urilabel
TRAIN
gs://my/bucket/image1.jpgcat

Where:
  • split can take on the values TRAIN, VALIDATION, and TEST
  • image_uri specifies a local or google cloud storage location for the image file.
  • label can be either a text-based label that will be integerized or an integer
In the future, we hope to extend TensorFlow Recorder to work with data in any format.

While this example would work well to convert a few thousand images into TFRecords, it probably wouldn’t scale well if you have millions of images. To scale up to huge datasets, TensorFlow Recorder provides connectivity with Google Cloud Dataflow, which is a serverless Apache Beam pipeline runner. Scaling up to DataFlow requires only a little bit more configuration.
df.tensorflow.to_tfrecord(
output_dir="gs://my/bucket",
runner="DataFlowRunner",
project="my-project",
region="us-central1)

What’s next?

We’d love for you to try out TensorFlow Recorder. You can get it from GitHub or simply pip install tfrecorder. Tensorflow Recorder is very new and we’d greatly appreciate your feedback, suggestions, and pull requests.

By Mike Bernico and Carlos Ezequiel, Google Cloud AI Engineers

Open source by the numbers at Google

At Google, open source is at the core of our infrastructure, processes, and culture. As such, participation in these communities is vital to our productivity. Within OSPO (Open Source Programs Office), our mission is to bring the value of open source to Google and the resources of Google to open source. To ensure our actions match our commitment, in this post we will explore a variety of metrics intended to increase context, transparency, and accountability across all of the communities we engage with.

Why we contribute: Open source has become a pervasive component in modern software development, and Google is no exception. We use thousands of open source projects across our internal infrastructure and products. As participants in the ecosystem, our intentions are twofold: give back to the communities we depend on as well as expand support for open source overall. We firmly believe in open source and its ability to bring together users, contributors, and companies alike to deliver better software.

The majority of Google’s open source work is done within one of two hosting platforms: GitHub and git-on-borg, Google’s production Git service which integrates with Gerrit for code review and access control. While we also allow individual usage of Bitbucket, GitLab, Launchpad, and other platforms, this analysis will focus on GitHub and git-on-borg. We will continue to explore how best to incorporate activity across additional channels.

A little context about the numbers you’ll read below:
  • Business and personal: While git-on-borg hosts both internal and external Google created repos, GitHub is a mixture of Google projects, experimental efforts and personal projects created by Googlers.
  • Driven by humans: We have created many automated bots and systems that can propose changes on both hosting platforms. We have intentionally filtered these data to ensure we are only showing human initiated activities.
  • GitHub data: We are using GH Archive as the primary source for GitHub data, which is currently available as a public dataset on BigQuery. Google activity within GitHub is identified by self registered accounts, which we anticipate under reports actual usage as employees acclimate to our policies.
  • Active counts: Where possible, we will show ‘active users’ and ‘active repositories’ defined by logged activity within each specified timeframe (for GH archive data, that’s any event type logged in the public GitHub event stream).
As numbers mean nothing without scale, let’s start by defining our applicable community: In 2019, more than 9% of Alphabet’s full time employees actively contributed to public repositories on git-on-borg and GitHub. While single digit, this percentage represents a portion of all full time Alphabet employees—from engineers to marketers to admins, across every business unit in Alphabet—and does not include those who contribute to open source projects outside of code. As our population has grown, so has our registered contributor base:
This chart shows the aggregate per year counts of Googlers active on public repositories hosted on GitHub and git-on-borg

What we create: As mentioned above, our contributing population works across a variety of Google, personal, and external repositories. Over the years, Google has released thousands of open source projects (many of which span multiple repositories) and ~2,600 are still active. Today, Google hosts over 8,000 public repositories on GitHub and more than 1,000 public repositories on git-on-borg. Over the last five years, we have doubled the number of public repos, growing our footprint by an average of 25% per year.

What we work on: In addition to our own repositories, we contribute to a wide pool of external projects. In 2019, Googlers were active in over 70,000 repositories on GitHub, pushing commits and/or opening pull requests on over 40,000 repositories. Note that more than 75% of the repos with Googler-opened pull requests were outside of Google-managed organizations (on GitHub).
This charts shows per year counts of activities initiated by Googlers on GitHub

What we contribute: For contribution volume on GitHub, we chose to focus on push events, opened, and merged pull requests instead of commits as this metric on its own is difficult to contextualize. Note that push events and pull requests typically include one or more commits per event. In 2019, Googlers created over 570,000 issues, opened over 150,000 pull requests, and created more than 36,000 push events on GitHub. Since 2015, we have doubled our annual counts of issues created and push events, and more than tripled the number of opened pull requests. Over the last five years, more than 80% of pull requests opened by Googlers have been closed and merged into active repositories.

How we spend our time: Combining these two classes of metrics—contributions and repos—provides context on how our contributors focus their time. On GitHub: in 2015, about 40% of our opened pull requests were concentrated in just 25 repositories. However, over the next four years, our activity became more distributed across a larger set of projects, with the top 25 repos claiming about 20% of opened pull requests in 2019. For us, this indicates a healthy expansion and diversification of interests, especially given that this activity represents both Google, as well as a community of contributors that happen to work at Google.
This chart splits the total per year counts of Googler created pull requests on GitHub by Top 25 repos vs the remainder ranked by number of opened pull requests per repo per year.

Open source contribution is about more than code

Every day, Google relies on the health and continuing availability of open source, and as such we actively invest in the security and sustainability of open source and its supply chain in three key areas:
  • Security: In addition to building security projects like OpenTitan and gVisor, Google’s OSS-Fuzz project aims to help other projects identify programming errors in software. As of the end of 2019, OSS-Fuzz had over 250 projects using the project, filed over 16,000 bugs, including 3,500 security vulnerabilities.
  • Community: Open source projects depend on communities of diverse individuals. We are committed to improving community sustainability and growth with programs like Google Summer of Code and Season of Docs. Over the last 15 years, about 15,000 students from over 105 countries have participated in Google Summer of Code, along with 25,000 mentors in more than 115 countries working on more than 680 open source projects.
  • Research: At the end of 2019, Google invested $1 million in open source research, partnering with researchers at UVM, with the goal to deepen understanding of how people, teams and organizations thrive in technology-rich settings, especially in open-source projects and communities.
Learn more about our open source initiatives at opensource.google.

By Sophia Vargas – Researcher, Google Open Source Programs Office

Google joins the Open Source Security Foundation

In modern software development, much of the code developers use originates outside their organization and is open source. While the cloud and internet ecosystem depends on an open source foundation, the sheer scale and dependency chain of the libraries and packages we all use makes it difficult to validate and verify the origin of the code you’re ingesting; that it’s up to date on recent patches, and coming from projects following security best practices. To continue deriving benefits from open source, we need to ensure that as a community we are building on the strongest possible foundation. 



At Google, security is always top of mind, and we have developed robust systems and security tools—including open source ones—to protect our internal systems and our customers. We believe the more we share what we’ve learned about open source security, and the more we work with those who face similar challenges, the more we can improve the state of open source security for everyone.

We’re happy to announce that Google is joining the Open Source Security Foundation (OpenSSF) to work alongside the broader industry on this journey of improving the state of security of open source projects we all depend on. Google has key areas in open source security we want to work on, and we’re excited to share our ideas with the OpenSSF community and work together. Some of our key areas are:

Shared schemas and metadata that enable automation for enforcing security best practices along the entire software supply chain.

Dependency management and risk assessments through tooling and data. We want to make it easy to map vulnerabilities back to specific versions of code that are affected and take action.
Verifiable builds through trusted build systems so that we know artifacts haven’t been tampered with. The Tekton project has been exploring this idea, and we’re excited to share some of these ideas with OpenSSF.

A developer identity system to help associate code changes back to their original author and help code reviewers have developer authentication as part of their commit and PR review process.

Securing critical OSS projects and helping projects respond to vulnerabilities. If you’re a maintainer who’s interested in getting help with vulnerability response or security engineering efforts, watch this space!

Security challenges are never going to disappear, and we must work together to maintain the security of the open source software we collectively depend on. If you're interested in getting involved in the OpenSSF initiatives, visit openssf.org or OpenSSF on GitHub.You can be a part of how the OpenSSF serves the open source community and the world!

By Kim Lewandowski, Product Security Team, and Dan Lorenc, Infrastructure Security Team, Google

Announcing a new kind of open source organization

Google has deep roots in open source. We're proud of our 20 years of contributions and community collaboration. The scale and tenure of Google’s open source participation has taught us what works well, what doesn’t, and where the corner cases are that challenge projects.

One of the places we’ve historically seen projects stumble is in managing their trademarks—their project’s name and logo. How project trademarks are used is different from how their code is used, as trademarks are a method of quality assurance. This includes the assurance that the code in question has an open source license. When trademarks are properly managed, project maintainers can define their identity, provide assurances to downstream users of the quality of their offering, and give others in the community certainty about the free and fair use of the brand.

In collaboration with academic leaders, independent contributors, and SADA Systems, today we are announcing the Open Usage Commons, an organization focused on extending the philosophy and definition of open source to project trademarks. The mission of the Open Usage Commons is to help open source projects assert and manage their project identity through programs specific to trademark management and conformance testing. Creating a neutral, independent ownership for these trademarks gives contributors and consumers peace of mind regarding their use of project names in a fair and transparent way.

Understanding and managing trademarks is critical for the long-term sustainability of projects, particularly with the increasing number of enterprise products based on open source. Trademarks sit at the juncture of the rule of law and the philosophy of open source, a complicated space; for this reason, we consider it to be the next challenge for open source, one we want to help with.

To get the Open Usage Commons started, Google has contributed initial funding, and the trademarks of Angular, a web application framework for mobile and desktop; Gerrit, web-based team code-collaboration tool; and Istio, an open platform to connect, manage, and secure microservices, will be joining the Open Usage Commons. If you use a trademark of one of the projects currently, you can continue to use those marks, following any current guidance from the project. As the Open Usage Commons is focused on trademark management, the contributor communities and technical roadmaps of these projects are not changed by joining the Commons, although we hope this new model encourages anyone who has stood on the sidelines until now to participate in these projects.

As the Open Usage Commons board wrote in their announcement, this is uncharted territory, and the Commons intends to “walk before they run,” so you can expect more information and activity from the organization in the coming months.

Learn more about the role of trademarks in open source and the Open Usage Commons at openusage.org.

By Chris DiBona, Director, Open Source at Google

Expanding our Differential Privacy Library

All developers have a responsibility to treat data with care and respect. Differential privacy helps organizations derive insights from data while simultaneously ensuring that those results do not allow any individual's data to be distinguished or re-identified. This principled approach supports data computation and analysis across many of Google’s core products and features.

Last summer, Google open sourced our foundational differential privacy library so developers and organizations around the world can benefit from this technology. Today, we’re announcing the addition of Go and Java to our library, an end-to-end solution for differential privacy: Privacy on Beam, and new tools to help developers implement this technology effectively.

We’ve listened to feedback from our developer community and, as of today, developers can now perform differentially private analysis in Java and Go. We’re working to bring these two libraries to full feature parity with C++.

We want all developers to have access to differential privacy, regardless of their level of expertise. Our new Privacy on Beam framework captures years of Googler developer experience and efficiency improvements in a comprehensive and easy-to-use solution that handles computation end-to-end. Built on Apache Beam, Privacy on Beam can reduce implementation mistakes, and take care of all the steps that are essential to differential privacy, including noise addition, partition selection, and contribution bounding. If you’re new to Apache Beam or differential privacy, our codelab can get you started.

Tracking privacy budgets is another challenge developers face when implementing differential privacy. So, we’re also releasing a new Privacy Loss Distribution tool for tracking privacy budgets. With this tool, developers can maintain an accurate estimate of the total cost to user privacy for collections of differentially private queries, and better evaluate the overall impact of their pipelines. Privacy Loss Distribution supports widely used mechanisms (such as Laplace, Gaussian, and Randomized response) and can scale to hundreds of compositions.

We hope these new languages, tools, and features unlock differential privacy for even more developers. Continue to share your stories and suggestions with us at [email protected]—your feedback will help inform our future differential privacy launches and updates.

Acknowledgements

Software Engineers: Yurii Sushko, Daniel Simmons-Marengo, Christoph Dibak, Damien Desfontaines, Maria Telyatnikova
Research Scientists: Pasin Manurangsi, Ravi Kumar, Sergei Vassilvitskii, Alex Kulesza, Jenny Gillenwater, Kareem Amin


By: Miguel Guevara, Mirac Vuslat Basaran, Sasha Kulankhina, and Badih Ghazi – Google Privacy Team and Google Research

Welcoming 1,000+ Interns to Open Source at Google

One of the core tenets of open source is about finding ways for people to build great things by working together, regardless of location. This summer, through our intern program we’re gathering incredible talent from schools around the world, Googlers with a passion for open source, and project maintainers both inside and outside of Google to see what we can build together. 

Onboarding that many interns and turning them into new open source contributors was no easy task. So in partnership with the Intern Programs team and engineering teams across Google, we’ve grounded our planning by answering four key questions. 

How can we make our internship program a force for good in the open source ecosystem?

We knew that having more than a thousand interns contribute to open source projects could have a huge impact, however, many projects aren’t set up to onboard dozens of new contributors at one time and many maintainers can’t take on hundreds of new pull requests. Early on, we established best practices for intern placement and support. We committed to:
  • Aligning interns’ work with project priorities to advance the project while also allowing the interns to learn and grow their skills.
  • Proactively communicating with project maintainers and contributors, keeping them in the loop on timelines and logistics.
  • Looking beyond Google. While we prioritized projects that have full-time Google engineerings support. That includes Google-owned projects like Go, TensorFlow, and Chromium, as well as Google-created projects we invest heavily in, such as Kubernetes, Apache Beam, and Tekton. But Google also has full-time engineers working on outside projects we rely on, so our interns will also be working on projects like Envoy, Rust, and Apache Maven.

How can we introduce the interns to open source at Google?

We are determined to support and empower the interns as they become lifelong contributors to open source. Every Noogler in engineering learns about using and contributing to open source in a training run by our Open Source Programs Office. With an unprecedented number of interns working on open source projects, we are also providing additional resources; from offering a platform for questions, office hours, enrichment talks, and partnerships with external open source organizations.

How can we learn from our interns about the experience of contributing to open source at Google and beyond?

We see a huge opportunity to listen to our interns this summer. By meeting with interns and hosts—as well as surveying the entire class of interns at the end of the summer—we can look for ways to improve open source at Google and the contributor experience for projects they’re working on. We’re excited to learn from the internship program and from interns’ perspectives working in and contributing to open source.

How can we have an impact on these students that carries on throughout their careers?

One of my favorite questions to ask Googlers who are active in open source is how they were first introduced to open source. There’s a well-trodden path of a developer fixing an annoying bug, then a few more bugs, then adding small features, becoming a core contributor, and eventually a project maintainer. That process requires persistence and patience, and projects lose a lot of great developers along the way.

But... What if your first experience with open source is being welcomed into a large and thriving community of contributors? What if you get to contribute to open source full time, mentored by creators and maintainers of the project you’re working on, collaborating across organizations and across time zones? Our hope is that this kind of experience will leave a lasting impression on this summer’s interns and that they’ll continue to contribute to open source for a long time to come.

By Jen Phillips, Google Open Source