Tag Archives: report card

Open source by the numbers at Google

At Google, open source is at the core of our infrastructure, processes, and culture. As such, participation in these communities is vital to our productivity. Within OSPO (Open Source Programs Office), our mission is to bring the value of open source to Google and the resources of Google to open source. To ensure our actions match our commitment, in this post we will explore a variety of metrics intended to increase context, transparency, and accountability across all of the communities we engage with.

Why we contribute: Open source has become a pervasive component in modern software development, and Google is no exception. We use thousands of open source projects across our internal infrastructure and products. As participants in the ecosystem, our intentions are twofold: give back to the communities we depend on as well as expand support for open source overall. We firmly believe in open source and its ability to bring together users, contributors, and companies alike to deliver better software.

The majority of Google’s open source work is done within one of two hosting platforms: GitHub and git-on-borg, Google’s production Git service which integrates with Gerrit for code review and access control. While we also allow individual usage of Bitbucket, GitLab, Launchpad, and other platforms, this analysis will focus on GitHub and git-on-borg. We will continue to explore how best to incorporate activity across additional channels.

A little context about the numbers you’ll read below:
  • Business and personal: While git-on-borg hosts both internal and external Google created repos, GitHub is a mixture of Google projects, experimental efforts and personal projects created by Googlers.
  • Driven by humans: We have created many automated bots and systems that can propose changes on both hosting platforms. We have intentionally filtered these data to ensure we are only showing human initiated activities.
  • GitHub data: We are using GH Archive as the primary source for GitHub data, which is currently available as a public dataset on BigQuery. Google activity within GitHub is identified by self registered accounts, which we anticipate under reports actual usage as employees acclimate to our policies.
  • Active counts: Where possible, we will show ‘active users’ and ‘active repositories’ defined by logged activity within each specified timeframe (for GH archive data, that’s any event type logged in the public GitHub event stream).
As numbers mean nothing without scale, let’s start by defining our applicable community: In 2019, more than 9% of Alphabet’s full time employees actively contributed to public repositories on git-on-borg and GitHub. While single digit, this percentage represents a portion of all full time Alphabet employees—from engineers to marketers to admins, across every business unit in Alphabet—and does not include those who contribute to open source projects outside of code. As our population has grown, so has our registered contributor base:
This chart shows the aggregate per year counts of Googlers active on public repositories hosted on GitHub and git-on-borg

What we create: As mentioned above, our contributing population works across a variety of Google, personal, and external repositories. Over the years, Google has released thousands of open source projects (many of which span multiple repositories) and ~2,600 are still active. Today, Google hosts over 8,000 public repositories on GitHub and more than 1,000 public repositories on git-on-borg. Over the last five years, we have doubled the number of public repos, growing our footprint by an average of 25% per year.

What we work on: In addition to our own repositories, we contribute to a wide pool of external projects. In 2019, Googlers were active in over 70,000 repositories on GitHub, pushing commits and/or opening pull requests on over 40,000 repositories. Note that more than 75% of the repos with Googler-opened pull requests were outside of Google-managed organizations (on GitHub).
This charts shows per year counts of activities initiated by Googlers on GitHub

What we contribute: For contribution volume on GitHub, we chose to focus on push events, opened, and merged pull requests instead of commits as this metric on its own is difficult to contextualize. Note that push events and pull requests typically include one or more commits per event. In 2019, Googlers created over 570,000 issues, opened over 150,000 pull requests, and created more than 36,000 push events on GitHub. Since 2015, we have doubled our annual counts of issues created and push events, and more than tripled the number of opened pull requests. Over the last five years, more than 80% of pull requests opened by Googlers have been closed and merged into active repositories.

How we spend our time: Combining these two classes of metrics—contributions and repos—provides context on how our contributors focus their time. On GitHub: in 2015, about 40% of our opened pull requests were concentrated in just 25 repositories. However, over the next four years, our activity became more distributed across a larger set of projects, with the top 25 repos claiming about 20% of opened pull requests in 2019. For us, this indicates a healthy expansion and diversification of interests, especially given that this activity represents both Google, as well as a community of contributors that happen to work at Google.
This chart splits the total per year counts of Googler created pull requests on GitHub by Top 25 repos vs the remainder ranked by number of opened pull requests per repo per year.

Open source contribution is about more than code

Every day, Google relies on the health and continuing availability of open source, and as such we actively invest in the security and sustainability of open source and its supply chain in three key areas:
  • Security: In addition to building security projects like OpenTitan and gVisor, Google’s OSS-Fuzz project aims to help other projects identify programming errors in software. As of the end of 2019, OSS-Fuzz had over 250 projects using the project, filed over 16,000 bugs, including 3,500 security vulnerabilities.
  • Community: Open source projects depend on communities of diverse individuals. We are committed to improving community sustainability and growth with programs like Google Summer of Code and Season of Docs. Over the last 15 years, about 15,000 students from over 105 countries have participated in Google Summer of Code, along with 25,000 mentors in more than 115 countries working on more than 680 open source projects.
  • Research: At the end of 2019, Google invested $1 million in open source research, partnering with researchers at UVM, with the goal to deepen understanding of how people, teams and organizations thrive in technology-rich settings, especially in open-source projects and communities.
Learn more about our open source initiatives at opensource.google.

By Sophia Vargas – Researcher, Google Open Source Programs Office

Google Open Source Report Card

Open source software enables Google to build things quickly and efficiently without reinventing the wheel, allowing us to focus on solving new problems. We stand on the shoulders of giants and we know it. This is why we support open source and make it easy for Googlers to release the projects they’re working on internally as open source.

Today we’re sharing our first Open Source Report Card, highlighting our most popular projects, sharing a few statistics and detailing some of the projects we’ve released in 2016.

We’ve open sourced over 20 million lines of code to date and you can find a listing of some of our best known project releases on our website. Here are some of our most popular projects:
  • Android - a software stack for mobile devices that includes an operating system, middleware and key applications.
  • Chromium - a project encompassing Chromium, the software behind Google Chrome, and Chromium OS, the software behind Google Chrome OS devices.
  • Angular - a web application framework for JavaScript and Dart focused on developer productivity, speed and testability.
  • TensorFlow - a library for numerical computation using data flow graphics with support for scalable machine learning across platforms from data centers to embedded devices.
  • Go - a statically typed and compiled programming language that is expressive, concise, clean and efficient.
  • Kubernetes - a system for automating deployment, operations and scaling of containerized applications.
  • Polymer - a lightweight library built on top of Web Components APIs for building encapsulated re-usable elements in web applications.
  • Protobuf - an extensible, language-neutral and platform-neutral mechanism for serializing structured data.
  • Guava - a set of Java core libraries that includes new collection types (such as multimap and multiset), immutable collections, a graph library, functional types, an in-memory cache, and APIs/utilities for concurrency, I/O, hashing, primitives, reflection, string processing and much more.
  • Yeoman - a robust and opinionated set of scaffolding tools including libraries and a workflow that can help developers quickly build beautiful and compelling web applications.
While it’s difficult to measure the full scope of open source at Google, we can use the subset of projects that are on GitHub to gather some interesting data. Today our GitHub footprint includes over 84 organizations and 3,499 repositories, 773 of which were created this year.

Googlers use countless languages from Assembly to XSLT, but what are their favorites? GitHub flags the most heavily used language in a repository and we can use that to find out. A survey of GitHub repositories shows us these are some of the languages Googlers use most often:
  • JavaScript
  • Java
  • C/C++
  • Go
  • Python
  • TypeScript
  • Dart
  • PHP
  • Objective-C
  • C#
Many things can be gleaned using the open source GitHub dataset on BigQuery, like usage of tabs versus spaces and the most popular Go packages. What about how many times Googlers have committed to open source projects on GitHub? We can search for Google.com email addresses to get a baseline number of Googler commits. Here’s our query:


SELECT count(*) as n
FROM [bigquery-public-data:github_repos.commits]
WHERE committer.date > '2016-01-01 00:00'
AND REGEXP_EXTRACT(author.email, r'.*@(.*)') = 'google.com'


With this, we learn that Googlers have made 142,527 commits to open source projects on GitHub since the start of the year. This dataset goes back to 2011 and we can tweak this query to find out that Googlers have made 719,012 commits since then. Again, this is just a baseline number as it doesn’t count commits made with other email addresses.

Looking back at the projects we’ve open-sourced in 2016 there’s a lot to be excited about. We have released open source software, hardware and datasets. Let’s take a look at some of this year’s releases.

Seesaw
Seesaw is a Linux Virtual Server (LVS) based load balancing platform developed in Go by our Site Reliability Engineers. Seesaw, like many projects, was built to scratch our own itch.

From our blog post announcing its release: “We needed the ability to handle traffic for unicast and anycast VIPs, perform load balancing with NAT and DSR (also known as DR), and perform adequate health checks against the backends. Above all we wanted a platform that allowed for ease of management, including automated deployment of configuration changes.”

Vendor Security Assessment Questionnaire (VSAQ)
We assess the security of hundreds of vendors every year and have developed a process to automate much of the initial information gathering with VSAQ. Many vendors found our questionnaires intuitive and flexible, so we decided to shared them. The VSAQ Framework includes four extensible questionnaire templates covering web applications, privacy programs, infrastructure as well as physical and data center security. You can learn more about it in our announcement blog post.

OpenThread
OpenThread, released by Nest, is a complete implementation of the Thread protocol for connected devices in the home. This is especially important because of the fragmentation we’re seeing in this space. Development of OpenThread is supported by ARM, Microsoft, Qualcomm, Texas Instruments and other major vendors.

Magenta
Can we use machine learning to create compelling art and music? That’s the question that animates Magenta, a project from the Google Brain team based on TensorFlow. The aim is to advance the state of the art in machine intelligence for music and art generation and build a collaborative community of artists, coders and machine learning researchers. Read the release announcement for more information.

Omnitone
Virtual reality (VR) isn’t nearly as immersive without spatial audio and much of VR development is taking place on proprietary platforms. Omnitone is an open library built by members of the Chrome Team that brings spatial audio to the browser. Omnitone builds on standard Web Audio APIs to deliver an immersive experience and can be used alongside projects like WebVR. Find out more in our blog post announcing the project’s release.

Science Journal
Today’s smartphones are packed with sensors that can tell us interesting things about the world around us. We launched Science Journal to help educators, students and citizen scientists tap into those sensors. You can learn more about the project in our announcement blog post.

Cartographer
Cartographer is a library for real-time simultaneous localization and mapping (SLAM) in 2D and 3D with Robot Operating System (ROS) support. Combining data from a variety of sensors, this library computes positioning and maps surroundings. This is a key element of self-driving cars, UAVs and robotics as well as efforts to map the insides of famous buildings. More information on Cartographer can be found in our blog post announcing its release.

This is just a small sampling of what we’ve released this year. Follow the Google Open Source Blog to stay apprised of Google’s open source software, hardware and data releases.

By Josh Simmons, Open Source Programs Office