Tag Archives: contributor

Metrics, spikes, and uncertainty: Open source contribution during a global pandemic

Welcome to the second edition of our Open Source Programs Office’s (OSPO) annual open source transparency report. In last year's report on 2019 open source activity, we focused on discovering baselines and trends for Alphabet’s open source activities. However, this past year was unlike any other in recent history. While many continue to investigate the impact of the global pandemic on work, productivity, and behavior, we wanted to understand the pandemic’s impact on Alphabet’s participation in open source.

Our mission within OSPO is to bring the value of open source to Google and the resources of Google to open source. While open source software remains a critical component of our infrastructure, products, and services, in 2020 we increased our focus on connecting with peers and supporting our extended communities across open source ecosystems. In addition to numerous Alphabet-led initiatives and programs, our open source community provided resources, funding, and technical support for projects and communities impacted by the global pandemic.

Before we jump into the data, we want to acknowledge that broad generalizations will never capture the complete context or complexities of personal experience. With these limitations in mind, we will attempt to aggregate what we learned from this past year and explore how our priorities, programs, and adjustments may have affected our measurements and reporting. For more details on the data source and methodology, see the “about this data” section below.

Open source engagement increased as employees moved to their homes

In March 2020, Alphabet closed our offices and required most employees to work from home. In addition to changing workplaces, we adapted our internship program for virtual participation, focusing many technical projects on open source. This inflection point directly impacted our open source contributor behavior, as observed by monthly active user trends—defined as users that logged any activity in a given month:
  • Before March 2020, our GitHub monthly active user counts were relatively stable: In any given month during 2019, about 45% of our yearly active contributing population logged activity on GitHub. Per month in 2019, this value was fairly consistent, with a relative standard deviation of 3%.
  • More GitHub users were active after March 2020: Starting in March 2020, our monthly active users grew by more than 20% and then continued to grow into April through July with the arrival of our interns. In addition to growth, activity fluctuated more dramatically with a relative standard deviation of 19%. Removing interns, this value dropped to 13%—still significantly higher than 2019.
  • Git-on-borg user patterns remained stable: On git-on-borg—our internal production Git service (more details below), more than 50% of users counted in this analysis were active per month. Activity levels were fairly stable in 2020 with a relative standard deviation of 3%, indicating that our behavior on git-on-borg was less impacted by pandemic-related changes. Note that less than 10% of our 2020 open source interns were active on git-on-borg as most worked on GitHub.
To identify more context behind this change in behavior, we explored our population, projects, and programs, in and around open source.
This chart of monthly active GitHub users shows a bump of activity starting in March 2020 and then continuing April through July with the arrival of interns.
This chart shows Alphabet’s monthly active users on GitHub, split by total, full-time employees, and interns.

Population: Our population of contributors grew as our composition shifted

In 2020, more than 10% of Alphabet full-time employees (FTEs) actively contributed to open source projects. This percentage has remained roughly consistent over the last five years, indicating that our open source contribution has scaled with the growth of Alphabet.

In addition to our FTEs, some of Alphabet's vendors, independent contractors, temporary staff, and interns have also contributed to open source during their tenures. From 2015-2019, this group represented about 3-5% of our total population of open source contributors. In 2020, this ratio doubled to 10% as many interns shifted to focus on open source. As a result, interns represented about 9% of our overall open source contributing population in 2020.
In 2020, more than 10% of Alphabet full-time employees (FTEs) actively contributed to open source projects. In addition to our FTEs, Alphabet's vendors, independent contractors, temporary staff, and interns have also contributed to open source during their tenures. From 2015-2019, this group represented about 3-5% of our total population of open source contributors. However in 2020, this ratio doubled to 10%.
This chart shows the aggregate per year counts of Alphabet employees, vendors, contractors, temps, and interns contributing to open source.

Scope: We created and interacted with more repositories and projects

Within Google-managed organizations, we created more than 2,000 new public repositories on GitHub, bringing our total active public repositories to over 9,000 on GitHub and over 1,500 on git-on-borg. While many of these new repositories were created within existing projects or to extend functionality of our products, more than 20% of our new GitHub repositories were created to host our interns’ open source projects. Moving forward, we anticipate that our total public repositories under management will stabilize or even shrink as we refine our depreciation and archival policies. In addition to supporting our own projects:
  • We engaged with more repositories on GitHub: In 2020, contributors at Alphabet interacted with more than 90,000 repositories on GitHub, pushing commits and/or opening pull requests on over 50,000 repositories. Removing passive interactions (WatchEvents or “stars”), we actively engaged with over 75,000 repositories in 2020.
  • We surpassed our growth rates from 2019. Across all metrics listed above, we engaged with 25% more repositories than in 2019—a growth rate significantly higher than last year’s growth rate of 15%-18%. These rates are not impacted by removing the repositories that supported our interns.
  • We continue to invest time in projects outside of Google: Consistent with our 2019 report, on GitHub more than 75% of repositories with pull requests opened by Alphabet contributors were outside of Google-managed organizations.

Behavior: Contribution activities increased, elevated by our interns

To take a closer look at our behavior, we explored all event types across GitHub Archive, grouping events into the following categories:

Category groups

GitHub Event Types

Code

PushEvent, PullRequestEvent, ForkEvent

Code Review

PullRequestReviewEvent, PullRequestReviewCommentEvent, CommitCommentEvent

Issue

IssuesEvent, IssueCommentEvent

Maintenance and administration

MemberEvent, CreateEvent, DeleteEvent, ReleaseEvent, PublicEvent

Wiki/Doc

GollumEvent

Star

WatchEvent

Exploring trends across event types, we found that:
  • GitHub activity grew across all event types: This is not surprising given our growth in the contributing population and repository counts described above. More specifically, in 2020, contributors at Alphabet created more than 780,000 issue comments, and opened over 240,000 pull requests on GitHub. Compared to 2019, we generated 32% more issue comments and opened 50% more pull requests in 2020. Removing WatchEvents, in 2020 our overall activity on GitHub grew by more than 35%.
  • Interns bolstered our growth on GitHub: While in previous years, full-time Alphabet employees were responsible for over 97% of all reported activity on GitHub, in 2020 interns opened more than 10% of Alphabet’s total pull requests on this platform.
  • git-on-borg’s growth rate was consistent with 2019: Where our GitHub activity growth rates increased, our submitted and reviewed changes on git-on-borg grew by 17%, consistent with our 2018-2019 year-over-year growth on this platform and on GitHub. This consistent trajectory once again implies that individuals working on git-on-borg did not significantly change their behavior as a result of the global pandemic. Please note, that the activity pulled from git-on-borg for this analysis was only from Google managed projects where GitHub logs also included non-Google organizations and personal activity.
This chart of grouped GitHub events shows spikes of activity in July 2020 and October 2020, with the largest concentration of activity around code creation.
This chart shows per-month counts of activities initiated by the Alphabet community on GitHub.
Note: not showing “PullRequestReviewEvent”, which GitHub Archive started collecting in August 2020.

Changes: What drove this change in behavior?

While 2020 behavior cannot be separated from the impact of the global pandemic, we were curious if we could isolate specific programs and externalities that would explain the uptick in monthly active users and spikes in logged activities. Again, acknowledging the limitations of aggregate analysis, we found evidence that these measurements were impacted by:
  • Intern hosts: In May-Sept, we welcomed more than 1000 interns and set them to work on open source projects. In addition to intern-driven activities, teams that hosted interns had to interact with these projects in public channels, which contributed to additional individuals logging actions on GitHub between April and September.
  • Tenured employees. To investigate other drivers of the March 2020 uplift in GitHub monthly active users, we filtered out interns and individuals that were new to Alphabet in 2020, which led us to believe that this increase could mostly be attributed to existing employees increasing their time on GitHub.
  • Hacktoberfest: During Hacktoberfest (October 2020), we saw a significant spike in activity with the largest uptick concentrated in issue-related activities, as open source contributors at Alphabet responded to activities initiated during this event.
We also interviewed open source contributors around the organization to understand how their professional and personal open source activity may have been impacted due to COVID-19. Although each case was unique, common themes were:
  • Remote work: With most teams working remotely, some reported that they relied more heavily on asynchronous tooling for collaboration and code review, which would yield additional logged activities on hosting platforms.
  • Open source as a personal outlet: For others, open source provided a place to create and socialize outside of work. This trend was also reported in GitHub’s Octoberverse report on productivity which showed an uptick in open source activity outside of traditional work hours.
Please note, that Alphabet’s aggregate experience does not translate to behavioral or productivity trends in specific projects that we work on. For example, leading up to Kubernetes’ 1.19 release in May 2020, community leaders reported declining engagement, measured by a 15% decline in daily pull request reviews across Kubernetes organizations compared to the 2019 average.

Beyond code: We continue to invest in all aspects of open source

Alphabet relies on the health and availability of open source projects, and as such we continue to invest in security and sustainability across the supply chain, from respectful language updates in our own projects to:
  • Mentorship and community engagement: In its 16th year of the program, Google’s 2020 Summer of Code program had 1,106 students from 65 countries successfully complete the program under the guidance of over 2,000 mentors. In its second year, Season of Docs sponsored 87 technical writers working on 48 projects with the support of over 100 mentors. And with in-person events postponed until further notice, we launched the Google Open Source Live monthly series to connect with our extended community, hosting 5 events last year, 7 so far in 2021, and more planned in the final quarters of 2021.
  • Improving open source stability and security: Security challenges are never going to disappear, and we must work together to maintain the security of the open source software we collectively depend on. In 2020, Google co-founded the OpenSSF to collaborate on tools and frameworks to improve open source security. As part of this community, we released Criticality Score and provided significant contributions to project Scorecards to help users, contributors, companies, and communities generate relative criticality metrics for projects that they depend on. Additionally, in 2020 the OSS-Fuzz project nearly doubled the number of supported projects to more than 400 projects, and identified more than 25,000 bugs. In addition to the main effort, the Fuzz team hosted interns, launched the Atheris Python Fuzzer, and ramped up a FuzzBench service to help academic researchers run large scale experiments on their fuzzing tools.
Despite perpetual uncertainty, we will continue to invest in the open source ecosystem as we value the connection, collaboration and community even when we are kept apart by a global pandemic. Learn more about our open source initiatives at opensource.google.

About the data:

  • Data source: These data represent activities on repositories hosted on GitHub and our internal production Git service git-on-borg. These sources represent a subset of open source activity currently tracked by our OSPO.
    • GitHub: We continue to use GitHub Archive as the primary source for GitHub data, which is available as a public dataset on BigQuery. Alphabet activity within GitHub is identified by self-registered accounts, which we estimate underreports actual activity. This year we decided to generate this report from Monthly Tables instead of Yearly Tables in order to explore contribution patterns within the year.
    • git-on-borg: This is our primary platform for internal projects and some of our larger, long running public projects like Android and Chromium. While we continue to develop on this platform, most of our open source activity has moved to GitHub to increase exposure and encourage community growth.
    • Distinct event types: Note that git-on-borg and GitHub APIs produce distinct sets of events—as such we will report activity metrics per platform. Where GitHub Event logs capture a wide range of activity from code creation and review to issue creation and comments, the Gerrit Event stream (used by git-on-borg) only captures code changes and reviews.
  • Driven by humans: We have created many automated bots and systems that can propose changes on various hosting platforms. We have intentionally filtered these data to focus on human-initiated activities.
  • Business and personal: Activity on GitHub reflects a mixture of Alphabet projects, third party projects, experimental efforts, and personal projects. Our metrics report on all of the above unless otherwise specified.
  • Alphabet contributors: Please note that unless additional detail is specified, activity counts attributed to Alphabet open source contributors will include our full-time employees as well as our extended Alphabet community (temps, vendors, contractors, and interns).
  • Active counts: Where possible, we will show ‘active users’ defined by logged activity within a specified timeframe (i.e. in month, year, etc) and ‘active repositories’ as those that have not been archived.
  • Activity types: This year we explore GitHub activity types in more detail. Note that in some cases we have removed “Watch Events” or articulated this as passive engagement. Additionally, GitHub added an event type “PullRequestReviewEvent” that started logging activity in August 2020, but we chose to remove this from our charts and aggregate counts as it invalidates year over year comparisons.
By Sophia Vargas, Research Analyst – Google Open Source Programs Office

Peer Bonus Experiences: The many ways in which you can contribute to open source

Recently, I was awarded a Google Open Source Peer Bonus, which I’m grateful for, as it proved to me that one can contribute value to open source projects, and build a career in it, without extensive experience coding. So how can someone with limited coding skills like me contribute to open source in a meaningful way?

Documentation

Documentation is important across open source and especially helpful to those who are new to a project! Developers and maintainers of projects are often focused on fixing bugs and improving the software. Therefore, documentation is harder to prioritize, so contributions to documentation are highly appreciated. Being experienced with applications won’t always help you in writing the documentation, since familiarity can cause you to miss a step when creating the doc. This is why, as a beginner, you are in an excellent position to ensure that instructions and step-by-step guides are easy to follow, don’t skip vital steps, and don’t use off-putting language.

If you have the opportunity to get involved in programs like Season of Docs as a mentor or a participant, as I did in 2019, the experience is hugely rewarding!

Events and Conferences

If you can help with mailing lists or organizing events, you can get involved in the community! In 2006, I became involved with the nascent Open Source Geospatial Foundation (OSGeo), where I was persuaded to set up a local chapter in the United Kingdom (going strong 14 years later!). It was one of the best things I could’ve done. This year we hosted a global conference (FOSS4G) and several UK events, including an online-only event. We’ve also managed to financially support a number of open source projects by providing an annual sponsorship, or by contributing to the funding of a specific improvement. I’ve met so many great people through my involvement in OSGeo, some of which have become colleagues and good friends.
The group meeting at FOSS4G 2013 in NottinghamAdd caption

If you’re interested in writing case studies, you can always speak about your experiences at conferences. Evidence that particular packages can be used successfully in real-world situations are incredibly valuable, and can help others put together business cases for considering an open source solution.

Assisting others

Sometimes the problems you face with technology can be experienced by money, and by open-sourcing your solution you could be impacting a lot of people. When I first started using open source software, the packages I needed were often hard to install and configure on Windows, having to be started using the command prompt, which can be intimidating for beginners. To scratch a problem-solving itch, I packaged them up onto a USB stick, added some batch files to make them load properly from an external drive, added a little menu for starting them, and Portable GIS was born. After 12 years, a few iterations, a website and a GitLab repository, it has been downloaded thousands of times, and is used in situations such as disaster relief, where installing lots of software rapidly on often old PCs is not really an option.

Mentoring Others

Once you are proficient in something, use your knowledge to help others. Some existing platforms for software use and development (online repositories like GitHub or GitLab) are extremely intimidating to new users, and create barriers to participation. If you can help people get over the fear-inducing first pull request, you will empower them to keep on contributing. My first pull request was a contribution to the Vaguely Rude Place-names map back in 2013 and since then I’ve run few training events along a similar line at conferences.

Open source is now fundamental to my career—16 years after learning about it—and something I am truly passionate about. It has shaped my life in many ways. I hope that my experiences might help someone who isn’t versed in code to get involved, realizing that their contributions are equally as valuable as bug fixes and patches.

By Jo Cook, Astun Technology—Guest Author

Open source by the numbers at Google

At Google, open source is at the core of our infrastructure, processes, and culture. As such, participation in these communities is vital to our productivity. Within OSPO (Open Source Programs Office), our mission is to bring the value of open source to Google and the resources of Google to open source. To ensure our actions match our commitment, in this post we will explore a variety of metrics intended to increase context, transparency, and accountability across all of the communities we engage with.

Why we contribute: Open source has become a pervasive component in modern software development, and Google is no exception. We use thousands of open source projects across our internal infrastructure and products. As participants in the ecosystem, our intentions are twofold: give back to the communities we depend on as well as expand support for open source overall. We firmly believe in open source and its ability to bring together users, contributors, and companies alike to deliver better software.

The majority of Google’s open source work is done within one of two hosting platforms: GitHub and git-on-borg, Google’s production Git service which integrates with Gerrit for code review and access control. While we also allow individual usage of Bitbucket, GitLab, Launchpad, and other platforms, this analysis will focus on GitHub and git-on-borg. We will continue to explore how best to incorporate activity across additional channels.

A little context about the numbers you’ll read below:
  • Business and personal: While git-on-borg hosts both internal and external Google created repos, GitHub is a mixture of Google projects, experimental efforts and personal projects created by Googlers.
  • Driven by humans: We have created many automated bots and systems that can propose changes on both hosting platforms. We have intentionally filtered these data to ensure we are only showing human initiated activities.
  • GitHub data: We are using GH Archive as the primary source for GitHub data, which is currently available as a public dataset on BigQuery. Google activity within GitHub is identified by self registered accounts, which we anticipate under reports actual usage as employees acclimate to our policies.
  • Active counts: Where possible, we will show ‘active users’ and ‘active repositories’ defined by logged activity within each specified timeframe (for GH archive data, that’s any event type logged in the public GitHub event stream).
As numbers mean nothing without scale, let’s start by defining our applicable community: In 2019, more than 9% of Alphabet’s full time employees actively contributed to public repositories on git-on-borg and GitHub. While single digit, this percentage represents a portion of all full time Alphabet employees—from engineers to marketers to admins, across every business unit in Alphabet—and does not include those who contribute to open source projects outside of code. As our population has grown, so has our registered contributor base:
This chart shows the aggregate per year counts of Googlers active on public repositories hosted on GitHub and git-on-borg

What we create: As mentioned above, our contributing population works across a variety of Google, personal, and external repositories. Over the years, Google has released thousands of open source projects (many of which span multiple repositories) and ~2,600 are still active. Today, Google hosts over 8,000 public repositories on GitHub and more than 1,000 public repositories on git-on-borg. Over the last five years, we have doubled the number of public repos, growing our footprint by an average of 25% per year.

What we work on: In addition to our own repositories, we contribute to a wide pool of external projects. In 2019, Googlers were active in over 70,000 repositories on GitHub, pushing commits and/or opening pull requests on over 40,000 repositories. Note that more than 75% of the repos with Googler-opened pull requests were outside of Google-managed organizations (on GitHub).
This charts shows per year counts of activities initiated by Googlers on GitHub

What we contribute: For contribution volume on GitHub, we chose to focus on push events, opened, and merged pull requests instead of commits as this metric on its own is difficult to contextualize. Note that push events and pull requests typically include one or more commits per event. In 2019, Googlers created over 570,000 issues, opened over 150,000 pull requests, and created more than 36,000 push events on GitHub. Since 2015, we have doubled our annual counts of issues created and push events, and more than tripled the number of opened pull requests. Over the last five years, more than 80% of pull requests opened by Googlers have been closed and merged into active repositories.

How we spend our time: Combining these two classes of metrics—contributions and repos—provides context on how our contributors focus their time. On GitHub: in 2015, about 40% of our opened pull requests were concentrated in just 25 repositories. However, over the next four years, our activity became more distributed across a larger set of projects, with the top 25 repos claiming about 20% of opened pull requests in 2019. For us, this indicates a healthy expansion and diversification of interests, especially given that this activity represents both Google, as well as a community of contributors that happen to work at Google.
This chart splits the total per year counts of Googler created pull requests on GitHub by Top 25 repos vs the remainder ranked by number of opened pull requests per repo per year.

Open source contribution is about more than code

Every day, Google relies on the health and continuing availability of open source, and as such we actively invest in the security and sustainability of open source and its supply chain in three key areas:
  • Security: In addition to building security projects like OpenTitan and gVisor, Google’s OSS-Fuzz project aims to help other projects identify programming errors in software. As of the end of 2019, OSS-Fuzz had over 250 projects using the project, filed over 16,000 bugs, including 3,500 security vulnerabilities.
  • Community: Open source projects depend on communities of diverse individuals. We are committed to improving community sustainability and growth with programs like Google Summer of Code and Season of Docs. Over the last 15 years, about 15,000 students from over 105 countries have participated in Google Summer of Code, along with 25,000 mentors in more than 115 countries working on more than 680 open source projects.
  • Research: At the end of 2019, Google invested $1 million in open source research, partnering with researchers at UVM, with the goal to deepen understanding of how people, teams and organizations thrive in technology-rich settings, especially in open-source projects and communities.
Learn more about our open source initiatives at opensource.google.

By Sophia Vargas – Researcher, Google Open Source Programs Office

Paving the way for a more diverse open source landscape: The First OSS Contributor Summit in Mexico

“I was able to make my first contribution yesterday, and today it was merged. I'm so excited about my first steps in open source", a participant said about the First Summit for Open Source Contributors, which took place this September in Guadalajara, México.
How do you involve others in open source? How can we make this space more inclusive for groups with low representation in the field?

With these questions in mind and the call to contribute to software that is powering the world's favorite products, Google partnered with Software Guru magazine, Wizeline Academy, OSOM (a consortium started by Googler, Griselda Cuevas, to engage more Mexican developers in open source), IBM, Intel, Salesforce and Indeed to organize the First Summit for Open Source Contributors in Mexico. The Apache Software Foundation and the CNCF were some of the organizations that sponsored the conference. The event consisted of two days of training and presentations on a selection of open source projects, including Apache Beam, Gnome, Node JS, Istio, Kubernetes, Firefox, Drupal, and others. Through 19 workshops, participants were able to learn about the state of open source in Latin America, and also get dedicated coaching and hands-on practice to become active contributors in OSS. While unpaid, these collaborations represent the most popular way of learning to code and building a portfolio for young professionals, or people looking to do a career shift towards tech.


As reported by many advocacy groups in the past few years, diversity remains a big debt in the tech industry. Only an average of 8.4% of employees in ten of the leading tech companies are Latinx(1). The gap is even bigger in open source software, where only 2.6% of committers to Apache projects are Latinx(2). Diversity in tech is not just the right thing to do, it is also good business: bringing more diverse participation in software development will result in more inclusive and successful products, that serve a more comprehensive set of use cases and needs in any given population.


While representation numbers in the creation of software are still looking grim, the use of OSS is growing fast: It is estimated that Cloud and big-data OSS technologies will grow five times by 2025 in Latin America. The main barrier for contributing? Language. 

The First Summit for Open Source Contributors set out to close this fundamental gap between tech users and its makers. To tackle this problem, we created, in partnership with other companies, 135 hours of content in Spanish for 481 participants, which produced over 200 new contributors across 19 open source projects. When asked why contributions from the region are so low, 41% of participants said it was due to lack of awareness, and 34% said they thought their contributions were not valuable. After the event, 47% of participants reported that the workshops and presentations provided them with information or guidance on how to contribute to specific projects, and 39% said the event helped them to lose fear and contribute. Almost 100% of participants stated that they plan to continue contributing to Open Source in the near future… and if they do, they would raise representation of Latinx in Open Source to 10%.
Organizing Team
This event left us with a lot of hope for the future of diversity and inclusion in open source. Going forward, we hope to continue supporting this summit in Latin America, and look for ways of reproducing this model in other regions of the world, as well as designing proactive outreach campaigns in other formats.

View more pictures of the event here.
View some of the recorded presentations here.


By: María Cruz for Google Open Source

(1) Aggregate data from Tech Crunch: https://techcrunch.com/2019/06/17/the-future-of-diversity-and-inclusion-in-tech/
(2) Data from the last Apache Software Foundation Committer Survey, applied in 2016, 765 respondents (13% of committers)