Author Archives: Open Source Programs Office

A new chapter for OSS-Fuzz

Cross-posted on the Google Security Blog.

Open source software (OSS) is extremely important to Google, and we rely on OSS in a variety of customer-facing and internal projects. We also understand the difficulty and importance of securing the open source ecosystem, and are continuously looking for ways to simplify it.

For the OSS community, we currently provide OSS-Fuzz, a free continuous fuzzing infrastructure hosted on the Google Cloud Platform. OSS-Fuzz uncovers security vulnerabilities and stability issues, and reports them directly to developers. Since launching in December 2016, OSS-Fuzz has reported over 9,000 bugs directly to open source developers.

In addition to OSS-Fuzz, Google's security team maintains several internal tools for identifying bugs in both Google internal and open source code. Until recently, these issues were manually reported to various public bug trackers by our security team and then monitored until they were resolved. Unresolved bugs were eligible for the Patch Rewards Program. While this reporting process had some success, it was overly complex. Now, by unifying and automating our fuzzing tools, we have been able to consolidate our processes into a single workflow, based on OSS-Fuzz. Projects integrated with OSS-Fuzz will benefit from being reviewed by both our internal and external fuzzing tools, thereby increasing code coverage and discovering bugs faster.

We are committed to helping open source projects benefit from integrating with our OSS-Fuzz fuzzing infrastructure. In the coming weeks, we will reach out via email to critical projects that we believe would be a good fit and support the community at large. Projects that integrate are eligible for rewards ranging from $1,000 (initial integration) up to $20,000 (ideal integration); more details are available here. These rewards are intended to help offset the cost and effort required to properly configure fuzzing for OSS projects. If you would like to integrate your project with OSS-Fuzz, please submit your project for review. Our goal is to admit as many OSS projects as possible and ensure that they are continuously fuzzed.

Once contacted, we might provide a sample fuzz target to you for easy integration. Many of these fuzz targets are generated with new technology that understands how library APIs are used appropriately. Watch this space for more details on how Google plans to further automate fuzz target creation, so that even more open source projects can benefit from continuous fuzzing.

Thank you for your continued contributions to the open source community. Let’s work together on a more secure and stable future for open source software.

By Matt Ruhstaller, TPM and Oliver Chang, Software Engineer, Google Security Team

The big reveal: Google Code-in 2018 winners and finalists

Our 9th consecutive year of Google Code-in (GCI) 2018 ended in mid-December. It was a very, very busy seven weeks for everyone – we had 3,124 students from 77 countries completing 15,323 tasks with a record 27 open source organizations!

Today, we are pleased to announce the Google Code-in 2018 Grand Prize Winners and Finalists with each organization. The 54 Grand Prize Winners from 19 countries completed an impressive 1,668 tasks between them while also helping other students during the contest.

Each of the Grand Prize Winners are invited to a four day trip to Google’s main campus and San Francisco offices in Northern California where they’ll meet Google engineers, meet one of the mentors they worked with during the contest, and enjoy some fun in California with the other winners. We look forward to seeing everyone later this year!
Country # of Winners Country # of Winners
Cameroon 1 Romania 1
Canada 1 Russian Federation 1
Czech Republic 1 Singapore 1
Georgia 1 South Africa 1
India 18 Spain 2
Indonesia 1 Sri Lanka 1
Macedonia 1 Ukraine 2
Netherlands 1 United Kingdom 6
Philippines 1 United States 9
Poland 4

Finalists

And a big congratulations to our 108 Finalists from 26 countries who completed over 2,350 tasks during the contest. The Finalists will all receive a special hoodie to commemorate their achievements in the contest. This year we had 1 student named as a finalist with 2 different organizations!

A breakdown of the countries represented by our finalists can be found below. 
Country # of Finalists Country # of Finalists
Canada 6 Philippines 1
China 2 Poland 15
Czech Republic 1 Russian Federation 2
Germany 1 Serbia 1
India 48 Singapore 2
Indonesia 2 South Korea 1
Israel 1 Spain 1
Kazakhstan 1 Sri Lanka 2
Luxembourg 1 Taiwan 1
Mauritius 2 Thailand 1
Mexico 1 United Kingdom 3
Nepal 1 United States 8
Pakistan 2 Uruguay 1

Mentors

This year we had 790 mentors dedicate their time and invaluable expertise to helping thousands of teenage students learn about open source by welcoming them into their communities. These mentors are the heart of GCI and the reason the contest continues to thrive. Mentors spend hundreds of hours answering questions, reviewing submitted tasks, and teaching students the basics and, in many cases, more advanced aspects of contributing to open source. GCI would not be possible without their enthusiasm and commitment.

We will post more statistics and fun stories that came from GCI 2018 here on the Google Open Source Blog over the next few months, so please stay tuned.

Congratulations to our Grand Prize Winners, Finalists, and all of the students who spent the last couple of months learning about, and contributing to, open source. We hope they will continue their journey in open source!

By Stephanie Taylor, Google Open Source

Wrapping up Google Code-in 2018

We are excited to announce the conclusion of the 9th annual Google Code-in (GCI), our global online contest introducing teenagers to the world of open source development. Over the years the contest has not only grown bigger, but also helped find and support talented young people around the world.

Here are some initial statistics about this year’s program:
  • Total number of students completing tasks: 3,123*
  • Total number of countries represented by students: 77
  • Percentage of girls among students: 17.9% 
Below you can see the total number of tasks completed by students year over year:
*These numbers will increase as mentors finish reviewing the final work submitted by students this morning.
Mentors from each of the 27 open source organizations are now busy reviewing the last  work submitted by participants. We look forward to sharing more statistics about the program, including countries and schools with the most student participants, in an upcoming blog post.

The mentors for each organization will spend the next couple of weeks selecting four Finalists (who will receive a hoodie too!) and their two Grand Prize Winners. Grand Prize Winners will be flown to Northern California to visit Google’s headquarters, enjoy a day of adventure in San Francisco, meet their mentors and hear talks from Google engineers.

Hearty congratulations to all the student participants for challenging themselves and making contributions to open source in the process!

Further, we’d like to thank the mentors and the organization administrators for GCI 2018. They are the heart of this program, volunteering countless hours creating tasks, reviewing student work, and helping bring students into the world of open source. Mentors teach young students about the many facets of open source development, from community standards and communicating across time zones to version control and testing. We couldn’t run this program without you! Thank you!

Stay tuned, we’ll be announcing the Grand Prize Winners and Finalists on January 7, 2019!

By Saranya Sampat, Google Open Source

Knative momentum continues, hits another adoption milestone

Released just four months ago by Google Cloud in collaboration with several vendors, Knative, an open source platform based on Kubernetes which provides the building blocks for serverless workloads, has already gained broad support.

The number of contributors has doubled, more than a dozen companies have contributed each month, and community contributions have increased over 45% since the 0.1 release. It’s an encouraging signal that validates the need for such a project, and suggests that ongoing development will be driven by healthy discussions among users and contributors.

Knative 0.2 Release 

In recent 0.2 release, the first major release since the project’s launch in July, we incorporated 323 pull requests from eight different companies. Knative 0.2 added a new Eventing resource model to complement the Serving and Build components. There were also lots of improvements under-the-hood, such as the implementation of pluggable routing and better support for autoscaling.

KubeCon North America

Continuing the theme, there are 10 sessions about Knative by speakers from seven different companies this week at KubeCon in Seattle. The sessions cover a variety of topics spanning from introductory overview sessions to advanced autoscaler customization. The number of companies represented by speakers illustrates the breadth of the growing Knative community.

Growing Ecosystem 

Another sign of Knative’s momentum is the growing ecosystem. A number of enterprise platform developers have begun using Knative to create serverless solutions on Kubernetes for their own hybrid cloud use-cases. Their use of the Knative API makes for a consistent developer experience and enables workload portability. For example, Pivotal, a top contributor to the Knative project, has adopted Knative alongside Kubernetes which helps them dedicate more resources higher in the stack:
"Since the release of Knative, we've been collaborating on an open functions platform to help companies run their new workloads on every cloud. That’s why we’re excited to launch the alpha of Pivotal Function Service." – Onsi Fakhouri, SVP of Engineering at Pivotal
Similarly, TriggerMesh has launched a hosted serverless management platform that runs on top of Knative, enabling developers to deploy and manage their functions from a central console.
"Knative provides us with the critical building blocks we need to create our serverless management platform." – Sebastien Goasguen, Co-founder, TriggerMesh
We’re excited by the speed with which Knative is being adopted and the broad cross section of the industry that is already contributing to the project. If you haven’t already jumped in, we invite you to get involved! Come visit github.com/knative and join the growing Knative community.

By Mark Chmarny, Knative Team

Google joins the OpenChain Project for license compliance

Google is thrilled to announce that we are joining the OpenChain Project as Platinum Members. OpenChain is an effort to make open source license compliance simpler and more consistent. We will also join the OpenChain board and are excited that Facebook and Uber will be fellow board members.

Over the last 14 years, the Open Source Programs Office (OSPO) at Google has developed rigorous policies and processes so that we can do open source license compliance correctly, and at scale. This helps us use free and open source software extensively across the company and makes it easier to upstream our work. For us, it’s a matter of legal compliance as well as showing respect for the amazing communities that create and maintain the software.

Until now, there’s been no commonly accepted standard for open source compliance within an organization. Most organizations, like Google, have had to invent and cobble together policies and processes, occasionally comparing notes and hoping we haven’t forgotten anything.

The OpenChain Project is changing that by defining the core requirements of a quality compliance program and developing curriculum to help with training and management. It’s hard to overstate the importance of this work now that open source is a critical input at every step in the supply chain, both in hardware and software.

Google believes in this mission and is excited for the opportunity to use what we’ve learned to pave the way for the rest of the industry. We can help guide the development of standards that are rigorous, clear, and easy to follow for companies both large and small.

By Max Sills and Josh Simmons, Google Open Source

TF-Ranking: a scalable TensorFlow library for learning-to-rank

Cross-posted from the Google AI Blog.

Ranking, the process of ordering a list of items in a way that maximizes the utility of the entire list, is applicable in a wide range of domains, from search engines and recommender systems to machine translation, dialogue systems and even computational biology. In applications like these (and many others), researchers often utilize a set of supervised machine learning techniques called learning-to-rank. In many cases, these learning-to-rank techniques are applied to datasets that are prohibitively large — scenarios where the scalability of TensorFlow could be an advantage. However, there is currently no out-of-the-box support for applying learning-to-rank techniques in TensorFlow. To the best of our knowledge, there are also no other open source libraries that specialize in applying learning-to-rank techniques at scale.

Today, we are excited to share TF-Ranking, a scalable TensorFlow-based library for learning-to-rank. As described in our recent paper, TF-Ranking provides a unified framework that includes a suite of state-of-the-art learning-to-rank algorithms, and supports pairwise or listwise loss functions, multi-item scoring, ranking metric optimization, and unbiased learning-to-rank.

TF-Ranking is fast and easy to use, and creates high-quality ranking models. The unified framework gives ML researchers, practitioners and enthusiasts the ability to evaluate and choose among an array of different ranking models within a single library. Moreover, we strongly believe that a key to a useful open source library is not only providing sensible defaults, but also empowering our users to develop their own custom models. Therefore, we provide flexible API's, within which the users can define and plug in their own customized loss functions, scoring functions and metrics.

Existing Algorithms and Metrics Support

The objective of learning-to-rank algorithms is minimizing a loss function defined over a list of items to optimize the utility of the list ordering for any given application. TF-Ranking supports a wide range of standard pointwise, pairwise and listwise loss functions as described in prior work. This ensures that researchers using the TF-Ranking library are able to reproduce and extend previously published baselines, and practitioners can make the most informed choices for their applications. Furthermore, TF-Ranking can handle sparse features (like raw text) through embeddings and scales to hundreds of millions of training instances. Thus, anyone who is interested in building real-world data intensive ranking systems such as web search or news recommendation, can use TF-Ranking as a robust, scalable solution.

Empirical evaluation is an important part of any machine learning or information retrieval research. To ensure compatibility with prior work,  we support many of the commonly used ranking metrics, including Mean Reciprocal Rank (MRR) and Normalized Discounted Cumulative Gain (NDCG). We also make it easy to visualize these metrics at training time on TensorBoard, an open source TensorFlow visualization dashboard.
An example of the NDCG metric (Y-axis) along the training steps (X-axis) displayed in the TensorBoard. It shows the overall progress of the metrics during training. Different methods can be compared directly on the dashboard. Best models can be selected based on the metric.

Multi-Item Scoring

TF-Ranking supports a novel scoring mechanism wherein multiple items (e.g., web pages) can be scored jointly, an extension of the traditional scoring paradigm in which single items are scored independently. One challenge in multi-item scoring is the difficulty for inference where items have to be grouped and scored in subgroups. Then, scores are accumulated per-item and used for sorting. To make these complexities transparent to the user, TF-Ranking provides a List-In-List-Out (LILO) API to wrap all this logic in the exported TF models.
The TF-Ranking library supports multi-item scoring architecture, an extension of traditional single-item scoring.
As we demonstrate in recent work, multi-item scoring is competitive in its performance to the state-of-the-art learning-to-rank models such as RankNet, MART, and LambdaMART on a public LETOR benchmark.

Ranking Metric Optimization

An important research challenge in learning-to-rank is direct optimization of ranking metrics (such as the previously mentioned NDCG and MRR).  These metrics, while being able to measure the performance of ranking systems better than the standard classification metrics like Area Under the Curve (AUC), have the unfortunate property of being either discontinuous or flat. Therefore standard stochastic gradient descent optimization of these metrics is problematic.

In recent work, we proposed a novel method, LambdaLoss, which provides a principled probabilistic framework for ranking metric optimization. In this framework, metric-driven loss functions can be designed and optimized by an expectation-maximization procedure. The TF-Ranking library integrates the recent advances in direct metric optimization and provides an implementation of LambdaLoss. We are hopeful that this will encourage and facilitate further research advances in the important area of ranking metric optimization.

Unbiased Learning-to-Rank

Prior research has shown that given a ranked list of items, users are much more likely to interact with the first few results, regardless of their relevance. This observation has inspired research interest in unbiased learning-to-rank, and led to the development of unbiased evaluation and several unbiased learning algorithms, based on training instances re-weighting. In the TF-Ranking library, metrics are implemented to support unbiased evaluation and losses are implemented for unbiased learning by natively supporting re-weighting to overcome the inherent biases in user interactions datasets.

Getting Started with TF-Ranking

TF-Ranking implements the TensorFlow Estimator interface, which greatly simplifies machine learning programming by encapsulating training, evaluation, prediction and export for serving. TF-Ranking is well integrated with the rich TensorFlow ecosystem. As described above, you can use TensorBoard to visualize ranking metrics like NDCG and MRR, as well as to pick the best model checkpoints using these metrics. Once your model is ready, it is easy to deploy it in production using TensorFlow Serving.

If you’re interested in trying TF-Ranking for yourself, please check out our GitHub repo, and walk through the tutorial examples. TF-Ranking is an active research project, and we welcome your feedback and contributions. We are excited to see how TF-Ranking can help the information retrieval and machine learning research communities.

By Xuanhui Wang and Michael Bendersky, Software Engineers, Google AI

Acknowledgements

This project was only possible thanks to the members of the core TF-Ranking team: Rama Pasumarthi, Cheng Li, Sebastian Bruch, Nadav Golbandi, Stephan Wolf, Jan Pfeifer, Rohan Anil, Marc Najork, Patrick McGregor and Clemens Mewald‎. We thank the members of the TensorFlow team for their advice and support: Alexandre Passos, Mustafa Ispir, Karmel Allison, Martin Wicke, and others. Finally, we extend our special thanks to our collaborators, interns and early adopters: Suming Chen, Zhen Qin, Chirag Sethi, Maryam Karimzadehgan, Makoto Uchida, Yan Zhu, Qingyao Ai, Brandon Tran, Donald Metzler, Mike Colagrosso, and many others at Google who helped in evaluating and testing the early versions of TF-Ranking.

Introducing a Web Component and Data API for Quick, Draw!


Over the past couple years, the Creative Lab in collaboration with the Handwriting Recognition team have released a few experiments in the realm of “doodle” recognition.  First, in 2016, there was Quick, Draw!, which uses a neural network to guess what you’re drawing.  Since Quick, Draw! launched we have collected over 1 billion drawings across 345 categories.  In the wake of that popularity, we open sourced a collection of 50 million drawings giving developers around the world access to the data set and the ability to conduct research with it.

"The different ways in which people draw are like different notes in some universally human scale" - Ian Johnson, UX Engineer @ Google

Since the initial dataset was released, it has been incredible to see how graphs, t-sne clusters, and simply overlapping millions of these doodles have given us the ability to infer interesting human behaviors, across different cultures.  One example, from the Quartz study, is that 86% of Americans (from a sample of 50,000) draw their circles counterclockwise, while 80% of Japanese (from a sample of 800) draw them clockwise. Part of this pattern in behavior can be attributed to the strict stroke order in Japanese writing, from the top left to the bottom right.


It’s also interesting to see how the data looks when it’s overlaid by country, as Kyle McDonald did, when he discovered that some countries draw their chairs in perspective while others draw them straight on.


On the more fun, artistic spectrum, there are some simple but clever uses of the data like Neil Mendoza’s face tracking experiment and Deborah Schmidt’s letter collages.
See the video here of Neil Mendoza mapping Quick, Draw! facial features to your own face in front of a webcam


See the process video here of Deborah Schmidt packing QuickDraw data into letters using OpenFrameworks
Some handy tools have also been released from the community since the release of all this data, and one of those that we’re releasing now is a Polymer component that allows you to display a doodle in your web-based project with one line of markup:

The Polymer component is coupled with a Data API that layers a massive file directory (50 million files) and returns a JSON object or an HTML canvas rendering for each drawing.  Without downloading all the data, you can start creating right away in prototyping your ideas.  We’ve also provided instructions for how to host the data and API yourself on Google Cloud Platform (for more serious projects that demand a higher request limit).  

One really handy tool when hosting an API on Google Cloud is Cloud Endpoints.  It allowed us to launch a demo API with a quota limit and authentication via an API key.  

By defining an OpenAPI specification (here is the Quick, Draw! Data API spec) and adding these three lines to our app.yaml file, an Extensible Service Proxy (ESP) gets deployed with our API backend code (more instructions here):
endpoints_api_service:
name: quickdrawfiles.appspot.com
rollout_strategy: managed
Based on the OpenAPI spec, documentation is also automatically generated for you:


We used a public Google Group as an access control list, so anyone who joins can then have the API available in their API library.
The Google Group used as an Access Control List
This component and Data API will make it easier for  creatives out there to manipulate the data for their own research.  Looking to the future, a potential next step for the project could be to store everything in a single database for more complex queries (i.e. “give me an recognized drawing from China in March 2017”).  Feedback is always welcome, and we hope this inspires even more types of projects using the data! More details on the project and the incredible research projects done using it can be found on our GitHub repo

By Nick Jonas, Creative Technologist, Creative Lab

Editor's Note: Some may notice that this isn’t the only dataset we’ve open sourced recently! You can find many more datasets in our open source project directory.

Google Summer of Code: 15 years strong!

Google Open Source is proud to announce Google Summer of Code (GSoC) 2019 – the 15th year of the program! We look forward to introducing the 15th batch of student developers to the world of open source and matching them with open source projects.

Over the last 14 years GSoC has provided over 14,000 university students from 109 countries with an opportunity to hone their skills by contributing to open source projects during their summer break. Participants gain invaluable experience working directly with mentors on open source projects, and earn a stipend upon successful completion of their project.

We’re excited to keep the tradition going! Applications for interested open source project organizations open on January 15, 2019, and student applications open March 25.

Are you an open source project interested in learning more? Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization and how to prepare your community and your application. We welcome all types of organizations – large and small – and are eager to involve first time projects. Each year, about 20% of the organizations we accept are completely new to GSoC.

Are you a university student keen on learning about how to prepare for the 2019 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. You should read the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in March. You can also get inspired by checking out the 200+ organizations that participated in Google Summer of Code 2018 as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

By Stephanie Taylor, GSoC Program Lead

Google Code-in 2018 contest for teenagers begins today

Today marks the start of the 9th consecutive year of Google Code-in (GCI). This is the biggest and best contest ever and we hope you’ll join us for the fun!

What is Google Code-in?

Our global, online contest introducing students to open source development. The contest runs for 7 weeks until December 12, 2018.

Who can register?

Pre-university students ages 13-17 that have their parent or guardian’s permission to register for the contest.

How do students register and participate?

Students can register for the contest beginning today at g.co/gci. Once students have registered and the parental consent form has been submitted and approved by Program Administrators students can choose which contest “task” they want to work on first. Students choose the task they find interesting from a list of thousands of available tasks created by 27 participating open source organizations. Tasks take an average of 3-5 hours to complete. There are even beginner tasks that are a wonderful way for students to get started in the contest.

The task categories are:
  • Coding
  • Design
  • Documentation/Training
  • Outreach/Research
  • Quality Assurance

Why should students participate?

Students not only have the opportunity to work on a real open source software project, thus gaining invaluable skills and experience, but they also have the opportunity to be a part of the open source community. Mentors are readily available to help answer their questions while they work through the tasks.

Google Code-in is a contest so there are prizes! Complete one task and receive a digital certificate, three completed tasks and you’ll also get a fun Google t-shirt. Finalists earn the coveted hoodie. Grand Prize winners (2 from each organization) will receive a trip to Google headquarters in California!

Details

Over the last 8 years, more than 8,100 students from 107 countries have successfully completed over 40,000 tasks in GCI. Curious? Learn more about GCI by checking out the Contest Rules and FAQs. And please visit our contest site and read the Getting Started Guide.

Teachers, if you are interested in getting your students involved in Google Code-in we have resources available to help you get started.

By Stephanie Taylor, Google Open Source

Building great open source documentation

When developers use, choose, and contribute to open source software, effective documentation can make all the difference between a positive experience or a negative experience.

In fact, a 2017 GitHub survey that “Incomplete or Confusing Documentation” is the top complaint about open source software, TechRepublic writes “Ask a developer what her primary gripes with open source are, and documentation gets top billing, and by a wide margin.”

Technical writers across Google are addressing this issue, starting with open source projects in Google Cloud. We'll be sharing what we learn along the way and are excited to offer this brief guide as a starting point.

Open source software documentation

Providing effective documentation for open source software can build strong, inclusive communities and increase the usage of your product. The same factors that encourage collaborative development in open source projects can have the same positive results with documentation. Open source software provides unique ways to create effective documentation.

When software is open sourced, users are regarded as contributors and can access the source code and the documentation. They’re encouraged to submit additions, fix code, report bugs, and update documentation. Having more contributors can increase the rate at which software and documentation evolve.


The best way to accelerate software adoption is to describe its benefits and demonstrate how to use it. The benefits of timely, effective, and accurate content are numerous. Because documentation can save enough time and money to pay for itself, it:
  • Helps to create inclusive communities
  • Makes for a better software product
  • Promotes product adoption 
  • Reduces cost of ownership
  • Reduces the user’s learning curve
  • Makes for happy users
  • Improves the user interface
When documentation is inadequate, the opposite can occur. Incorrect, old, or missing documentation can:
  • Waste time
  • Cause errors and destroy data
  • Turn away customers
  • Increase support costs
  • Shorten a product’s life span

Why supplying effective documentation can be such a challenge

Useful documentation takes time to write and must be updated with the software. Did the installation process change? Are there better ways to configure performance? Was the user interface modified? Were new features introduced? Updates like these must be explained to new and existing customers.

It’s not enough to add words to a file and call it documentation.

How many times have you tried to use documentation that was five years out-of-date? How long did it take you to find another solution? (Not long, right?) Using old documentation can be like hiking an overgrown trail. The prospects of rogue branches, poison ivy, and getting lost suggest you are unlikely to emerge unscathed.

At first blush, writing documentation may seem optional. However, as a developer, you can literally be so close to your product that you take its features and purpose for granted. But your customers have no idea what you know, or how to apply what you know to address their business challenges. And time is money.

Remember the swimmer who got halfway across the English Channel only to say, “I’m tired,” before turning around and swimming back to shore? Ignoring the need for documentation is like that. Developing a software product gets you some of the way to your goal whereas helpful documentation fulfills your goal.

Momentum matters a lot. If you can settle into a rhythm of implementing new features, fixing bugs in the code, as well as writing and updating documentation, you can propel yourself to success.

Create an inclusive open source community

In the same GitHub survey referenced above, 95% of the respondents were men but evidence suggests that clear documentation:
  • Can contribute to inclusive communities
  • Is more highly valued by those groups who are typically under-represented
Documentation that effectively explains a project's processes, such as contributing to guides and defining codes of conduct, is valued more by groups that are underrepresented in the open source community, such as women.

Factors that can lead to ineffective documentation

Conditions like these almost always compromise the quality of documentation:
  • Belief that it’s enough for open source software to just work.
  • Notion that a specification is as good as instructional text.
  • Idea that casual unreviewed contributions are sufficient.
  • Belief that unscheduled and unspecified software updates are acceptable.
  • Minimal to no style guidelines.

Conquer your documentation challenges

Use these best practices to provide helpful and timely documentation:

1. Identify common terminology

Define and influence the usage and adoption of terminology for your open source project. Use the same terminology in the guides and in the product. Involving a writer early in product development can lead to a natural synergy between the user interface, guides, and training materials.

With clear definitions of terms and consistent usage in the documentation, you can teach your community to speak a common language. As a result, everyone in your products’ ecosystem can communicate more effectively with each other.

2. Provide contribution guidelines

Opensource.com describes exactly why contribution guidelines are so important. Consider how Kubernetes describes the types of documentation contributions your users can make and how to make them.

3. Create a documentation template

To make sure your contributors provide details in a consistent format, consider providing a document template to capture details for common topics such as:
  • Overview
  • Prerequisites
  • Procedural steps
  • What’s next

4. Document new and updated features

When a feature is added or updated, ask that it be documented. You can even provide guidelines to capture key information. Initially, some may think it cumbersome to require that  instructions be provided early in the development process. However, think of documentation to be like testing in that nobody really wants to do it but things work so much better. Sufficient testing and teaching are good for quality and momentum.

Code reviewers and maintainers of open source software have power. Code reviewers can (and should) withhold approval until documentation is sufficient.

Remember, not all changes require documentation updates. Here’s a good rule of thumb:

If an update to a project will require users to change their behavior, then documentation updates may be required.

If not, how will your customers find the new feature you worked so hard to implement? Said another way, if a change doesn't require tests, it probably doesn't require docs, either. Use your best judgement. For example, code refactoring and experimental tweaks need not be tested or documented.

As always, simplify and automate this process as much as possible. At Google, teams can enforce a presubmit check that either looks for a flag that indicates a doc update isn't necessary (presubmit checks for style issues can prevent a lot of arguments, too). We also allow owners of a file to submit changes without a review.

If your team balks at this requirement, remind them that simple project documentation is about sharing the information you have in your head so that many others can access it later without bothering you.

Documentation updates aren't typically onerous! The size of your documentation change scales with the size of your pull request (PR). If your PR contains a thousand lines of code, you may need to write a few hundred lines of documentation. If the PR contains a one-line change, you may need to change a word or two.

Finally, remember that documentation needn’t be perfect, but instead fit for use. What's most important is that key information is clearly conveyed.

5. Conduct regular freshness reviews

At least every quarter, review and update your content. Many hands make for many voices, many typos, and many inconsistencies. Don’t let content persist unchecked for years without periodically confirming it’s still useful.

In conclusion, success breeds success. By effectively documenting open source software, everybody wins.

We hope you'll put this guidance to work and help your open source project become even more successful! We'd love to hear from you if you do, or if you have questions or useful advice to share.

By Janet Davies, Google OpenDocs