Category Archives: Open Source Blog

News about Google’s open source projects and programs

Google’s initiative for more inclusive language in open source projects

Certain terms in open source projects reinforce negative associations and unconscious biases. At Google, we want our language to be inclusive. The Google Open Source Programs Office (OSPO) created and posted a policy for new Google-run projects to remove the terms “slave,” “whitelist,” and “blacklist,” and replace them with more inclusive alternatives, such as “replica,” “allowlist,” and “blocklist.” OSPO required that new projects follow this policy beginning October 2020, and has plans to enforce these changes on more complex, established projects beginning in 2021. 


To ensure this policy was implemented in a timely manner, a small team within OSPO and Developer Relations orchestrated tool and policy updates and an open-source specific fix-it, a virtual event where Google engineers dedicate time to fixing a project. The fix-it focused on existing projects and non-breaking changes, but also served as a reminder that inclusivity is an important part of our daily work. Now that the original fix-it is over, the policy remains and the projects continue.

For more information on why inclusive language matters to us, you can check out Google Developer Documentation Style Guide which contains a section on word-choice with useful, clearer alternatives. Regardless of the phrases used, it is necessary to understand that certain terms reinforce biases and that replacing them is a positive step, both in creating a more welcoming atmosphere for everyone and in being more technically accurate. In short, words matter.


By Erin Balabanian, Open Source Compliance.

Security scorecards for open source projects

When developers or organizations introduce a new open source dependency into their production software, there’s no easy indication of how secure that package is.

Some organizations—including Google—have systems and processes in place that engineers must follow when introducing a new open source dependency, but that process can be tedious, manual, and error-prone. Furthermore, many of these projects and developers are resource constrained and security often ends up a low priority on the task list. This leads to critical projects not following good security best practices and becoming vulnerable to exploits. These issues are what inspired us to work on a new project called “Scorecards” announced last week by the Open Source Security Foundation (OpenSSF). 

Scorecards is one of the first projects being released under the OpenSSF since its inception in August, 2020. The goal of the Scorecards project is to auto-generate a “security score” for open source projects to help users as they decide the trust, risk, and security posture for their use case. Scorecards defines an initial evaluation criteria that will be used to generate a scorecard for an open source project in a fully automated way. Every scorecard check is actionable. Some of the evaluation metrics used include a well-defined security policy, code review process, and continuous test coverage with fuzzing and static code analysis tools. A boolean is returned as well as a confidence score for each security check. Over time, Google will be improving upon these metrics with community contributions through the OpenSSF.

Check out the Security Scorecards project on GitHub and provide feedback. This is just the first step of many, and we look forward to continuing to improve open source security with the community.

By Kim Lewandowski, Dan Lorenc, and Abhishek Arya, Google Security team


Security scorecards for open source projects

When developers or organizations introduce a new open source dependency into their production software, there’s no easy indication of how secure that package is.

Some organizations—including Google—have systems and processes in place that engineers must follow when introducing a new open source dependency, but that process can be tedious, manual, and error-prone. Furthermore, many of these projects and developers are resource constrained and security often ends up a low priority on the task list. This leads to critical projects not following good security best practices and becoming vulnerable to exploits. These issues are what inspired us to work on a new project called “Scorecards” announced last week by the Open Source Security Foundation (OpenSSF). 

Scorecards is one of the first projects being released under the OpenSSF since its inception in August, 2020. The goal of the Scorecards project is to auto-generate a “security score” for open source projects to help users as they decide the trust, risk, and security posture for their use case. Scorecards defines an initial evaluation criteria that will be used to generate a scorecard for an open source project in a fully automated way. Every scorecard check is actionable. Some of the evaluation metrics used include a well-defined security policy, code review process, and continuous test coverage with fuzzing and static code analysis tools. A boolean is returned as well as a confidence score for each security check. Over time, Google will be improving upon these metrics with community contributions through the OpenSSF.

Check out the Security Scorecards project on GitHub and provide feedback. This is just the first step of many, and we look forward to continuing to improve open source security with the community.

By Kim Lewandowski, Dan Lorenc, and Abhishek Arya, Google Security team


Releasing the Healthcare Text Annotation Guidelines

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:
  • The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
  • Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
  • Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).
Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

Releasing the Healthcare Text Annotation Guidelines

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:
  • The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
  • Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
  • Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).
Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

Peer Bonus Experiences: The many ways in which you can contribute to open source

Recently, I was awarded a Google Open Source Peer Bonus, which I’m grateful for, as it proved to me that one can contribute value to open source projects, and build a career in it, without extensive experience coding. So how can someone with limited coding skills like me contribute to open source in a meaningful way?

Documentation

Documentation is important across open source and especially helpful to those who are new to a project! Developers and maintainers of projects are often focused on fixing bugs and improving the software. Therefore, documentation is harder to prioritize, so contributions to documentation are highly appreciated. Being experienced with applications won’t always help you in writing the documentation, since familiarity can cause you to miss a step when creating the doc. This is why, as a beginner, you are in an excellent position to ensure that instructions and step-by-step guides are easy to follow, don’t skip vital steps, and don’t use off-putting language.

If you have the opportunity to get involved in programs like Season of Docs as a mentor or a participant, as I did in 2019, the experience is hugely rewarding!

Events and Conferences

If you can help with mailing lists or organizing events, you can get involved in the community! In 2006, I became involved with the nascent Open Source Geospatial Foundation (OSGeo), where I was persuaded to set up a local chapter in the United Kingdom (going strong 14 years later!). It was one of the best things I could’ve done. This year we hosted a global conference (FOSS4G) and several UK events, including an online-only event. We’ve also managed to financially support a number of open source projects by providing an annual sponsorship, or by contributing to the funding of a specific improvement. I’ve met so many great people through my involvement in OSGeo, some of which have become colleagues and good friends.
The group meeting at FOSS4G 2013 in NottinghamAdd caption

If you’re interested in writing case studies, you can always speak about your experiences at conferences. Evidence that particular packages can be used successfully in real-world situations are incredibly valuable, and can help others put together business cases for considering an open source solution.

Assisting others

Sometimes the problems you face with technology can be experienced by money, and by open-sourcing your solution you could be impacting a lot of people. When I first started using open source software, the packages I needed were often hard to install and configure on Windows, having to be started using the command prompt, which can be intimidating for beginners. To scratch a problem-solving itch, I packaged them up onto a USB stick, added some batch files to make them load properly from an external drive, added a little menu for starting them, and Portable GIS was born. After 12 years, a few iterations, a website and a GitLab repository, it has been downloaded thousands of times, and is used in situations such as disaster relief, where installing lots of software rapidly on often old PCs is not really an option.

Mentoring Others

Once you are proficient in something, use your knowledge to help others. Some existing platforms for software use and development (online repositories like GitHub or GitLab) are extremely intimidating to new users, and create barriers to participation. If you can help people get over the fear-inducing first pull request, you will empower them to keep on contributing. My first pull request was a contribution to the Vaguely Rude Place-names map back in 2013 and since then I’ve run few training events along a similar line at conferences.

Open source is now fundamental to my career—16 years after learning about it—and something I am truly passionate about. It has shaped my life in many ways. I hope that my experiences might help someone who isn’t versed in code to get involved, realizing that their contributions are equally as valuable as bug fixes and patches.

By Jo Cook, Astun Technology—Guest Author

Google Summer of Code 2021 will bring some changes

Google Open Source is pleased to announce the 2021 cycle of the Google Summer of Code (GSoC) program, which will be our 17th consecutive year bringing students into open source communities. Over the past 16 years Google Summer of Code has brought over 16,000 student developers from 111 countries into 715 open source organizations big and small.

Some exciting changes are coming to the 2021 GSoC as we make adjustments to add more flexibility into the program for students and mentors alike.
  • With the pandemic straining folks’ time we are changing the size of the projects and time commitment students are expected to spend on their projects. Starting in 2021, students will be focused on a 175-hour project over a 10-week coding period.
  • As students are learning in many different educational formats in 2020, we are opening up the 2021 program to students 18 years and older who are:
    1. Enrolled in post-secondary academic programs (including college, university, masters program, PhD program and/or undergraduate program, or licensed coding school, etc.) as of May 17, 2021; or,
    2. Have graduated from a post-secondary academic program between December 1, 2020 and May 17, 2021.

We’re excited that GSoC will be able to continue to thrive as we welcome more students from around the world into open source in 2021! Applications for interested open source project organizations open on January 29th, and student applications open March 29, 2021.

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program so having mentors excited about teaching students how to be a part of your community and ready to guide students is key.

Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization, how to prepare your community (hint: have plenty of enthusiastic mentors!), create appropriate project ideas, and tips for preparing your application. We welcome all types of organizations—large and small—and are very eager to involve first time projects. For 2021, we hope to welcome more organizations than ever before and are looking to accept at least 40 into their first GSoC.

Are you a student interested in learning how to prepare for the 2021 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in late-March. You can also get inspired by checking out the 198 organizations that participated in Google Summer of Code 2020, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these changes will help more excited folks apply to be students and mentoring organizations in GSoC 2021!

By Stephanie Taylor, Program Manager—Google Open Source

Google Summer of Code 2021 will bring some changes

Google Open Source is pleased to announce the 2021 cycle of the Google Summer of Code (GSoC) program, which will be our 17th consecutive year bringing students into open source communities. Over the past 16 years Google Summer of Code has brought over 16,000 student developers from 111 countries into 715 open source organizations big and small.

Some exciting changes are coming to the 2021 GSoC as we make adjustments to add more flexibility into the program for students and mentors alike.
  • With the pandemic straining folks’ time we are changing the size of the projects and time commitment students are expected to spend on their projects. Starting in 2021, students will be focused on a 175-hour project over a 10-week coding period.
  • As students are learning in many different educational formats in 2020, we are opening up the 2021 program to students 18 years and older who are:
    1. Enrolled in post-secondary academic programs (including college, university, masters program, PhD program and/or undergraduate program, or licensed coding school, etc.) as of May 17, 2021; or,
    2. Have graduated from a post-secondary academic program between December 1, 2020 and May 17, 2021.

We’re excited that GSoC will be able to continue to thrive as we welcome more students from around the world into open source in 2021! Applications for interested open source project organizations open on January 29th, and student applications open March 29, 2021.

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program so having mentors excited about teaching students how to be a part of your community and ready to guide students is key.

Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization, how to prepare your community (hint: have plenty of enthusiastic mentors!), create appropriate project ideas, and tips for preparing your application. We welcome all types of organizations—large and small—and are very eager to involve first time projects. For 2021, we hope to welcome more organizations than ever before and are looking to accept at least 40 into their first GSoC.

Are you a student interested in learning how to prepare for the 2021 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in late-March. You can also get inspired by checking out the 198 organizations that participated in Google Summer of Code 2020, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these changes will help more excited folks apply to be students and mentoring organizations in GSoC 2021!

By Stephanie Taylor, Program Manager—Google Open Source

Google Summer of Code 2021 will bring some changes

Google Open Source is pleased to announce the 2021 cycle of the Google Summer of Code (GSoC) program, which will be our 17th consecutive year bringing students into open source communities. Over the past 16 years Google Summer of Code has brought over 16,000 student developers from 111 countries into 715 open source organizations big and small.

Some exciting changes are coming to the 2021 GSoC as we make adjustments to add more flexibility into the program for students and mentors alike.
  • With the pandemic straining folks’ time we are changing the size of the projects and time commitment students are expected to spend on their projects. Starting in 2021, students will be focused on a 175-hour project over a 10-week coding period.
  • As students are learning in many different educational formats in 2020, we are opening up the 2021 program to students 18 years and older who are:
    1. Enrolled in post-secondary academic programs (including college, university, masters program, PhD program and/or undergraduate program, or licensed coding school, etc.) as of May 17, 2021; or,
    2. Have graduated from a post-secondary academic program between December 1, 2020 and May 17, 2021.

We’re excited that GSoC will be able to continue to thrive as we welcome more students from around the world into open source in 2021! Applications for interested open source project organizations open on January 29th, and student applications open March 29, 2021.

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program so having mentors excited about teaching students how to be a part of your community and ready to guide students is key.

Visit the program site and read the mentor guide to learn more about what it means to be a mentor organization, how to prepare your community (hint: have plenty of enthusiastic mentors!), create appropriate project ideas, and tips for preparing your application. We welcome all types of organizations—large and small—and are very eager to involve first time projects. For 2021, we hope to welcome more organizations than ever before and are looking to accept at least 40 into their first GSoC.

Are you a student interested in learning how to prepare for the 2021 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in late-March. You can also get inspired by checking out the 198 organizations that participated in Google Summer of Code 2020, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these changes will help more excited folks apply to be students and mentoring organizations in GSoC 2021!

By Stephanie Taylor, Program Manager—Google Open Source

Peer Bonus Experiences: Building tiny models for the ML community with TensorFlow

Almost all the current state-of-the-art machine learning (ML) models take quite a lot of disk space. This makes them particularly inefficient in production situations. A bulky machine learning model can be exposed as a REST API and hosted on cloud services, but that same bulk may lead to hefty infrastructure costs. And some applications may need to operate in low-bandwidth environments, making cloud-hosted models less practical.

In a perfect world, your models would live alongside your application, saving data transfer costs and complying with any regulatory requirements restricting what data can be sent to the cloud. But storing multi-gigabyte models on today’s devices just isn’t practical. The field of on-device ML is dedicated to the development of tools and techniques to produce tiny—yet high performing!—ML models. Progress has been slow, but steady!

There has never been a better time to learn about on-device ML and successfully apply it in your own projects. With frameworks like TensorFlow Lite, you have an exceptional toolset to optimize your bulky models while retaining as much performance as possible. TensorFlow Lite also makes it very easy for mobile application developers to integrate ML models with tools like metadata and ML Model Binding, Android codegen, and others.

What is TensorFlow Lite?

“TensorFlow Lite is a production ready, cross-platform framework for deploying ML on mobile devices and embedded systems.” - TensorFlow Youtube

TensorFlow Lite provides first-class support for Native Android and iOS-based integrations (with many additional features, such as delegates). TensorFlow Lite also supports other tiny computing platforms, such as microcontrollers. TensorFlow Lite’s optimization APIs produce world-class, fast, and well-performing machine learning models.

Venturing into TensorFlow Lite

Last year, I started playing around with TensorFlow Lite while developing projects for Raspberry Pi for Computer Vision, using the official documentation and this course to fuel my initial learning. Following this interest, I decided to join a voluntary working group focused on creating sample applications, writing out tutorials, and creating tiny models. This working group consists of individuals from different backgrounds passionate about teaching on-device machine learning to others. The group is coordinated by Khanh LeViet (TensorFlow Lite team) and Hoi Lam (Android ML team). This is by far one of the most active working groups I have ever seen. And, back in our starting days, Khanh proposed a few different state-of-art machine learning models that were great fits for on-device machine learning:

These ideas were enough for us to start spinning up Jupyter notebooks and VSCode. After months of work, we now have strong collaborations between machine learning GDEs and a bunch of different TensorFlow Lite models, sample applications, and tutorials for the community to learn from. Our collaborations have been fueled by the power of open source and all the tiny models that we have built together are available on TensorFlow Hub. There are numerous open source applications that we have built that demonstrate how to use these models.
The Cartoonizer model cartoonizes uploaded images

Margaret and I co-authored an end-to-end tutorial that was published from the official TensorFlow blog and published the TensorFlow Lite models on TensorFlow Hub. So far, the response we have received for this work has been truly mesmerizing. I’ve also shared my experiences with TensorFlow Lite in these blog posts and conference talks:

A Tale of Model Quantization in TF Lite
Plunging into Model Pruning in Deep Learning
A few good stuff in TF Lite
Doing more with TF Lite
Model Optimization 101

The power of collaboration

The working group is a tremendous opportunity for machine learning GDEs, Googlers, and passionate community individuals to collaborate and learn. We get to learn together, create together, and celebrate the joy of teaching others. I am immensely thankful, grateful, and humbled to be a part of this group. Lastly, I would like to wholeheartedly thank Khanh for being a pillar of support to us and for nominating me for the Google Open Source Peer Bonus Award.

By Sayak Paul, PyImageSearch—Guest Author