Author Archives: Open Source Programs Office

Announcing the 2020 first quarter Google Open Source Peer Bonus winners

We are very pleased to announce the latest Google Open Source Peer Bonus winners and their projects.

The Google Open Source Peer Bonus rewards external open source contributors nominated by Googlers for their exceptional contributions to open source. Historically, the program was primarily focused on rewarding developers. Over the years the program has evolved—rewarding not just software engineers but all types of contributors—including technical writers, user experience and graphic designers, community managers and marketers, mentors and educators, ops and security experts. 

In support of diversity, equity and inclusion initiatives worldwide, we had decided to devote this cycle to amazing women in open source, especially since it coincided with celebrating International Women’s Day on March 8. We are very excited and pleased to share the following statistics with you.

We have 56 winners this cycle representing 17 countries all over the world: Australia, Belgium, Canada, Estonia, France, Germany, India, Italy, Japan, Republic of Korea, Netherlands, Russia, Sweden, Switzerland, Ukraine, United Kingdom, and the United States.

Even though the cycle was open to ALL contributors, the number of female nominees went up from 8% to 25% in comparison to the previous cycle. That’s an amazing number celebrating amazing women!

Also, we are very pleased to see the number of docs contributors increase from 7% to 15%. Documentation is the #1 factor for project adoption, so this shift is very important and encouraging. To strengthen this trend and emphasize the importance of documentation in open source, the next cycle will be devoted (but not limited!) to docs contributors.

Below is the list of current winners who gave us permission to thank them publicly:
WinnerProject
Matt Mower
AMP HTML
Sergey Zakharov
Android Open Source Project
Pawel Kozlowski
Angular
Jakob Homan
Apache Airflow, Apache Kafka, Apache Hadoop
Chad Dombrova
Apache Beam
Myrle Krantz
Apache Software Foundation - Diversity and Inclusion committee + board
Katia Rojas
Apache Software Foundation Outreachy Program
Greg Hesp
assistant-relay
Beka Westberg
Blockly
Siebrand Mazeland
Blockly Games
Dave Mielke
BRLTTY
Vijay Hiremath
Chromium; platform/ec
Daniel Stenberg
curl / libcurl
Simon Binder
Dart build system
Aloďs Deniel
device_preview
Fatima Sarah Khalid
Drupal
Gregory Popovitch
Filament
Amr Yousef
Flutter
Remi Rousselet
Flutter
Pooja Bhaumik
Flutter
Elijah Newren
Git
Roger Peppe
Go
Oleksandr Porunov
JanusGraph
Tim Bannister
Kubernetes
June Yi
Kubernetes
Karen Bradshaw
Kubernetes
James Le Cuirot
leptonica
Stefan Weil
leptonica
Egor Pugin
leptonica
Bert Frees
LibLouis
Christian Egli
LibLouis
Richard Hughes
Linux Vendor Firmware Service (LVFS)
James (purpleidea)
mgmt
Mike Ryan
NgRx
Stefano Bonicatti
osquery
Alyssa Rosenzweig
panfrost
Carol Willing
Project Jupyter
Mariatta Wijaya
Python programming language
Alexander Neumann
restic
Nicholas Jamieson
rxjs (core member), rxjs-tslint-rules, rxjs-etc, ts-action
Kate Temkin
Several, mostly educational (see in Reasons)
Alyssa Ross
SpectrumOS / Nix
Rosalind Benoit
Spinnaker
Brian Le
Spinnaker
Vincent Demeester
Tekton
Chmouel Boudjnah
Tekton
Andrea Frittoli
Tekton
Simon Kaegi
Tekton
Cameron Shorter
The Good Docs Project
Ando Saabas
TreeInterpreter
Daz Wilkin
Trillian, Prometheus Exporter for GCP, KeyTransparency , OpenCensus
Gerrit Birkeland
typedoc
Wilson Snyder
Verilator
Thomas Oster
VisiCut
Koen Kanters
zigbee2mqtt
Jia Li
Zone.js
Congratulations to our winners! We look forward to your continued support and contributions to open source!

By Maria Tabak, Google Open Source

A milestone to celebrate: 10 years of GCI!

 
This year we celebrated the best of program milestones—10 years of bringing together 13-17 year old students from around the world into open source software development with our Google Code-in (GCI) contest. The contest wrapped up in January with our largest numbers ever; 3,566 students from 76 countries completed an impressive 20,840 tasks during the 7-week contest!

Students spent their time working online with mentors from 29 open source organizations that provided help to answer questions and guide students throughout the contest. The students wrote code, edited and created documentation, designed UI elements and logos, and conducted research. Additionally, they developed videos to teach others about open source software and found (and fixed!) hundreds of bugs.

Overview

  • 2,605 students completed three or more tasks (earning a Google Code-in 2019 t-shirt)
  • 18.5% of students were girls
  • 79.8% of students were first time participants in GCI (same percentage as in 2018- weird!)
  • We saw very large increases in the number of students from Japan, Mongolia, Sri Lanka, and Taiwan.

Student Age


Participating Schools

School NameNumber of Student ParticipantsCountry
Dunman High School138Singapore
Liceul Teoretic ''Aurel Vlaicu''47Romania
Indus E.M High School46India
Sacred Heart Convent Senior Secondary School34India
Ananda College29Sri Lanka

Students from 1,900 schools (yes, exactly 1,900!) competed in this year’s contest; plus, 273 students were homeschooled. Many students learn about GCI from their friends or teachers and continue to spread the word to their classmates. This year the top five schools that had the most students with completed tasks were:

Countries

The chart below displays the top 10 countries with students who completed at least 1 task.

We are thrilled that Google Code-in was so popular this year!

Thank you again to the people who make this program possible: the 895 mentors—from 59 countries—that guided students through the program and welcomed them into their open source communities.

By Stephanie Taylor, Google Open Source

Free Universal Sound Separation

We are happy to announce the release of FUSS: the Free Universal Sound Separation dataset.

Audio recordings often contain a mixture of different sound sources; Universal sound separation is the ability to separate such a mixture into its component sounds, regardless of the types of sound present. Previously, sound separation work has focused on separating mixtures of a small number of sound types, such as "speech" versus "nonspeech", or different instances of the same type of sound, such as speaker #1 versus speaker #2. Often in such work, the number of sounds in a mixture is also assumed to be known a priori. The FUSS dataset shifts focus to the more general problem of separating a variable number of arbitrary sounds from one another.

One major hurdle to training models in this domain is that even if you have high-quality recordings of sound mixtures, you can't easily annotate these recordings with ground truth. High-quality simulation is one approach to overcome this limitation. To achieve good results, you need a diverse set of sounds, a realistic room simulator, and code to mix these elements together for realistic, multi-source, multi-class audio with ground truth. With FUSS, we are releasing all three of these.

FUSS relies on Creative Commons licensed audio clips from freesound.org. We filtered these by license type, then using a pre-release of FSD50k [1], further filtered out sounds that aren't separable by humans when mixed together. We were left with about 23 hours of audio, consisting of 12,377 sounds useful for mixing (7,237 train, 2,883 validation, 2,257 eval). Using these clips, we created 20,000 training mixtures, 1,000 validation mixtures, and 1,000 eval mixtures.

We developed our own room simulator implemented in tensorflow, which generates the impulse response of a box shaped room with frequency-dependent reflective properties given a sound source location and a mic location. As part of the dataset release, we provide pre-calculated room impulse responses used for each audio sample along with mixing code, so the research community can simulate novel audio without running the computationally expensive room simulator. Future work may include releasing the code for our room simulator and extending the simulator capabilities to address more extensive acoustic properties of rooms, materials with different reflective properties, novel room shapes, etc.

Finally, we have released a masking-based separation model, based on an improved time-domain convolutional network (TDCN++), described in our recent publications [2, 3]. On the eval set, this model achieves 12.5 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source mixtures with 37.6 dB absolute SI-SNR.

Source audio, reverb impulse responses, reverberated mixtures and sources created by the mixing code, and a baseline model checkpoint are available for download. Code for reverberating and mixing the audio data and for training the released model is available on our github page.

The dataset will also be used in the DCASE challenge, as a component of the Sound Event Detection and Separation task. The released model will serve as a baseline for this competition, and a benchmark to demonstrate progress against in future experiments.

Our hope is this dataset will lower the barrier to new research, and particularly will allow for fast iteration and application of novel techniques from other machine learning domains to the sound separation challenge.

By John Hershey, Scott Wisdom, and Hakan Erdogan, Google Research

References:
[1] Eduardo Fonseca, Jordi Pons, Xavier Favory, Frederic Font Corbera, Dmitry Bogdanov, Andrés Ferraro, Sergio Oramas, Alastair Porter, and Xavier Serra. "Freesound Datasets: A Platform for the Creation of Open Audio Datasets." International Society for Music Information Retrieval Conference (ISMIR), pp. 486–493. Suzhou, China, 2017.
[2] Ilya Kavalerov, Scott Wisdom, Hakan Erdogan, Brian Patton, Kevin Wilson, Jonathan Le Roux, and John R. Hershey. "Universal Sound Separation." IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), pp. 175-179. New Paltz, NY, USA, 2019.
[3] Efthymios Tzinis, Scott Wisdom, John R. Hershey, Aren Jansen, and Daniel P. W. Ellis. "Improving Universal Sound Separation Using Sound Classification." IEEE International Conference on Acoustics Speech and Signal Processing (ICASSP), 2020.

Code Search for Google open source projects

We are pleased to launch Code Search for Google open source projects. Code Search is one of Google’s most popular internal tools, and now we have a version (same binary, different flags) targeted to open source communities.

Googlers use Code Search every day to help understand the codebase: they search for half-remembered functions and usages; jump through the codebase to figure out what calls the function they are viewing; and try to identify when and why a particular line of code changed.

The Code Search tool gives a rich code browsing experience. For example, the blame button shows which user last changed each line and you can display history on the same page as the file contents. In addition, it supports a powerful search language and, for some repositories, cross-references.

Suggest-as-you-type in any search box annotates suggestions with the type of code object, the repository and the path, helping users find what they want faster.


The search language supports regular expressions and a number of helpful search atoms. For a user looking for a function foo in a Go file, instead of sifting through thousands of results containing foo, the user can search for lang:go function:foo to limit search results to Go files where foo is a function and not a struct or a word in a comment.

One example is finding a file using only part of the name. The query file:KytheURI.java goes directly to the file, since there is only one such file.

See the quick reference for more information.

In addition to text search, some of the open source repositories have cross-references powered by Kythe. Kythe is a Google open source project that includes tools to help understand code. Project owners instrument a build of their repository to output compilation information for Kythe. Kythe tools convert this data to a graph. This graph connects definitions to declarations and code references to the abstract objects they represent (described by a graph schema). Google then runs an internal pipeline that combines these graphs for the different languages, prunes unnecessary pieces, and optimizes it for serving cross-references. The whole process runs several times per day to keep the data fresh.

Open source communities use a broader set of build systems than Google. In order to support cross-references, Kythe added drop-in support for Bazel, CMake, Maven, and Go. Projects using other build systems can use Kythe-provided wrappers for clang and javac to instrument their builds; these are used by Chromium and Android AOSP to provide compilation information for Kythe.

Because Kythe is based on the build, Kythe cross-references include links to files generated as part of the build process, such as Java files generated for AutoValues (example here) or protos. For repositories where cross-references are enabled, clicking on a symbol will take you to a definition of that symbol.

Clicking on the definition of a symbol will open a cross-reference panel, showing all the places where that symbol is referenced. For example, clicking on toVName below, we can see the places that reference this method. One of the callers is parseVName, and clicking on that shows the callers of that method.



At this time, we only provide search on the repositories listed below, but we plan to add more over time:
  • Angular
  • Bazel (with cross-references)
  • Dart
  • ExoPlayer
  • Firebase SDK
  • Flutter
  • Go (with cross-references)
  • gVisor (with cross-references)
  • Kythe (with cross-references)
  • Nomulus (with cross-references)
  • Outline
  • Tensorflow (with cross-references)
We are also investing in making the application keyboard-navigable and usable with a screen reader.

We hope you find this tool useful!

By Kris Hildrum, Code Search Team

Kpt: Packaging up your Kubernetes configuration with git and YAML since 2014

Kubernetes configuration manifests have become an industry standard for deploying both custom and off-the-shelf applications (as well as for infrastructure). Manifests are combined into bundles to create higher-level deployable systems as well as reusable blueprints (such as a product offering, off the shelf software, or customizable starting point for a new application).

However, most teams lack the expertise or desire to create bespoke bundles of configuration from scratch and instead: 1) either fork them from another bundle, or 2) use some packaging solution which generates manifests from code.

Teams quickly discover they need to customize, validate, audit and re-publish their forked/ generated bundles for their environment. Most packaging solutions to date are tightly coupled to some format written as code (e.g. templates, DSLs, etc). This introduces a number of challenges when trying to extend, build on top of, or integrate them with other systems. For example, how does one update a forked template from upstream, or how does one apply custom validation?

Packaging is the foundation of building reusable components, but it also incurs a productivity tax on the users of those components.

Today we’d like to introduce kpt, an OSS tool for Kubernetes packaging, which uses a standard format to bundle, publish, customize, update, and apply configuration manifests.

Kpt is built around an “as data” architecture bundling Kubernetes resource configuration, a format for both humans and machines. The ability for tools to read and write the package contents using standardized data structures enables powerful new capabilities:
  • Any existing directory in a Git repo with configuration files can be used as a kpt package.
  • Packages can be arbitrarily customized and later pull in updates from upstream by merging them.
  • Tools and automation can perform high-level operations by transforming and validating package data on behalf of users or systems.
  • Organizations can develop their own tools and automation which operate against the package data.
  • Existing tools and automation that work with resource configuration “just work” with kpt.
  • Existing solutions that generate configuration (e.g. from templates or DSLs) can emit kpt packages which enable the above capabilities for them.

Example workflow with kpt

Now that we’ve established the benefits of using kpt for managing your packages of Kubernetes config, lets walk through how an enterprise might leverage kpt to package, share and use their best practices for Kubernetes across the organization.


First, a team within the organization may build and contribute to a repository of best practices (pictured in blue) for managing a certain type of application, for example a microservice (called “app”). As the best practices are developed within an organization, downstream teams will want to consume and modify configuration blueprints based on them. These blueprints provide a blessed starting point which adheres to organization policies and conventions.

The downstream team will get their own copy of a package by downloading it to their local filesystem (pictured in red) using kpt pkg get. This clones the git subdirectory, recording upstream metadata so that it can be updated later.

They may decide to update the number of replicas to fit their scaling requirements or may need to alter part of the image field to be the image name for their app. They can directly modify the configuration using a text editor (as would be done before). Alternatively, the package may define setters, allowing fields to be set programmatically using kpt cfg set. Setters streamline workflows by providing user and automation friendly commands to perform common operations.

Once the modifications have been made to the local filesystem, the team will commit and push their package to an app repository owned by them. From there, a CI/CD pipeline will kick off and the deployment process will begin. As a final customization before the package is deployed to the cluster, the CI/CD pipeline will inject the digest of the image it just built into the image field (using kpt cfg set). When the image digest has been set, the CI/CD pipeline can send the manifests to the cluster using kpt live apply. Kpt live operates like kubectl apply, providing additional functionality to prune resources deleted from the configuration and block on rollout completion (reporting status of the rollout back to the user).

Now that we’ve walked through how you might use kpt in your organization, we’d love it if you’d try it out, read the docs, or contribute.

One more thing

There’s still a lot to the story we didn’t cover here. Expect to hear more from us about:
  • Using kpt with GitOps
  • Building custom logic with functions
  • Writing effective blueprints with kpt and kustomize
By Phillip Wittrock, Software Engineer and Vic Iglesias, Cloud Solutions Architect

OpenTelemetry is now beta!

OpenTelemetry and OpenCensus have been a critical part of our goal of making platforms like Kubernetes more observable and more manageable. This has been a multi-year journey for us, from creating OpenCensus and growing it into a core part of major web services’ observability stack, to our announcement of OpenTelemetry last year and the rapid growth of the OpenTelemetry community.

Beta is a big milestone for OpenTelemetry, as developers can now use the SDKs, integrations, and Collector to capture distributed traces and metrics from their applications and send them to backends like Prometheus, Jaeger, Cloud Monitoring, Cloud Trace, and others for analysis. This is a great time to try out OpenTelemetry and get involved in the observability community— whether you’re looking to improve your visibility into production services, giving your users performance data from client libraries that you maintain—or want to join a rapidly-growing open source project!

To learn more, please read our official community announcement, which copied below:

Co-authored by maintainers, community contributors, and members of the OpenTelemetry governance committee.

OpenTelemetry has just begun its first wave of beta releases, starting with the Collector and the Erlang, Go, Java, JavaScript, and Python SDKs, followed by the .Net SDK and Java auto-instrumentation agent. This means that you can begin integrating OpenTelemetry into your applications and client libraries to capture app-level metrics and distributed traces.

If you’re not already familiar with OpenTelemetry, the project provides a single set of language-specific APIs, SDKs, agents, and other components that you can use to collect distributed traces, metrics, and related metadata from your applications. In addition to its core capabilities, much of OpenTelemetry’s utility comes from integrations for HTTP and RPC libraries, storage clients, etc. that allow developers to capture critical observability data from their applications with almost zero effort. After capturing these signals, each OpenTelemetry component can export them to your backends of choice, including Prometheus, Jaeger, Zipkin, Azure Monitor, Dynatrace, Google Cloud Monitoring + Trace, Lightstep, New Relic, and Splunk.

This first beta release includes:
  • APIs and SDKs for Erlang, Go, Java, JavaScript, and Python, which include the interfaces and implementations that you need to define and create distributed traces and metrics, manage sampling and context propagation, etc. The .Net API + SDK will follow shortly.
  • Language-specific API integrations for at least one popular HTTP framework, gRPC, and at least one popular storage client, which can be enabled with one line of code, and will automatically capture relevant traces and metrics and handle context propagation.
  • Language-specific exporters that allow SDKs to send captured traces and metrics to any supported backends.
  • The OpenTelemetry Collector, which can receive data from OpenTelemetry SDKs and other sources, and then export this telemetry to any supported backend.
  • Auto-Instrumentation for Java that captures telemetry from 47 Java libraries and frameworks without requiring any modification to your application.
  • Documentation for each component including getting started guides.
As these and subsequent OpenTelemetry components enter beta (requirements and release plan), we are declaring that they are ready to start integrating with. This means that service developers can begin to include OpenTelemetry in their applications and that maintainers of storage, RPC, etc. clients should start testing the OpenTelemetry APIs to provide better observability of their users.

However, this does come with some caveats:
  • Each OpenTelemetry component will likely undergo several beta releases in the coming weeks — this is simply the first.
  • While functional, beta components have not gone through thorough testing or benchmarking and they are not intended for production workloads.
  • While we aim to avoid any major changes to the OpenTelemetry APIs between beta and GA release candidates, we cannot guarantee that there will not be any changes during this period.
  • Some functionality is still missing from the first beta and will be added in subsequent releases; this is documented in each component’s GitHub repository.
In the coming weeks, you can expect additional beta releases from the first wave of OpenTelemetry components and others. In particular, we expect the API + SDK for .Net and the Java auto-instrumentation agent to be ready soon. Eventually, components will reach a level of maturity and testing where we’ll feel confident in naming them a release candidate (RC), after which we will not make any breaking changes to the APIs for that component.

This beta milestone is a huge accomplishment for the OpenTelemetry community, and every contributor should be proud of the fact that OpenTelemetry is now working and ready to integrate with. This is a great opportunity for the maintainers of client libraries to begin integrating with the OpenTelemetry APIs, for end-users to start integrating it into their services, and for anyone interested in contributing to join our rapidly growing community by joining our mailing lists, Gitter chats, and the monthly community meeting!

By Morgan McLean, Product Manager

Semantic Reactor: A tool for experimenting with NLU models

Companies are using natural language understanding (NLU) to create digital personal assistants, customer service bots, and semantic search engines for reviews, forums and the news.

However, the perception that using NLU and machine learning is costly and time consuming prevents a lot of potential users from exploring its benefits.

To dispel some of the intimidation of using NLU, and to demonstrate how it can be easily used with pre-trained, generic models, we have released a tool, the Semantic Reactor, and open-sourced example code, The Mystery of the Three Bots.

The Semantic Reactor

The Semantic Reactor is a Google Sheets Add-On that allows the user to sort lines of text in a sheet using a variety of machine-learning models. It is released as a whitelisted experiment, so if you would like to check it out, fill out this application at the Google Cloud AI Workshop. Once approved, you’ll be emailed instructions on how to install it.

The tool offers ranking methods that determine how the list will be sorted. With the semantic similarity method, the lines more similar in meaning to the input will be ranked higher.



With the input-response method, the lines that are the most appropriate conversational responses are ranked higher.

Why use the Semantic Reactor?

There are a lot of interesting things you can do with the Semantic Reactor, but let’s look at the following two:
  • Writing dialogue for a bot that exists within a well-defined environment and has a clear purpose (like a customer service bot) using semantic similarity.
  • Searching within large collections of text, like from a message board. For that, we will use input-response.

Writing Dialogue for a Bot Using Semantic Similarity

For the sake of an example, let’s say you are writing dialogue for a bot that answers questions about a product, in this case, cookies.

If you’ve been running a cookie hotline for a while, you probably can list the most common cookie questions. With that data, you can create your cookie bot. Start by opening a Google Sheet and writing the common questions and answers (questions in the A column, answers in the B).

Here is the start of what that Sheet might look like. Make a copy of the Sheet, which will allow you to use the Semantic Reactor Add-on. Use the tool to experiment with new QA pairs and how each model reacts to them.

Here are a few queries to try, using the semantic similarity rank method:

Query: What are cookie ingredients?
Returns: What are cookies made of?

Query: Are cookies biscuits?
Returns: Are cookies also called biscuits?

Query: What should I serve with cookies?
Returns: What drinks go well with cookies?



Of course, that small list of responses won’t cover many of the questions people will ask your cookie bot. What the Reactor allows you to do is quickly add new QA pairs as you learn about what your users want to ask.

For example, maybe people are asking a lot about cookie calories.

You’d write the new question in column A, and the new answer in column B, and then test a few different phrasings with the Reactor. You might need to tweak the target response a few times to make sure it matches a wide variety of phrasings. You should also experiment with the three different models to see which one performs the best.

For instance, let’s say the new target question you want the model to match to is: “How many calories does a typical cookie have?”

That question might be phrased by users as:
  • Are cookies caloric?
  • A lot of calories in a cookie?
  • Will cookies wreck my diet?
  • Are cookies fattening?


The more you test with live users, the more you’ll find that they phrase their questions in ways you don’t expect. As with all things based on machine learning, constantly refreshing data, testing and improvement is all part of the process.

Searching Through Text Using Input-Response

Sometimes you can’t anticipate what users are going to ask, and sometimes you might be dealing with a lot of potential responses, maybe thousands. In cases like that, you should use the input-response ranking method. That means the model will examine the list of potential responses and then rank each one according to what it thinks is the most likely response.

Here is a Sheet containing a list of simple conversational responses. Using the input-response ranking method, try a few generic conversational openers like “Hello” or “How’s it going?”

Note that in input-response mode, the model is predicting the most likely conversational response to an input and not the most semantically similar response.

Note that “Hello,” in input-response mode, returns “Nice to meet you.” In semantic similarity mode, “Hello” returns what the model thinks is semantically closest to “Hello,” which is “What’s up?”

Now try your own! Add potential responses. Switch between the models and ranking methods to see how it changes the results (be sure to hit the “reload” button every time you add new responses).

Example Code

One of the models available on TensorFlow Hub is the Universal Sentence Encoder Lite. It’s only 1.6MB and is suitable for use within websites and on-device applications.

An open sourced sample game that uses the USE Lite is Mystery of the Three Bots on Github. It’s a simple demonstration that shows how you can use a small semantic ML model to drive conversations with game characters. The corpora the game uses were created and tested using the Semantic Reactor.

You can play a running version of the game here. You can experiment with the corpora of two of the characters, the Maid and the Butler, contained within this Sheet. Be sure to make a copy of the Sheet so you can edit and add new QA pairs.

Where To Get The Models Used Within The Semantic Reactor

All of the models used in the Semantic Reactor are published and available online.
  • Local – Minified TensorFlow.js version of the Universal Sentence Encoder.
  • Basic Online – Basic version of the Universal Sentence Encoder.
  • Multilingual Online – Universal Sentence Encoder trained on question/ answer pairs in 16 languages.

Final Thoughts

These language models are far from perfect. They use their training to give a best estimate on what to return based on the list of responses you gave it. Machine learning is about calculation, prediction, and training. Models can be improved over time with more data and tuning, and in turn, be made more accurate.

Also, because conversational models are trained on dialogue between people, and because people are biased, the models will display biases that exist in the data that they were trained on, sometimes in ways you can’t predict. For more on model bias, and more detail about how these models were trained, see the Semantic Experiences for Developers page.

By Ben Pietrzak, Steve Pucci, Aaron Cohen — Google AI  

A Season of Docs story

Lack of clear and reliable documentation is one of the main shortcomings of many open source projects. Last year, Google set out to help change that by announcing the first ever Season of Docs

Season of Docs is an initiative that brings together technical writers and open source projects to collaborate for a few months, benefitting both the communities and writers.

This is the story of Audrey Tavares, one of the writers who signed up for Season of Docs.

Turning incipient curiosity into an opportunity

In 2019, Audrey was completing the Technical and Professional Communication program at Glendon College, exploring technical writing out of curiosity. One of Google’s technical writers, Nicola Yap, completed the same program and visited Audrey’s class in March to talk about her career. It was an enlightening experience, showing technical writing as an attractive alternative with plenty of opportunities, and introducing Audrey to Season of Docs.

For Audrey, this experience meant stepping into unknown territory—she knew nothing about open source software. Naturally, the first step was to familiarize herself with the communities and understand the software development paradigm. After spending time learning she submitted her Technical Writer application—which was accepted—and was assigned to Oppia, an online educational platform.

Main challenges

Audrey had two mentors to help her on her journey: one in India and the other in the United States. As you can imagine, this revealed the first challenge—time zones. While the first few days were stressful, as navigating schedules across time zones was a daunting task,with a little work, they soon came up with an arrangement that worked for everyone.

The second challenge was learning the tools. For most of us, writing a document involves opening a word processor and typing some text, however, Audrey was about to find out, things are a bit more intricate when it comes to documenting code.

When presented with the choice of a documentation tool set, Audrey decided on Write the Docs. It seemed like a very popular tool among open source communities. How hard can it be to use, right? Well, it’s not so much about how difficult it is, but how different it is for someone unfamiliar with a common software development workflow since it entails learning a few things:
Audrey was not dismayed. She pushed forward and gradually learned these new tools. Both mentors were always available, willing to help, and answered all of her questions. Their mentorship was key to her success.

Every end is a new beginning

After Season of Docs was over, Audrey decided to remain part of the Oppia community to actively contribute to make the platform even better.

The experience allowed Audrey to walk away from Season of Docs with a new set of technical skills, communication skills with software engineers, an extended professional network, and a new item in her résumé. She now works as a technical writer for a software company in Toronto.

Applications for Season of Docs 2020 start on April 13 for open source organizations and on May 11 for technical writers. Check the official announcement to learn how to participate.

By Geri Ochoa, Google Cloud

Announcing Season of Docs 2020

Google Open Source is delighted to announce Season of Docs 2020!

Season of Docs brings technical writers and open source projects together for a few months to work on open source documentation. 2019 was the first year of Season of Docs, bringing together open source organizations and technical writers to create 44 successful documentation projects!

Docs are key to open source success

Survey after survey show the importance of good documentation in how developers choose and use open source:
  • 72% of surveyed developers say “Established policies and documentation” is a key decision factor when choosing open source
  • 93% of surveyed developers say “Incomplete or outdated documentation is a pervasive problem” in open source
  • “Lack of documentation” was the top reason developers gave for deciding against using an open source project
Open source communities know this, and still struggle to produce good documentation. Why? Because creating documentation is hard. But...

There are people who know how to do docs well. Technical writers know how to structure a documentation site so that people can find and understand the content. They know how to write docs that fit the needs of their audience. Technical writers can also help optimize a community’s processes for open source contribution and onboarding new contributors.

Season of Docs brings open source projects and technical writers together with the shared goal of creating great documentation. The writers bring their expertise to the projects, and the project mentors help the technical writers learn more about open source and new technologies. Communities gain new docs contributors and technical writers gain valuable open source skills.

Together the technical writers and mentors build a new doc set, improve the structure of the existing docs, develop a much-needed tutorial, or improve contribution processes and guides. See more ideas for technical writing projects.

By working together in Season of Docs we raise awareness of open source, docs, and technical writing.

How does it work?

April 13 – May 4Open source organizations apply to take part in Season of Docs
May 11Google publishes the list of accepted mentoring organizations, along with their ideas for documentation projects
May 11 – July 9Technical writers choose the project they’d like to work on and submit their proposals to Season of Docs
August 10Google announces the accepted technical writer projects
August 11 – September 11Community bonding: Technical writers get to know mentors and the open source community, and refine their projects in collaboration with their mentors
September 11 – December 6Technical writers work with open source mentors on the accepted projects, and submit their work at the end of the period
January 7, 2021Google publishes the list of successfully-completed projects.
See the timeline for details, including the provision for projects that run longer than three months.

Join us

Explore the Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and get ready to apply!

By Erin McKean, Google Open Source

Google and Binomial partner to open source high quality basis universal

Today, Google and Binomial are excited to announce the high quality update to the original Basis Universal release.

Basis Universal allows you to have state of the art web performance with your images, keeping images compressed even on the GPU. Older systems like JPEG and PNG may look small in storage size, but once they hit the GPU they are processed as uncompressed data! The original Basis Universal codec created images that were 6-8 times smaller than JPEG on the GPU while maintaining a similar storage size.

Today we release a high quality Basis Universal codec that utilizes the highest quality formats modern GPUs support, finally bringing the web up to modern GPU texture standards—with cross platform support. The textures are larger in storage size and GPU compressed size, but are still 3-4 times smaller than sending a JPEG or PNG file to be processed on the GPU, and can transcode to a lower quality format for older GPUs.
Original Image by Erol Ahmed from Unsplash.com
Visual comparison of Basis Universal High Quality

Best of all, we are actively working on standardizing Basis Universal with the Khronos Group.

Since our original release in Summer 2019 we’ve seen widespread adoption of Basis Universal in engines like three.js, Babylon.js, Godot, and more, changing what is possible for people to create on the web. Now that a high quality option is available, we expect to see even more adoption and groundbreaking applications created with it.

Please feel free to join our community on Github and check out the full demo there as well. You can also follow standardization efforts via Khronos Group events and forums.

By Stephanie Hurlburt, Binomial and Jamieson Brettle, Chrome Media