Category Archives: Open Source Blog

News about Google’s open source projects and programs

Google Summer of Code 2022 mentoring orgs revealed!


After reviewing over 350 mentoring organization applications, we are excited to announce that 203 open source projects have been selected for Google Summer of Code (GSoC) 2022. This year we are welcoming 32 new organizations to mentor GSoC contributors.

Visit our new program site to view the complete list of GSoC 2022 accepted mentoring organizations. You can drill down into the details for each organization on their program page, including reading more about the project ideas they are looking for GSoC contributors to work on this year.

Are you a developer new to open source interested in participating in GSoC?
If you are a new or beginner open source contributor over 18 years old, we welcome you to apply for GSoC 2022! Contributor applications will open on Monday, April 4, 2022 at 18:00 UTC with Tuesday, April 19, 2022 18:00 UTC being the deadline to submit your application (which includes your project proposal).

The most successful applications come from students who start preparing now. We can’t say this enough—if you want to significantly increase your chances of being selected as a 2022 GSoC Contributor, we recommend you to prepare early. Below are some tips for prospective contributors to accomplish before the application period begins in early April:

  • Watch our short videos: What is GSoC? and Being a GSoC Contributor
  • Check out the Contributor/ Student Guide and Advice for Applying to GSoC doc.
  • Review the list of accepted organizations and find two to four that interest you and read through their Project Ideas lists.
  • When you see an idea that piques your interest, reach out to the organization via their preferred communication methods (listed on their org page on the GSoC program site).
  • Talk with the mentors and community to determine if this project idea is something you would enjoy working on during the program. Find a project that motivates you, otherwise it may be a challenging summer for you and your mentor.
  • Use the information you received during your communications with the mentors and other org community members to write up your proposal.
You can find more information about the program on our website which includes a full timeline of important dates. We also highly recommend reading the FAQ and Program Rules and watching some of our other videos with more details about GSoC for contributors and mentors.

A hearty welcome—and thank you—to all of our mentor organizations! We look forward to working with all of you during Google Summer of Code 2022.

By Stephanie Taylor – Google Open Source

The 2022 Season of Docs application for organizations is open!

Organization applications for the 2022 Season of Docs are now open!

Through Season of Docs, Google awards grants to open source projects and organizations to hire technical writers to work on documentation projects. Participating organizations hire and pay the technical writers directly (we use Open Collective to help transfer grant funds). Organizations have up to six months to complete their documentation project. At the end of the program, organizations submit a case study outlining the results of their documentation projects, including the metrics they used to evaluate the success of their new or improved documentation. The case studies from the 2021 Season of Docs program are available online, and we will be releasing a summary report for the 2021 Season of Docs shortly—join our Season of Docs announcements list to be notified when it’s available! 

How does my organization apply to take part in Season of Docs?


Organization applications are now open! The deadline to apply is March 25, 2022 at 18:00 UTC.

To apply, first read the guidelines for creating an organization application on the Season of Docs website.

Take a look at the examples of project ideas, then create a project proposal based on your open source project’s actual documentation needs. Your goal is to attract technical writers to your organization, making them feel comfortable about approaching the organization and excited about what they can achieve.

We strongly recommend reading through the proposals and case studies submitted by organizations participating in the 2021 Season of Docs.

Organizations can submit their applications here: https://goo.gle/3dRyD7P. Organization applications close on March 25th at 18:00 UTC.

How do technical writers take part in Season of Docs?


Technical writers interested in working with accepted open source organizations can share their contact information via the Season of Docs GitHub repository; or they may submit proposals directly to the organizations using the contact information shared on the organization project page. Technical writers do not submit a formal application through Season of Docs.

Technical writers interested in participating in the 2022 Season of Docs should read our guide for technical writers on the Season of Docs website. Please note that technical writer recruiting began on February 3, 2022.

If you have any questions about the program, please email us at [email protected].

General timeline

February 23 - March 25

Open source organizations apply to take part in Season of Docs

April 14

Google publishes the list of accepted organizations, along with their project proposals and doc development can begin.

June 15

Organization administrators begin to submit monthly evaluations to report on the status of their project.

November 30

Organization administrators submit their case study and final project evaluation.

December 14

Google releases submitted case studies. 

May 2, 2023

Organizations begin to participate in post-program followup surveys.


See the timeline for details.

Join us

Explore the Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and apply now!

By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

A New Library for Network Optimization

Networks are all around us from the electrical circuits inside our computers to the multitude of internet servers that route packets of data around the globe. Even the web itself is a network of pages connected to each other by a myriad of blue links.

A network of rubber bands on a wooden pegboard


A network's structure is referred to as its topology. Network topologies can be physical or logical, centralized or decentralized, and fully or partially connected. Given a network with n nodes, the number of possible topologies grows exponentially with n; even just a dozen nodes admit nearly a trillion trillion possible configurations!

The network-opt logo


We are pleased to announce the open source release of network-opt, a C++ library that supports the optimization of network topologies. Using sophisticated techniques for combinatorial search, this algorithm can efficiently construct instances from a family of so-called series-parallel networks that commonly arise in electrical and telecommunications applications. For example, given 15 1-Ω resistors and a target resistance of π Ω, our tool can produce a circuit that achieves six digits of precision:

A 15-element circuit composed of 1-ohm resistors


For more details, refer to our paper: "Search Strategies for Topological Network Optimization," appearing this month at the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI-22).


By Michael D. Moffitt, Core Enterprise Machine Learning

FPGA Interchange format to enable interoperable FPGA tooling

Field Programmable Gate Arrays (FPGAs) have been around for decades and historically, the development of their specific toolchains happened in separate ecosystems that were driven by the vendors themselves. This has changed in recent years with the development of vendor-neutral open source toolchains, but this has revealed a new problem: the need for an abstraction layer to describe and define FPGA architectures through a standard format.

FPGA toolchains are not trivial, but roughly speaking, you can divide the process of “compiling” FPGA-targeted code in a Hardware Description Language (HDL) into three stages: synthesis, place and route, and bitstream generation. A standard format provides a common description of the architectures and serves as a bridge between the open source and closed proprietary tools responsible for various parts of the process.This includes the open source Yosys for synthesis, VtR and nextpnr for place and route, and vendor tooling from Xilinx, Intel, Lattice, QuickLogic, etc.

The introduction of a common format enables a shared methodology where specific building blocks are interchangeable. With that in mind, Google and Antmicro started the FPGA Interchange format project, providing a unified framework to lower the entry barriers for developers to swiftly move from one tool to another. In collaboration, Antmicro and other CHIPS Alliance members are developing the Interchange format definition and related tools aiming to become a development standard that the FPGA industry needs.

Components

The Interchange format provides three key descriptions to describe an FPGA and to interact with the various tools involved:
  • Device resources: defines the FPGA internal structure and the technological cell libraries describing FPGA logic blocks (basic blocks like flip-flops and complex like DSP cells).
  • Logical netlist: post-synthesized netlist compatible with the Interchange.
  • Physical netlist: collection of all placement locations and physical routing of the nets and resources produced by the place and route tool.
A challenge behind the creation of a standard format lies in the definition of the line between generalization and specificity of an FPGA architecture. By focusing on the only architecture type in mainstream use on the market today, namely island-based (also called tile-based) FPGAs, the standard reaches a level of universality and conciseness, which makes it easy to work with, adopt, and implement.

How it works

As previously mentioned, the FPGA Interchange format aims at lowering the barriers and building bridges between different place and route tools that can read and write using the same convention.

To achieve this, Antmicro and Google chose nextpnr as the first place and route tool to adopt the Interchange format. In the past few months, Antmicro extended nextpnr with FPGA Interchange format capabilities to place and route basic designs for the Xilinx 7-series and Lattice Nexus FPGA families.

To achieve initial support for Xilinx devices, the vendor’s own interesting RapidWright framework has also been introduced to the flow in collaboration with Xilinx’s research team. It is specifically used to write the device database in the Interchange format, consisting of all the device information.

Additionally, RapidWright can read and write the physical netlists to generate design checkpoint files that can be opened in Xilinx’s Vivado tool.

Example flow

The default open source flow for Xilinx devices uses Yosys to synthesize the design and VPR or nextpnr for place and route. The last step, bitstream generation, uses the open source FPGA Assembly FASM format to generate the file used for programming the FPGA. VPR supported this format natively, and nextpnr has been extended to support it as a part of the interchange format support work.


Now, by using the interchange format, you can create your flow from building blocks with the possibility to use a different tool (either open source or proprietary) for each step. A sample which illustrates this mix-and-match nature of interchange-capable tooling may look as depicted in the diagram above.

For processing any design, you need the FPGA device description files. These are generated in the following matter:
  • RapidWright generates the device description in the Interchange format.
  • The device description is translated by a dedicated script into the data-format suitable for nextpnr. The script will be eventually integrated into nextpnr enabling it to read the interchange format device description natively.
The device description has to be generated only once, and will normally be distributed with the toolchain installation package so that the user will not have to bother with this part. With the device architecture in place, a digital design can be processed with the toolchain.

This flow example shows how the Interchange creates a bridge between an open source flow with Yosys and nextpnr, and a closed source one using Vivado, demonstrating the possibility of interchanging tools thanks to a shared format.

To push forward the adoption of the format, the effort was recently onboarded into CHIPS Alliance, whose goal is to build an open source ASIC/FPGA ecosystem—including cores, I/O IPs, interconnect standards as well as digital and analog tooling—to radically transform the ASIC/FPGA design landscape.

Apart from allowing various existing tools to interoperate and share development efforts today, the Interchange format is a natural addition to the CHIPS Alliance in that it opens up smart ways to rapidly design and prototype new FPGA architectures, reduce the iteration times to implement, or add support to a place and route tool for a new architecture.

To further the collaboration around the Interchange Format, Xilinx has just joined the CHIPS Alliance to participate in the newly established FPGA workgroup of the Alliance which will include this and other FPGA-related projects that CHIPS Alliance is hosting.

Plans for the coming months

Besides nextpnr, there are other open source place and route tools slated to adopt the Interchange format as well, such as the Versatile Place and Route (VPR) from the Verilog-to-Routing project (VtR).

VtR can be used to place and route designs on FPGAs such as the Xilinx 7-series and QuickLogic’s eFPGA. This can only be done using the VPR data model and device description for now as it doesn’t support the Interchange format.

Antmicro is now implementing native support of the format within VPR, therefore enabling the use of place and route using different tools interchangeably; e.g. jumping from VPR placement output to nextpnr routing, allowing for faster improvements in algorithms.

Those benefits will extend to not only VPR and nextpnr, but to any other closed source tools, or new open source ones that adopt and implement the Interchange format.

Having a standard Interchange format at the tooling developers’ disposal lowers the barriers to developing new open source tools in this area. As example use cases, it enables new approaches to partial dynamic reconfiguration and the exploration of different place and route algorithms.

Customers will benefit from a wider range of flexible and customizable tools and ultimately get more control over their products. Antmicro provides end-to-end custom FPGA services, which involve reusable open source IP blocks, fast and reconfigurable SoCs, innovative tooling, and helping customers adopt the latest advances in software-driven FPGA development methodologies.


By Alessandro Comodi and Tom Michalak – Antmicro

Google Summer of Code 2022 is open for mentor organization applications!

We are excited to announce that open source projects and organizations can now apply to participate as mentoring organizations in the 2022 Google Summer of Code (GSoC) program. Applications for organizations will close Monday, February 21 at 10am PT.

As 2022 begins, so does our 18th edition of Google Summer of Code! With our new updates to the program, we look forward to welcoming not just students, but new and beginner open source contributors over 18 years old into our GSoC community. With increased flexibility in the length of the projects—now offering 175 and 350-hour projects—and the ability to extend the program from the standard 12 weeks to 22 weeks, we hope to spur the interest of more potential GSoC contributors.

Does your open source project want to learn more about becoming a mentor organization? Visit the program site and read the mentor guide to learn what it means to be a mentoring organization and how to prepare your community (hint: have plenty of excited mentors and well thought out project ideas!).

We welcome all types of organizations and are very eager to involve first-timers with a 2022 goal of welcoming 30+ new orgs into GSoC. We encourage new organizations to get a referral from experienced organizations that think they would be a good fit to participate in GSoC.

The open source projects that participate in GSoC as mentor organizations do all kinds of interesting work in security, cloud, development tools, science, medicine, data, and media for example. Projects can be relatively new (about 2 years old) to well established projects that started over 20 years ago. We welcome open source projects big and small and everything in between.

One thing to remember is that open source projects wishing to apply need to have a solid community. While you don’t have to have 50+ community members, the project also can’t have as few as 3 people; the goal of GSoC is to bring new contributors into communities and there should be an established community for them to join.

You can apply to be a mentor organization for GSoC starting today on the program site. The deadline to apply is February 21st at 10am PT. We will publicly announce the organizations chosen for GSoC 2022 on March 7th.

Please visit the program site for more information on how to apply and review the detailed timeline for important deadlines. We also encourage you to check out the Mentor Guide and our short video on why open source projects are excited to be a part of the GSoC program.

Good luck to all open source mentoring organization applicants!

By Stephanie Taylor, Google Open Source

Announcing Season of Docs 2022

Google Open Source is delighted to announce Season of Docs 2022!

The Season of Docs program supports better documentation in open source and provides opportunities for skilled technical writers to gain open source experience. 

Participating projects receive funds to create, improve, or expand their documentation, while contributing to our knowledge of effective metrics for evaluating open source documentation through their shared case studies.

About the program

Season of Docs allows open source organizations to apply for a grant based on their documentation needs. If selected, the open source organizations use their grant to hire a technical writer directly to complete their documentation project. Organizations have up to six months to complete their documentation project.


Participating organizations help broaden our understanding of effective documentation practices and metrics in open source by submitting a final case study upon completion of the program. The case study should outline the problem the documentation project was intended to solve, what metrics were used to judge the effectiveness of the documentation, and what the organization learned for the future. All project case studies will be published on the Season of Docs site at the end of the program.

Organizations: start your exploration engines

2022 Season of Docs applications open February 23, 2022. We strongly suggest that organizations take the time to complete the steps in the exploration phase before the application process begins, including:

  • Creating a project page to gauge community and technical writer interest in participating (see our project ideas page for examples)
  • Publicizing your interest in participating in Season of Docs through your project channels and adding your project to our list of interested projects on GitHub
  • Lining up community members who are interested in mentoring or helping to onboard technical writers to your project
  • Brainstorming requirements for technical writers to work on your project (Will they need to be able to test code? Work with video? Have prior experience with your project or related technologies?)

On your mark, get set, project page!

Every Season of Docs project begins with a project page. Your project page serves as an overview of your documentation project, and it should be publicly visible. A good project page includes:
  • A statement of the problem your project needs to solve (“users on Windows don’t have clear guidance of how to install our project”)
  • The documentation that might solve this problem (“We want to create a quickstart doc and installation guide for Windows users”)
  • How you’ll measure the success of your documentation (“With a good quickstart, we expect to see 50% fewer issues opened about Windows installation problems.”)
  • What skills your technical writer would need (break down into “must have” and “nice to have” categories. “Must have: access Windows machine to test instructions”)
  • What volunteer help is needed from community members (“need help onboarding technical writer to our discussion groups”) and links to where the community can discuss the proposal
  • Most importantly, include a way for interested technical writers to reach you and ask questions!

Technical writers: express your interest

Technical writers interested in working with accepted open source organizations can share their contact information via the Season of Docs GitHub repository; or they may submit proposals directly to the organizations using the contact information shared on the organization project page. Technical writers do not submit a formal application through Season of Docs.

General timeline

February 23 - March 25

Open source organizations apply to take part in Season of Docs.

April 14

Google publishes the list of accepted organizations, along with their project proposals and doc development can begin.

June 15

Organization administrators begin to submit monthly evaluations to report on the status of their project.

November 30

Organization administrators submit their case study and final project evaluation.

December 14

Google publishes the 2021 case studies and aggregate project data.

May 2, 2023

Organizations begin to participate in post-program followup surveys.    

See the full program timeline for more details.

Join us

Explore the Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and get ready to apply!


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Introducing Ephemeral Containers

Ephemeral containers in Kubernetes started with a simple question: is it feasible to run a service on Kubernetes without bundling a Linux distribution userland with every binary?

It was early 2016. Kubernetes had just released version 1.2, and my SRE team was evaluating using Google Kubernetes Engine for internal workloads. Docker and Kubernetes examples always seemed to build images on top of Linux distributions like Debian or CentOS, but our build system produced a binary with its minimum set of library dependencies, so that's what I wanted to deploy as a container image.

This minimal container image worked fine, but only if I never made a mistake. Since the container image had no shell to use with kubectl exec, I had to log into the node with administrator privileges to interactively troubleshoot any problems. This produced an unfortunate debugging experience and was unacceptable from a security perspective.

What's more, kubectl exec had changed little from docker exec even though Kubernetes introduced new abstractions such as a Pod, where multiple containers share resources. How should Kubernetes native troubleshooting work?

Debugging on Borg

Providing userspace utilities for cluster applications wasn't a new problem for Google. Google's existing cluster orchestration system, Borg, provides a common userland for processes. Rather than including system and debugging utilities with the application binary, Borg provides a basic set of userland utilities that applications can expect in their runtime environment. Another team maintains and updates these utilities independent from application binaries.

There are downsides to this approach: Updates to the common utilities can take weeks or months to roll out, application owners can't specify which utilities they need, and the utilities needed at run time may be completely different from the ones needed at debug time. We could do better for Kubernetes.

Extensibility for Kubernetes

I wanted a solution that felt native for Kubernetes and gave users the freedom to customize to their use case, but I was still new to Kubernetes. I reached out to SIG Node and discovered a welcoming, helpful open source community.

Together we considered different ways of deploying tools to a Pod at debugging time. Implementing the feature entirely on the client side would be easiest, but solutions such as copying binaries into the running container image didn't make debugging feel like a feature of the platform. Kubernetes deploys binaries using containers, so it's natural to use containers for troubleshooting as well.

Existing container types were tied to the Pod lifecycle, though. Containers and Init Containers run when a Pod starts, and neither may be added after a Pod is created. For administrative actions we needed a lifecycle more like kubectl exec. We needed a new type of container: the Ephemeral Container.

What are Ephemeral Containers?

Ephemeral containers are a new type of container that are part of the Kubernetes core API. An Ephemeral Container may be added to an existing Pod for administrative actions like debugging, it runs until it exits, and it won't be restarted. An ephemeral container runs within the Pod's existing resource allocation and shares common container namespaces.

How are Ephemeral Containers used?

Here are some debugging scenarios that are made easier using ephemeral containers.

Troubleshooting Clusters

I run a service named "apples" that consists of a Go binary running in a distroless container image. One of its pods is suddenly having trouble connecting to a backend service, but since it's a distroless image I can't use kubectl exec to troubleshoot:

% kubectl exec -it apples-57bcf49487-ddmpn -- sh OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown

We can use kubectl debug to add an ephemeral container and test the backend service:

% kubectl debug -it --image=busybox apples-57bcf49487-ddmpn -- sh Defaulting debug container name to debugger-5wvgc. / # ps ax PID USER TIME COMMAND 1 65535 0:00 /pause 7 root 0:00 /app 19 root 0:00 sh 26 root 0:00 ps ax / # wget -S -O - http://bananas:8080Connecting to bananas:8080 (10.0.0.237:8080) HTTP/1.1 500 Internal Server Error wget: server returned error: HTTP/1.1 500 Internal Server Error

Technical Support

To make this easier to detect next time, I'll add this check to my operation team's autodiagnose script. The ops team doesn't have access to attach to production pods, but they have access to run the autodiagnose image which attaches its logs to a bug report:

% kubectl debug --image=gcr.io/apples/autodiagnose apples-57bcf49487-ddmpn --

--bug=1234

What's Next for Ephemeral Containers?

Ephemeral Containers are available as a beta feature in Kubernetes 1.23, but we still have lots of work to polish the rough edges and improve kubectl debug to support more debugging journeys, such as configuring the container security context to allow attaching a debugger.

Try out Ephemeral Containers and let us know in the Ephemeral Containers and kubectl debug enhancements how they work for you.

Contributor Experience

Working with the Kubernetes community has been incredibly rewarding. When I started I didn't know nearly enough to contribute something like this, but I discovered a community that works hard to welcome contributions at all levels.

I want to thank the community, and especially Dawn Chen, Yu-Ju Hong, Jordan Liggitt‎, Clayton Coleman, Maciej Szulik, Tim Hockin‎, for providing the support and guidance that made this feature possible.

Kubernetes will welcome your contribution as well! See kubernetes.dev for how to get started.


By Lee Verberne, Site Reliability Engineer – Google Cloud Platform

Life after Season of Docs


My journey to technical writing involved a long, windy, and non-linear career path. Before I became a technical writer, I spent years working in finance jobs with a stint of teaching in between. Seeking a career change, I went back to university where I came across the Technical Writing certificate program. It was the perfect fit for my skills and interests.

One of Google’s technical writers, Nicole Yap, visited my class to talk about her career, and introduce Season of Docs—a program that brings technical writers and open source projects together to work on open source documentation. My interest was piqued as it seemed like a great opportunity for a new graduate. With no real world experience of writing documentation, I applied and was accepted into Season of Docs. I worked with Oppia, an online learning platform for a 3-month project, where I created a user guide with video tutorials.

During that time, I had to quickly become familiar with many new concepts:
  • Open source philosophy
  • Writing docs-as-code
  • Command-line basics
  • Submitting and amending pull requests on GitHub, and much more!
In the course of the Season of Docs program, I got my first full time job as a technical writer at a software company in Toronto. Juggling the demands of the project and my new job was challenging, but I was grateful for the experience as I could transfer the skills I learned to the new role. 

Opening doors to new experiences

I had such a positive experience working with my mentors1 at Oppia that we mutually agreed to extend our relationship. Over the next year, I continued to work with Oppia in different capacities—copywriting, editing, helping write math lessons—while getting to know the network of international volunteers who contribute to this incredible organization.

I also had the opportunity to present a talk at a Write the Docs Toronto meetup which was a great way to plug Season of Docs, and demonstrate what I had learnt during the program. There was quite a bit of interest from the audience as many hadn’t even heard of the program before.

My Season of Docs experience also helped me with my day job as a technical writer. After experiencing the steep learning curve with Oppia, I was able to hit the ground running with learning the new job processes at the software company. I was also able to fall back on my Season of Docs experience as I created marketing and technical videos in my new job as well.

A new opportunity

At the start of 2021, I had the opportunity to apply for a technical writing position at Google. I had the notion that a company like Google would require years of tech writing experience before they would even consider my application, but that turned out not to be true. I’ve been a technical writer at Google for four months now, and it still feels a bit surreal!

As a newcomer in the tech world, I find that everything I learned during the Season of Docs program has come in handy in helping me understand my job a little better. Getting into Season of Docs as a new entrant to the field of technical writing was a confidence-booster for me, and the path it led to has been challenging yet gratifying. I’m excited to continue learning every single day from the sea of talent around me.


By Audrey Tavares – Google Cloud


  1. The current Season of Docs program format does not have a defined mentor role, but technical writers in the program work closely with project contributors to learn open source skills.  




DeepNull: an open-source method to improve the discovery power of genetic association studies

In our paper “DeepNull models non-linear covariate effects to improve phenotypic prediction and association power,” we proposed a new method, DeepNull, to model the complex relationship between covariate effects on phenotypes to improve Genome-wide association studies (GWAS) results. We have released DeepNull as open source software, with a Colab notebook tutorial for its use.

Human Genetics 101

Each individual’s genetic data carries health information such as why certain individuals have a lower risk of developing skin cancer compared to others or why certain drugs differ in effectiveness between individuals. Genetic data is encoded in the human genome—a DNA sequence—composed of a 3 billion long chain built from four possible nucleotides (A, C, G, and T). Only a small subset of the genome (~4-5 million positions) varies between two individuals. One of the goals of genetic studies is to detect variants that are associated with different phenotypes (e.g., risk of diseases such as Glaucoma or observed phenotypic values such as high-density lipoprotein (HDL), low-density lipoproteins (LDL), height, etc).

Genome-wide association studies

GWAS are used to associate genetic variants with complex traits and diseases. To more accurately determine an association strength between genotype and phenotype, the interactions between phenotypes (such as age and sex) and principal components (PCs) of genotypes, must be adjusted for as covariates. Covariate adjustment in GWAS can increase precision and correct for confounding. In the linear model setting, adjustment for a covariate will improve precision (i.e., statistical power) if the distribution of the phenotype differs across levels of the covariate. For example, when performing GWAS on height, males and females have different means. All state of the art methods (e.g., BOLT-LMM, regenie) perform GWAS assuming that the effect of genotypes and covariates to phenotype is linear and additive. However, we know that the assumption of linear and additive contributions of covariates often does not reflect underlying biology, so we sought a method to more comprehensively model and adjust for the interactions between phenotypes for GWAS.

DeepNull method overview

We proposed a new method, DeepNull, to relax the linear assumption of covariate effects on phenotypes. DeepNull trains a deep neural network (DNN) to predict phenotype using all covariates in a 5-fold cross-validation. After training the DeepNull model, we make phenotype predictions for all individuals and add this prediction as one additional covariate in the association test. Major advantages of DeepNull are its simplicity to use and that it requires only a minimal change to existing GWAS pipeline implementations. In other words, to use DeepNull, we just need to add one additional covariate, which is computed by DeepNull, to the existing pipeline to perform GWAS.

DeepNull improves statistical power

We simulated data under different genetic architectures (genetic conditions) to first check that DeepNull controls type I error and then compare DeepNull statistical power with current state of the art methods (hereafter referred to as “Baseline”). First, we simulated data under genetic architectures where covariates have a linear effect on phenotype and observed that both Baseline and DeepNull have tight control of type I error. It is interesting that DeepNull power does not decrease compared to Baseline under a setting in which covariates have only a linear effect on phenotype. Next, we simulated data under genetic architectures where covariates have non-linear effects on phenotype. Both Baseline and DeepNull have tight control of type I error while DeepNull increases the statistical power depending on the genetic architecture. We observed that for certain genetic architectures, DeepNull increases the statistical power up to 20%. Below, we compare the -log p-value of test statistics computed from DeepNull versus Baseline for Apolipoprotein B (ApoB) levels obtained from UK Biobank:
Figure 1. Significance level comparison of DeepNull vs Baseline. X-axis is the -log p-value of Baseline and Y-axis is the -log p-value of DeepNull. The orange dots indicate variants that are significant for Baseline but not significant for DeepNull and green dots indicate variants that are significant for DeepNull but not significant for Baseline.

DeepNull improves phenotype prediction

We applied DeepNull to predict phenotypes by utilizing polygenic risk score (PRS) and existing covariates such as age and sex. We considered 10 phenotypes obtained from UK Biobank. We observed that DeepNull on average increased the phenotype prediction (R2 where R is Pearson correlation) by 23%. More strikingly, in the case of Glaucoma, referral probability that is computed from the fundus images (Phene et al. Ophthalmology 2019, Alipanahi et al AJHG 2021), DeepNull improves the phenotype prediction by 83.4% and in the case of LDL, DeepNull improves the phenotype prediction by 40.3%. The summary of DeepNull results versus Baseline are shown in figure 2 below:

 
 

Figure 2. DeepNull improves phenotype prediction compared to Baseline. The Y-axis is the R2 where R is the Pearson’s correlation between true and predicted value of phenotypes. Phenotypic abbreviations: alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), apolipoprotein B (ApoB), glaucoma referral probability (GRP), LDLcholesterol (LDL), sex hormone-binding globulin (SHBG), and triglycerides (TG).

Conclusion

We proposed a new framework, DeepNull, that can model the nonlinear effect of covariates on phenotypes when such nonlinearity exists. We show that DeepNull can substantially improve phenotype prediction. In addition, we show that DeepNull achieves results similar to a standard GWAS when the effect of covariate on the phenotype is linear and can significantly outperform a standard GWAS when the covariate effects are nonlinear. DeepNull is open source and is available for download from GitHub or installation via PyPI.

By Farhad Hormozdiari and Andrew Carroll – Genomics team in HealthAI

Acknowledgments

This blog summarizes the work of the following Google contributors, who we would like to thank: Zachary R. McCaw, Thomas Colthurst, Ted Yun, Nick Furlotte, Babak Alipanahi, and Cory Y. McLean. In addition, we would like to thank Alkes Price, Babak Behsaz, and Justin Cosentino for their invaluable comments and suggestions.

Season of Docs announces results of 2021 program


Season of Docs has announced the 2021 program results for all projects. You can view a list of successfully completed projects on the website along with their case studies.
In 2021, the Season of Docs program allowed open source organizations to apply for a grant based on their documentation needs. Selected open source organizations then used their grant to hire a technical writer directly to complete their desired documentation project. Organizations then had six months to complete their documentation project. (In previous years, Google matched technical writers to projects and paid the technical writers directly.)

The 2021 Season of Docs documentation development phase began on April 16 and ended November 16, 2021 for all projects:
  • 30 open source organizations finished their projects (100% completion)
  • 93% of organizations had a positive experience
  • 96% of the technical writers had a positive experience
Take a look at the list of completed projects to see the wide range of subjects covered!

What is next?

Stay tuned for information about Season of Docs 2022—watch for posts on this blog and sign up for the announcements email list. We’ll also be sharing information about best practices in open source technical writing derived from the Season of Docs case studies.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office