Category Archives: Open Source Blog

News about Google’s open source projects and programs

Introducing Ephemeral Containers

Ephemeral containers in Kubernetes started with a simple question: is it feasible to run a service on Kubernetes without bundling a Linux distribution userland with every binary?

It was early 2016. Kubernetes had just released version 1.2, and my SRE team was evaluating using Google Kubernetes Engine for internal workloads. Docker and Kubernetes examples always seemed to build images on top of Linux distributions like Debian or CentOS, but our build system produced a binary with its minimum set of library dependencies, so that's what I wanted to deploy as a container image.

This minimal container image worked fine, but only if I never made a mistake. Since the container image had no shell to use with kubectl exec, I had to log into the node with administrator privileges to interactively troubleshoot any problems. This produced an unfortunate debugging experience and was unacceptable from a security perspective.

What's more, kubectl exec had changed little from docker exec even though Kubernetes introduced new abstractions such as a Pod, where multiple containers share resources. How should Kubernetes native troubleshooting work?

Debugging on Borg

Providing userspace utilities for cluster applications wasn't a new problem for Google. Google's existing cluster orchestration system, Borg, provides a common userland for processes. Rather than including system and debugging utilities with the application binary, Borg provides a basic set of userland utilities that applications can expect in their runtime environment. Another team maintains and updates these utilities independent from application binaries.

There are downsides to this approach: Updates to the common utilities can take weeks or months to roll out, application owners can't specify which utilities they need, and the utilities needed at run time may be completely different from the ones needed at debug time. We could do better for Kubernetes.

Extensibility for Kubernetes

I wanted a solution that felt native for Kubernetes and gave users the freedom to customize to their use case, but I was still new to Kubernetes. I reached out to SIG Node and discovered a welcoming, helpful open source community.

Together we considered different ways of deploying tools to a Pod at debugging time. Implementing the feature entirely on the client side would be easiest, but solutions such as copying binaries into the running container image didn't make debugging feel like a feature of the platform. Kubernetes deploys binaries using containers, so it's natural to use containers for troubleshooting as well.

Existing container types were tied to the Pod lifecycle, though. Containers and Init Containers run when a Pod starts, and neither may be added after a Pod is created. For administrative actions we needed a lifecycle more like kubectl exec. We needed a new type of container: the Ephemeral Container.

What are Ephemeral Containers?

Ephemeral containers are a new type of container that are part of the Kubernetes core API. An Ephemeral Container may be added to an existing Pod for administrative actions like debugging, it runs until it exits, and it won't be restarted. An ephemeral container runs within the Pod's existing resource allocation and shares common container namespaces.

How are Ephemeral Containers used?

Here are some debugging scenarios that are made easier using ephemeral containers.

Troubleshooting Clusters

I run a service named "apples" that consists of a Go binary running in a distroless container image. One of its pods is suddenly having trouble connecting to a backend service, but since it's a distroless image I can't use kubectl exec to troubleshoot:

% kubectl exec -it apples-57bcf49487-ddmpn -- sh OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown

We can use kubectl debug to add an ephemeral container and test the backend service:

% kubectl debug -it --image=busybox apples-57bcf49487-ddmpn -- sh Defaulting debug container name to debugger-5wvgc. / # ps ax PID USER TIME COMMAND 1 65535 0:00 /pause 7 root 0:00 /app 19 root 0:00 sh 26 root 0:00 ps ax / # wget -S -O - http://bananas:8080Connecting to bananas:8080 (10.0.0.237:8080) HTTP/1.1 500 Internal Server Error wget: server returned error: HTTP/1.1 500 Internal Server Error

Technical Support

To make this easier to detect next time, I'll add this check to my operation team's autodiagnose script. The ops team doesn't have access to attach to production pods, but they have access to run the autodiagnose image which attaches its logs to a bug report:

% kubectl debug --image=gcr.io/apples/autodiagnose apples-57bcf49487-ddmpn --

--bug=1234

What's Next for Ephemeral Containers?

Ephemeral Containers are available as a beta feature in Kubernetes 1.23, but we still have lots of work to polish the rough edges and improve kubectl debug to support more debugging journeys, such as configuring the container security context to allow attaching a debugger.

Try out Ephemeral Containers and let us know in the Ephemeral Containers and kubectl debug enhancements how they work for you.

Contributor Experience

Working with the Kubernetes community has been incredibly rewarding. When I started I didn't know nearly enough to contribute something like this, but I discovered a community that works hard to welcome contributions at all levels.

I want to thank the community, and especially Dawn Chen, Yu-Ju Hong, Jordan Liggitt‎, Clayton Coleman, Maciej Szulik, Tim Hockin‎, for providing the support and guidance that made this feature possible.

Kubernetes will welcome your contribution as well! See kubernetes.dev for how to get started.


By Lee Verberne, Site Reliability Engineer – Google Cloud Platform

Life after Season of Docs


My journey to technical writing involved a long, windy, and non-linear career path. Before I became a technical writer, I spent years working in finance jobs with a stint of teaching in between. Seeking a career change, I went back to university where I came across the Technical Writing certificate program. It was the perfect fit for my skills and interests.

One of Google’s technical writers, Nicole Yap, visited my class to talk about her career, and introduce Season of Docs—a program that brings technical writers and open source projects together to work on open source documentation. My interest was piqued as it seemed like a great opportunity for a new graduate. With no real world experience of writing documentation, I applied and was accepted into Season of Docs. I worked with Oppia, an online learning platform for a 3-month project, where I created a user guide with video tutorials.

During that time, I had to quickly become familiar with many new concepts:
  • Open source philosophy
  • Writing docs-as-code
  • Command-line basics
  • Submitting and amending pull requests on GitHub, and much more!
In the course of the Season of Docs program, I got my first full time job as a technical writer at a software company in Toronto. Juggling the demands of the project and my new job was challenging, but I was grateful for the experience as I could transfer the skills I learned to the new role. 

Opening doors to new experiences

I had such a positive experience working with my mentors1 at Oppia that we mutually agreed to extend our relationship. Over the next year, I continued to work with Oppia in different capacities—copywriting, editing, helping write math lessons—while getting to know the network of international volunteers who contribute to this incredible organization.

I also had the opportunity to present a talk at a Write the Docs Toronto meetup which was a great way to plug Season of Docs, and demonstrate what I had learnt during the program. There was quite a bit of interest from the audience as many hadn’t even heard of the program before.

My Season of Docs experience also helped me with my day job as a technical writer. After experiencing the steep learning curve with Oppia, I was able to hit the ground running with learning the new job processes at the software company. I was also able to fall back on my Season of Docs experience as I created marketing and technical videos in my new job as well.

A new opportunity

At the start of 2021, I had the opportunity to apply for a technical writing position at Google. I had the notion that a company like Google would require years of tech writing experience before they would even consider my application, but that turned out not to be true. I’ve been a technical writer at Google for four months now, and it still feels a bit surreal!

As a newcomer in the tech world, I find that everything I learned during the Season of Docs program has come in handy in helping me understand my job a little better. Getting into Season of Docs as a new entrant to the field of technical writing was a confidence-booster for me, and the path it led to has been challenging yet gratifying. I’m excited to continue learning every single day from the sea of talent around me.


By Audrey Tavares – Google Cloud


  1. The current Season of Docs program format does not have a defined mentor role, but technical writers in the program work closely with project contributors to learn open source skills.  




DeepNull: an open-source method to improve the discovery power of genetic association studies

In our paper “DeepNull models non-linear covariate effects to improve phenotypic prediction and association power,” we proposed a new method, DeepNull, to model the complex relationship between covariate effects on phenotypes to improve Genome-wide association studies (GWAS) results. We have released DeepNull as open source software, with a Colab notebook tutorial for its use.

Human Genetics 101

Each individual’s genetic data carries health information such as why certain individuals have a lower risk of developing skin cancer compared to others or why certain drugs differ in effectiveness between individuals. Genetic data is encoded in the human genome—a DNA sequence—composed of a 3 billion long chain built from four possible nucleotides (A, C, G, and T). Only a small subset of the genome (~4-5 million positions) varies between two individuals. One of the goals of genetic studies is to detect variants that are associated with different phenotypes (e.g., risk of diseases such as Glaucoma or observed phenotypic values such as high-density lipoprotein (HDL), low-density lipoproteins (LDL), height, etc).

Genome-wide association studies

GWAS are used to associate genetic variants with complex traits and diseases. To more accurately determine an association strength between genotype and phenotype, the interactions between phenotypes (such as age and sex) and principal components (PCs) of genotypes, must be adjusted for as covariates. Covariate adjustment in GWAS can increase precision and correct for confounding. In the linear model setting, adjustment for a covariate will improve precision (i.e., statistical power) if the distribution of the phenotype differs across levels of the covariate. For example, when performing GWAS on height, males and females have different means. All state of the art methods (e.g., BOLT-LMM, regenie) perform GWAS assuming that the effect of genotypes and covariates to phenotype is linear and additive. However, we know that the assumption of linear and additive contributions of covariates often does not reflect underlying biology, so we sought a method to more comprehensively model and adjust for the interactions between phenotypes for GWAS.

DeepNull method overview

We proposed a new method, DeepNull, to relax the linear assumption of covariate effects on phenotypes. DeepNull trains a deep neural network (DNN) to predict phenotype using all covariates in a 5-fold cross-validation. After training the DeepNull model, we make phenotype predictions for all individuals and add this prediction as one additional covariate in the association test. Major advantages of DeepNull are its simplicity to use and that it requires only a minimal change to existing GWAS pipeline implementations. In other words, to use DeepNull, we just need to add one additional covariate, which is computed by DeepNull, to the existing pipeline to perform GWAS.

DeepNull improves statistical power

We simulated data under different genetic architectures (genetic conditions) to first check that DeepNull controls type I error and then compare DeepNull statistical power with current state of the art methods (hereafter referred to as “Baseline”). First, we simulated data under genetic architectures where covariates have a linear effect on phenotype and observed that both Baseline and DeepNull have tight control of type I error. It is interesting that DeepNull power does not decrease compared to Baseline under a setting in which covariates have only a linear effect on phenotype. Next, we simulated data under genetic architectures where covariates have non-linear effects on phenotype. Both Baseline and DeepNull have tight control of type I error while DeepNull increases the statistical power depending on the genetic architecture. We observed that for certain genetic architectures, DeepNull increases the statistical power up to 20%. Below, we compare the -log p-value of test statistics computed from DeepNull versus Baseline for Apolipoprotein B (ApoB) levels obtained from UK Biobank:
Figure 1. Significance level comparison of DeepNull vs Baseline. X-axis is the -log p-value of Baseline and Y-axis is the -log p-value of DeepNull. The orange dots indicate variants that are significant for Baseline but not significant for DeepNull and green dots indicate variants that are significant for DeepNull but not significant for Baseline.

DeepNull improves phenotype prediction

We applied DeepNull to predict phenotypes by utilizing polygenic risk score (PRS) and existing covariates such as age and sex. We considered 10 phenotypes obtained from UK Biobank. We observed that DeepNull on average increased the phenotype prediction (R2 where R is Pearson correlation) by 23%. More strikingly, in the case of Glaucoma, referral probability that is computed from the fundus images (Phene et al. Ophthalmology 2019, Alipanahi et al AJHG 2021), DeepNull improves the phenotype prediction by 83.4% and in the case of LDL, DeepNull improves the phenotype prediction by 40.3%. The summary of DeepNull results versus Baseline are shown in figure 2 below:

 
 

Figure 2. DeepNull improves phenotype prediction compared to Baseline. The Y-axis is the R2 where R is the Pearson’s correlation between true and predicted value of phenotypes. Phenotypic abbreviations: alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), apolipoprotein B (ApoB), glaucoma referral probability (GRP), LDLcholesterol (LDL), sex hormone-binding globulin (SHBG), and triglycerides (TG).

Conclusion

We proposed a new framework, DeepNull, that can model the nonlinear effect of covariates on phenotypes when such nonlinearity exists. We show that DeepNull can substantially improve phenotype prediction. In addition, we show that DeepNull achieves results similar to a standard GWAS when the effect of covariate on the phenotype is linear and can significantly outperform a standard GWAS when the covariate effects are nonlinear. DeepNull is open source and is available for download from GitHub or installation via PyPI.

By Farhad Hormozdiari and Andrew Carroll – Genomics team in HealthAI

Acknowledgments

This blog summarizes the work of the following Google contributors, who we would like to thank: Zachary R. McCaw, Thomas Colthurst, Ted Yun, Nick Furlotte, Babak Alipanahi, and Cory Y. McLean. In addition, we would like to thank Alkes Price, Babak Behsaz, and Justin Cosentino for their invaluable comments and suggestions.

Season of Docs announces results of 2021 program


Season of Docs has announced the 2021 program results for all projects. You can view a list of successfully completed projects on the website along with their case studies.
In 2021, the Season of Docs program allowed open source organizations to apply for a grant based on their documentation needs. Selected open source organizations then used their grant to hire a technical writer directly to complete their desired documentation project. Organizations then had six months to complete their documentation project. (In previous years, Google matched technical writers to projects and paid the technical writers directly.)

The 2021 Season of Docs documentation development phase began on April 16 and ended November 16, 2021 for all projects:
  • 30 open source organizations finished their projects (100% completion)
  • 93% of organizations had a positive experience
  • 96% of the technical writers had a positive experience
Take a look at the list of completed projects to see the wide range of subjects covered!

What is next?

Stay tuned for information about Season of Docs 2022—watch for posts on this blog and sign up for the announcements email list. We’ll also be sharing information about best practices in open source technical writing derived from the Season of Docs case studies.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Season of Docs announces results of 2021 program


Season of Docs has announced the 2021 program results for all projects. You can view a list of successfully completed projects on the website along with their case studies.
In 2021, the Season of Docs program allowed open source organizations to apply for a grant based on their documentation needs. Selected open source organizations then used their grant to hire a technical writer directly to complete their desired documentation project. Organizations then had six months to complete their documentation project. (In previous years, Google matched technical writers to projects and paid the technical writers directly.)

The 2021 Season of Docs documentation development phase began on April 16 and ended November 16, 2021 for all projects:
  • 30 open source organizations finished their projects (100% completion)
  • 93% of organizations had a positive experience
  • 96% of the technical writers had a positive experience
Take a look at the list of completed projects to see the wide range of subjects covered!

What is next?

Stay tuned for information about Season of Docs 2022—watch for posts on this blog and sign up for the announcements email list. We’ll also be sharing information about best practices in open source technical writing derived from the Season of Docs case studies.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Boosting Machine Learning with tailored accelerators: Custom Function Units in Renode


Development of Machine Learning algorithms which enable new and exciting applications is progressing at a breakneck pace, and given the long turnaround time of hardware development, the designers of dedicated hardware accelerators are struggling to keep up. FPGAs offer an interesting alternative to ASICs, enabling a much faster and more flexible environment for such HW-SW co-development, and with projects such as the FPGA interchange format (now part of CHIPS Alliance), Google and Antmicrohave been turning the FPGA ecosystem to be ever more open and software driven.

The open RISC-V ISA was built with Machine Learning in mind, with its configurable and adaptable nature, flexible vector extensions and a rich ecosystem of open source implementations which can serve as an excellent starting point for new R&D projects.

Given their wide-ranging interests in edge AI, both Google and Antmicro have embraced RISC-V as Founding members as far back as 2015. Among many other open source tools and building blocks that Antmicro is creating, we have invested heavily into enabling HW/SW co-development of ML solutions using RISC-V in our open source simulation framework, Renode.

RISC-V is also excellent for FPGA-based ML development. It offers a multitude of FPGA-friendly softcore options—such as VexRiscv and specialized ML-oriented extensions called CFU—which you can experiment in cheap, easily accessible hardware andRenode, using Verilator co-simulation capabilities.

In this note, we will describe the CFU and the CFU playground ML experimentation project that Antmicro and Google have been collaborating on to push forward FPGA acceleration of AI, and how to get started quickly with your very own hardware-assisted ML pipeline.

About the CFU

A “CFU”, or a “Custom Function Unit,” is an accelerator tightly coupled with the CPU. It adds a custom instruction to the ISA using a standardized format defined by the CFU working group of RISC-V International.

CFUs are easy to design, write, and experiment with given the reprogrammable nature of FPGAs. When working with a CFU, you are encouraged to identify blocks to be accelerated iteratively, measure your payload after each iteration and, above all, prepare custom CFUs for each payload (potentially using the capabilities of most FPGAs to be reprogrammed on the fly, or just holding several CFUs in store side by side, to be executed depending on the payload in question).

CFU execution is triggered by one of the standard instructions, with arguments passed via registers. The CPU can handle many different CFUs with various functions, their IDs are retrieved from the `funct7` and `funct3` operands of the decoded instruction. The only interaction between the CPU and the CFU is via registers and immediate values provided in the instruction itself—there is no direct memory access nor any interaction between different CFUs.

Figure 1

CFU Playground

Google’s CFU Playground provides an open source framework which offers a handy methodology for reasoning about ML acceleration and developing your own Custom Function Units using FPGAs and simulation. Various CFU examples and demos are available, and you can also add a project with your sources and modified TFLite Micro code (one of the results of our collaboration with the TF Lite Micro team). An overlay mechanism lets you override every part of code that you need.

A CFU may be written in Verilog or any language/framework that outputs Verilog. In the CFU Playground demos, CFUs are mostly written in nMigen, which allows you to write code in Python and then generates Verilog output. The Python-based flow simplifies development for software engineers who may not be familiar with writing Verilog code. Since it’s generated from Python, it is also very easy to upgrade in small steps in a structured way until you reach your expected acceleration targets.

Co-simulation in Renode

Renode has been supporting co-simulation of various buses since the 1.7.1 release, and support for CFU was also added recently. CFU support is done via the Renode Integration Layer plugin. It essentially consists of two parts: first, a C# class called `CFUVerilatedPeripheral,` which manages the Verilator simulation process, and second, an integration library written in C++. The integration library alongside the ‘verilated’ hardware code (i.e. HDL compiled into C++ via Verilator) are then built into a binary, which in turn is imported by the `CFUVerilatedPeripheral`. It is possible to install up to four different CFUs under one RISC-V CPU. Each of them will be executed based on the opcode received from the CPU.

Since the hardware is translated into C++ via Verilator, you can also enable tracing which dumps CFU waveforms into a file to later analyze.

How to ‘verilate’ your own CFU

Basic examples of verilated CFUs are available on Antmicro’s GitHub. You can use this repository to ‘verilate’ your own custom CFU.

In the `main.cpp` of your verilated model, you need to include C++ headers from the Renode Verilator Integration Library.

#include “src/renode_cfu.h”
#include “src/buses/cfu.h”

Next, you need to initialize the `RenodeAgent` and the model’s `top` instance along with the `eval()` function that will evaluate the model during simulation.

RenodeAgent *cfu;
Vcfu *top = new Vcfu;

void eval() {
    top->eval();
}

Now add an `Init()` function that will initialize a bus along with its signals, and the `eval()` function. It should also initialize and return the `RenodeAgent` connected to a bus.

RenodeAgent *Init() {
    Cfu* bus = new Cfu();

    //=================================================
    // Init CFU signals
    //=================================================
    bus->req_valid = &top->cmd_valid;
    bus->req_ready = &top->cmd_ready;
    bus->req_func_id = (uint16_t *)&top->cmd_payload_function_id;
    bus->req_data0 = (uint32_t *)&top->cmd_payload_inputs_0;
    bus->req_data1 = (uint32_t *)&top->cmd_payload_inputs_1;
    bus->resp_valid = &top->rsp_valid;
    bus->resp_ready = &top->rsp_ready;
    bus->resp_ok = &top->rsp_payload_response_ok;
    bus->resp_data = (uint32_t *)&top->rsp_payload_outputs_0;
    bus->rst = &top->reset;
    bus->clk = &top->clk;

    //=================================================
    // Init eval function
    //=================================================
    bus->evaluateModel = &eval;

    //=================================================
    // Init peripheral
    //=================================================
    cfu = new RenodeAgent(bus);

    return cfu;
}

To compile your project, you must first export three environment variables:
  • `RENODE_ROOT`: path to Renode source directory
  • `VERILATOR_ROOT`:path to the directory where Verilator is located (this is not needed if Verilator is installed system-wide)
  • `SRC_PATH`: path to the directory containing your `main.cpp`
With the variables above now set, go to `SRC_PATH` and build your CFU:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release "$SRC_PATH"
make libVtop

If you need more details about creating your own ‘verilated’ peripheral, visit the chapter in Renode documentation about co-simulation.

To attach a verilated CFU to a Renode platform, add `CFUVerilatedPeripheral` to your `RISC-V` CPU.

cpu: CPU.VexRiscv @ sysbus
    cpuType: "rv32im"

cfu0: Verilated.CFUVerilatedPeripheral @ cpu 0
    frequency: 100000000


As the last step, provide a path to a compiled verilated CFU. You can do it either in `.repl` platform as a CFU constructor or in `.resc` script.

cpu.cfu0 SimulationFilePath @libVtop.so

To see how it works without building your own project, run the built-in Renode demo script called litex_vexriscv_verilated_cfu.resc in Renode’s monitor CLI:

(monitor) s @scripts/single-node/litex_vexriscv_verilated_cfu.resc

CFU Playground Integration

CFU Playground makes use of a Continuous Integration mechanism to make sure new changes don’t break anything. Since the project is targeted mostly for real hardware, a simulator like Antmicro’s open source Renode framework is indispensable. A large number of varied tests are executed with every change in the mainline CFU Playground repository, building the CFUsoftware, and then running it in Renode with hardware co-simulation or with a software CFU reimplementation.

In the CI tests, Renode uses scripts which are generated for each specific build target. This makes it possible to generate the exact same scripts locally and run them in Renode to enable a step-by-step assessment of what is happening in the code.

What’s next?

CFU integration in Renode is already used in practice, among other places in the EU-funded project called VEDLIoT, for which Antmicro also implemented the Kenning framework. VEDLIoT will use Renode to develop and test a soft-SoC based system aimed to drive Tiny ML workloads.

Renode’s use in CFU Playground is yet another outcome of Antmicro’s long partnership with Google. Along with the testing and development work we did for the TensorFlow Lite Micro team, this shows that Renode is and will continue to be a go-to framework for embedded ML developers.


By guest author Michael Gielda – Antmicro

#BazelCon 2021 Wrap Up


The apps, platforms, and systems that the Bazel community builds with Bazel touch the lives of people around the world in ways we couldn’t have imagined. Through BazelCon, we aim to connect Bazel enthusiasts, the Bazel team, maintainers, contributors, users, and friends in an inclusive and welcoming environment. At BazelCon, the community demonstrates the global user impact of the community—with some quirky and carefully crafted talks, a readout on the State-of-Bazel, an upfront discussion on “Implicit Bias Mitigation,” and community sharing events that remind us that we are not alone in our efforts to build a better world, one line of code at a time.


At BazelCon, the community shared over 24 technical sessions with the 1400+ registrants, which you can watch here at your own pace. Make sure you check out:
  • Reproducible builds with Bazel” — Stories about the meaning of "hermetic" and how to achieve it in the context of builds and a meditation on the aesthetic aspects of build reproducibility.
  • “Streamlining VMware's Open Source License Compliance” — Solving the complexities of identifying and tracking open-source software (OSS) to comply with license requirements by using Bazel to create an accurate bill of materials containing OSS and third-party packages during a build.
Attendees were able to interact with the community and engage with the Bazel team through a series of “Birds of a Feather” (BoF) sessions and a live Q&A session. You can find all of the BoF presentations and notes here.

As announced, soon we will be releasing Bazel 5.0, the updated version of our next generation, multi-language, multi-platform build functionality that includes a new external dependency system, called bzlmod, for you to try out.

We’d like to thank everyone who helped make BazelCon a success: presenters, organizers, Google Developer Studios, contributors, and attendees. If you have any questions about BazelCon, you can reach out to [email protected].

We hope that you enjoyed #BazelCon and "Building Better with Bazel".


By Joe Hicks, Product Manager, Core Developer

Knative applies to become a CNCF incubating project

Image of the Knative logo

In 2018, the Knative project was founded and released by Google, and was subsequently developed in close partnership with IBM, Red Hat, VMware, and SAP. The project provides a serverless experience layer on Kubernetes, providing the building blocks you need to build and deploy modern, container-based serverless applications. Over the last three years, Knative has become the most widely-installed serverless layer on Kubernetes. More recently, Knative 1.0 was released, reaching an important milestone that was made possible thanks to the contributions and collaboration of over 600 developers in the community.

Google has worked closely with key maintainers and partners on the evolution of Knative, including conformance definition and stability ahead of the 1.0 milestone. To enable the next phase of community-driven innovation in Knative, today we have submitted Knative to the Cloud Native Computing Foundation (CNCF) for consideration as an incubating project, which begins the process to donate the Knative trademark, IP, and code.

As a leader in serverless computing, we’re committed to the future of Knative, and offering Knative 1.0 conformant Cloud Run and Cloud Run For Anthos products. Finding a home in the CNCF secures Knative’s long-term future and encourages continuing and open innovation. This donation recognizes the adoption and investment in Knative from the community, and will encourage further multi-vendor innovation, broader education and training.

At Google, we believe that using open source comes with a responsibility to contribute, sustain, and improve the projects that help drive innovation and make better software. We are excited to see how developers will continue to build and innovate in serverless using Knative.

By Alexandra Bush and Edd Wilder-James, Google Open Source

Open source DDR controller framework for mitigating Rowhammer

Rowhammer is a hardware vulnerability that affects DRAM memory chips and can be exploited to modify memory contents, potentially providing root access to the system. It occurs because Dynamic RAM consists of multiple memory cells packed tightly together and specific access patterns can cause unwanted effects that propagate to nearby memory cells and cause bit-flips in cells which have not been accessed by the attacker.

The problem has been known for several years, but as shown by most recent research from Google performed with the open source platform Antmicro developed that we’ll describe in this note, it has yet to be completely solved. The tendency in DRAM manufacturing is to make the chips denser to pack more memory in the same size which inevitably results in increased interdependency between memory cells, making Rowhammer an ongoing problem.

Diagram of Rowhammer attack principle

Solutions like TRR (Target Row Refresh) introduced in newer memory chips mitigate the issue, although only in part—and attack methods like Half-Double or TRRespass keep emerging. To go beyond the all-too-often used “security through obscurity” approach, Antmicro has been helping build open source platforms which give security researchers full control over the entire technology stack, and enables them to find new solutions to emerging threats.

The Rowhammer Tester platform

The Rowhammer Tester platform was developed for and with Google, who just like Antmicro believe that open source, well documented technical infrastructure is critical in speeding up research and increasing collaboration with the industry. In this case, we wanted to enable the memory security researchers and manufacturers to have access to a flexible platform for experimenting with new types of attacks and finding better Rowhammer mitigation techniques.

Current Rowhammer test methods involve using the chip-specific MBIST (Memory Built-in Self-Test) or costly ATE (Automated Test Equipment), which means that the existing approaches are either costly, inflexible, or both. MBIST are specialized IP cores that test memory chips for errors. Although effective, they lack flexibility of changing testing algorithms hardcoded into the IP core. ATEs devices are usually used at foundries to run various tests on wafers. Access to these devices is limited and expensive; chip vendors have to rely on DFT (Design for Test) software to produce compressed test patterns, which require less access time to ATE while ensuring high test coverage.

The main goal of the project was to address those limitations, providing an FPGA-based Rowhammer testing platform that enables full control over the commands sent to the DRAM chip. This is important because DRAM memory requires specialized hardware controllers and any software-based testing approaches have to communicate with the DRAM indirectly via the controller, which pulls the researchers away from the main research subject when studying the DRAM chip behaviour itself.

Platform architecture

Diagram of platform architecture

The Rowhammer Tester consists of two parts: the FPGA gateware that is loaded to the hardware platform and a set of Python scripts used to communicate with the FPGA system from the user’s PC. Internally, all the important modules of the FPGA system are connected to a shared WishBone bus. We use an EtherBone bridge to be able to interface with the FPGA WishBone bus from the host PC. EtherBone is a protocol that allows to perform regular WishBone transactions over Ethernet. This way we can perform all of the communication between the user PC and the FPGA efficiently through an Ethernet cable.

The FPGA gateware has four main parts: a Bulk transfer module, a Payload Executor, the LiteDRAM controller, and a VexRiscv CPU. The Bulk transfer module provides an efficient way of filling and testing the whole memory contents. It supports user-configurable access and data patterns, using high-performance DMA to make use of full bandwidth offered by the LiteDRAM controller. When using the Bulk transfer module, LiteDRAM handles all the required DRAM logic, including row activation, refreshing, etc. and ensuring that all DRAM timings are met.

If more fine-grained control is required, our Rowhammer Tester provides the Payload Executor module. Payload Executor can be thought of as a simple processor that can execute our custom instruction set. Most of the instructions map directly to DRAM commands, with minimal control flow provided by the LOOP instruction. A user can compile a “program” and load it to Rowhammer Tester’s instruction SRAM, which will be then executed. To execute a program, Payload Executor will disconnect the LiteDRAM controller and send the requested command sequences directly to the DRAM chip via the PHY’s DFI interface. After execution the LiteDRAM controller gets reconnected and the contents of the memory can be inspected to search for potential bit-flips.

In our platform, we use LiteDRAM which is an open-source controller that we have been using in multiple different projects. It is part of the wider LiteX ecosystem, which is also a very popular choice for many of our FPGA projects. The controller supports different memory types (SDR, DDR, DDR2, DDR3, DDR4, …), as well as many FPGA platforms (Lattice ECP5, Xilinx Series 6, 7, UltraScale, UltraScale+, …). Since it is an open source FPGA IP core, we have complete control over its internals. That means two things: firstly, we were able to easily integrate it with the rest of our system and contribute back to improve LiteDRAM itself. Secondly, and perhaps even more importantly, groups focused on researching new memory attacking methods can modify the controller in order to expose existing vulnerabilities. The results of such experiments should essentially motivate vendors to work on mitigating the uncovered flaws, rather than rely on the “security by obscurity” based approach.

Our Rowhammer Tester is fully open source. We provide an extensive set of Python scripts for controlling the board, performing rowhammer attacks and harvesting the results. For more complex testing you can use the so-called Playbook, which is a framework that allows to describe complex testing scenarios using JSON files, providing some predefined attack configurations.

Antmicro is actively collaborating with Google and memory makers to help study the Rowhammer vulnerability, contributing to standardization efforts under the JEDEC initiative. The platform has already been used to a lot of success in state-of-the-art Rowhammer research (like the case of finding a new type of Rowhammer attack called Half-Double, as mentioned previously).

New DRAM PHYs

Initially our Rowhammer Tester targeted two easily available and price-optimized boards: Digilent Arty (DDR3, Xilinx Series7 FPGA) and Xilinx ZCU104 (DDR4, Xilinx UltraScale+ FPGA). They were a good starting point, as DDR3 and DDR4 PHYs for these boards were already supported by LiteDRAM. After the initial version of the Rowhammer Tester was ready and tested on these boards, proving the validity of the concept, the next step was to cover more memory types, some of which find their way into many devices that we interact with daily. A natural target was the LPDDR4 DRAM—a relatively new type of memory designed for low-power operation with throughputs up to 3200 MT/s. For this end, we designed our dedicated LPDDR4 Test Board, which has already been covered in a previous blog note.

LPDDR4 Test Board

The design is quite interesting because we decided to put the LPDDR4 memory chips on a module, which is against the usual practice of putting LPDDR4 directly on the PCB, as close as possible to the CPU/FPGA to minimize trace impedance. The reason was trivial—we needed the platform to be able to test many memory types interchangeably without having to desolder and resolder parts, using complicated interposers or other niche techniques—the platform is supposed to be open and approachable to all.

Alongside the hardware platform we had to develop a new LPDDR4 PHY IP as LiteDRAM didn’t have support for LPDDR4 at that time, resolving problems related to the differences between LPDDR4 and previously supported DRAM types, such as new training modes. After a phase of verification and testing on our hardware, the newly implemented PHY has been contributed back to LiteDRAM.

What’s next?

The project does not stop there; we are already working on an LPDDR5 PHY for next-gen low power memory support. This latest low-power memory standard published by JEDEC poses some new and interesting challenges including a new clocking architecture and operation on an even lower voltage. As of today, LPDDR5 chips are hardly available on the market as a bleeding-edge technology, but we are continuing our work to prepare LPDDR5 support for our future hardware platform in simulation using custom and vendor provided simulation models.

The fact that our platform has already been successfully used to demonstrate new types of Rowhammer attacks proves that open source test platforms can make a difference, and we are pleased to see a growing collaborative ecosystem around the project in a joint effort to ensure that we find robust and transparent mitigation techniques for all variants of Rowhammer for the foreseeable future.

Ultimately, our work with the Rowhammer Tester platform shows that by using open source, vendor-neutral IP, tools and hardware, we can create better platforms for more effective research and product development. In the future, building on the success of the FPGA version, our work as part of the CHIPS Alliance will most likely lead to demonstrating the LiteDRAM controller in ASIC form, unlocking even more performance based on the same solid platform.

If you are interested in state of the art, high-speed FPGA I/O and extreme customizability that open source FPGA blocks can offer, get in touch with Antmicro at [email protected] to hire development services to develop your next product.

Originally posted on the Antmicro blog.


By guest author Michael Gielda, Antmicro

Expanding Google Summer of Code in 2022

We are pleased to announce that in 2022 we’re broadening our scope of Google Summer of Code (GSoC) with exciting new updates to the program.

For 17 years, GSoC has focused on bringing new open source contributors into OSS communities big and small. GSoC has brought over 18,000 university students from 112 countries together with over 17K mentors from 746 open source organizations.

At its heart, GSoC is a mentorship program where people interested in learning more about open source are welcomed into our open source communities by excited mentors ready to help them learn and grow as developers. The goal is to have these new contributors stay involved in open source communities long after their Google Summer of Code program is over.

Over the course of GSoC’s 17 years, open source has grown and evolved, and we’ve realized that the program needs to evolve as well. With that in mind, we have several major updates to the program coming in 2022, aimed at better meeting the needs of our open source communities and providing more flexibility to both projects and contributors so that people from all walks of life can find, join and contribute to great open source communities.

Expanding eligibility

Beginning in 2022, we are opening the program up to all newcomers of open source that are 18 years and older. The program will no longer be solely focused on university students or recent graduates. We realize there are many folks that could benefit from the GSoC program that are at various stages of their career, recent career changers, self-taught, those returning to the workforce, etc. so we wanted to allow these folks the opportunity to participate in GSoC.

We expect many students to continue applying to the program (which we encourage!), yet we wanted to provide excited individuals who want to get into open source—but weren’t sure how to get started or whether open source communities would welcome their newbie contributions—with a place to start.

Many people can benefit from mentorship programs like GSoC and we want to welcome more folks into open source.

Multiple Sizes of Projects

This year we introduced the concept of a medium sized project in response to the many distractions folks were dealing with during the pandemic. This adjustment was beneficial for many participants and organizations but we also heard feedback that the larger, more complex projects were a better fit for others. In the spirit of flexibility, we are going to support both medium sized projects (~175 hours) and large projects (~350 hours) in 2022.

One of our goals is to find ways to get more people from different backgrounds into open source which means meeting people where they are at and understanding that not everyone can devote an entire summer to coding.

Increased Flexibility of Timing for Projects

For 2022, we are allowing for considerable flexibility in the timing for the program. You can spread the project out over a longer period of time and you can even switch to a longer timeframe mid-program if life happens. Rather than a mandatory 12-week program that runs from June – August with everyone required to finish their projects by the end of the 12th week, we are opening it up so mentors and their GSoC Contributors can decide together if they want to extend the deadline for the project up to 22 weeks.
Image with text reads 'Google Summer of Code'

Interested in Applying to GSoC?

We will announce the GSoC 2022 program timeline soon.

Open Source Organizations

Does your open source project want to learn more about how to apply to be a mentoring organization? This is a mentorship program focused on welcoming new contributors into your community and helping them learn best practices that will help them be long term OSS contributors. A key factor is having plenty of mentors excited about teaching newcomers about open source.

Read the mentor guide, to learn more about what it means to be a mentor organization, how to prepare your community, creating appropriate project ideas (175 hour and 350 hour projects), and tips for preparing your application.

Want to be a GSoC Contributor?

Are you a potential GSoC Contributor interested in learning how to prepare for the 2022 GSoC program? It’s never too early to start thinking about your proposal or about what type of open source organization you may want to work with. Read through the student/contributor guide for important tips on preparing your proposal and what to consider if you wish to apply for the program in 2022. You can also get inspired by checking out the 199 organizations that participated in Google Summer of Code 2021, as well as the projects that students worked on.

We encourage you to explore other resources and you can learn more on the program website.

Please spread the word to your friends as we hope these updates to the program will help more excited folks apply to be GSoC Contributors and mentoring organizations in GSoC 2022!


By Stephanie Taylor, Program Manager, Google Open Source