Tag Archives: Open source

Announcing Season of Docs 2022

Google Open Source is delighted to announce Season of Docs 2022!

The Season of Docs program supports better documentation in open source and provides opportunities for skilled technical writers to gain open source experience. 

Participating projects receive funds to create, improve, or expand their documentation, while contributing to our knowledge of effective metrics for evaluating open source documentation through their shared case studies.

About the program

Season of Docs allows open source organizations to apply for a grant based on their documentation needs. If selected, the open source organizations use their grant to hire a technical writer directly to complete their documentation project. Organizations have up to six months to complete their documentation project.


Participating organizations help broaden our understanding of effective documentation practices and metrics in open source by submitting a final case study upon completion of the program. The case study should outline the problem the documentation project was intended to solve, what metrics were used to judge the effectiveness of the documentation, and what the organization learned for the future. All project case studies will be published on the Season of Docs site at the end of the program.

Organizations: start your exploration engines

2022 Season of Docs applications open February 23, 2022. We strongly suggest that organizations take the time to complete the steps in the exploration phase before the application process begins, including:

  • Creating a project page to gauge community and technical writer interest in participating (see our project ideas page for examples)
  • Publicizing your interest in participating in Season of Docs through your project channels and adding your project to our list of interested projects on GitHub
  • Lining up community members who are interested in mentoring or helping to onboard technical writers to your project
  • Brainstorming requirements for technical writers to work on your project (Will they need to be able to test code? Work with video? Have prior experience with your project or related technologies?)

On your mark, get set, project page!

Every Season of Docs project begins with a project page. Your project page serves as an overview of your documentation project, and it should be publicly visible. A good project page includes:
  • A statement of the problem your project needs to solve (“users on Windows don’t have clear guidance of how to install our project”)
  • The documentation that might solve this problem (“We want to create a quickstart doc and installation guide for Windows users”)
  • How you’ll measure the success of your documentation (“With a good quickstart, we expect to see 50% fewer issues opened about Windows installation problems.”)
  • What skills your technical writer would need (break down into “must have” and “nice to have” categories. “Must have: access Windows machine to test instructions”)
  • What volunteer help is needed from community members (“need help onboarding technical writer to our discussion groups”) and links to where the community can discuss the proposal
  • Most importantly, include a way for interested technical writers to reach you and ask questions!

Technical writers: express your interest

Technical writers interested in working with accepted open source organizations can share their contact information via the Season of Docs GitHub repository; or they may submit proposals directly to the organizations using the contact information shared on the organization project page. Technical writers do not submit a formal application through Season of Docs.

General timeline

February 23 - March 25

Open source organizations apply to take part in Season of Docs.

April 14

Google publishes the list of accepted organizations, along with their project proposals and doc development can begin.

June 15

Organization administrators begin to submit monthly evaluations to report on the status of their project.

November 30

Organization administrators submit their case study and final project evaluation.

December 14

Google publishes the 2021 case studies and aggregate project data.

May 2, 2023

Organizations begin to participate in post-program followup surveys.    

See the full program timeline for more details.

Join us

Explore the Season of Docs website at g.co/seasonofdocs to learn more about participating in the program. Use our logo and other promotional resources to spread the word. Check out the timeline and FAQ, and get ready to apply!


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Introducing Ephemeral Containers

Ephemeral containers in Kubernetes started with a simple question: is it feasible to run a service on Kubernetes without bundling a Linux distribution userland with every binary?

It was early 2016. Kubernetes had just released version 1.2, and my SRE team was evaluating using Google Kubernetes Engine for internal workloads. Docker and Kubernetes examples always seemed to build images on top of Linux distributions like Debian or CentOS, but our build system produced a binary with its minimum set of library dependencies, so that's what I wanted to deploy as a container image.

This minimal container image worked fine, but only if I never made a mistake. Since the container image had no shell to use with kubectl exec, I had to log into the node with administrator privileges to interactively troubleshoot any problems. This produced an unfortunate debugging experience and was unacceptable from a security perspective.

What's more, kubectl exec had changed little from docker exec even though Kubernetes introduced new abstractions such as a Pod, where multiple containers share resources. How should Kubernetes native troubleshooting work?

Debugging on Borg

Providing userspace utilities for cluster applications wasn't a new problem for Google. Google's existing cluster orchestration system, Borg, provides a common userland for processes. Rather than including system and debugging utilities with the application binary, Borg provides a basic set of userland utilities that applications can expect in their runtime environment. Another team maintains and updates these utilities independent from application binaries.

There are downsides to this approach: Updates to the common utilities can take weeks or months to roll out, application owners can't specify which utilities they need, and the utilities needed at run time may be completely different from the ones needed at debug time. We could do better for Kubernetes.

Extensibility for Kubernetes

I wanted a solution that felt native for Kubernetes and gave users the freedom to customize to their use case, but I was still new to Kubernetes. I reached out to SIG Node and discovered a welcoming, helpful open source community.

Together we considered different ways of deploying tools to a Pod at debugging time. Implementing the feature entirely on the client side would be easiest, but solutions such as copying binaries into the running container image didn't make debugging feel like a feature of the platform. Kubernetes deploys binaries using containers, so it's natural to use containers for troubleshooting as well.

Existing container types were tied to the Pod lifecycle, though. Containers and Init Containers run when a Pod starts, and neither may be added after a Pod is created. For administrative actions we needed a lifecycle more like kubectl exec. We needed a new type of container: the Ephemeral Container.

What are Ephemeral Containers?

Ephemeral containers are a new type of container that are part of the Kubernetes core API. An Ephemeral Container may be added to an existing Pod for administrative actions like debugging, it runs until it exits, and it won't be restarted. An ephemeral container runs within the Pod's existing resource allocation and shares common container namespaces.

How are Ephemeral Containers used?

Here are some debugging scenarios that are made easier using ephemeral containers.

Troubleshooting Clusters

I run a service named "apples" that consists of a Go binary running in a distroless container image. One of its pods is suddenly having trouble connecting to a backend service, but since it's a distroless image I can't use kubectl exec to troubleshoot:

% kubectl exec -it apples-57bcf49487-ddmpn -- sh OCI runtime exec failed: exec failed: container_linux.go:380: starting container process caused: exec: "sh": executable file not found in $PATH: unknown

We can use kubectl debug to add an ephemeral container and test the backend service:

% kubectl debug -it --image=busybox apples-57bcf49487-ddmpn -- sh Defaulting debug container name to debugger-5wvgc. / # ps ax PID USER TIME COMMAND 1 65535 0:00 /pause 7 root 0:00 /app 19 root 0:00 sh 26 root 0:00 ps ax / # wget -S -O - http://bananas:8080Connecting to bananas:8080 (10.0.0.237:8080) HTTP/1.1 500 Internal Server Error wget: server returned error: HTTP/1.1 500 Internal Server Error

Technical Support

To make this easier to detect next time, I'll add this check to my operation team's autodiagnose script. The ops team doesn't have access to attach to production pods, but they have access to run the autodiagnose image which attaches its logs to a bug report:

% kubectl debug --image=gcr.io/apples/autodiagnose apples-57bcf49487-ddmpn --

--bug=1234

What's Next for Ephemeral Containers?

Ephemeral Containers are available as a beta feature in Kubernetes 1.23, but we still have lots of work to polish the rough edges and improve kubectl debug to support more debugging journeys, such as configuring the container security context to allow attaching a debugger.

Try out Ephemeral Containers and let us know in the Ephemeral Containers and kubectl debug enhancements how they work for you.

Contributor Experience

Working with the Kubernetes community has been incredibly rewarding. When I started I didn't know nearly enough to contribute something like this, but I discovered a community that works hard to welcome contributions at all levels.

I want to thank the community, and especially Dawn Chen, Yu-Ju Hong, Jordan Liggitt‎, Clayton Coleman, Maciej Szulik, Tim Hockin‎, for providing the support and guidance that made this feature possible.

Kubernetes will welcome your contribution as well! See kubernetes.dev for how to get started.


By Lee Verberne, Site Reliability Engineer – Google Cloud Platform

Life after Season of Docs


My journey to technical writing involved a long, windy, and non-linear career path. Before I became a technical writer, I spent years working in finance jobs with a stint of teaching in between. Seeking a career change, I went back to university where I came across the Technical Writing certificate program. It was the perfect fit for my skills and interests.

One of Google’s technical writers, Nicole Yap, visited my class to talk about her career, and introduce Season of Docs—a program that brings technical writers and open source projects together to work on open source documentation. My interest was piqued as it seemed like a great opportunity for a new graduate. With no real world experience of writing documentation, I applied and was accepted into Season of Docs. I worked with Oppia, an online learning platform for a 3-month project, where I created a user guide with video tutorials.

During that time, I had to quickly become familiar with many new concepts:
  • Open source philosophy
  • Writing docs-as-code
  • Command-line basics
  • Submitting and amending pull requests on GitHub, and much more!
In the course of the Season of Docs program, I got my first full time job as a technical writer at a software company in Toronto. Juggling the demands of the project and my new job was challenging, but I was grateful for the experience as I could transfer the skills I learned to the new role. 

Opening doors to new experiences

I had such a positive experience working with my mentors1 at Oppia that we mutually agreed to extend our relationship. Over the next year, I continued to work with Oppia in different capacities—copywriting, editing, helping write math lessons—while getting to know the network of international volunteers who contribute to this incredible organization.

I also had the opportunity to present a talk at a Write the Docs Toronto meetup which was a great way to plug Season of Docs, and demonstrate what I had learnt during the program. There was quite a bit of interest from the audience as many hadn’t even heard of the program before.

My Season of Docs experience also helped me with my day job as a technical writer. After experiencing the steep learning curve with Oppia, I was able to hit the ground running with learning the new job processes at the software company. I was also able to fall back on my Season of Docs experience as I created marketing and technical videos in my new job as well.

A new opportunity

At the start of 2021, I had the opportunity to apply for a technical writing position at Google. I had the notion that a company like Google would require years of tech writing experience before they would even consider my application, but that turned out not to be true. I’ve been a technical writer at Google for four months now, and it still feels a bit surreal!

As a newcomer in the tech world, I find that everything I learned during the Season of Docs program has come in handy in helping me understand my job a little better. Getting into Season of Docs as a new entrant to the field of technical writing was a confidence-booster for me, and the path it led to has been challenging yet gratifying. I’m excited to continue learning every single day from the sea of talent around me.


By Audrey Tavares – Google Cloud


  1. The current Season of Docs program format does not have a defined mentor role, but technical writers in the program work closely with project contributors to learn open source skills.  




DeepNull: an open-source method to improve the discovery power of genetic association studies

In our paper “DeepNull models non-linear covariate effects to improve phenotypic prediction and association power,” we proposed a new method, DeepNull, to model the complex relationship between covariate effects on phenotypes to improve Genome-wide association studies (GWAS) results. We have released DeepNull as open source software, with a Colab notebook tutorial for its use.

Human Genetics 101

Each individual’s genetic data carries health information such as why certain individuals have a lower risk of developing skin cancer compared to others or why certain drugs differ in effectiveness between individuals. Genetic data is encoded in the human genome—a DNA sequence—composed of a 3 billion long chain built from four possible nucleotides (A, C, G, and T). Only a small subset of the genome (~4-5 million positions) varies between two individuals. One of the goals of genetic studies is to detect variants that are associated with different phenotypes (e.g., risk of diseases such as Glaucoma or observed phenotypic values such as high-density lipoprotein (HDL), low-density lipoproteins (LDL), height, etc).

Genome-wide association studies

GWAS are used to associate genetic variants with complex traits and diseases. To more accurately determine an association strength between genotype and phenotype, the interactions between phenotypes (such as age and sex) and principal components (PCs) of genotypes, must be adjusted for as covariates. Covariate adjustment in GWAS can increase precision and correct for confounding. In the linear model setting, adjustment for a covariate will improve precision (i.e., statistical power) if the distribution of the phenotype differs across levels of the covariate. For example, when performing GWAS on height, males and females have different means. All state of the art methods (e.g., BOLT-LMM, regenie) perform GWAS assuming that the effect of genotypes and covariates to phenotype is linear and additive. However, we know that the assumption of linear and additive contributions of covariates often does not reflect underlying biology, so we sought a method to more comprehensively model and adjust for the interactions between phenotypes for GWAS.

DeepNull method overview

We proposed a new method, DeepNull, to relax the linear assumption of covariate effects on phenotypes. DeepNull trains a deep neural network (DNN) to predict phenotype using all covariates in a 5-fold cross-validation. After training the DeepNull model, we make phenotype predictions for all individuals and add this prediction as one additional covariate in the association test. Major advantages of DeepNull are its simplicity to use and that it requires only a minimal change to existing GWAS pipeline implementations. In other words, to use DeepNull, we just need to add one additional covariate, which is computed by DeepNull, to the existing pipeline to perform GWAS.

DeepNull improves statistical power

We simulated data under different genetic architectures (genetic conditions) to first check that DeepNull controls type I error and then compare DeepNull statistical power with current state of the art methods (hereafter referred to as “Baseline”). First, we simulated data under genetic architectures where covariates have a linear effect on phenotype and observed that both Baseline and DeepNull have tight control of type I error. It is interesting that DeepNull power does not decrease compared to Baseline under a setting in which covariates have only a linear effect on phenotype. Next, we simulated data under genetic architectures where covariates have non-linear effects on phenotype. Both Baseline and DeepNull have tight control of type I error while DeepNull increases the statistical power depending on the genetic architecture. We observed that for certain genetic architectures, DeepNull increases the statistical power up to 20%. Below, we compare the -log p-value of test statistics computed from DeepNull versus Baseline for Apolipoprotein B (ApoB) levels obtained from UK Biobank:
Figure 1. Significance level comparison of DeepNull vs Baseline. X-axis is the -log p-value of Baseline and Y-axis is the -log p-value of DeepNull. The orange dots indicate variants that are significant for Baseline but not significant for DeepNull and green dots indicate variants that are significant for DeepNull but not significant for Baseline.

DeepNull improves phenotype prediction

We applied DeepNull to predict phenotypes by utilizing polygenic risk score (PRS) and existing covariates such as age and sex. We considered 10 phenotypes obtained from UK Biobank. We observed that DeepNull on average increased the phenotype prediction (R2 where R is Pearson correlation) by 23%. More strikingly, in the case of Glaucoma, referral probability that is computed from the fundus images (Phene et al. Ophthalmology 2019, Alipanahi et al AJHG 2021), DeepNull improves the phenotype prediction by 83.4% and in the case of LDL, DeepNull improves the phenotype prediction by 40.3%. The summary of DeepNull results versus Baseline are shown in figure 2 below:

 
 

Figure 2. DeepNull improves phenotype prediction compared to Baseline. The Y-axis is the R2 where R is the Pearson’s correlation between true and predicted value of phenotypes. Phenotypic abbreviations: alkaline phosphatase (ALP), alanine aminotransferase (ALT), aspartate aminotransferase (AST), apolipoprotein B (ApoB), glaucoma referral probability (GRP), LDLcholesterol (LDL), sex hormone-binding globulin (SHBG), and triglycerides (TG).

Conclusion

We proposed a new framework, DeepNull, that can model the nonlinear effect of covariates on phenotypes when such nonlinearity exists. We show that DeepNull can substantially improve phenotype prediction. In addition, we show that DeepNull achieves results similar to a standard GWAS when the effect of covariate on the phenotype is linear and can significantly outperform a standard GWAS when the covariate effects are nonlinear. DeepNull is open source and is available for download from GitHub or installation via PyPI.

By Farhad Hormozdiari and Andrew Carroll – Genomics team in HealthAI

Acknowledgments

This blog summarizes the work of the following Google contributors, who we would like to thank: Zachary R. McCaw, Thomas Colthurst, Ted Yun, Nick Furlotte, Babak Alipanahi, and Cory Y. McLean. In addition, we would like to thank Alkes Price, Babak Behsaz, and Justin Cosentino for their invaluable comments and suggestions.

“Disapproved Ads Auditor” tool

Posted by Nir Kalush, Dvir Kalev, Chen Yoveg, Elad Ben-David

Ads Review Tool

This tool flags (and optionally deletes) policy violating ads across your accounts. Advertisers can learn from the output to ensure their ads are compliant with Google Ads Policies at all times.

Business Challenge:

Advertisers operating at scale need a solution to holistically review policy violating ads across their accounts so they can ensure compliance with Google’s Ad Policies. As Google introduces more policies and enforcement mechanisms, advertisers need to continue checking their accounts to ensure they comply with Google’s Ads Policies.

Solution Overview:

“Disapproved Ads Auditor” is a tool that enables advertisers to review at scale all disapproved ads across their Google Ads accounts. This view allows advertisers to proactively audit their accounts , analyze the ad disapprovals holistically and identify learnings to minimize and reduce submission of potentially policy violating ads.

The tool is based on a Python script, which can be run in either of the following modes:

  • “Audit Mode”- export an output of disapproved ads across your accounts
  • “Remove Mode” - delete currently disapproved ads and log details.

There are a few output files (see here) which are saved locally under the “output” folder and there is an optional feature to export on BigQuery for further data analysis (“Disapproved Ads Auditor” dataset).

Skills Required:

Google Products Used:

  • Google Ads
  • BigQuery

Estimated time to implement the solution: ~2h

Implementation instructions: View on github

Impact:

“Disapproved Ads Auditor” tool automates auditing ads across your accounts to provide you insights into non compliant ads. You can take the learnings from the output to ensure ads are compliant with Google Ads Policies and avoid creating non compliant ads. Moreover, you can optionally remove disapproved ads.

Season of Docs announces results of 2021 program


Season of Docs has announced the 2021 program results for all projects. You can view a list of successfully completed projects on the website along with their case studies.
In 2021, the Season of Docs program allowed open source organizations to apply for a grant based on their documentation needs. Selected open source organizations then used their grant to hire a technical writer directly to complete their desired documentation project. Organizations then had six months to complete their documentation project. (In previous years, Google matched technical writers to projects and paid the technical writers directly.)

The 2021 Season of Docs documentation development phase began on April 16 and ended November 16, 2021 for all projects:
  • 30 open source organizations finished their projects (100% completion)
  • 93% of organizations had a positive experience
  • 96% of the technical writers had a positive experience
Take a look at the list of completed projects to see the wide range of subjects covered!

What is next?

Stay tuned for information about Season of Docs 2022—watch for posts on this blog and sign up for the announcements email list. We’ll also be sharing information about best practices in open source technical writing derived from the Season of Docs case studies.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Season of Docs announces results of 2021 program


Season of Docs has announced the 2021 program results for all projects. You can view a list of successfully completed projects on the website along with their case studies.
In 2021, the Season of Docs program allowed open source organizations to apply for a grant based on their documentation needs. Selected open source organizations then used their grant to hire a technical writer directly to complete their desired documentation project. Organizations then had six months to complete their documentation project. (In previous years, Google matched technical writers to projects and paid the technical writers directly.)

The 2021 Season of Docs documentation development phase began on April 16 and ended November 16, 2021 for all projects:
  • 30 open source organizations finished their projects (100% completion)
  • 93% of organizations had a positive experience
  • 96% of the technical writers had a positive experience
Take a look at the list of completed projects to see the wide range of subjects covered!

What is next?

Stay tuned for information about Season of Docs 2022—watch for posts on this blog and sign up for the announcements email list. We’ll also be sharing information about best practices in open source technical writing derived from the Season of Docs case studies.

If you were excited about participating, please do write social media posts. See the promotion and press page for images and other promotional materials you can include, and be sure to use the tag #SeasonOfDocs when promoting your project on social media. To include the tech writing and open source communities, add #WriteTheDocs, #techcomm, #TechnicalWriting, and #OpenSource to your posts.


By Kassandra Dhillon and Erin McKean, Google Open Source Programs Office

Boosting Machine Learning with tailored accelerators: Custom Function Units in Renode


Development of Machine Learning algorithms which enable new and exciting applications is progressing at a breakneck pace, and given the long turnaround time of hardware development, the designers of dedicated hardware accelerators are struggling to keep up. FPGAs offer an interesting alternative to ASICs, enabling a much faster and more flexible environment for such HW-SW co-development, and with projects such as the FPGA interchange format (now part of CHIPS Alliance), Google and Antmicrohave been turning the FPGA ecosystem to be ever more open and software driven.

The open RISC-V ISA was built with Machine Learning in mind, with its configurable and adaptable nature, flexible vector extensions and a rich ecosystem of open source implementations which can serve as an excellent starting point for new R&D projects.

Given their wide-ranging interests in edge AI, both Google and Antmicro have embraced RISC-V as Founding members as far back as 2015. Among many other open source tools and building blocks that Antmicro is creating, we have invested heavily into enabling HW/SW co-development of ML solutions using RISC-V in our open source simulation framework, Renode.

RISC-V is also excellent for FPGA-based ML development. It offers a multitude of FPGA-friendly softcore options—such as VexRiscv and specialized ML-oriented extensions called CFU—which you can experiment in cheap, easily accessible hardware andRenode, using Verilator co-simulation capabilities.

In this note, we will describe the CFU and the CFU playground ML experimentation project that Antmicro and Google have been collaborating on to push forward FPGA acceleration of AI, and how to get started quickly with your very own hardware-assisted ML pipeline.

About the CFU

A “CFU”, or a “Custom Function Unit,” is an accelerator tightly coupled with the CPU. It adds a custom instruction to the ISA using a standardized format defined by the CFU working group of RISC-V International.

CFUs are easy to design, write, and experiment with given the reprogrammable nature of FPGAs. When working with a CFU, you are encouraged to identify blocks to be accelerated iteratively, measure your payload after each iteration and, above all, prepare custom CFUs for each payload (potentially using the capabilities of most FPGAs to be reprogrammed on the fly, or just holding several CFUs in store side by side, to be executed depending on the payload in question).

CFU execution is triggered by one of the standard instructions, with arguments passed via registers. The CPU can handle many different CFUs with various functions, their IDs are retrieved from the `funct7` and `funct3` operands of the decoded instruction. The only interaction between the CPU and the CFU is via registers and immediate values provided in the instruction itself—there is no direct memory access nor any interaction between different CFUs.

Figure 1

CFU Playground

Google’s CFU Playground provides an open source framework which offers a handy methodology for reasoning about ML acceleration and developing your own Custom Function Units using FPGAs and simulation. Various CFU examples and demos are available, and you can also add a project with your sources and modified TFLite Micro code (one of the results of our collaboration with the TF Lite Micro team). An overlay mechanism lets you override every part of code that you need.

A CFU may be written in Verilog or any language/framework that outputs Verilog. In the CFU Playground demos, CFUs are mostly written in nMigen, which allows you to write code in Python and then generates Verilog output. The Python-based flow simplifies development for software engineers who may not be familiar with writing Verilog code. Since it’s generated from Python, it is also very easy to upgrade in small steps in a structured way until you reach your expected acceleration targets.

Co-simulation in Renode

Renode has been supporting co-simulation of various buses since the 1.7.1 release, and support for CFU was also added recently. CFU support is done via the Renode Integration Layer plugin. It essentially consists of two parts: first, a C# class called `CFUVerilatedPeripheral,` which manages the Verilator simulation process, and second, an integration library written in C++. The integration library alongside the ‘verilated’ hardware code (i.e. HDL compiled into C++ via Verilator) are then built into a binary, which in turn is imported by the `CFUVerilatedPeripheral`. It is possible to install up to four different CFUs under one RISC-V CPU. Each of them will be executed based on the opcode received from the CPU.

Since the hardware is translated into C++ via Verilator, you can also enable tracing which dumps CFU waveforms into a file to later analyze.

How to ‘verilate’ your own CFU

Basic examples of verilated CFUs are available on Antmicro’s GitHub. You can use this repository to ‘verilate’ your own custom CFU.

In the `main.cpp` of your verilated model, you need to include C++ headers from the Renode Verilator Integration Library.

#include “src/renode_cfu.h”
#include “src/buses/cfu.h”

Next, you need to initialize the `RenodeAgent` and the model’s `top` instance along with the `eval()` function that will evaluate the model during simulation.

RenodeAgent *cfu;
Vcfu *top = new Vcfu;

void eval() {
    top->eval();
}

Now add an `Init()` function that will initialize a bus along with its signals, and the `eval()` function. It should also initialize and return the `RenodeAgent` connected to a bus.

RenodeAgent *Init() {
    Cfu* bus = new Cfu();

    //=================================================
    // Init CFU signals
    //=================================================
    bus->req_valid = &top->cmd_valid;
    bus->req_ready = &top->cmd_ready;
    bus->req_func_id = (uint16_t *)&top->cmd_payload_function_id;
    bus->req_data0 = (uint32_t *)&top->cmd_payload_inputs_0;
    bus->req_data1 = (uint32_t *)&top->cmd_payload_inputs_1;
    bus->resp_valid = &top->rsp_valid;
    bus->resp_ready = &top->rsp_ready;
    bus->resp_ok = &top->rsp_payload_response_ok;
    bus->resp_data = (uint32_t *)&top->rsp_payload_outputs_0;
    bus->rst = &top->reset;
    bus->clk = &top->clk;

    //=================================================
    // Init eval function
    //=================================================
    bus->evaluateModel = &eval;

    //=================================================
    // Init peripheral
    //=================================================
    cfu = new RenodeAgent(bus);

    return cfu;
}

To compile your project, you must first export three environment variables:
  • `RENODE_ROOT`: path to Renode source directory
  • `VERILATOR_ROOT`:path to the directory where Verilator is located (this is not needed if Verilator is installed system-wide)
  • `SRC_PATH`: path to the directory containing your `main.cpp`
With the variables above now set, go to `SRC_PATH` and build your CFU:

mkdir build && cd build
cmake -DCMAKE_BUILD_TYPE=Release "$SRC_PATH"
make libVtop

If you need more details about creating your own ‘verilated’ peripheral, visit the chapter in Renode documentation about co-simulation.

To attach a verilated CFU to a Renode platform, add `CFUVerilatedPeripheral` to your `RISC-V` CPU.

cpu: CPU.VexRiscv @ sysbus
    cpuType: "rv32im"

cfu0: Verilated.CFUVerilatedPeripheral @ cpu 0
    frequency: 100000000


As the last step, provide a path to a compiled verilated CFU. You can do it either in `.repl` platform as a CFU constructor or in `.resc` script.

cpu.cfu0 SimulationFilePath @libVtop.so

To see how it works without building your own project, run the built-in Renode demo script called litex_vexriscv_verilated_cfu.resc in Renode’s monitor CLI:

(monitor) s @scripts/single-node/litex_vexriscv_verilated_cfu.resc

CFU Playground Integration

CFU Playground makes use of a Continuous Integration mechanism to make sure new changes don’t break anything. Since the project is targeted mostly for real hardware, a simulator like Antmicro’s open source Renode framework is indispensable. A large number of varied tests are executed with every change in the mainline CFU Playground repository, building the CFUsoftware, and then running it in Renode with hardware co-simulation or with a software CFU reimplementation.

In the CI tests, Renode uses scripts which are generated for each specific build target. This makes it possible to generate the exact same scripts locally and run them in Renode to enable a step-by-step assessment of what is happening in the code.

What’s next?

CFU integration in Renode is already used in practice, among other places in the EU-funded project called VEDLIoT, for which Antmicro also implemented the Kenning framework. VEDLIoT will use Renode to develop and test a soft-SoC based system aimed to drive Tiny ML workloads.

Renode’s use in CFU Playground is yet another outcome of Antmicro’s long partnership with Google. Along with the testing and development work we did for the TensorFlow Lite Micro team, this shows that Renode is and will continue to be a go-to framework for embedded ML developers.


By guest author Michael Gielda – Antmicro

#BazelCon 2021 Wrap Up


The apps, platforms, and systems that the Bazel community builds with Bazel touch the lives of people around the world in ways we couldn’t have imagined. Through BazelCon, we aim to connect Bazel enthusiasts, the Bazel team, maintainers, contributors, users, and friends in an inclusive and welcoming environment. At BazelCon, the community demonstrates the global user impact of the community—with some quirky and carefully crafted talks, a readout on the State-of-Bazel, an upfront discussion on “Implicit Bias Mitigation,” and community sharing events that remind us that we are not alone in our efforts to build a better world, one line of code at a time.


At BazelCon, the community shared over 24 technical sessions with the 1400+ registrants, which you can watch here at your own pace. Make sure you check out:
  • Reproducible builds with Bazel” — Stories about the meaning of "hermetic" and how to achieve it in the context of builds and a meditation on the aesthetic aspects of build reproducibility.
  • “Streamlining VMware's Open Source License Compliance” — Solving the complexities of identifying and tracking open-source software (OSS) to comply with license requirements by using Bazel to create an accurate bill of materials containing OSS and third-party packages during a build.
Attendees were able to interact with the community and engage with the Bazel team through a series of “Birds of a Feather” (BoF) sessions and a live Q&A session. You can find all of the BoF presentations and notes here.

As announced, soon we will be releasing Bazel 5.0, the updated version of our next generation, multi-language, multi-platform build functionality that includes a new external dependency system, called bzlmod, for you to try out.

We’d like to thank everyone who helped make BazelCon a success: presenters, organizers, Google Developer Studios, contributors, and attendees. If you have any questions about BazelCon, you can reach out to [email protected].

We hope that you enjoyed #BazelCon and "Building Better with Bazel".


By Joe Hicks, Product Manager, Core Developer

RLDS: An Ecosystem to Generate, Share, and Use Datasets in Reinforcement Learning

Most reinforcement learning (RL) and sequential decision making algorithms require an agent to generate training data through large amounts of interactions with their environment to achieve optimal performance. This is highly inefficient, especially when generating those interactions is difficult, such as collecting data with a real robot or by interacting with a human expert. This issue can be mitigated by reusing external sources of knowledge, for example, the RL Unplugged Atari dataset, which includes data of a synthetic agent playing Atari games.

However, there are very few of these datasets and a variety of tasks and ways of generating data in sequential decision making (e.g., expert data or noisy demonstrations, human or synthetic interactions, etc.), making it unrealistic and not even desirable for the whole community to work on a small number of representative datasets because these will never be representative enough. Moreover, some of these datasets are released in a form that only works with certain algorithms, which prevents researchers from reusing this data. For example, rather than including the sequence of interactions with the environment, some datasets provide a set of random interactions, making it impossible to reconstruct the temporal relation between them, while others are released in slightly different formats, which can introduce subtle bugs that are very difficult to identify.

In this context, we introduce Reinforcement Learning Datasets (RLDS), and release a suite of tools for recording, replaying, manipulating, annotating and sharing data for sequential decision making, including offline RL, learning from demonstrations, or imitation learning. RLDS makes it easy to share datasets without any loss of information (e.g., keeping the sequence of interactions instead of randomizing them) and to be agnostic to the underlying original format, enabling users to quickly test new algorithms on a wider range of tasks. Additionally, RLDS provides tools for collecting data generated by either synthetic agents (EnvLogger) or humans (RLDS Creator), as well as for inspecting and manipulating the collected data. Ultimately, integration with TensorFlow Datasets (TFDS) facilitates the sharing of RL datasets with the research community.

With RLDS, users can record interactions between an agent and an environment in a lossless and standard format. Then, they can use and transform this data to feed different RL or Sequential Decision Making algorithms, or to perform data analysis.

Dataset Structure
Algorithms in RL, offline RL, or imitation learning may consume data in very different formats, and, if the format of the dataset is unclear, it's easy to introduce bugs caused by misinterpretations of the underlying data. RLDS makes the data format explicit by defining the contents and the meaning of each of the fields of the dataset, and provides tools to re-align and transform this data to fit the format required by any algorithm implementation. In order to define the data format, RLDS takes advantage of the inherently standard structure of RL datasets — i.e., sequences (episodes) of interactions (steps) between agents and environments, where agents can be, for example, rule-based/automation controllers, formal planners, humans, animals, or a combination of these. Each of these steps contains the current observation, the action applied to the current observation, the reward obtained as a result of applying action, and the discount obtained together with reward. Steps also include additional information to indicate whether the step is the first or last of the episode, or if the observation corresponds to a terminal state. Each step and episode may also contain custom metadata that can be used to store environment-related or model-related data.

Producing the Data
Researchers produce datasets by recording the interactions with an environment made by any kind of agent. To maintain its usefulness, raw data is ideally stored in a lossless format by recording all the information that is produced, keeping the temporal relation between the data items (e.g., ordering of steps and episodes), and without making any assumption on how the dataset is going to be used in the future. For this, we release EnvLogger, a software library to log agent-environment interactions in an open format.

EnvLogger is an environment wrapper that records agent–environment interactions and saves them in long-term storage. Although EnvLogger is seamlessly integrated in the RLDS ecosystem, we designed it to be usable as a stand-alone library for greater modularity.

As in most machine learning settings, collecting human data for RL is a time consuming and labor intensive process. The common approach to address this is to use crowd-sourcing, which requires user-friendly access to environments that may be difficult to scale to large numbers of participants. Within the RLDS ecosystem, we release a web-based tool called RLDS Creator, which provides a universal interface to any human-controllable environment through a browser. Users can interact with the environments, e.g., play the Atari games online, and the interactions are recorded and stored such that they can be loaded back later using RLDS for analysis or to train agents.

Sharing the Data
Datasets are often onerous to produce, and sharing with the wider research community not only enables reproducibility of former experiments, but also accelerates research as it makes it easier to run and validate new algorithms on a range of scenarios. For that purpose, RLDS is integrated with TensorFlow Datasets (TFDS), an existing library for sharing datasets within the machine learning community. Once a dataset is part of TFDS, it is indexed in the global TFDS catalog, making it accessible to any researcher by using tfds.load(name_of_dataset), which loads the data either in Tensorflow or in Numpy formats.

TFDS is independent of the underlying format of the original dataset, so any existing dataset with RLDS-compatible format can be used with RLDS, even if it was not originally generated with EnvLogger or RLDS Creator. Also, with TFDS, users keep ownership and full control over their data and all datasets include a citation to credit the dataset authors.

Consuming the Data
Researchers can use the datasets in order to analyze, visualize or train a variety of machine learning algorithms, which, as noted above, may consume data in different formats than how it has been stored. For example, some algorithms, like R2D2 or R2D3, consume full episodes; others, like Behavioral Cloning or ValueDice, consume batches of randomized steps. To enable this, RLDS provides a library of transformations for RL scenarios. These transformations have been optimized, taking into account the nested structure of the RL datasets, and they include auto-batching to accelerate some of these operations. Using those optimized transformations, RLDS users have full flexibility to easily implement some high level functionalities, and the pipelines developed are reusable across RLDS datasets. Example transformations include statistics across the full dataset for selected step fields (or sub-fields) or flexible batching respecting episode boundaries. You can explore the existing transformations in this tutorial and see more complex real examples in this Colab.

Available Datasets
At the moment, the following datasets (compatible with RLDS) are in TFDS:

Our team is committed to quickly expanding this list in the near future and external contributions of new datasets to RLDS and TFDS are welcomed.

Conclusion
The RLDS ecosystem not only improves reproducibility of research in RL and sequential decision making problems, but also enables new research by making it easier to share and reuse data. We hope the capabilities offered by RLDS will initiate a trend of releasing structured RL datasets, holding all the information and covering a wider range of agents and tasks.

Acknowledgements
Besides the authors of this post, this work has been done by Google Research teams in Paris and Zurich in Collaboration with Deepmind. In particular by Sertan Girgin, Damien Vincent, Hanna Yakubovich, Daniel Kenji Toyama, Anita Gergely, Piotr Stanczyk, Raphaël Marinier, Jeremiah Harmsen, Olivier Pietquin and Nikola Momchev. We also want to thank the collaboration of other engineers and researchers who provided feedback and contributed to the project. In particular, George Tucker, Sergio Gomez, Jerry Li, Caglar Gulcehre, Pierre Ruyssen, Etienne Pot, Anton Raichuk, Gabriel Dulac-Arnold, Nino Vieillard, Matthieu Geist, Alexandra Faust, Eugene Brevdo, Tom Granger, Zhitao Gong, Toby Boyd and Tom Small.

Source: Google AI Blog