Tag Archives: Open source

Announcing KataOS and Sparrow


As we find ourselves increasingly surrounded by smart devices that collect and process information from their environment, it's more important now than ever that we have a simple solution to build verifiably secure systems for embedded hardware. If the devices around us can't be mathematically proven to keep data secure, then the personally-identifiable data they collect—such as images of people and recordings of their voices—could be accessible to malicious software.

Unfortunately, system security is often treated as a software feature that can be added to existing systems or solved with an extra piece of ASIC hardware— this generally is not good enough. Our team in Google Research has set out to solve this problem by building a provably secure platform that's optimized for embedded devices that run ML applications. This is an ongoing project with plenty left to do, but we're excited to share some early details and invite others to collaborate on the platform so we can all build intelligent ambient systems that have security built-in by default.

To begin collaborating with others, we've open sourced several components for our secure operating system, called KataOS, on GitHub, as well as partnered with Antmicro on their Renode simulator and related frameworks. As the foundation for this new operating system, we chose seL4 as the microkernel because it puts security front and center; it is mathematically proven secure, with guaranteed confidentiality, integrity, and availability. Through the seL4 CAmkES framework, we're also able to provide statically-defined and analyzable system components. KataOS provides a verifiably-secure platform that protects the user's privacy because it is logically impossible for applications to breach the kernel's hardware security protections and the system components are verifiably secure. KataOS is also implemented almost entirely in Rust, which provides a strong starting point for software security, since it eliminates entire classes of bugs, such as off-by-one errors and buffer overflows.

The current GitHub release includes most of the KataOS core pieces, including the frameworks we use for Rust (such as the sel4-sys crate, which provides seL4 syscall APIs), an alternate rootserver written in Rust (needed for dynamic system-wide memory management), and the kernel modifications to seL4 that can reclaim the memory used by the rootserver. And we've collaborated with Antmicro to enable GDB debugging and simulation for our target hardware with Renode.

Internally, KataOS also is able to dynamically load and run third-party applications built outside of the CAmkES framework. At the moment, the code on Github does not include the required components to run these applications, but we hope to publish these features in the near future.

To prove-out a secure ambient system in its entirety, we're also building a reference implementation for KataOS called Sparrow, which combines KataOS with a secured hardware platform. So in addition to the logically-secure operating system kernel, Sparrow includes a logically-secure root of trust built with OpenTitan on a RISC-V architecture. However, for our initial release, we're targeting a more standard 64-bit ARM platform running in simulation with QEMU.

Our goal is to open source all of Sparrow, including all hardware and software designs. For now, we're just getting started with an early release of KataOS on GitHub. So this is just the beginning, and we hope you will join us in building a future where intelligent ambient ML systems are always trustworthy.

By Sam, Scott, and June – AmbiML Developers

Flutter SLSA Progress & Identity and Access Management through Infrastructure As Code

We are excited to announce several new achievements in Dart and Flutter's mission to harden security. We have achieved Supply Chain Levels for Software Artifacts (SLSA) Level 2 security on Flutter’s Cocoon application, reduced our Identity and Access Management permissions to the minimum required access, and implemented Infrastructure-as-Code to manage permissions for some of our applications. These achievements follow our recent success to enable Allstar and Security Scorecards.

Highlights

Achieving Flutter’s Cocoon SLSA level 2: Cocoon application provides continuous integration orchestration for Flutter Infrastructure. Cocoon also helps integrate several CI services with Github and provides tools to make Github development easier. Achieving SLSA Level 2 for Cocoon means we have addressed all the security concerns of levels 1 and 2 across the application. Under SLSA Level 2, Cocoon has “extra resistance to specific threats” to its supply chain. The Google Open Source Security team has audited and validated our achievement of SLSA Level 2 for Cocoon.


Implementing Identity & Access Management (IAM) via Infrastructure-as-Code: We have implemented additional security hardening features by onboarding docs-flutter-dev, master-docs-flutter-dev, and flutter-dashboard to use Identity and Access Management through an Infrastructure-as-Code system. These projects host applications, provide public documentation for Flutter, and contain a dashboard website for Flutter build status.

Using our Infrastructure-as-Code approach, security permission changes require code changes, ensuring approval is granted before the change is made. This also means that changes to security permissions are audited through source control and contain associated reasoning for the change. Existing IAM roles for these applications have been pared so that the applications follow the Principle of Least Privilege.

Advantages

  • Achieving SLSA Level 2 for Cocoon means we have addressed all the security concerns of levels 1 and 2 across the application. Under SLSA Level 2, Cocoon has “extra resistance to specific threats” to its supply chain.
  • Provenance is now generated for both, flutter-dashboard and auto-submit, artifacts through Cocoon’s automated build process. Provenance on these artifacts shows proof of their code source and tamper-proof build evidence. This work helps harden the security on the multiple tools used during the Cocoon build process: Google Cloud Platform, Cloudbuild, App Engine, and Artifact Registry.
  • Overall we addressed 83% of all SLSA requirements across all levels for the Cocoon application. We have identified the work across the application which will need to be completed for each level and category of SLSA compliance. Because of this, we know we are well positioned to continue future work toward SLSA Level 4.

Learnings and Best Practices

  1. Relatively small changes to the Cocoon application’s build process significantly increased the security of its supply chain. Google Cloud Build made this simple, since provenance metadata is created automatically during the Cloud Build process.
  2. Regulating IAM permissions through code changes adds many additional benefits and can make granting first time access simpler.
  3. Upgrading the SLSA level of an application sometimes requires varying efforts depending on the different factors of the application build process. Working towards SLSA level 4 will likely necessitate different configuration and code changes than required for SLSA level 2.

Coming Soon

Since this is the beginning of the Flutter and Dart journey toward greater SLSA level accomplishments, we hope to apply our learnings to more applications. We hope to begin work toward SLSA level 2 and beyond for more complex repositories like Flutter/flutter. Also, we hope to achieve an even higher level of SLSA compliance for the Cocoon application.

References

Supply Chain Levels for Software Artifacts (SLSA) is a security framework which outlines levels of supply chain security for an application as a checklist.

By Jesse Seales, Software Engineer – Dart and Flutter Security Working Group

Announcing the second group of Open Source Peer Bonus winners in 2022



We’re excited to announce our second group of Open Source Peer Bonus winners in 2022! The Google Open Source Peer Bonus program is designed to recognize external open source contributors nominated by Googlers for their open source contributions. This cycle, we are pleased to announce a total of 141 winners across 110+ projects, residing in 36 countries.

All open source contributors external to Google are eligible to be nominated. Whether you’re a software engineer, technical writer, community advocate, mentor, user experience designer, security expert, or educator, etc. you can be nominated for a peer bonus

Our awards often come as a surprise to some while also providing motivation to others to responsibly contribute to open source. Learn more about what the Google Open Source Peer Bonus program means to our winners from this cycle:

“It was a very nice surprise to receive the Open Source Peer Bonus notification. I hope it can help lift contributors off, not only for their code contributions but for community contributions too.” – Oriol Abril Pla, ArviZ, PyMC

“The Kubernetes and CNCF ecosystem is massive. So, there are tons of opportunities to carve out your own niche in them. One of my key goals has been to make the project(s) more secure than how they were when I joined them. These awards are a welcome sprinkle of motivation to keep being a responsible open source contributor.” – Pushkar Joglekar, Kubernetes and CNCF

“I’m very pleased and proud to receive a Google Open Source Peer Bonus award. I was nominated for my contributions to The Good Docs Project where we are creating technical writing templates to help other projects create high-quality documentation. I’m passionate about the work we’re doing there, and have been hanging around the project since its inception in 2019. This is a friendly, inclusive community creating a safe space for folk to dip their toe into open source. We are global, and new folk are always welcome.” – Felicity Brand, The Good Docs Project

“I've been actively working on open source projects since my time at NIST with the FDS project starting in 2006. More recently with The Good Docs Project (TGDP) since 2020. It's been a very rewarding experience to contribute to TGDP, with such an amazing diversity of participants, perspectives and interests involved. To be given recognition through the OSPB program was a pleasant and unexpected surprise. While it's not at all what I am participating in the project for, it feels great to have someone else in the project bring my name up for this award. Thank you to TGDP and to Google for this honor.” – Bryan Klein, The Good Docs Project

“The Open Source Peer Bonus program is more than an appreciation for our contribution to the open source world. It encourages people to share their talent. To be the hero of the ones who are benefiting from your work, put your codes in the open source world.” – Nan YE, Orange Innovation China

“The TFX team and community is by far the most responsive, helpful and knowledgeable open-source project that I have worked on. It's a great feeling to be a part of the democratizing of productionised ML workflows, and being officially recognised on your efforts and contributions is the cherry on top.” – Jens Wiren, Analytical Impact Solutions

“The HTTP Archive team is welcoming to contributors and happily showed me the ropes until I got going. The project is invaluable to the web community, and working on the Web Almanac allowed me to work with domain experts on several topics, including Performance, JavaScript, and Third Parties.” – Kevin Farrugia, HTTP Archive

“Participating in these projects has been a great learning experience and has given me the opportunity to connect with a lot of great people. I am humble and grateful for the recognition and appreciation this program gives to the contributions made to these projects.” – Ole Markus With, kOps/etcdadm

“Google has been very generous in recognising VertFlow, which is a tool still in its infancy after the idea popped into my head a few months ago in conversation with a Google Cloud Customer Engineer. I hope this will encourage users to adopt VertFlow to reduce their carbon footprint when using GCP.” – Jack Lockyer-Stevens, VertFlow

Below is the list of current winners who gave us permission to thank them publicly:

Project

Winner

abap2xlsx

Gregor Wolf

ABC A System for Sequential Synthesis and Verification

Alan Mishchenko

Accelerated HW Synthesis

Zihao Li

Agones

Daniel Oliveira

Android, Pithus, Exodus Privacy, PiRogue, Frida

Esther Onfroy

AndroidX Jetpack

Michał Zieliński

Angular

Dario Piotrowicz

Angular Language Service

Ivan Wan

Apache Airflow

Elad Kalif

Apache Beam

Alex Van Boxel

Apache Beam

Austin Bennett

Apache Beam

Moritz Mack

Apache Hop

Matt Casters

aroman

Avi Romanoff

ArviZ and PyMC

Oriol Abril Pla

Babel

Nicolò Ribaudo

Bazel

Fabian Meumertzheim

Beam

Alex Kosolapov

Blockly

Johnny Oshika

BRLTTY

Dave Mielke

Bun

Jarred Sumner

cargo-make

Sagie Gur-Ari

Chrome DevTools Frontend

Percy Ley

Chromium

Juba Borgohain

Chromium

David Sanders

Chromium

Amos Lim

ClangBuiltLinux

Nathan Chancellor

cloud-data-quality

Amandeep Singh

CNCF

Ragashree M C

Contibuting.today Open Source meetup

Floor Drees

CoreDNS and Kubernetes

Chris O'Haver

cpu_features

Mykola Hohsadze

DartPad

Tim Maffett

dbus

Simon McVittie

Dill

Mike McKerns

distroless

Ole-Martin Bratteng

Don't kill my app and merge to Google Android CTS

Petr Nálevka

ecma262

Richard Gibson

Firebase Admin .NET SDK

Levi Muriuki

Firebase Admin Node.js SDK

Igor Savin

Firebase Admin Node.js SDK

Aras Abbasi

Firebase Apple SDK

Mike Hardy

Firebase Apple SDK

Jake Krog

Firebase Apple SDK

Alex Zchut

Firebase Arduino Client Library for ESP8266 and ESP32.

Suwatchai Klakerdpol

Firebase Crashlytics

Sergio Campamá

firebase-ios-sdk

Fumito Ito

firebase-ios-sdk

Tito Ciuro

firebase-js-sdk

Andi Pätzold

fish-shell

Peter Ammon

Flashrom

Thomas Heijligen

Flashrom

Felix Singer

FreeCAD

Lei Zheng

Fuchsia

Alexander Popov

Git

Jorawar Singh

git and openssh

Fabian Stelzer

GNU Guix

Ludovic Courtès

GNU Mes

Janneke Nieuwenhuizen

go-clean-arch

Iman Tumorang

golang/protobuf

Cassondra Foesch

google-cloud-pricing-cost-calculator

Nils Knieling

gopls

Ruslan Nigmatullin

GrapheneOS

Daniel Micay

GSYVideoPlayer

Asher Guo

Hello World gRPC-Gateway

Rajiv Singh

Lichess

Thibault Duplessis

JRuby

Charles Nutter

Keras

Sayak Paul

KernelWireguard

Jason Donenfeld

Knative

Mahamed Ali

Knative

Gabriel Freites

Kubernetes, CNCF

Pushkar Joglekar

Kubernetes (kOps, etcdadm etc)

Ciprian Hacman

Kubernetes (particularly kOps / etcdadm)

Ole Markus With

Kubernetes (particularly kOps / etcdadm)

Peter Rifel

Kubernetes Gateway API

Keith Mattix

KUnit/Linux kernel

Shuah Khan

Leaflet

Volodymyr Agafonkin

libyuv

Yuan Tong

lnav

Tim Stack

Log4J

Ralph Goers

Magit

Jonas Bernoulli

medium_stats

Oliver Tosky

Mockk

Oleksii Pylypenko

moja global

Harsh Bardhan Mishra

mvt (Mobile Verification Toolkit)

Claudio Guarnieri

OSS educator and collaborator

José Luis Chiquete

notcurses

nick black

Nudge

Erik Gomez

OpenSSF Allstar

Yori Yano

Oppia

Om Khandade

Oppia

Chantel Chan

OR-Tools

Xiang Chen

pcileech (and LeechCore subproject)

Ulf Frisk

Project Jupyter

Min Ragan-Kelley

Protocol Buffers

Yannic Bonenberger

pyinfra

Nick Mills-Barrett

PyPI

Jack Lockyer-Stevens

PyTorch / XLA

Ronghang Hu

QGIS

Nyall Dawson

react-native-firebase

Minsik Kim

Rich, Textualize

Will McGugan

Rust for Linux

Björn Roy Baron

sableangle

Miki Huang

Samba

David Mulder

Scorecards

Varun Sharma

Scorecards

Naveen Srinivasan

SimpleWebAuthn

Matthew Miller

SLSA

Michael Lieberman

Spock

Leonard Brünings

SQLAlchemy

Michael Bayer

stage0

Jeremiah Orians

styler

Lorenz Walthert

Surelog

Alain Dargelas

Svelte

Rich Harris

TC39

Jordan Harband

Tekton

Parth Patel

Tekton

Andrew Bayer

TensorFlow

Stefano Fabri

TensorFlow

Jason Zaman

TensorFlow Lite Examples - Android

Nan Ye

TFX

Ukjae Jeong

TFX

Jens Wiren

TFX-Addons

Gerard Casas Saez

TFX-Addons

Hannes Hapke

TFX-BSL

Martin Bomio

tfx-helper

Tomasz Mackowiak

The Good Docs Project

Aaron Peters

The Good Docs Project

Felicity Brand

The Good Docs Project

Ian Nguyen

The Good Docs Project

Bryan Klein

The Good Docs Project

Serena Jolley

Tow-Boot

Samuel Dionne-Riel

Trivy

Teppei Fukuda

TUF, CNCF

Marina Moore

V8

Ao Wang

ViSQOL

Feargus O'Gorman

W3C WebGPU standard

Mehmet Oguz Derin

wdi5

Volker Buzek

Web Almanac

Kevin Farrugia

WebRTC

Byoungchan Lee


Congratulations to our winners above and thank you for your open source contributions. We look forward to your continued support and efforts in the open source communities. Additionally, thank you to all of the Googlers who submitted nominations and our review committee members for reviewing nominations.

By Joe Sylvanovich – Google Open Source Programs Office

Lyra V2 – a better, faster, and more versatile speech codec

Since we open sourced the first version of Lyra on GitHub last year, we are delighted to see a vibrant community growing around it, with thousands of stars, hundreds of forks, and many comments and pull requests. There are people who fixed and formatted our code, built continuous integration for the project, and even added support for Web Assembly.

We are incredibly grateful for all these contributions, and we also heard the community's feedback, asking us to improve Lyra. Some examples of what developers wanted were to run Lyra on more platforms, develop applications in more languages; and for a model that computes faster with more bitrate options and lower latency, and better audio quality with fewer artifacts.

That's why we are now releasing Lyra V2, with a new architecture that enjoys a wider platform support, provides scalable bitrate capabilities, has better performance, and generates higher quality audio. With this release, we hope to continue to evolve with the community, and with its collective creativity, see new applications being developed and new directions emerging.

New Architecture

Lyra V2 is based on an end-to-end neural audio codec called SoundStream. The architecture has a residual vector quantizer (RVQ) sitting before and after the transmission channel, which quantizes the encoded information into a bitstream and reconstructs it on the decoder side.

Lyra V2's SoundStream architecture
The integration of RVQ into the architecture allows changing the bitrate of Lyra V2 at any time by selecting the number of quantizers to use. When more quantizers are used, higher quality audio is generated (at a cost of a higher bitrate). In Lyra V2, we support three different bitrates: 3.2 kps, 6 kbps, and 9.2 kbps. This enables developers to choose a bitrate most suitable for their network condition and quality requirements.

Lyra V2's model is exported in TensorFlow Lite, TensorFlow's lightweight cross-platform solution for mobile and embedded devices, which supports various platforms and hardware accelerations. The code is tested on Android phones and Linux, with experimental Mac and Windows support. Operation on iOS and other embedded platforms is not currently supported, although we expect it is possible with additional effort. Moreover, this paradigm opens Lyra to any future platform supported by TensorFlow Lite.

Better Performance

With the new architecture, the delay is reduced from 100 ms with the previous version to 20 ms. In this regard, Lyra V2 is comparable to the most widely used audio codec Opus for WebRTC, which has a typical delay of 26.5 ms, 46.5 ms, and 66.5 ms.

Lyra V2 also encodes and decodes five times faster than the previous version. On a Pixel 6 Pro phone, Lyra V2 takes 0.57 ms to encode and decode a 20 ms audio frame, which is 35 times faster than real time. The reduced complexity means that more phones can run Lyra V2 in real time than V1, and that the overall battery consumption is lowered.

Higher Quality

Driven by the advance of machine learning research over the years, the quality of the generated audio is also improved. Our listening tests show that the audio quality (measured in MUSHRA score, an indication of subjective quality) of Lyra V2 at 3.2 kbps, 6 kbps, and
9.2 kbps measures up to Opus at 10 kbps, 13 kbps, and 14 kbps respectively.

Lyra vs. Opus at various bitrates


Sample 1

Sample 2


Original

Opus       @6kbps


LyraV1


Opus     @10kbps


LyraV2 @3.2kbps


Opus           @13k


LyraV2    @6kbps


Opus     @14kbps


LyraV2 @9.2kbps

This makes Lyra V2 a competitive alternative to other state-of-the-art telephony codecs. While Lyra V1 already compares favorably to the Adaptive Multi-Rate (AMR-NB) codec, Lyra V2 further outperforms Enhanced Voice Services (EVS) and Adaptive Multi-Rate Wideband (AMR-WB), and is on par with Opus, all the while using only 50% - 60% of their bandwidth.

Lyra vs. state-of-the-art codecs


Sample 1

Sample 2



Original


AMR-NB



LyraV1



EVS



AMR-WB


Opus           @13kbps


LyraV2    @6kbps

This means more devices can be connected in bandwidth-constrained environments, or that additional information can be sent over the network to reduce voice choppiness through forward error correction and packet loss concealment.

Open Source Release

Lyra V2 continues to provide what is already in Lyra V1 (the build tools, the testing frameworks, the C++ encoding and decoding API, the signal processing toolchain, and the example Android app). Developers who have experience with the Lyra V1 API will find that the V2 API looks familiar, but with a few changes. For example, now it's possible to change bitrates during encoding (more information is available in the release notes). In addition, the model definitions and weights are included as .tflite files. As with V1, this release is a beta version and the API and bitstream are expected to change. The code for running Lyra is open sourced under the Apache license. We can’t wait to see what innovative applications people will create with the new and improved Lyra!

By Hengchin Yeh - Chrome

Acknowledgements

The following people helped make the open source release possible: from Chrome: Alejandro Luebs, Michael Chinen, Andrew Storus, Tom Denton, Felicia Lim, Bastiaan Kleijn, Jan Skoglund, Yaowu Xu, Jamieson Brettle, Omer Osman, Matt Frost, Jim Bankoski; and from Google Research: Neil Zeghidour, Marco Tagliasacchi

TensorStore for High-Performance, Scalable Array Storage

Many exciting contemporary applications of computer science and machine learning (ML) manipulate multidimensional datasets that span a single large coordinate system, for example, weather modeling from atmospheric measurements over a spatial grid or medical imaging predictions from multi-channel image intensity values in a 2d or 3d scan. In these settings, even a single dataset may require terabytes or petabytes of data storage. Such datasets are also challenging to work with as users may read and write data at irregular intervals and varying scales, and are often interested in performing analyses using numerous machines working in parallel.

Today we are introducing TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data that:

TensorStore has already been used to solve key engineering challenges in scientific computing (e.g., management and processing of large datasets in neuroscience, such as peta-scale 3d electron microscopy data and “4d” videos of neuronal activity). TensorStore has also been used in the creation of large-scale machine learning models such as PaLM by addressing the problem of managing model parameters (checkpoints) during distributed training.

Familiar API for Data Access and Manipulation
TensorStore provides a simple Python API for loading and manipulating large array data. In the following example, we create a TensorStore object that represents a 56 trillion voxel 3d image of a fly brain and access a small 100x100 patch of the data as a NumPy array:

>>> import tensorstore as ts
>>> import numpy as np

# Create a TensorStore object to work with fly brain data.
>>> dataset = ts.open({
... 'driver':
... 'neuroglancer_precomputed',
... 'kvstore':
... 'gs://neuroglancer-janelia-flyem-hemibrain/v1.1/segmentation/',
... }).result()

# Create a 3-d view (remove singleton 'channel' dimension):
>>> dataset_3d = dataset[ts.d['channel'][0]]
>>> dataset_3d.domain
{ "x": [0, 34432), "y": [0, 39552), "z": [0, 41408) }

# Convert a 100x100x1 slice of the data to a numpy ndarray
>>> slice = np.array(dataset_3d[15000:15100, 15000:15100, 20000])

Crucially, no actual data is accessed or stored in memory until the specific 100x100 slice is requested; hence arbitrarily large underlying datasets can be loaded and manipulated without having to store the entire dataset in memory, using indexing and manipulation syntax largely identical to standard NumPy operations. TensorStore also provides extensive support for advanced indexing features, including transforms, alignment, broadcasting, and virtual views (data type conversion, downsampling, lazily on-the-fly generated arrays).

The following example demonstrates how TensorStore can be used to create a zarr array, and how its asynchronous API enables higher throughput:

>>> import tensorstore as ts
>>> import numpy as np

>>> # Create a zarr array on the local filesystem
>>> dataset = ts.open({
... 'driver': 'zarr',
... 'kvstore': 'file:///tmp/my_dataset/',
... },
... dtype=ts.uint32,
... chunk_layout=ts.ChunkLayout(chunk_shape=[256, 256, 1]),
... create=True,
... shape=[5000, 6000, 7000]).result()

>>> # Create two numpy arrays with example data to write.
>>> a = np.arange(100*200*300, dtype=np.uint32).reshape((100, 200, 300))
>>> b = np.arange(200*300*400, dtype=np.uint32).reshape((200, 300, 400))

>>> # Initiate two asynchronous writes, to be performed concurrently.
>>> future_a = dataset[1000:1100, 2000:2200, 3000:3300].write(a)
>>> future_b = dataset[3000:3200, 4000:4300, 5000:5400].write(b)

>>> # Wait for the asynchronous writes to complete
>>> future_a.result()
>>> future_b.result()

Safe and Performant Scaling
Processing and analyzing large numerical datasets requires significant computational resources. This is typically achieved through parallelization across numerous CPU or accelerator cores spread across many machines. Therefore a fundamental goal of TensorStore has been to enable parallel processing of individual datasets that is both safe (i.e., avoids corruption or inconsistencies arising from parallel access patterns) and high performance (i.e., reading and writing to TensorStore is not a bottleneck during computation). In fact, in a test within Google’s datacenters, we found nearly linear scaling of read and write performance as the number of CPUs was increased:

Read and write performance for a TensorStore dataset in zarr format residing on Google Cloud Storage (GCS) accessed concurrently using a variable number of single-core compute tasks in Google data centers. Both read and write performance scales nearly linearly with the number of compute tasks.

Performance is achieved by implementing core operations in C++, extensive use of multithreading for operations such as encoding/decoding and network I/O, and partitioning large datasets into much smaller units through chunking to enable efficiently reading and writing subsets of the entire dataset. TensorStore also provides configurable in-memory caching (which reduces slower storage system interactions for frequently accessed data) and an asynchronous API that enables a read or write operation to continue in the background while a program completes other work.

Safety of parallel operations when many machines are accessing the same dataset is achieved through the use of optimistic concurrency, which maintains compatibility with diverse underlying storage layers (including Cloud storage platforms, such as GCS, as well as local filesystems) without significantly impacting performance. TensorStore also provides strong ACID guarantees for all individual operations executing within a single runtime.

To make distributed computing with TensorStore compatible with many existing data processing workflows, we have also integrated TensorStore with parallel computing libraries such as Apache Beam (example code) and Dask (example code).

Use Case: Language Models
An exciting recent development in ML is the emergence of more advanced language models such as PaLM. These neural networks contain hundreds of billions of parameters and exhibit some surprising capabilities in natural language understanding and generation. These models also push the limits of computational infrastructure; in particular, training a language model such as PaLM requires thousands of TPUs working in parallel.

One challenge that arises during this training process is efficiently reading and writing the model parameters. Training is distributed across many separate machines, but parameters must be regularly saved to a single object (“checkpoint”) on a permanent storage system without slowing down the overall training process. Individual training jobs must also be able to read just the specific set of parameters they are concerned with in order to avoid the overhead that would be required to load the entire set of model parameters (which could be hundreds of gigabytes).

TensorStore has already been used to address these challenges. It has been applied to manage checkpoints associated with large-scale (“multipod”) models trained with JAX (code example) and has been integrated with frameworks such as T5X (code example) and Pathways. Model parallelism is used to partition the full set of parameters, which can occupy more than a terabyte of memory, over hundreds of TPUs. Checkpoints are stored in zarr format using TensorStore, with a chunk structure chosen to allow the partition for each TPU to be read and written independently in parallel.

When saving a checkpoint, each model parameter is written using TensorStore in zarr format using a chunk grid that further subdivides the grid used to partition the parameter over TPUs. The host machines write in parallel the zarr chunks for each of the partitions assigned to TPUs attached to that host. Using TensorStore's asynchronous API, training proceeds even while the data is still being written to persistent storage. When resuming from a checkpoint, each host reads only the chunks that make up the partitions assigned to that host.

Use Case: 3D Brain Mapping
The field of synapse-resolution connectomics aims to map the wiring of animal and human brains at the detailed level of individual synaptic connections. This requires imaging the brain at extremely high resolution (nanometers) over fields of view of up to millimeters or more, which yields datasets that can span petabytes in size. In the future these datasets may extend to exabytes as scientists contemplate mapping entire mouse or primate brains. However, even current datasets pose significant challenges related to storage, manipulation, and processing; in particular, even a single brain sample may require millions of gigabytes with a coordinate system (pixel space) of hundreds of thousands pixels in each dimension.

We have used TensorStore to solve computational challenges associated with large-scale connectomic datasets. Specifically, TensorStore has managed some of the largest and most widely accessed connectomic datasets, with Google Cloud Storage as the underlying object storage system. For example, it has been applied to the human cortex “h01” dataset, which is a 3d nanometer-resolution image of human brain tissue. The raw imaging data is 1.4 petabytes (roughly 500,000 * 350,000 * 5,000 pixels large, and is further associated with additional content such as 3d segmentations and annotations that reside in the same coordinate system. The raw data is subdivided into individual chunks 128x128x16 pixels large and stored in the “Neuroglancer precomputed” format, which is optimized for web-based interactive viewing and can be easily manipulated from TensorStore.

A fly brain reconstruction for which the underlying data can be easily accessed and manipulated using TensorStore.

Getting Started
To get started using the TensorStore Python API, you can install the tensorstore PyPI package using:

pip install tensorstore

Refer to the tutorials and API documentation for usage details. For other installation options and for using the C++ API, refer to installation instructions.

Acknowledgements
Thanks to Tim Blakely, Viren Jain, Yash Katariya, Jan-Matthis Luckmann, Michał Januszewski, Peter Li, Adam Roberts, Brain Williams, and Hector Yee from Google Research, and Davis Bennet, Stuart Berg, Eric Perlman, Stephen Plaza, and Juan Nunez-Iglesias from the broader scientific community for valuable feedback on the design, early testing and debugging.

Source: Google AI Blog


Co-simulating ML with Springbok using Renode

The landscape of Machine Learning software libraries and models is evolving rapidly, and to satisfy the ever-increasing demand for memory and compute while managing latency, power and security considerations, hardware must be developed in an iterative process alongside the workloads it is meant to run.

With its open architecture, custom instructions support and flexible vector extensions, the RISC-V ISA offers an unprecedented capacity for such co-design. And by energizing the open hardware ecosystem, RISC-V has supercharged research and innovation into how to improve chipmaking itself to better leverage the methods and suit the needs of software. Initiatives such as Google’s OpenMPW Shuttle show how a more open and software-focused approach to building hardware, are key to enabling a new wave of more powerful and transparent ML-focused solutions.

A RISC-V-based ML accelerator with a HW/SW co-design flow

In the past months, Google Research has joined efforts with Antmicro to work on a silicon project that can serve as a template for efficient hardware-software co-design. For their secure ML solution, the Google Research team supported by Antmicro has been developing a completely open source, rapid pre-silicon ML development flow using Renode, Antmicro’s open source simulation framework.

This builds on the result of cooperation from last year in which Antmicro implemented Renode support for RISC-V Vector extensions, which are used in the Google team’s RISC-V based ML accelerator codenamed Springbok. To allow a more well-rounded developer experience, as part of the project Antmicro is also working on improving the support for the underlying SoC and a large number of user oriented features such as OS-aware debugging, performance optimizations, payload profiling and performance measurement capabilities.

Springbok is part of Google’s AmbiML project that aims to create an open source ML development ecosystem centered on privacy and security. By using the RISC-V Vector extensions, the Google Research team has a standard but flexible way to parallelize the matrix multiply and accumulate operations that are universal in ML payloads. And thanks to Renode, the team can make informed choices as to how exactly to leverage RISC-V’s flexibility by analyzing tradeoffs between speed, complexity and specialization in a practical, iterative fashion using data generated by Renode and the text-based configuration capabilities that let them play around with hardware composition and functionality in a matter of minutes, not days.

Diagram of A RISC-V-based ML accelerator with a HW/SW co-design flow

On the ML software side, the ecosystem revolves around IREE—Google’s research project developing an open source ML compiler and runtime for constrained devices, based on LLVM MLIR.

IREE allows you to load models from typical ML frameworks such as TensorFlow or TensorFlow Lite and then convert them to Intermediate Representation (MLIR), which later goes through optimizations on graph level and then through an LLVM compilation flow to get the best-fitted runtime for a specific target. When it comes to deploying models on target devices, IREE provides APIs for both the C and Python programming languages as well as a TFLite C API which provides the same convention as TFLite for model loading, tensor management and inference invoking.

Using these runtimes, the model can be deployed and tested, debugged, benchmarked and executed on the target device or in a simulation environment like Renode.

Demoing the flow at Spring 2022 RISC-V Week

In the build up to the Spring 2022 RISC-V Week in Paris, the first such large open hardware meeting in years, an initial version of the AmbiML bare metal ML flow was released as open source. This includes both the ability to run interactively and an example CI using Antmicro’s GitHub Renode Action showing how such a workflow can be tested automatically on each commit. As a Google Cloud partner, Antmicro is currently working with Google Cloud to make Renode available for massive scale CI testing and deployments for scenarios similar to this one.

In a joint talk at the Paris event, Antmicro and Google presented the software co-development flow, together with a demo of a heterogeneous multi-core solution, with one core running the AmbiML Springbok payload and another core running Zephyr.

In the presented scenario the Springbok core, acting as a ML compute offload unit to the main CPU, executed inference on the MobileNetv1 network and reported the work done to the application core via a RISC-V custom instruction. Adding and modifying custom instructions is trivial in Renode, either via a single line of Python, C#, or even co-simulated in RTL.

Renode helps ML developers and silicon designers not only to run and test their solutions, but also to learn more about what their software is actually doing. As part of the Paris demonstration, Antmicro and Google showed how you can count executed instructions and how often specific opcodes are used to measure how well your solution is performing. These features, accompanied by execution metrics analysis, executed functions logging, and recently developed execution trace generation, give you great insight into every detail of your emulated ML environment.

These capabilities join the wide arsenal of hardware/software co-development solutions in Renode, such as RTL co-simulation which Antmicro has been developing with Microchip and support for verilated custom instructions developed with another ML-focused Google team responsible for RISC-V Custom Function Units and also used in the EU-funded VEDLIoT project.

Future plans

This is just the beginning of a wider activity from the Google Research team Antmicro is working with to release software and hardware components as well as tools supporting a collaborative co-design ecosystem for secure ML development. If you think Renode, RISC-V and co-development could help in building your next ML-focused product, go ahead and try the AmbiML flow yourself!

Visit the iree-rv32-springbok repository on GitHub, clone it locally and follow the instructions from README.md.

Renode Repository

You can also grab Renode from the official repository and start playing with the available demos, or head to the Renode documentation to read up on features helpful for ML acceleration development such as Verilator co-simulation.

By Peter Zierhoffer – Antmicro

Google and NIST partner on nanotechnology development platform

We’re proud to announce Google’s cooperative research and development agreement with the U.S. National Institute of Standards and Technology (NIST) to develop an open source testbed for nanotechnology research and development for American universities. NIST—a bureau of the U.S. Department of Commerce—will start by migrating their existing planarized wafer designs to an open source framework, which can be manufactured in the U.S. on SkyWater Technologies’ open source 130nm process (SKY130). The physical wafers and source code will be available in the coming months. Together, NIST, Google, and the open source community will develop designs to facilitate research into both basic and applied science, including technology transfer into production with U.S. manufacturers.

Furthering Google’s goals to improve access to semiconductor technology, this agreement will provide academic researchers with unprecedented resources from a semiconductor foundry to enhance research into the physics of semiconductors and nanodevices. This includes their chemistry, defects, electrical properties, high frequency operation, and switching behavior, while reducing overall costs through economies of scale. Most importantly, this access enhances the technology transfer process by enabling researchers to develop new and emerging technologies using foundry resources, that can then be seamlessly transitioned into mass production since universities will already be using an industrially relevant platform. This will greatly improve scientist’s ability to move their technologies through the tech-transfer “valley of death” and into practical use.

Nanotechnology research has benefitted from silicon wafers that are normally used for chip manufacturing in a unique way. Instead of turning them into packaged microchips, their smooth, planarized surface makes a great substrate for building and testing nanoscale structures. This likewise helps test their transition into mass production.

Picture of a full wafer using the SKY130 open source PDK.

Picture of a full wafer using the SKY130 open source PDK.


The wafer for this platform has a number of different metrology structures, from parametric test structures based on simple transistor arrays—which can be probed in a probe station—to thousands of complex measurements that users can operate using synthesized digital circuits. Critically, the wafers will be available to universities in a 200 mm form factor, and mid-production planarized wafers with less than a single nanometer of surface roughness. Smooth, flat surfaces are critical for advanced manufacturing at small sizes.

NIST researchers are also ensuring that the wafers have photolithographic and electron beam alignment marks commonly found in university nanofabrication facilities, allowing the foundry silicon to be used directly by university researchers with ease. Metal pads on the surface will allow scientists to access the semiconductor transistors from the surface.

NIST scientists anticipate the nanotechnology accelerator platform will enhance scientific investigations into a diverse set of technologies, including memory devices (resistive switches, magnetic tunnel junctions, flash memories), artificial intelligence, plasmonics, semiconductor bioelectronics, thin film transistors and even quantum information science.

Picture of a development die from Google 's OpenMPW program for the nanotechnology accelerator developed by NIST and the University of Michigan

Picture of a development die from Google 's OpenMPW program for the nanotechnology accelerator developed by NIST and the University of Michigan

This program also benefits from Google’s previous contributions and support of the GDSFactory and OpenFASOC open source projects that help automate and shorten the construction of these important measuring devices from months to days. Ahead of the full wafer tapeout in 2023, NIST scientists, working with partners at the University of Michigan, Carnegie Mellon, University of Maryland, The George Washington University, and Brown University have been using Google's OpenMPW program to develop and test preliminary circuits which they expect to include in the nanotechnology accelerator. Preliminary testing will help ensure the program’s goals are met with working circuits that best serve the scientific community.

A key factor in cutting-edge research is reproducibility, or the ability for researchers from different institutions to repeat each other’s experiments and improve upon them. By migrating to an open source framework, researchers can more easily share reproducible results, contribute to the creation of open source datasets to enhance future simulation, and advance the scientific community’s state of the art of nanotechnology and semiconductor manufacturing.

NIST and Google will distribute the first production run of wafers to leading U.S. universities. Post-program, American scientists will be able to directly purchase the wafers from Skywater without license requirements, giving them the freedom to pursue their research without any restrictions. Since wafers are hundreds of times cheaper than full mask-sets or the cost of designing integrated circuits from scratch, scientists will have a much easier time getting and using this powerful industrial technology. Longer term, working with NIST to develop future platforms on the recently announced SKY90FD open source PDK will further expand this R&D ecosystem.

To kick off this research effort NIST is organizing the "NIST Integrated Circuits for Metrology Workshop" from September 20–21, 2022. This workshop will be held online with a series of presentations and panel discussions on the first day. During the second day, a working group of researchers, scientists and engineers will work to focus on the creation of parametric test structures for monolithic integration using open source silicon technology. Visit the event website to get more details about this program and register to attend or learn more about presenting.

By Ethan Mahintorabi and Johan Euphrosine, Software Engineers – Hardware Toolchains Team, and Aaron Cunningham, Technical Program Manager – Google Open Source Programs Office

Accelerate your models to production with Google Cloud and PyTorch

We believe in the power of choice for Machine Learning development, and continue to invest resources to make it easy for ML practitioners to train, deploy, and orchestrate models from a single unified data and AI cloud platform. We’re excited to announce our role as a founding member of the newly formed PyTorch Foundation, which will better position Google Cloud to make meaningful contributions to the PyTorch community. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive adoption of AI tooling by building an ecosystem of open source projects with PyTorch. We strongly believe in choice and will continue to invest in frameworks such as JAX and Tensorflow and support integrations with other OSS Projects including Spark, Airflow, XGBoost, and others.

In this blog, we provide an overview of existing resources to help you get started with PyTorch on Google Cloud. We also talk about how ML practitioners can leverage our end-to-end ML platform to train, tune, and deploy PyTorch models.

PyTorch on Google Cloud

Open source in the cloud is important because it gives you flexibility and control over where you train and deploy your ML workloads. PyTorch is extensively used in the research space and in recent years it has gained immense traction in the industry due to its ease of use and deployment. In fact, according to a survey of Kaggle users, PyTorch is the fastest growing ML framework today.

ML practitioners using PyTorch tell us that it can be challenging to advance their ML project past experimentation. This is why Google Cloud has built integrations with PyTorch that make it easier to train, deploy, and orchestrate models in production. Some examples are:

  • PyTorch integrates directly with Vertex AI, a fully managed ML platform that provides the tools you need to take a model from PyTorch to production, like the Pytorch DL containers or the Vertex AI workbench PyTorch one-click JupyterLab environment.
  • PyTorch/XLA, an open source library, uses the XLA deep learning compiler to enable PyTorch to run on Cloud TPUs. Cloud TPUs are custom accelerators designed by Google, optimized for perf/TCO with large scale ML workload PyTorch/XLA also enables XLA driven optimizations on GPUs.
  • TorchX provides an adapter to run and orchestrate TorchX components as part of Kubeflow Pipelines that you can easily scale on Vertex AI Pipelines.
  • With our OSS contributions to Apache Beam, we have made PyTorch models easy to deploy in batch or stream, data processing pipelines. Running on Google Dataflow, these pipelines will scale to very large workloads in a fully managed and simple to maintain environment.

To learn more and start using PyTorch on Google Cloud, check out the resources below:

PyTorch on Vertex AI Resources

  1. How To train and tune PyTorch models on Vertex AI: Learn how to use Vertex AI Training to build and train a sentiment text classification model using PyTorch and Vertex AI Hyperparameter Tuning to tune hyperparameters of PyTorch models.
  2. How to deploy PyTorch models on Vertex AI: Walk through the deployment of a Pytorch model using TorchServe as a custom container, by deploying the model artifacts to a Vertex Prediction service.
  3. Orchestrating PyTorch ML Workflows on Vertex AI Pipelines: See how to build and orchestrate ML pipelines for training and deploying PyTorch models on Google Cloud Vertex AI using Vertex AI Pipelines.
  4. Scalable ML Workflows using PyTorch on Kubeflow Pipelines and Vertex Pipelines: Take a look at examples of PyTorch-based ML workflows on two pipelines frameworks: OSS Kubeflow Pipelines, part of the Kubeflow project, and Vertex AI Pipelines. We share new PyTorch built-in components added to the Kubeflow Pipelines.

PyTorch/XLA and Cloud TPU/GPU

  1. Scaling deep learning workloads with PyTorch / XLA and Cloud TPU VM: Describes the challenges associated with scaling deep learning jobs to distributed training settings, using the Cloud TPU VM and shows how to stream training data from Google Cloud Storage (GCS) to PyTorch / XLA models running on Cloud TPU Pod slices.
  2. PyTorch/XLA: Performance debugging on Cloud TPU VM: Part I: In the first part of the performance debugging series on Cloud TPU, we lay out the conceptual framework for PyTorch/XLA in the context of training performance. We introduced a case study to make sense of preliminary profiler logs and identify the corrective actions.
  3. PyTorch/XLA: Performance debugging on Cloud TPU VM: Part II: In the second part, we deep dive into further analysis of the performance debugging to discover more performance improvement opportunities.
  4. PyTorch/XLA: Performance debugging on Cloud TPU VM: Part III: In the final part of the performance debugging series, we introduce user defined code annotation and visualize these annotations in the form of a trace.
  5. Train ML models with Pytorch Lightning on TPUs: Learn how easy it is to start training models with PyTorch Lightning on TPUs with its built-in TPU support.

Other resources

  1. Increase your productivity using PyTorch Lightning: Learn how to use PyTorch Lightning on Vertex AI Workbench (was previously Notebooks).

By Erwin Huizing and Grace Reed – Cloud AI and ML

TestParameterInjector gets JUnit5 support

In March 2021, we announced the open source release of TestParameterInjector: A parameterized test runner for JUnit4 (see GitHub page).

Over a year later, the Google-internal usage of TestParameterInjector has continued to rapidly grow, and is now by far the most popular parameterized test framework.
Graph of the different parameterized test frameworks in Google
Guava's philosophy frames it nicely: "When trying to estimate the ubiquity of a feature, we frequently use the Google internal code base as a reference." We also believe that TestParameterInjector usage in Google is a decent proxy for its utility elsewhere.

As you can see on the graph above, not only did TestParameterInjector reduce the usage of the other frameworks, but it also caused a drastic increase of the total amount of parameterized tests. This suggests that TestParameterInjector reduced the threshold for parameterizing a regular unit test and Googlers are more actively using this tool to improve the quality of their tests.

JUnit5 (Jupiter) support

At Google, we use JUnit4 exclusively, but some developers outside of Google have moved on to JUnit5 (Jupiter). For those users, we have now expanded the scope of TestParameterInjector.

We've kept the API the same as much as possible:

// **************** JUnit4 **************** //

@RunWith(TestParameterInjector.class)

public class MyTest {


  @TestParameter boolean isDryRun;


  @Test public void test1(@TestParameter boolean enableFlag) { ... }


  @Test public void test2(@TestParameter MyEnum myEnum) { ... }


  enum MyEnum { VALUE_A, VALUE_B, VALUE_C }

}


// **************** JUnit5 (Jupiter) **************** //

class MyTest {


  @TestParameter boolean isDryRun;


  @TestParameterInjectorTest

  void test1(@TestParameter boolean enableFlag) {

    // This method is run 4 times for all combinations of isDryRun and enableFlag

  }


  @TestParameterInjectorTest

  void test2(@TestParameter MyEnum myEnum) {

    // This method is run 6 times for all combinations of isDryRun and myEnum

  }


  enum MyEnum { VALUE_A, VALUE_B, VALUE_C }

}

The only differences are that @RunWith / @ExtendWith are not necessary and that every test method needs a @TestParameterInjectorTest annotation.

The other features of TestParameterInjector work in a similar way with Jupiter:

class MyTest {


  // **************** Defining sets of parameters **************** //

  @TestParameterInjectorTest

  @TestParameters(customName = "teenager", value = "{age: 17, expectIsAdult: false}")

  @TestParameters(customName = "young adult", value = "{age: 22, expectIsAdult: true}")

  void personIsAdult_success(int age, boolean expectIsAdult) {

    assertThat(personIsAdult(age)).isEqualTo(expectIsAdult);

  }


  // **************** Dynamic parameter generation **************** //

  @TestParameterInjectorTest

  void matchesAllOf_throwsOnNull(

      @TestParameter(valuesProvider = CharMatcherProvider.class) CharMatcher charMatcher) {

    assertThrows(NullPointerException.class, () -> charMatcher.matchesAllOf(null));

  }


  private static final class CharMatcherProvider implements TestParameterValuesProvider {

    @Override

    public List<CharMatcher> provideValues() {

      return ImmutableList.of(

          CharMatcher.any(), CharMatcher.ascii(), CharMatcher.whitespace());

    }

  }

}

Other things we've been working on

Custom names for @TestParameters
When running a the following parameterized test:

@Test

@TestParameters("{age: 17, expectIsAdult: false}")

@TestParameters("{age: 22, expectIsAdult: true}")

public void withRepeatedAnnotation(int age, boolean expectIsAdult){ ... }

the generated test names will be:

MyTest#withRepeatedAnnotation[{age: 17, expectIsAdult: false}]

MyTest#withRepeatedAnnotation[{age: 22, expectIsAdult: true}]

This is fine for small parameter sets, but when the number of @TestParameters or parameters within the YAML string gets large, it quickly becomes hard to figure out what each parameter set is supposed to represent.

For those cases, we added the option to add customName:

@Test

@TestParameters(customName = "teenager", value = "{age: 17, expectIsAdult: false}")

@TestParameters(customName = "young adult", value = "{age: 22, expectIsAdult: true}")

public void personIsAdult(int age, boolean expectIsAdult){...}

To allow this API change, we had to allow @TestParameters to be used in a different way: The original way of specifying @TestParameters sets was to specify them as a list of YAML strings inside a single @TestParameters annotation. We considered multiple options of specifying the custom name inside of these YAML strings, such as a magic 
_name key and an extra YAML mapping layer where the keys would be the test names. But we eventually settled on the aforementioned API, which makes @TestParameters a repeated annotation because it results in the least complex code and clearly separates the different parameter sets.

It should be noted that the original API (list of YAML strings in single annotation) still works, but it is now discouraged in favor of multiple @TestParameters annotations with a single YAML string, even when customName isn't used. The main arguments for this recommendation are:
  • Consistency with the customName case, which needs a single YAML string per @TestParameters annotation
  • We believe it structures the list of parameters (especially when it's long) in a more structured way

Integration with RobolectricTestRunner

Recently, we've managed to internally make a version of RobolectricTestRunner that supports TestParameterInjector annotations. There is a significant amount of work left to open source this, and we are now considering when and how to do this.

Learn more

Our GitHub README provides an overview of the framework. Let us know on GitHub if you have any questions, comments, or feature requests!

By Jens Nyman – TestParameterInjector