Tag Archives: Open source

Announcing KataOS and Sparrow

As we find ourselves increasingly surrounded by smart devices that collect and process information from their environment, it's more important now than ever that we have a simple solution to build verifiably secure systems for embedded hardware. If the devices around us can't be mathematically proven to keep data secure, then the personally-identifiable data they collect—such as images of people and recordings of their voices—could be accessible to malicious software.

Unfortunately, system security is often treated as a software feature that can be added to existing systems or solved with an extra piece of ASIC hardware— this generally is not good enough. Our team in Google Research has set out to solve this problem by building a provably secure platform that's optimized for embedded devices that run ML applications. This is an ongoing project with plenty left to do, but we're excited to share some early details and invite others to collaborate on the platform so we can all build intelligent ambient systems that have security built-in by default.

To begin collaborating with others, we've open sourced several components for our secure operating system, called KataOS, on GitHub, as well as partnered with Antmicro on their Renode simulator and related frameworks. As the foundation for this new operating system, we chose seL4 as the microkernel because it puts security front and center; it is mathematically proven secure, with guaranteed confidentiality, integrity, and availability. Through the seL4 CAmkES framework, we're also able to provide statically-defined and analyzable system components. KataOS provides a verifiably-secure platform that protects the user's privacy because it is logically impossible for applications to breach the kernel's hardware security protections and the system components are verifiably secure. KataOS is also implemented almost entirely in Rust, which provides a strong starting point for software security, since it eliminates entire classes of bugs, such as off-by-one errors and buffer overflows.

The current GitHub release includes most of the KataOS core pieces, including the frameworks we use for Rust (such as the sel4-sys crate, which provides seL4 syscall APIs), an alternate rootserver written in Rust (needed for dynamic system-wide memory management), and the kernel modifications to seL4 that can reclaim the memory used by the rootserver. And we've collaborated with Antmicro to enable GDB debugging and simulation for our target hardware with Renode.

Internally, KataOS also is able to dynamically load and run third-party applications built outside of the CAmkES framework. At the moment, the code on Github does not include the required components to run these applications, but we hope to publish these features in the near future.

To prove-out a secure ambient system in its entirety, we're also building a reference implementation for KataOS called Sparrow, which combines KataOS with a secured hardware platform. So in addition to the logically-secure operating system kernel, Sparrow includes a logically-secure root of trust built with OpenTitan on a RISC-V architecture. However, for our initial release, we're targeting a more standard 64-bit ARM platform running in simulation with QEMU.

Our goal is to open source all of Sparrow, including all hardware and software designs. For now, we're just getting started with an early release of KataOS on GitHub. So this is just the beginning, and we hope you will join us in building a future where intelligent ambient ML systems are always trustworthy.

By Sam, Scott, and June – AmbiML Developers

Source: Google Open Source Blog

Flutter SLSA Progress & Identity and Access Management through Infrastructure As Code

We are excited to announce several new achievements in Dart and Flutter's mission to harden security. We have achieved Supply Chain Levels for Software Artifacts (SLSA) Level 2 security on Flutter’s Cocoon application, reduced our Identity and Access Management permissions to the minimum required access, and implemented Infrastructure-as-Code to manage permissions for some of our applications. These achievements follow our recent success to enable Allstar and Security Scorecards.

Highlights

Achieving Flutter’s Cocoon SLSA level 2: Cocoon application provides continuous integration orchestration for Flutter Infrastructure. Cocoon also helps integrate several CI services with Github and provides tools to make Github development easier. Achieving SLSA Level 2 for Cocoon means we have addressed all the security concerns of levels 1 and 2 across the application. Under SLSA Level 2, Cocoon has “extra resistance to specific threats” to its supply chain. The Google Open Source Security team has audited and validated our achievement of SLSA Level 2 for Cocoon.

Implementing Identity & Access Management (IAM) via Infrastructure-as-Code: We have implemented additional security hardening features by onboarding docs-flutter-dev, master-docs-flutter-dev, and flutter-dashboard to use Identity and Access Management through an Infrastructure-as-Code system. These projects host applications, provide public documentation for Flutter, and contain a dashboard website for Flutter build status.

Using our Infrastructure-as-Code approach, security permission changes require code changes, ensuring approval is granted before the change is made. This also means that changes to security permissions are audited through source control and contain associated reasoning for the change. Existing IAM roles for these applications have been pared so that the applications follow the Principle of Least Privilege.

Advantages

Achieving SLSA Level 2 for Cocoon means we have addressed all the security concerns of levels 1 and 2 across the application. Under SLSA Level 2, Cocoon has “extra resistance to specific threats” to its supply chain.

Provenance is now generated for both, flutter-dashboard and auto-submit, artifacts through Cocoon’s automated build process. Provenance on these artifacts shows proof of their code source and tamper-proof build evidence. This work helps harden the security on the multiple tools used during the Cocoon build process: Google Cloud Platform, Cloudbuild, App Engine, and Artifact Registry.

Overall we addressed 83% of all SLSA requirements across all levels for the Cocoon application. We have identified the work across the application which will need to be completed for each level and category of SLSA compliance. Because of this, we know we are well positioned to continue future work toward SLSA Level 4.

Learnings and Best Practices

Relatively small changes to the Cocoon application’s build process significantly increased the security of its supply chain. Google Cloud Build made this simple, since provenance metadata is created automatically during the Cloud Build process.

Regulating IAM permissions through code changes adds many additional benefits and can make granting first time access simpler.

Upgrading the SLSA level of an application sometimes requires varying efforts depending on the different factors of the application build process. Working towards SLSA level 4 will likely necessitate different configuration and code changes than required for SLSA level 2.

Coming Soon

Since this is the beginning of the Flutter and Dart journey toward greater SLSA level accomplishments, we hope to apply our learnings to more applications. We hope to begin work toward SLSA level 2 and beyond for more complex repositories like Flutter/flutter. Also, we hope to achieve an even higher level of SLSA compliance for the Cocoon application.

References

Supply Chain Levels for Software Artifacts (SLSA) is a security framework which outlines levels of supply chain security for an application as a checklist.

By Jesse Seales, Software Engineer – Dart and Flutter Security Working Group

Source: Google Open Source Blog

Announcing the second group of Open Source Peer Bonus winners in 2022

We’re excited to announce our second group of Open Source Peer Bonus winners in 2022! The Google Open Source Peer Bonus program is designed to recognize external open source contributors nominated by Googlers for their open source contributions. This cycle, we are pleased to announce a total of 141 winners across 110+ projects, residing in 36 countries.

All open source contributors external to Google are eligible to be nominated. Whether you’re a software engineer, technical writer, community advocate, mentor, user experience designer, security expert, or educator, etc. you can be nominated for a peer bonus

Our awards often come as a surprise to some while also providing motivation to others to responsibly contribute to open source. Learn more about what the Google Open Source Peer Bonus program means to our winners from this cycle:

“It was a very nice surprise to receive the Open Source Peer Bonus notification. I hope it can help lift contributors off, not only for their code contributions but for community contributions too.” – Oriol Abril Pla, ArviZ, PyMC

“The Kubernetes and CNCF ecosystem is massive. So, there are tons of opportunities to carve out your own niche in them. One of my key goals has been to make the project(s) more secure than how they were when I joined them. These awards are a welcome sprinkle of motivation to keep being a responsible open source contributor.” – Pushkar Joglekar, Kubernetes and CNCF

“I’m very pleased and proud to receive a Google Open Source Peer Bonus award. I was nominated for my contributions to The Good Docs Project where we are creating technical writing templates to help other projects create high-quality documentation. I’m passionate about the work we’re doing there, and have been hanging around the project since its inception in 2019. This is a friendly, inclusive community creating a safe space for folk to dip their toe into open source. We are global, and new folk are always welcome.” – Felicity Brand, The Good Docs Project

“I've been actively working on open source projects since my time at NIST with the FDS project starting in 2006. More recently with The Good Docs Project (TGDP) since 2020. It's been a very rewarding experience to contribute to TGDP, with such an amazing diversity of participants, perspectives and interests involved. To be given recognition through the OSPB program was a pleasant and unexpected surprise. While it's not at all what I am participating in the project for, it feels great to have someone else in the project bring my name up for this award. Thank you to TGDP and to Google for this honor.” – Bryan Klein, The Good Docs Project

“The Open Source Peer Bonus program is more than an appreciation for our contribution to the open source world. It encourages people to share their talent. To be the hero of the ones who are benefiting from your work, put your codes in the open source world.” – Nan YE, Orange Innovation China

“The TFX team and community is by far the most responsive, helpful and knowledgeable open-source project that I have worked on. It's a great feeling to be a part of the democratizing of productionised ML workflows, and being officially recognised on your efforts and contributions is the cherry on top.” – Jens Wiren, Analytical Impact Solutions

“The HTTP Archive team is welcoming to contributors and happily showed me the ropes until I got going. The project is invaluable to the web community, and working on the Web Almanac allowed me to work with domain experts on several topics, including Performance, JavaScript, and Third Parties.” – Kevin Farrugia, HTTP Archive

“Participating in these projects has been a great learning experience and has given me the opportunity to connect with a lot of great people. I am humble and grateful for the recognition and appreciation this program gives to the contributions made to these projects.” – Ole Markus With, kOps/etcdadm

“Google has been very generous in recognising VertFlow, which is a tool still in its infancy after the idea popped into my head a few months ago in conversation with a Google Cloud Customer Engineer. I hope this will encourage users to adopt VertFlow to reduce their carbon footprint when using GCP.” – Jack Lockyer-Stevens, VertFlow

Below is the list of current winners who gave us permission to thank them publicly:

Project	Winner
abap2xlsx	Gregor Wolf
ABC A System for Sequential Synthesis and Verification	Alan Mishchenko
Accelerated HW Synthesis	Zihao Li
Agones	Daniel Oliveira
Android, Pithus, Exodus Privacy, PiRogue, Frida	Esther Onfroy
AndroidX Jetpack	Michał Zieliński
Angular	Dario Piotrowicz
Angular Language Service	Ivan Wan
Apache Airflow	Elad Kalif
Apache Beam	Alex Van Boxel
Apache Beam	Austin Bennett
Apache Beam	Moritz Mack
Apache Hop	Matt Casters
aroman	Avi Romanoff
ArviZ and PyMC	Oriol Abril Pla
Babel	Nicolò Ribaudo
Bazel	Fabian Meumertzheim
Beam	Alex Kosolapov
Blockly	Johnny Oshika
BRLTTY	Dave Mielke
Bun	Jarred Sumner
cargo-make	Sagie Gur-Ari
Chrome DevTools Frontend	Percy Ley
Chromium	Juba Borgohain
Chromium	David Sanders
Chromium	Amos Lim
ClangBuiltLinux	Nathan Chancellor
cloud-data-quality	Amandeep Singh
CNCF	Ragashree M C
Contibuting.today Open Source meetup	Floor Drees
CoreDNS and Kubernetes	Chris O'Haver
cpu_features	Mykola Hohsadze
DartPad	Tim Maffett
dbus	Simon McVittie
Dill	Mike McKerns
distroless	Ole-Martin Bratteng
Don't kill my app and merge to Google Android CTS	Petr Nálevka
ecma262	Richard Gibson
Firebase Admin .NET SDK	Levi Muriuki
Firebase Admin Node.js SDK	Igor Savin
Firebase Admin Node.js SDK	Aras Abbasi
Firebase Apple SDK	Mike Hardy
Firebase Apple SDK	Jake Krog
Firebase Apple SDK	Alex Zchut
Firebase Arduino Client Library for ESP8266 and ESP32.	Suwatchai Klakerdpol
Firebase Crashlytics	Sergio Campamá
firebase-ios-sdk	Fumito Ito
firebase-ios-sdk	Tito Ciuro
firebase-js-sdk	Andi Pätzold
fish-shell	Peter Ammon
Flashrom	Thomas Heijligen
Flashrom	Felix Singer
FreeCAD	Lei Zheng
Fuchsia	Alexander Popov
Git	Jorawar Singh
git and openssh	Fabian Stelzer
GNU Guix	Ludovic Courtès
GNU Mes	Janneke Nieuwenhuizen
go-clean-arch	Iman Tumorang
golang/protobuf	Cassondra Foesch
google-cloud-pricing-cost-calculator	Nils Knieling
gopls	Ruslan Nigmatullin
GrapheneOS	Daniel Micay
GSYVideoPlayer	Asher Guo
Hello World gRPC-Gateway	Rajiv Singh
Lichess	Thibault Duplessis
JRuby	Charles Nutter
Keras	Sayak Paul
Kernel, Wireguard	Jason Donenfeld
Knative	Mahamed Ali
Knative	Gabriel Freites
Kubernetes, CNCF	Pushkar Joglekar
Kubernetes (kOps, etcdadm etc)	Ciprian Hacman
Kubernetes (particularly kOps / etcdadm)	Ole Markus With
Kubernetes (particularly kOps / etcdadm)	Peter Rifel
Kubernetes Gateway API	Keith Mattix
KUnit/Linux kernel	Shuah Khan
Leaflet	Volodymyr Agafonkin
libyuv	Yuan Tong
lnav	Tim Stack
Log4J	Ralph Goers
Magit	Jonas Bernoulli
medium_stats	Oliver Tosky
Mockk	Oleksii Pylypenko
moja global	Harsh Bardhan Mishra
mvt (Mobile Verification Toolkit)	Claudio Guarnieri
OSS educator and collaborator	José Luis Chiquete
notcurses	nick black
Nudge	Erik Gomez
OpenSSF Allstar	Yori Yano
Oppia	Om Khandade
Oppia	Chantel Chan
OR-Tools	Xiang Chen
pcileech (and LeechCore subproject)	Ulf Frisk
Project Jupyter	Min Ragan-Kelley
Protocol Buffers	Yannic Bonenberger
pyinfra	Nick Mills-Barrett
PyPI	Jack Lockyer-Stevens
PyTorch / XLA	Ronghang Hu
QGIS	Nyall Dawson
react-native-firebase	Minsik Kim
Rich, Textualize	Will McGugan
Rust for Linux	Björn Roy Baron
sableangle	Miki Huang
Samba	David Mulder
Scorecards	Varun Sharma
Scorecards	Naveen Srinivasan
SimpleWebAuthn	Matthew Miller
SLSA	Michael Lieberman
Spock	Leonard Brünings
SQLAlchemy	Michael Bayer
stage0	Jeremiah Orians
styler	Lorenz Walthert
Surelog	Alain Dargelas
Svelte	Rich Harris
TC39	Jordan Harband
Tekton	Parth Patel
Tekton	Andrew Bayer
TensorFlow	Stefano Fabri
TensorFlow	Jason Zaman
TensorFlow Lite Examples - Android	Nan Ye
TFX	Ukjae Jeong
TFX	Jens Wiren
TFX-Addons	Gerard Casas Saez
TFX-Addons	Hannes Hapke
TFX-BSL	Martin Bomio
tfx-helper	Tomasz Mackowiak
The Good Docs Project	Aaron Peters
The Good Docs Project	Felicity Brand
The Good Docs Project	Ian Nguyen
The Good Docs Project	Bryan Klein
The Good Docs Project	Serena Jolley
Tow-Boot	Samuel Dionne-Riel
Trivy	Teppei Fukuda
TUF, CNCF	Marina Moore
V8	Ao Wang
ViSQOL	Feargus O'Gorman
W3C WebGPU standard	Mehmet Oguz Derin
wdi5	Volker Buzek
Web Almanac	Kevin Farrugia
WebRTC	Byoungchan Lee

Congratulations to our winners above and thank you for your open source contributions. We look forward to your continued support and efforts in the open source communities. Additionally, thank you to all of the Googlers who submitted nominations and our review committee members for reviewing nominations.

By Joe Sylvanovich – Google Open Source Programs Office

Source: Google Open Source Blog

Lyra V2 – a better, faster, and more versatile speech codec

Since we open sourced the first version of Lyra on GitHub last year, we are delighted to see a vibrant community growing around it, with thousands of stars, hundreds of forks, and many comments and pull requests. There are people who fixed and formatted our code, built continuous integration for the project, and even added support for Web Assembly.

We are incredibly grateful for all these contributions, and we also heard the community's feedback, asking us to improve Lyra. Some examples of what developers wanted were to run Lyra on more platforms, develop applications in more languages; and for a model that computes faster with more bitrate options and lower latency, and better audio quality with fewer artifacts.

That's why we are now releasing Lyra V2, with a new architecture that enjoys a wider platform support, provides scalable bitrate capabilities, has better performance, and generates higher quality audio. With this release, we hope to continue to evolve with the community, and with its collective creativity, see new applications being developed and new directions emerging.

New Architecture

Lyra V2 is based on an end-to-end neural audio codec called SoundStream. The architecture has a residual vector quantizer (RVQ) sitting before and after the transmission channel, which quantizes the encoded information into a bitstream and reconstructs it on the decoder side.

The integration of RVQ into the architecture allows changing the bitrate of Lyra V2 at any time by selecting the number of quantizers to use. When more quantizers are used, higher quality audio is generated (at a cost of a higher bitrate). In Lyra V2, we support three different bitrates: 3.2 kps, 6 kbps, and 9.2 kbps. This enables developers to choose a bitrate most suitable for their network condition and quality requirements.

Lyra V2's model is exported in TensorFlow Lite, TensorFlow's lightweight cross-platform solution for mobile and embedded devices, which supports various platforms and hardware accelerations. The code is tested on Android phones and Linux, with experimental Mac and Windows support. Operation on iOS and other embedded platforms is not currently supported, although we expect it is possible with additional effort. Moreover, this paradigm opens Lyra to any future platform supported by TensorFlow Lite.

Better Performance

With the new architecture, the delay is reduced from 100 ms with the previous version to 20 ms. In this regard, Lyra V2 is comparable to the most widely used audio codec Opus for WebRTC, which has a typical delay of 26.5 ms, 46.5 ms, and 66.5 ms.

Lyra V2 also encodes and decodes five times faster than the previous version. On a Pixel 6 Pro phone, Lyra V2 takes 0.57 ms to encode and decode a 20 ms audio frame, which is 35 times faster than real time. The reduced complexity means that more phones can run Lyra V2 in real time than V1, and that the overall battery consumption is lowered.

Higher Quality

Driven by the advance of machine learning research over the years, the quality of the generated audio is also improved. Our listening tests show that the audio quality (measured in MUSHRA score, an indication of subjective quality) of Lyra V2 at 3.2 kbps, 6 kbps, and
9.2 kbps measures up to Opus at 10 kbps, 13 kbps, and 14 kbps respectively.

	Sample 1	Sample 2
Original
Opus @6kbps
LyraV1
Opus @10kbps
LyraV2 @3.2kbps
Opus @13k
LyraV2 @6kbps
Opus @14kbps
LyraV2 @9.2kbps

This makes Lyra V2 a competitive alternative to other state-of-the-art telephony codecs. While Lyra V1 already compares favorably to the Adaptive Multi-Rate (AMR-NB) codec, Lyra V2 further outperforms Enhanced Voice Services (EVS) and Adaptive Multi-Rate Wideband (AMR-WB), and is on par with Opus, all the while using only 50% - 60% of their bandwidth.

	Sample 1	Sample 2
Original
AMR-NB
LyraV1
EVS
AMR-WB
Opus @13kbps
LyraV2 @6kbps

This means more devices can be connected in bandwidth-constrained environments, or that additional information can be sent over the network to reduce voice choppiness through forward error correction and packet loss concealment.

Open Source Release

Lyra V2 continues to provide what is already in Lyra V1 (the build tools, the testing frameworks, the C++ encoding and decoding API, the signal processing toolchain, and the example Android app). Developers who have experience with the Lyra V1 API will find that the V2 API looks familiar, but with a few changes. For example, now it's possible to change bitrates during encoding (more information is available in the release notes). In addition, the model definitions and weights are included as .tflite files. As with V1, this release is a beta version and the API and bitstream are expected to change. The code for running Lyra is open sourced under the Apache license. We can’t wait to see what innovative applications people will create with the new and improved Lyra!

By Ｈengchin Yeh - Chrome

Acknowledgements

The following people helped make the open source release possible: from Chrome: Alejandro Luebs, Michael Chinen, Andrew Storus, Tom Denton, Felicia Lim, Bastiaan Kleijn, Jan Skoglund, Yaowu Xu, Jamieson Brettle, Omer Osman, Matt Frost, Jim Bankoski; and from Google Research: Neil Zeghidour, Marco Tagliasacchi

Source: Google Open Source Blog

TensorStore for High-Performance, Scalable Array Storage

Posted by Jeremy Maitin-Shepard and Laramie Leavitt, Software Engineers, Connectomics at Google

Many exciting contemporary applications of computer science and machine learning (ML) manipulate multidimensional datasets that span a single large coordinate system, for example, weather modeling from atmospheric measurements over a spatial grid or medical imaging predictions from multi-channel image intensity values in a 2d or 3d scan. In these settings, even a single dataset may require terabytes or petabytes of data storage. Such datasets are also challenging to work with as users may read and write data at irregular intervals and varying scales, and are often interested in performing analyses using numerous machines working in parallel.

Today we are introducing TensorStore, an open-source C++ and Python software library designed for storage and manipulation of n-dimensional data that:

Provides a uniform API for reading and writing multiple array formats, including zarr and N5.
Natively supports multiple storage systems, including Google Cloud Storage, local and network filesystems, HTTP servers, and in-memory storage.
Supports read/writeback caching and transactions, with strong atomicity, isolation, consistency, and durability (ACID) guarantees.
Supports safe, efficient access from multiple processes and machines via optimistic concurrency.
Offers an asynchronous API to enable high-throughput access even to high-latency remote storage.
Provides advanced, fully composable indexing operations and virtual views.

TensorStore has already been used to solve key engineering challenges in scientific computing (e.g., management and processing of large datasets in neuroscience, such as peta-scale 3d electron microscopy data and “4d” videos of neuronal activity). TensorStore has also been used in the creation of large-scale machine learning models such as PaLM by addressing the problem of managing model parameters (checkpoints) during distributed training.

Familiar API for Data Access and Manipulation
TensorStore provides a simple Python API for loading and manipulating large array data. In the following example, we create a TensorStore object that represents a 56 trillion voxel 3d image of a fly brain and access a small 100x100 patch of the data as a NumPy array:

>>> import tensorstore as ts
>>> import numpy as np

# Create a TensorStore object to work with fly brain data.
>>> dataset = ts.open({
...     'driver':
...         'neuroglancer_precomputed',
...     'kvstore':
...         'gs://neuroglancer-janelia-flyem-hemibrain/v1.1/segmentation/',
... }).result()

# Create a 3-d view (remove singleton 'channel' dimension):
>>> dataset_3d = dataset[ts.d['channel'][0]]
>>> dataset_3d.domain
{ "x": [0, 34432), "y": [0, 39552), "z": [0, 41408) }

# Convert a 100x100x1 slice of the data to a numpy ndarray
>>> slice = np.array(dataset_3d[15000:15100, 15000:15100, 20000])

Crucially, no actual data is accessed or stored in memory until the specific 100x100 slice is requested; hence arbitrarily large underlying datasets can be loaded and manipulated without having to store the entire dataset in memory, using indexing and manipulation syntax largely identical to standard NumPy operations. TensorStore also provides extensive support for advanced indexing features, including transforms, alignment, broadcasting, and virtual views (data type conversion, downsampling, lazily on-the-fly generated arrays).

The following example demonstrates how TensorStore can be used to create a zarr array, and how its asynchronous API enables higher throughput:

>>> import tensorstore as ts
>>> import numpy as np

>>> # Create a zarr array on the local filesystem
>>> dataset = ts.open({
...     'driver': 'zarr',
...     'kvstore': 'file:///tmp/my_dataset/',
... },
... dtype=ts.uint32,
... chunk_layout=ts.ChunkLayout(chunk_shape=[256, 256, 1]),
... create=True,
... shape=[5000, 6000, 7000]).result()

>>> # Create two numpy arrays with example data to write.
>>> a = np.arange(100*200*300, dtype=np.uint32).reshape((100, 200, 300))
>>> b = np.arange(200*300*400, dtype=np.uint32).reshape((200, 300, 400))

>>> # Initiate two asynchronous writes, to be performed concurrently.
>>> future_a = dataset[1000:1100, 2000:2200, 3000:3300].write(a)
>>> future_b = dataset[3000:3200, 4000:4300, 5000:5400].write(b)

>>> # Wait for the asynchronous writes to complete
>>> future_a.result()
>>> future_b.result()

Safe and Performant Scaling
Processing and analyzing large numerical datasets requires significant computational resources. This is typically achieved through parallelization across numerous CPU or accelerator cores spread across many machines. Therefore a fundamental goal of TensorStore has been to enable parallel processing of individual datasets that is both safe (i.e., avoids corruption or inconsistencies arising from parallel access patterns) and high performance (i.e., reading and writing to TensorStore is not a bottleneck during computation). In fact, in a test within Google’s datacenters, we found nearly linear scaling of read and write performance as the number of CPUs was increased:

Read and write performance for a TensorStore dataset in zarr format residing on Google Cloud Storage (GCS) accessed concurrently using a variable number of single-core compute tasks in Google data centers. Both read and write performance scales nearly linearly with the number of compute tasks.

Performance is achieved by implementing core operations in C++, extensive use of multithreading for operations such as encoding/decoding and network I/O, and partitioning large datasets into much smaller units through chunking to enable efficiently reading and writing subsets of the entire dataset. TensorStore also provides configurable in-memory caching (which reduces slower storage system interactions for frequently accessed data) and an asynchronous API that enables a read or write operation to continue in the background while a program completes other work.

Safety of parallel operations when many machines are accessing the same dataset is achieved through the use of optimistic concurrency, which maintains compatibility with diverse underlying storage layers (including Cloud storage platforms, such as GCS, as well as local filesystems) without significantly impacting performance. TensorStore also provides strong ACID guarantees for all individual operations executing within a single runtime.

To make distributed computing with TensorStore compatible with many existing data processing workflows, we have also integrated TensorStore with parallel computing libraries such as Apache Beam (example code) and Dask (example code).

Use Case: Language Models
An exciting recent development in ML is the emergence of more advanced language models such as PaLM. These neural networks contain hundreds of billions of parameters and exhibit some surprising capabilities in natural language understanding and generation. These models also push the limits of computational infrastructure; in particular, training a language model such as PaLM requires thousands of TPUs working in parallel.

One challenge that arises during this training process is efficiently reading and writing the model parameters. Training is distributed across many separate machines, but parameters must be regularly saved to a single object (“checkpoint”) on a permanent storage system without slowing down the overall training process. Individual training jobs must also be able to read just the specific set of parameters they are concerned with in order to avoid the overhead that would be required to load the entire set of model parameters (which could be hundreds of gigabytes).

TensorStore has already been used to address these challenges. It has been applied to manage checkpoints associated with large-scale (“multipod”) models trained with JAX (code example) and has been integrated with frameworks such as T5X (code example) and Pathways. Model parallelism is used to partition the full set of parameters, which can occupy more than a terabyte of memory, over hundreds of TPUs. Checkpoints are stored in zarr format using TensorStore, with a chunk structure chosen to allow the partition for each TPU to be read and written independently in parallel.

When saving a checkpoint, each model parameter is written using TensorStore in zarr format using a chunk grid that further subdivides the grid used to partition the parameter over TPUs. The host machines write in parallel the zarr chunks for each of the partitions assigned to TPUs attached to that host. Using TensorStore's asynchronous API, training proceeds even while the data is still being written to persistent storage. When resuming from a checkpoint, each host reads only the chunks that make up the partitions assigned to that host.

Use Case: 3D Brain Mapping
The field of synapse-resolution connectomics aims to map the wiring of animal and human brains at the detailed level of individual synaptic connections. This requires imaging the brain at extremely high resolution (nanometers) over fields of view of up to millimeters or more, which yields datasets that can span petabytes in size. In the future these datasets may extend to exabytes as scientists contemplate mapping entire mouse or primate brains. However, even current datasets pose significant challenges related to storage, manipulation, and processing; in particular, even a single brain sample may require millions of gigabytes with a coordinate system (pixel space) of hundreds of thousands pixels in each dimension.

We have used TensorStore to solve computational challenges associated with large-scale connectomic datasets. Specifically, TensorStore has managed some of the largest and most widely accessed connectomic datasets, with Google Cloud Storage as the underlying object storage system. For example, it has been applied to the human cortex “h01” dataset, which is a 3d nanometer-resolution image of human brain tissue. The raw imaging data is 1.4 petabytes (roughly 500,000 * 350,000 * 5,000 pixels large, and is further associated with additional content such as 3d segmentations and annotations that reside in the same coordinate system. The raw data is subdivided into individual chunks 128x128x16 pixels large and stored in the “Neuroglancer precomputed” format, which is optimized for web-based interactive viewing and can be easily manipulated from TensorStore.

A fly brain reconstruction for which the underlying data can be easily accessed and manipulated using TensorStore.

Getting Started
To get started using the TensorStore Python API, you can install the tensorstore PyPI package using:

pip install tensorstore

Refer to the tutorials and API documentation for usage details. For other installation options and for using the C++ API, refer to installation instructions.

Acknowledgements
Thanks to Tim Blakely, Viren Jain, Yash Katariya, Jan-Matthis Luckmann, Michał Januszewski, Peter Li, Adam Roberts, Brain Williams, and Hector Yee from Google Research, and Davis Bennet, Stuart Berg, Eric Perlman, Stephen Plaza, and Juan Nunez-Iglesias from the broader scientific community for valuable feedback on the design, early testing and debugging.

Source: Google AI Blog

GSoC 2022: The first phase of completed projects

Source: Google Open Source Blog

Co-simulating ML with Springbok using Renode

The landscape of Machine Learning software libraries and models is evolving rapidly, and to satisfy the ever-increasing demand for memory and compute while managing latency, power and security considerations, hardware must be developed in an iterative process alongside the workloads it is meant to run.

With its open architecture, custom instructions support and flexible vector extensions, the RISC-V ISA offers an unprecedented capacity for such co-design. And by energizing the open hardware ecosystem, RISC-V has supercharged research and innovation into how to improve chipmaking itself to better leverage the methods and suit the needs of software. Initiatives such as Google’s OpenMPW Shuttle show how a more open and software-focused approach to building hardware, are key to enabling a new wave of more powerful and transparent ML-focused solutions.

A RISC-V-based ML accelerator with a HW/SW co-design flow

In the past months, Google Research has joined efforts with Antmicro to work on a silicon project that can serve as a template for efficient hardware-software co-design. For their secure ML solution, the Google Research team supported by Antmicro has been developing a completely open source, rapid pre-silicon ML development flow using Renode, Antmicro’s open source simulation framework.

This builds on the result of cooperation from last year in which Antmicro implemented Renode support for RISC-V Vector extensions, which are used in the Google team’s RISC-V based ML accelerator codenamed Springbok. To allow a more well-rounded developer experience, as part of the project Antmicro is also working on improving the support for the underlying SoC and a large number of user oriented features such as OS-aware debugging, performance optimizations, payload profiling and performance measurement capabilities.

Springbok is part of Google’s AmbiML project that aims to create an open source ML development ecosystem centered on privacy and security. By using the RISC-V Vector extensions, the Google Research team has a standard but flexible way to parallelize the matrix multiply and accumulate operations that are universal in ML payloads. And thanks to Renode, the team can make informed choices as to how exactly to leverage RISC-V’s flexibility by analyzing tradeoffs between speed, complexity and specialization in a practical, iterative fashion using data generated by Renode and the text-based configuration capabilities that let them play around with hardware composition and functionality in a matter of minutes, not days.

Diagram of A RISC-V-based ML accelerator with a HW/SW co-design flow

On the ML software side, the ecosystem revolves around IREE—Google’s research project developing an open source ML compiler and runtime for constrained devices, based on LLVM MLIR.

IREE allows you to load models from typical ML frameworks such as TensorFlow or TensorFlow Lite and then convert them to Intermediate Representation (MLIR), which later goes through optimizations on graph level and then through an LLVM compilation flow to get the best-fitted runtime for a specific target. When it comes to deploying models on target devices, IREE provides APIs for both the C and Python programming languages as well as a TFLite C API which provides the same convention as TFLite for model loading, tensor management and inference invoking.

Using these runtimes, the model can be deployed and tested, debugged, benchmarked and executed on the target device or in a simulation environment like Renode.

Demoing the flow at Spring 2022 RISC-V Week

In the build up to the Spring 2022 RISC-V Week in Paris, the first such large open hardware meeting in years, an initial version of the AmbiML bare metal ML flow was released as open source. This includes both the ability to run interactively and an example CI using Antmicro’s GitHub Renode Action showing how such a workflow can be tested automatically on each commit. As a Google Cloud partner, Antmicro is currently working with Google Cloud to make Renode available for massive scale CI testing and deployments for scenarios similar to this one.

In a joint talk at the Paris event, Antmicro and Google presented the software co-development flow, together with a demo of a heterogeneous multi-core solution, with one core running the AmbiML Springbok payload and another core running Zephyr.

In the presented scenario the Springbok core, acting as a ML compute offload unit to the main CPU, executed inference on the MobileNetv1 network and reported the work done to the application core via a RISC-V custom instruction. Adding and modifying custom instructions is trivial in Renode, either via a single line of Python, C#, or even co-simulated in RTL.

Renode helps ML developers and silicon designers not only to run and test their solutions, but also to learn more about what their software is actually doing. As part of the Paris demonstration, Antmicro and Google showed how you can count executed instructions and how often specific opcodes are used to measure how well your solution is performing. These features, accompanied by execution metrics analysis, executed functions logging, and recently developed execution trace generation, give you great insight into every detail of your emulated ML environment.

These capabilities join the wide arsenal of hardware/software co-development solutions in Renode, such as RTL co-simulation which Antmicro has been developing with Microchip and support for verilated custom instructions developed with another ML-focused Google team responsible for RISC-V Custom Function Units and also used in the EU-funded VEDLIoT project.

Future plans

This is just the beginning of a wider activity from the Google Research team Antmicro is working with to release software and hardware components as well as tools supporting a collaborative co-design ecosystem for secure ML development. If you think Renode, RISC-V and co-development could help in building your next ML-focused product, go ahead and try the AmbiML flow yourself!

Visit the iree-rv32-springbok repository on GitHub, clone it locally and follow the instructions from README.md.

You can also grab Renode from the official repository and start playing with the available demos, or head to the Renode documentation to read up on features helpful for ML acceleration development such as Verilator co-simulation.

By Peter Zierhoffer – Antmicro

Source: Google Open Source Blog

Google and NIST partner on nanotechnology development platform

We’re proud to announce Google’s cooperative research and development agreement with the U.S. National Institute of Standards and Technology (NIST) to develop an open source testbed for nanotechnology research and development for American universities. NIST—a bureau of the U.S. Department of Commerce—will start by migrating their existing planarized wafer designs to an open source framework, which can be manufactured in the U.S. on SkyWater Technologies’ open source 130nm process (SKY130). The physical wafers and source code will be available in the coming months. Together, NIST, Google, and the open source community will develop designs to facilitate research into both basic and applied science, including technology transfer into production with U.S. manufacturers.

Furthering Google’s goals to improve access to semiconductor technology, this agreement will provide academic researchers with unprecedented resources from a semiconductor foundry to enhance research into the physics of semiconductors and nanodevices. This includes their chemistry, defects, electrical properties, high frequency operation, and switching behavior, while reducing overall costs through economies of scale. Most importantly, this access enhances the technology transfer process by enabling researchers to develop new and emerging technologies using foundry resources, that can then be seamlessly transitioned into mass production since universities will already be using an industrially relevant platform. This will greatly improve scientist’s ability to move their technologies through the tech-transfer “valley of death” and into practical use.

Nanotechnology research has benefitted from silicon wafers that are normally used for chip manufacturing in a unique way. Instead of turning them into packaged microchips, their smooth, planarized surface makes a great substrate for building and testing nanoscale structures. This likewise helps test their transition into mass production.

Picture of a full wafer using the SKY130 open source PDK.

The wafer for this platform has a number of different metrology structures, from parametric test structures based on simple transistor arrays—which can be probed in a probe station—to thousands of complex measurements that users can operate using synthesized digital circuits. Critically, the wafers will be available to universities in a 200 mm form factor, and mid-production planarized wafers with less than a single nanometer of surface roughness. Smooth, flat surfaces are critical for advanced manufacturing at small sizes.

NIST researchers are also ensuring that the wafers have photolithographic and electron beam alignment marks commonly found in university nanofabrication facilities, allowing the foundry silicon to be used directly by university researchers with ease. Metal pads on the surface will allow scientists to access the semiconductor transistors from the surface.

NIST scientists anticipate the nanotechnology accelerator platform will enhance scientific investigations into a diverse set of technologies, including memory devices (resistive switches, magnetic tunnel junctions, flash memories), artificial intelligence, plasmonics, semiconductor bioelectronics, thin film transistors and even quantum information science.

Picture of a development die from Google 's OpenMPW program for the nanotechnology accelerator developed by NIST and the University of Michigan

This program also benefits from Google’s previous contributions and support of the GDSFactory and OpenFASOC open source projects that help automate and shorten the construction of these important measuring devices from months to days. Ahead of the full wafer tapeout in 2023, NIST scientists, working with partners at the University of Michigan, Carnegie Mellon, University of Maryland, The George Washington University, and Brown University have been using Google's OpenMPW program to develop and test preliminary circuits which they expect to include in the nanotechnology accelerator. Preliminary testing will help ensure the program’s goals are met with working circuits that best serve the scientific community.

A key factor in cutting-edge research is reproducibility, or the ability for researchers from different institutions to repeat each other’s experiments and improve upon them. By migrating to an open source framework, researchers can more easily share reproducible results, contribute to the creation of open source datasets to enhance future simulation, and advance the scientific community’s state of the art of nanotechnology and semiconductor manufacturing.

NIST and Google will distribute the first production run of wafers to leading U.S. universities. Post-program, American scientists will be able to directly purchase the wafers from Skywater without license requirements, giving them the freedom to pursue their research without any restrictions. Since wafers are hundreds of times cheaper than full mask-sets or the cost of designing integrated circuits from scratch, scientists will have a much easier time getting and using this powerful industrial technology. Longer term, working with NIST to develop future platforms on the recently announced SKY90FD open source PDK will further expand this R&D ecosystem.

To kick off this research effort NIST is organizing the "NIST Integrated Circuits for Metrology Workshop" from September 20–21, 2022. This workshop will be held online with a series of presentations and panel discussions on the first day. During the second day, a working group of researchers, scientists and engineers will work to focus on the creation of parametric test structures for monolithic integration using open source silicon technology. Visit the event website to get more details about this program and register to attend or learn more about presenting.

By Ethan Mahintorabi and Johan Euphrosine, Software Engineers – Hardware Toolchains Team, and Aaron Cunningham, Technical Program Manager – Google Open Source Programs Office

Source: Google Open Source Blog

Accelerate your models to production with Google Cloud and PyTorch

We believe in the power of choice for Machine Learning development, and continue to invest resources to make it easy for ML practitioners to train, deploy, and orchestrate models from a single unified data and AI cloud platform. We’re excited to announce our role as a founding member of the newly formed PyTorch Foundation, which will better position Google Cloud to make meaningful contributions to the PyTorch community. As a member of the board, we will deepen our open source investment to deliver on the Foundation’s mission to drive adoption of AI tooling by building an ecosystem of open source projects with PyTorch. We strongly believe in choice and will continue to invest in frameworks such as JAX and Tensorflow and support integrations with other OSS Projects including Spark, Airflow, XGBoost, and others.

In this blog, we provide an overview of existing resources to help you get started with PyTorch on Google Cloud. We also talk about how ML practitioners can leverage our end-to-end ML platform to train, tune, and deploy PyTorch models.

PyTorch on Google Cloud

Open source in the cloud is important because it gives you flexibility and control over where you train and deploy your ML workloads. PyTorch is extensively used in the research space and in recent years it has gained immense traction in the industry due to its ease of use and deployment. In fact, according to a survey of Kaggle users, PyTorch is the fastest growing ML framework today.

ML practitioners using PyTorch tell us that it can be challenging to advance their ML project past experimentation. This is why Google Cloud has built integrations with PyTorch that make it easier to train, deploy, and orchestrate models in production. Some examples are:

PyTorch integrates directly with Vertex AI, a fully managed ML platform that provides the tools you need to take a model from PyTorch to production, like the Pytorch DL containers or the Vertex AI workbench PyTorch one-click JupyterLab environment.
PyTorch/XLA, an open source library, uses the XLA deep learning compiler to enable PyTorch to run on Cloud TPUs. Cloud TPUs are custom accelerators designed by Google, optimized for perf/TCO with large scale ML workload PyTorch/XLA also enables XLA driven optimizations on GPUs.
TorchX provides an adapter to run and orchestrate TorchX components as part of Kubeflow Pipelines that you can easily scale on Vertex AI Pipelines.
With our OSS contributions to Apache Beam, we have made PyTorch models easy to deploy in batch or stream, data processing pipelines. Running on Google Dataflow, these pipelines will scale to very large workloads in a fully managed and simple to maintain environment.

To learn more and start using PyTorch on Google Cloud, check out the resources below:

PyTorch on Vertex AI Resources

How To train and tune PyTorch models on Vertex AI: Learn how to use Vertex AI Training to build and train a sentiment text classification model using PyTorch and Vertex AI Hyperparameter Tuning to tune hyperparameters of PyTorch models.
How to deploy PyTorch models on Vertex AI: Walk through the deployment of a Pytorch model using TorchServe as a custom container, by deploying the model artifacts to a Vertex Prediction service.
Orchestrating PyTorch ML Workflows on Vertex AI Pipelines: See how to build and orchestrate ML pipelines for training and deploying PyTorch models on Google Cloud Vertex AI using Vertex AI Pipelines.
Scalable ML Workflows using PyTorch on Kubeflow Pipelines and Vertex Pipelines: Take a look at examples of PyTorch-based ML workflows on two pipelines frameworks: OSS Kubeflow Pipelines, part of the Kubeflow project, and Vertex AI Pipelines. We share new PyTorch built-in components added to the Kubeflow Pipelines.

PyTorch/XLA and Cloud TPU/GPU

Scaling deep learning workloads with PyTorch / XLA and Cloud TPU VM: Describes the challenges associated with scaling deep learning jobs to distributed training settings, using the Cloud TPU VM and shows how to stream training data from Google Cloud Storage (GCS) to PyTorch / XLA models running on Cloud TPU Pod slices.
PyTorch/XLA: Performance debugging on Cloud TPU VM: Part I: In the first part of the performance debugging series on Cloud TPU, we lay out the conceptual framework for PyTorch/XLA in the context of training performance. We introduced a case study to make sense of preliminary profiler logs and identify the corrective actions.
PyTorch/XLA: Performance debugging on Cloud TPU VM: Part II: In the second part, we deep dive into further analysis of the performance debugging to discover more performance improvement opportunities.
PyTorch/XLA: Performance debugging on Cloud TPU VM: Part III: In the final part of the performance debugging series, we introduce user defined code annotation and visualize these annotations in the form of a trace.
Train ML models with Pytorch Lightning on TPUs: Learn how easy it is to start training models with PyTorch Lightning on TPUs with its built-in TPU support.

Other resources

Increase your productivity using PyTorch Lightning: Learn how to use PyTorch Lightning on Vertex AI Workbench (was previously Notebooks).

By Erwin Huizing and Grace Reed – Cloud AI and ML

Source: Google Open Source Blog

TestParameterInjector gets JUnit5 support

In March 2021, we announced the open source release of TestParameterInjector: A parameterized test runner for JUnit4 (see GitHub page).

Over a year later, the Google-internal usage of TestParameterInjector has continued to rapidly grow, and is now by far the most popular parameterized test framework.

Graph of the different parameterized test frameworks in Google

Guava's philosophy frames it nicely: "When trying to estimate the ubiquity of a feature, we frequently use the Google internal code base as a reference." We also believe that TestParameterInjector usage in Google is a decent proxy for its utility elsewhere.

As you can see on the graph above, not only did TestParameterInjector reduce the usage of the other frameworks, but it also caused a drastic increase of the total amount of parameterized tests. This suggests that TestParameterInjector reduced the threshold for parameterizing a regular unit test and Googlers are more actively using this tool to improve the quality of their tests.

JUnit5 (Jupiter) support

At Google, we use JUnit4 exclusively, but some developers outside of Google have moved on to JUnit5 (Jupiter). For those users, we have now expanded the scope of TestParameterInjector.

We've kept the API the same as much as possible:

// **************** JUnit4 **************** //

@RunWith(TestParameterInjector.class)

public class MyTest {

@TestParameter boolean isDryRun;

@Test public void test1(@TestParameter boolean enableFlag) { ... }

@Test public void test2(@TestParameter MyEnum myEnum) { ... }

enum MyEnum { VALUE_A, VALUE_B, VALUE_C }

}

// **************** JUnit5 (Jupiter) **************** //

class MyTest {

@TestParameter boolean isDryRun;

@TestParameterInjectorTest

void test1(@TestParameter boolean enableFlag) {

// This method is run 4 times for all combinations of isDryRun and enableFlag

}

@TestParameterInjectorTest

void test2(@TestParameter MyEnum myEnum) {

// This method is run 6 times for all combinations of isDryRun and myEnum

}

enum MyEnum { VALUE_A, VALUE_B, VALUE_C }

}

The only differences are that @RunWith / @ExtendWith are not necessary and that every test method needs a @TestParameterInjectorTest annotation.

The other features of TestParameterInjector work in a similar way with Jupiter:

class MyTest {

// **************** Defining sets of parameters **************** //

@TestParameterInjectorTest

@TestParameters(customName = "teenager", value = "{age: 17, expectIsAdult: false}")

@TestParameters(customName = "young adult", value = "{age: 22, expectIsAdult: true}")

void personIsAdult_success(int age, boolean expectIsAdult) {

assertThat(personIsAdult(age)).isEqualTo(expectIsAdult);

}

// **************** Dynamic parameter generation **************** //

@TestParameterInjectorTest

void matchesAllOf_throwsOnNull(

@TestParameter(valuesProvider = CharMatcherProvider.class) CharMatcher charMatcher) {

assertThrows(NullPointerException.class, () -> charMatcher.matchesAllOf(null));

}

private static final class CharMatcherProvider implements TestParameterValuesProvider {

@Override

public List<CharMatcher> provideValues() {

return ImmutableList.of(

CharMatcher.any(), CharMatcher.ascii(), CharMatcher.whitespace());

}

Other things we've been working on

Custom names for @TestParameters
When running a the following parameterized test:

@Test

@TestParameters("{age: 17, expectIsAdult: false}")

@TestParameters("{age: 22, expectIsAdult: true}")

public void withRepeatedAnnotation(int age, boolean expectIsAdult){ ... }

the generated test names will be:

MyTest#withRepeatedAnnotation[{age: 17, expectIsAdult: false}]

MyTest#withRepeatedAnnotation[{age: 22, expectIsAdult: true}]

This is fine for small parameter sets, but when the number of @TestParameters or parameters within the YAML string gets large, it quickly becomes hard to figure out what each parameter set is supposed to represent.

For those cases, we added the option to add customName:

@Test

@TestParameters(customName = "teenager", value = "{age: 17, expectIsAdult: false}")

@TestParameters(customName = "young adult", value = "{age: 22, expectIsAdult: true}")

public void personIsAdult(int age, boolean expectIsAdult){...}

To allow this API change, we had to allow @TestParameters to be used in a different way: The original way of specifying @TestParameters sets was to specify them as a list of YAML strings inside a single @TestParameters annotation. We considered multiple options of specifying the custom name inside of these YAML strings, such as a magic _name key and an extra YAML mapping layer where the keys would be the test names. But we eventually settled on the aforementioned API, which makes @TestParameters a repeated annotation because it results in the least complex code and clearly separates the different parameter sets.

It should be noted that the original API (list of YAML strings in single annotation) still works, but it is now discouraged in favor of multiple @TestParameters annotations with a single YAML string, even when customName isn't used. The main arguments for this recommendation are:

Consistency with the customName case, which needs a single YAML string per @TestParameters annotation
We believe it structures the list of parameters (especially when it's long) in a more structured way

Integration with RobolectricTestRunner

Recently, we've managed to internally make a version of RobolectricTestRunner that supports TestParameterInjector annotations. There is a significant amount of work left to open source this, and we are now considering when and how to do this.

Learn more

Our GitHub README provides an overview of the framework. Let us know on GitHub if you have any questions, comments, or feature requests!

By Jens Nyman – TestParameterInjector