Tag Archives: Open source

Rax: Composable Learning-to-Rank Using JAX

Ranking is a core problem across a variety of domains, such as search engines, recommendation systems, or question answering. As such, researchers often utilize learning-to-rank (LTR), a set of supervised machine learning techniques that optimize for the utility of an entire list of items (rather than a single item at a time). A noticeable recent focus is on combining LTR with deep learning. Existing libraries, most notably TF-Ranking, offer researchers and practitioners the necessary tools to use LTR in their work. However, none of the existing LTR libraries work natively with JAX, a new machine learning framework that provides an extensible system of function transformations that compose: automatic differentiation, JIT-compilation to GPU/TPU devices and more.

Today, we are excited to introduce Rax, a library for LTR in the JAX ecosystem. Rax brings decades of LTR research to the JAX ecosystem, making it possible to apply JAX to a variety of ranking problems and combine ranking techniques with recent advances in deep learning built upon JAX (e.g., T5X). Rax provides state-of-the-art ranking losses, a number of standard ranking metrics, and a set of function transformations to enable ranking metric optimization. All this functionality is provided with a well-documented and easy to use API that will look and feel familiar to JAX users. Please check out our paper for more technical details.

Learning-to-Rank Using Rax
Rax is designed to solve LTR problems. To this end, Rax provides loss and metric functions that operate on batches of lists, not batches of individual data points as is common in other machine learning problems. An example of such a list is the multiple potential results from a search engine query. The figure below illustrates how tools from Rax can be used to train neural networks on ranking tasks. In this example, the green items (B, F) are very relevant, the yellow items (C, E) are somewhat relevant and the red items (A, D) are not relevant. A neural network is used to predict a relevancy score for each item, then these items are sorted by these scores to produce a ranking. A Rax ranking loss incorporates the entire list of scores to optimize the neural network, improving the overall ranking of the items. After several iterations of stochastic gradient descent, the neural network learns to score the items such that the resulting ranking is optimal: relevant items are placed at the top of the list and non-relevant items at the bottom.

Using Rax to optimize a neural network for a ranking task. The green items (B, F) are very relevant, the yellow items (C, E) are somewhat relevant and the red items (A, D) are not relevant.

Approximate Metric Optimization
The quality of a ranking is commonly evaluated using ranking metrics, e.g., the normalized discounted cumulative gain (NDCG). An important objective of LTR is to optimize a neural network so that it scores highly on ranking metrics. However, ranking metrics like NDCG can present challenges because they are often discontinuous and flat, so stochastic gradient descent cannot directly be applied to these metrics. Rax provides state-of-the-art approximation techniques that make it possible to produce differentiable surrogates to ranking metrics that permit optimization via gradient descent. The figure below illustrates the use of rax.approx_t12n, a function transformation unique to Rax, which allows for the NDCG metric to be transformed into an approximate and differentiable form.

Using an approximation technique from Rax to transform the NDCG ranking metric into a differentiable and optimizable ranking loss (approx_t12n and gumbel_t12n).

First, notice how the NDCG metric (in green) is flat and discontinuous, making it hard to optimize using stochastic gradient descent. By applying the rax.approx_t12n transformation to the metric, we obtain ApproxNDCG, an approximate metric that is now differentiable with well-defined gradients (in red). However, it potentially has many local optima — points where the loss is locally optimal, but not globally optimal — in which the training process can get stuck. When the loss encounters such a local optimum, training procedures like stochastic gradient descent will have difficulty improving the neural network further.

To overcome this, we can obtain the gumbel-version of ApproxNDCG by using the rax.gumbel_t12n transformation. This gumbel version introduces noise in the ranking scores which causes the loss to sample many different rankings that may incur a non-zero cost (in blue). This stochastic treatment may help the loss escape local optima and often is a better choice when training a neural network on a ranking metric. Rax, by design, allows the approximate and gumbel transformations to be freely used with all metrics that are offered by the library, including metrics with a top-k cutoff value, like recall or precision. In fact, it is even possible to implement your own metrics and transform them to obtain gumbel-approximate versions that permit optimization without any extra effort.

Ranking in the JAX Ecosystem
Rax is designed to integrate well in the JAX ecosystem and we prioritize interoperability with other JAX-based libraries. For example, a common workflow for researchers that use JAX is to use TensorFlow Datasets to load a dataset, Flax to build a neural network, and Optax to optimize the parameters of the network. Each of these libraries composes well with the others and the composition of these tools is what makes working with JAX both flexible and powerful. For researchers and practitioners of ranking systems, the JAX ecosystem was previously missing LTR functionality, and Rax fills this gap by providing a collection of ranking losses and metrics. We have carefully constructed Rax to function natively with standard JAX transformations such as jax.jit and jax.grad and various libraries like Flax and Optax. This means that users can freely use their favorite JAX and Rax tools together.

Ranking with T5
While giant language models such as T5 have shown great performance on natural language tasks, how to leverage ranking losses to improve their performance on ranking tasks, such as search or question answering, is under-explored. With Rax, it is possible to fully tap this potential. Rax is written as a JAX-first library, thus it is easy to integrate it with other JAX libraries. Since T5X is an implementation of T5 in the JAX ecosystem, Rax can work with it seamlessly.

To this end, we have an example that demonstrates how Rax can be used in T5X. By incorporating ranking losses and metrics, it is now possible to fine-tune T5 for ranking problems, and our results indicate that enhancing T5 with ranking losses can offer significant performance improvements. For example, on the MS-MARCO QNA v2.1 benchmark we are able to achieve a +1.2% NDCG and +1.7% MRR by fine-tuning a T5-Base model using the Rax listwise softmax cross-entropy loss instead of a pointwise sigmoid cross-entropy loss.

Fine-tuning a T5-Base model on MS-MARCO QNA v2.1 with a ranking loss (softmax, in blue) versus a non-ranking loss (pointwise sigmoid, in red).

Conclusion
Overall, Rax is a new addition to the growing ecosystem of JAX libraries. Rax is entirely open source and available to everyone at github.com/google/rax. More technical details can also be found in our paper. We encourage everyone to explore the examples included in the github repository: (1) optimizing a neural network with Flax and Optax, (2) comparing different approximate metric optimization techniques, and (3) how to integrate Rax with T5X.

Acknowledgements
Many collaborators within Google made this project possible: Xuanhui Wang, Zhen Qin, Le Yan, Rama Kumar Pasumarthi, Michael Bendersky, Marc Najork, Fernando Diaz, Ryan Doherty, Afroz Mohiuddin, and Samer Hassan.

Source: Google AI Blog


Rax: Composable Learning-to-Rank Using JAX

Ranking is a core problem across a variety of domains, such as search engines, recommendation systems, or question answering. As such, researchers often utilize learning-to-rank (LTR), a set of supervised machine learning techniques that optimize for the utility of an entire list of items (rather than a single item at a time). A noticeable recent focus is on combining LTR with deep learning. Existing libraries, most notably TF-Ranking, offer researchers and practitioners the necessary tools to use LTR in their work. However, none of the existing LTR libraries work natively with JAX, a new machine learning framework that provides an extensible system of function transformations that compose: automatic differentiation, JIT-compilation to GPU/TPU devices and more.

Today, we are excited to introduce Rax, a library for LTR in the JAX ecosystem. Rax brings decades of LTR research to the JAX ecosystem, making it possible to apply JAX to a variety of ranking problems and combine ranking techniques with recent advances in deep learning built upon JAX (e.g., T5X). Rax provides state-of-the-art ranking losses, a number of standard ranking metrics, and a set of function transformations to enable ranking metric optimization. All this functionality is provided with a well-documented and easy to use API that will look and feel familiar to JAX users. Please check out our paper for more technical details.

Learning-to-Rank Using Rax
Rax is designed to solve LTR problems. To this end, Rax provides loss and metric functions that operate on batches of lists, not batches of individual data points as is common in other machine learning problems. An example of such a list is the multiple potential results from a search engine query. The figure below illustrates how tools from Rax can be used to train neural networks on ranking tasks. In this example, the green items (B, F) are very relevant, the yellow items (C, E) are somewhat relevant and the red items (A, D) are not relevant. A neural network is used to predict a relevancy score for each item, then these items are sorted by these scores to produce a ranking. A Rax ranking loss incorporates the entire list of scores to optimize the neural network, improving the overall ranking of the items. After several iterations of stochastic gradient descent, the neural network learns to score the items such that the resulting ranking is optimal: relevant items are placed at the top of the list and non-relevant items at the bottom.

Using Rax to optimize a neural network for a ranking task. The green items (B, F) are very relevant, the yellow items (C, E) are somewhat relevant and the red items (A, D) are not relevant.

Approximate Metric Optimization
The quality of a ranking is commonly evaluated using ranking metrics, e.g., the normalized discounted cumulative gain (NDCG). An important objective of LTR is to optimize a neural network so that it scores highly on ranking metrics. However, ranking metrics like NDCG can present challenges because they are often discontinuous and flat, so stochastic gradient descent cannot directly be applied to these metrics. Rax provides state-of-the-art approximation techniques that make it possible to produce differentiable surrogates to ranking metrics that permit optimization via gradient descent. The figure below illustrates the use of rax.approx_t12n, a function transformation unique to Rax, which allows for the NDCG metric to be transformed into an approximate and differentiable form.

Using an approximation technique from Rax to transform the NDCG ranking metric into a differentiable and optimizable ranking loss (approx_t12n and gumbel_t12n).

First, notice how the NDCG metric (in green) is flat and discontinuous, making it hard to optimize using stochastic gradient descent. By applying the rax.approx_t12n transformation to the metric, we obtain ApproxNDCG, an approximate metric that is now differentiable with well-defined gradients (in red). However, it potentially has many local optima — points where the loss is locally optimal, but not globally optimal — in which the training process can get stuck. When the loss encounters such a local optimum, training procedures like stochastic gradient descent will have difficulty improving the neural network further.

To overcome this, we can obtain the gumbel-version of ApproxNDCG by using the rax.gumbel_t12n transformation. This gumbel version introduces noise in the ranking scores which causes the loss to sample many different rankings that may incur a non-zero cost (in blue). This stochastic treatment may help the loss escape local optima and often is a better choice when training a neural network on a ranking metric. Rax, by design, allows the approximate and gumbel transformations to be freely used with all metrics that are offered by the library, including metrics with a top-k cutoff value, like recall or precision. In fact, it is even possible to implement your own metrics and transform them to obtain gumbel-approximate versions that permit optimization without any extra effort.

Ranking in the JAX Ecosystem
Rax is designed to integrate well in the JAX ecosystem and we prioritize interoperability with other JAX-based libraries. For example, a common workflow for researchers that use JAX is to use TensorFlow Datasets to load a dataset, Flax to build a neural network, and Optax to optimize the parameters of the network. Each of these libraries composes well with the others and the composition of these tools is what makes working with JAX both flexible and powerful. For researchers and practitioners of ranking systems, the JAX ecosystem was previously missing LTR functionality, and Rax fills this gap by providing a collection of ranking losses and metrics. We have carefully constructed Rax to function natively with standard JAX transformations such as jax.jit and jax.grad and various libraries like Flax and Optax. This means that users can freely use their favorite JAX and Rax tools together.

Ranking with T5
While giant language models such as T5 have shown great performance on natural language tasks, how to leverage ranking losses to improve their performance on ranking tasks, such as search or question answering, is under-explored. With Rax, it is possible to fully tap this potential. Rax is written as a JAX-first library, thus it is easy to integrate it with other JAX libraries. Since T5X is an implementation of T5 in the JAX ecosystem, Rax can work with it seamlessly.

To this end, we have an example that demonstrates how Rax can be used in T5X. By incorporating ranking losses and metrics, it is now possible to fine-tune T5 for ranking problems, and our results indicate that enhancing T5 with ranking losses can offer significant performance improvements. For example, on the MS-MARCO QNA v2.1 benchmark we are able to achieve a +1.2% NDCG and +1.7% MRR by fine-tuning a T5-Base model using the Rax listwise softmax cross-entropy loss instead of a pointwise sigmoid cross-entropy loss.

Fine-tuning a T5-Base model on MS-MARCO QNA v2.1 with a ranking loss (softmax, in blue) versus a non-ranking loss (pointwise sigmoid, in red).

Conclusion
Overall, Rax is a new addition to the growing ecosystem of JAX libraries. Rax is entirely open source and available to everyone at github.com/google/rax. More technical details can also be found in our paper. We encourage everyone to explore the examples included in the github repository: (1) optimizing a neural network with Flax and Optax, (2) comparing different approximate metric optimization techniques, and (3) how to integrate Rax with T5X.

Acknowledgements
Many collaborators within Google made this project possible: Xuanhui Wang, Zhen Qin, Le Yan, Rama Kumar Pasumarthi, Michael Bendersky, Marc Najork, Fernando Diaz, Ryan Doherty, Afroz Mohiuddin, and Samer Hassan.

Source: Google AI Blog


El Carro for Oracle: Data migration and improved backups

In May 2021, we released El Carro to make it easier to run Oracle databases on Kubernetes. Our following blog dove deeper into El Carro’s features, announced support for Oracle 19c, and detailed more flexibility for building database images. Today we’re excited to open source two new features to enhance El Carro and make it easier to manage your Oracle deployments: Data Migration and Point-in-Time Recovery. Automated Data Migration makes it much easier to re-platform to El Carro and Point-in-Time Recovery is a standard feature that database professionals come to expect because it enables you to drastically reduce RPO and worry less about backup frequency.

Data Migration

The Data Migration feature of El Carro enables users to migrate data from their existing database to an El Carro database running on Kubernetes. This functionality allows users to re-platform to El Carro with minimal disruption. The two most common pathways shown in the image below are to 1) modernize in place by simply migrating your database to Kubernetes so you can leverage the automation of El Carro and to 2) migrate to Kubernetes in the cloud.
Migrate your database in place or to the cloud
Migrate your database in place or to the cloud

Typical migration sources include AWS (RDS, EKS, EC2), Azure (AKS, VMs), GCP (GKE, GCE, BMS) or on-premises deployments. Typical migration targets include any Kubernetes installation on GCP (GKE), AWS (EKS), Azure (AKS), or on-premises.

The Data Migration feature offers two automated migration flows and two manual ones.

Category

Options

Migration Downtime

Complexity

Automated

Data Guard with physical standby

minimal

lowest

Data pump

long

low

Manual

RMAN-based migration

long

medium

Data pump

long

medium

  • Migration Downtime: required downtime to migrate source database into El Carro without data loss. Minimal means less downtime, long means more downtime.
  • Complexity: summarizes the difficulty and complexity of the migration journey.

Point-in-Time Recovery

Since its release, El Carro has provided users the ability to take backups and restore via RMAN or storage-based snapshots. Today we’re excited to release a new Point-in-Time Recovery feature to enhance El Carro’s backup functionality by automatically backing up archive redo logs to a GCS (Google Cloud Storage) bucket and allowing users to seamlessly restore their databases to any point in time within a user configurable window. This optional feature provides an additional layer of protection and enhanced restore granularity without interfering with manual backups or affecting database performance.

The diagram below contrasts the new versus old functionality. Previously, there were discrete restore points (shown in green on the top arrow) which represented limited opportunities to restore. With Point-in-Time Recovery, the entire arrow is green, meaning the recovery functionality is continuous, with restore points at any time along the green arrow.

With Point-in-Time Recovery you can restore to any point after the first backup
With Point-in-Time Recovery you can restore to any point after the first backup

Conclusion

As always, you can try the open source El Carro operator for free (Apache 2.0 license) wherever you run Oracle databases. Follow the quick start guide and try out provisioning of instances, databases, users. Import data via Data Pump, manage instance parameters, choose between different methods for backups, and try out a restore. Have a look at how we integrate with external logging and monitoring solutions. Reach out via our Google group and leave feedback for what features you would like to see next, or even create your own patch, issue or pull request on GitHub.

By Kyle Meggs, Product Manager and Ash Gbadamassi, Software Engineer – Cloud Databases

Efficient Sequence Modeling for On-Device ML

The increasing demand for machine learning (ML) model inference on-device (for mobile devices, tablets, etc.) is driven by the rise of compute-intensive applications, the need to keep certain data on device for privacy and security reasons, and the desire to provide services when a network connection may not be available. However, on-device inference introduces a myriad of challenges, ranging from modeling to platform support requirements. These challenges relate to how different architectures are designed to optimize memory and computation, while still trying to maintain the quality of the model. From a platform perspective, the issue is identifying operations and building on top of them in a way that can generalize well across different product use cases.

In previous research, we combined a novel technique for generating embeddings (called projection-based embeddings) with efficient architectures like QRNN (pQRNN) and proved them to be competent for a number of classification problems. Augmenting these with distillation techniques provides an additional bump in end-to-end quality. Although this is an effective approach, it is not scalable to bigger and more extensive vocabularies (i.e., all possible Unicode or word tokens that can be fed to the model). Additionally, the output from the projection operation itself doesn’t contain trainable weights to take advantage of pre-training the model.

Token-free models presented in ByT5 are a good starting point for on-device modeling that can address pre-training and scalability issues without the need to increase the size of the model. This is possible because these approaches treat text inputs as a stream of bytes (each byte has a value that ranges from 0 to 255) that can reduce the vocabulary size for the embedding tables from ~30,000 to 256. Although ByT5 presents a compelling alternative for on-device modeling, going from word-level representation to byte stream representation increases the sequence lengths linearly; with an average word length of four characters and a single character having up to four bytes, the byte sequence length increases proportionally to the word length. This can lead to a significant increase in inference latency and computational costs.

We address this problem by developing and releasing three novel byte-stream sequence models for the SeqFlowLite library (ByteQRNN, ByteTransformer and ByteFunnelTransformer), all of which can be pre-trained on unsupervised data and can be fine-tuned for specific tasks. These models leverage recent innovations introduced by Charformer, including a fast character Transformer-based model that uses a gradient-based subword tokenization (GBST) approach to operate directly at the byte level, as well as a “soft” tokenization approach, which allows us to learn token boundaries and reduce sequence lengths. In this post, we focus on ByteQRNN and demonstrate that the performance of a pre-trained ByteQRNN model is comparable to BERT, despite being 300x smaller.

Sequence Model Architecture
We leverage pQRNN, ByT5 and Charformer along with platform optimizations, such as in-training quantization (which tracks minimum and maximum float values for model activations and weights for quantizing the inference model) that reduces model sizes by one-fourth, to develop an end-to-end model called ByteQRNN (shown below). First, we use a ByteSplitter operation to split the input string into a byte stream and feed it to a smaller embedding table that has a vocabulary size of 259 (256 + 3 additional meta tokens).

The output from the embedding layer is fed to the GBST layer, which is equipped with in-training quantization and combines byte-level representations with the efficiency of subword tokenization while enabling end-to-end learning of latent subwords. We “soft” tokenize the byte stream sequences by enumerating and combining each subword block length with scores (computed with a quantized dense layer) at each strided token position (i.e., at token positions that are selected at regular intervals). Next, we downsample the byte stream to manageable sequence length and feed it to the encoder layer.

The output from the GBST layer can be downsampled to a lower sequence length for efficient encoder computation or can be used by an encoder, like Funnel Transformer, which pools the query length and reduces the self-attention computation to create the ByteFunnelTransformer model. The encoder in the end-to-end model can be replaced with any other encoder layer, such as the Transformer from the SeqFlowLite library, to create a ByteTransformer model.

A diagram of a generic end-to-end sequence model using byte stream input. The ByteQRNN model uses a QRNN encoder from the SeqFlowLite library.

In addition to the input embeddings (i.e., the output from the embedding layer described above), we go a step further to build an effective sequence-to-sequence (seq2seq) model. We do so by taking ByteQRNN and adding a Transformer-based decoder model along with a quantized beam search (or tree exploration) to go with it. The quantized beam search module reduces the inference latency when generating decoder outputs by computing the most likely beams (i.e., possible output sequences) using the logarithmic sum of previous and current probabilities and returns the resulting top beams. Here the system uses a more efficient 8-bit integer (uint8) format, compared to a typical single-precision floating-point format (float32) model.

The decoder Transformer model uses a merged attention sublayer (MAtt) to reduce the complexity of the decoder self-attention from quadratic to linear, thereby lowering the end-to-end latency. For each decoding step, MAtt uses a fixed-size cache for decoder self-attention compared to the increasing cache size of a traditional transformer decoder. The following figure illustrates how the beam search module interacts with the decoder layer to generate output tokens on-device using an edge device (e.g., mobile phones, tablets, etc.).

A comparison of cloud server decoding and on-device (edge device) implementation. Left: Cloud server beam search employs a Transformer-based decoder model with quadratic time self-attention in float32, which has an increasing cache size for each decoding step. Right: The edge device implementation employs a quantized beam search module along with a fixed-size cache and a linear time self-attention computation.

Evaluation
After developing ByteQRNN, we evaluate its performance on the civil_comments dataset using the area under the curve (AUC) metric and compare it to a pre-trained ByteQRNN and BERT (shown below). We demonstrate that the fine-tuned ByteQRNN improves the overall quality and brings its performance closer to the BERT models, despite being 300x smaller. Since SeqFlowLite models support in-training quantization that reduces model sizes by one-fourth, the resulting models scale well to low-compute devices. We chose multilingual data sources that related to the task for pre-training both BERT and byte stream models to achieve the best possible performance.

Comparison of ByteQRNN with fine-tuned ByteQRNN and BERT on the civil_comments dataset.

Conclusion
Following up on our previous work with pQRNN, we evaluate byte stream models for on-device use to enable pre-training and thereby improve model performance for on-device deployment. We present an evaluation for ByteQRNN with and without pre-training and demonstrate that the performance of the pre-trained ByteQRNN is comparable to BERT, despite being 300x smaller. In addition to ByteQRNN, we are also releasing ByteTransformer and ByteFunnelTransformer, two models which use different encoders, along with the merged attention decoder model and the beam search driver to run the inference through the SeqFlowLite library. We hope these models will provide researchers and product developers with valuable resources for future on-device deployments.

Acknowledgements
We would like to thank Khoa Trinh, Jeongwoo Ko, Peter Young and Yicheng Fan for helping with open-sourcing and evaluating the model. Thanks to Prabhu Kaliamoorthi for all the brainstorming and ideation. Thanks to Vinh Tran, Jai Gupta and Yi Tay for their help with pre-training byte stream models. Thanks to Ruoxin Sang, Haoyu Zhang, Ce Zheng, Chuanhao Zhuge and Jieying Luo for helping with the TPU training. Many thanks to Erik Vee, Ravi Kumar and the Learn2Compress leadership for sponsoring the project and their support and encouragement. Finally, we would like to thank Tom Small for the animated figure used in this post.

Source: Google AI Blog


GlobalFoundries joins Google’s open source silicon initiative

Over the last year we have been busy planning the expansion of our free open source silicon design and manufacturing program to further grow the community of developers and companies building custom silicon, and build a thriving ecosystem around open source hardware.

Today, we’re excited to announce an expansion of this program and our partnership with GlobalFoundries. Together, we're releasing the Process Design Kit (PDK) for the GlobalFoundries 180MCU technology platform under the Apache 2.0 license, along with a no-cost silicon realization program to manufacture open source designs on the Efabless platform. This open source PDK is the first result of our ongoing partnership with GF. Based on the scale and breadth of GF’s technology and manufacturing expertise, we expect to do more together to further access and innovation in semiconductor development and manufacturing.
GF180MCU 1P5M 5 metals stack-up, 9kA top metal, with MIM between M3 and M4 layers.
Google started this program with SkyWater Technologies, by releasing one of their PDKs under the Apache 2.0 license. We sponsored six shuttle runs over the course of two years, allowing the open source community to submit more than 350 unique designs of which around 240 were manufactured at no-cost.
We cannot understate the milestone that this new partnership represents in the foundry ecosystem market.

Over the past few years, the world has experienced an unprecedented acceleration of adoption of digital capabilities—driven by the pandemic, and technology megatrends that have shifted every aspect of human life. According to GlobalFoundries, this has led to roughly 73% of foundry revenue being associated with high growth markets such as mobile, IoT, and automotive. This transition has not only given rise to a “New Golden Age” of semiconductors but also a tectonic shift in how we define and deliver innovation as an industry.  

Specifically, applications using 180nm are at a global capacity of 16+ million wafers a year and bound to grow to 22+ million wafers in 2026, according to GlobalFoundries.

The 180nm application space continues to see strong market traction in motor controller, RFID, general purpose MCUs and PMIC, along with emerging applications such as IoT Sensors, Dual Frequency RFID and Motor Drive.

The collaboration between GlobalFoundries and Google will help drive innovation for the application and silicon engineers designing in these high growth areas, and is an unambiguous affirmation of the viability of the open source model for the foundry ecosystem.

The GF 180nm technology platform offers open source silicon designers new capabilities for high volume production, affordability, and more voltage options. This PDK includes the following standard cells
  • Digital standard cells libraries (7-track and 9-track)
  • Low (3.3V), Medium (5V, 6V) and High (10V) voltage devices
  • SRAM macros (64x8, 128x8, 256x8, 512x8)
  • I/O and primitives (Resistors, Capacitors, Transistors, eFuses) cells libraries
Open sourcing more PDKs is a critical step in the development of the open source silicon ecosystem:
  • Open source EDA tools can now add support for multiple process technologies.
  • Researchers can produce fully-reproducible designs against multiple technology baselines.
  • Popular open source IP blocks can be ported to different process technologies.
We cannot build this on our own, we need you: software developers and hardware engineers, researchers and undergrad students, hobbyists and industry veterans, new startups and industry players alike, to bring your fresh ideas and your proven experiences to help us grow the open source silicon ecosystem.

We encourage you to:
By Johan Euphrosine and Ethan Mahintorabi – Hardware Toolchains Team

SkyWater and Google expand open source program to new 90nm technology

Today, Google is announcing the expansion of our partnership with SkyWater Technology. We are working together to release an open source process design kit (PDK) for SKY90-FD, SkyWater’s commercial 90nm fully depleted silicon on insulator (FDSOI) CMOS process technology. SKY90-FD is based on MIT Lincoln Laboratory’s 90 nm commercial FDSOI technology, and enables designers to create complex integrated circuits for a diverse range of applications.

Over the last two years, Google and SkyWater Technology have partnered to make building open silicon accessible to all developers, starting with the open source release of the SKY130 PDK and continuing with a series of no-cost manufacturing shuttles for developers in the open source hardware ecosystem. To date, Google has sponsored six shuttles on the Efabless platform, manufacturing 240 designs from over 364 community submissions. This is the first partnership of its type ever launched, and the results to date have been impressive.
The latest MPW-6 shuttle received 90 submissions from a diverse community across 24 different countries:

Over the coming months, we'll work closely with SkyWater Technology to release their new SKY90-FD PDK under the Apache 2.0 license and organize additional Open MPW shuttles to manufacture open source designs for this new 90nm FDSOI technology, through the Efabless platform.

We believe that having access to different technologies through open source PDKs is critical to grow and strengthen the open silicon ecosystem:
  • Developers can go beyond the constraints of their familiar process nodes and explore different performance, power and area trade offs with existing or new designs.
  • Researchers can reproduce their research on different technologies to produce diverse figures of merit.
  • Tool maintainers can generalize their technologies' backends to support more than one process.
  • The community can refine the ways we structure, distribute and maintain these PDKs.
SKY90-FD is a 90nm FDSOI process. Unlike a traditional CMOS BULK process, SKY90-FD features a thin layer of insulator material between the substrate and the upper silicon layer. This thin oxide process allows the transistor to be significantly thinner than in the BULK process, allowing the device to be “fully depleted,” and simplifying the fabrication process. This extra insulation greatly reduces parasitic current leakage and lowers junction capacitances, providing improved speed and power performance under various environmental conditions.
The SKY90-FD process stack topology features 5x thin Copper base metal layers for the main interconnect and two extra thicker Al (Aluminum) metal layers capable of conducting higher current.
Google is excited about the new range of applications this open source PDK will enable once it's released later this year, and we can't wait to hear from you and watch the growing stream of innovative project ideas originating from the open silicon community.

In the meantime, make sure to check https://developers.google.com/silicon for resources and pointers to start your open silicon journey!


By Johan Euphrosine and Ethan Mahintorabi – Hardware Toolchains Team

Cirq Turns 1.0



Today we are excited to announce the first full version release of the open source quantum programming framework Cirq: Cirq 1.0. Cirq is a Python framework for writing, running, and analyzing the results of quantum computer programs. It was designed for near-term quantum computers, those with a few hundred qubits and few thousands of quantum gates. The significance of the 1.0 release is that Cirq has support for the vast majority of workflows for these systems and is considered to be a stable API that we will only update with breaking changes at major version numbers.

Getting to Cirq 1.0 is the culmination of a large amount of hard work by hundreds of contributors from Google, industry, and academia. We have been running a weekly meeting, called the “Cirq Cync”, for over four years where community members gather to discuss work on Cirq, bugs, and to generally tell terrible but amusing quantum programming jokes. We’re proud of this inclusive community, and we’ve been particularly happy to see the growth of many software developers into quantum computing experts, and quantum computing experts into solid software developers. One of our contributors, Victory Omole, won the 2021 Witteck Quantum Prize for Open Source Software. Way to go Victory!

The first commit to Cirq on GitHub (an internal version of Cirq at Google existed prior to this) was on Dec 19, 2017 by Craig Gidney, and we publicly announced Cirq in July of 2018. 3,200+ commits later to the GitHub repo, in the hands of the team at Google and the Cirq community, we’ve seen Cirq help accomplish some amazing things:
  • Cirq is the lingua franca that Google’s hardware team uses to write quantum programs that run on Google’s quantum computing hardware. Because of this, we have been able to post open source code in our ReCirq repo for these experiments for anyone to examine and extend. A few highlights of the past few years:
    • “Realizing topologically ordered states on a quantum processor”, K. J. Satzinger et al., Science 374 6572, 1237-1241 (2021) [paper] [ReCirq code]
    • “Information scrambling in quantum circuits”, X. Mi, P. Roushan, C. Quintana et al, Science 374, 6574 1479-1483 (2021) [paper] [ReCirq code]
    • “Hartree-Fock on a superconducting qubit quantum computer”, F. Arute et al., Science 369, 6507 1084--1089 (2020) [paper] [ReCirq code]
  • A healthy community of libraries have now been built on top of Cirq, enabling different quantum computing research areas. These libraries include:
    • TensorFlow Quantum: a tool for exploring quantum machine learning. Using TensorFlow Quantum researchers trained a machine learning model on 30 qubits at a rate of 1.1 petaflops per second (1.1 x 1015 operations per second).
    • OpenFermion: an open source tool for quantum computations involved in chemistry simulations.
    • Pytket (pytkey-cirq): an open source Python tool for optimizing and manipulating quantum circuits.
    • Mitiq: an open source library developed by the non-profit Unitary fund for error mitigation techniques developed by the non-profit Unitary fund.
    • Qsim: a high performance state vector simulator written using AVX/FMA vectorized instructions with optional GPU acceleration. qsimcirq is the Cirq interface one can use to access qsim from Cirq.
  • Numerous quantum computing cloud services from companies in the industry have also integrated/standardized Cirq. Programs written in Cirq can be used to run through AQT, IonQ, Pascal, Rigetti, and IQM vendors. In addition, Cirq can be used on Azure Quantum to run on the hardware supported by Azure Quantum. Finally, one can get realistic noise simulations of Google’s quantum computing hardware using our newly released Quantum Virtual Machine.
  • Cirq is not just for stuffy research. Cirq has also been used to help develop Quantum Chess, a version of chess that uses superposition and entanglement. This notebook shows you how the game of Quantum Chess can be programmed using Cirq.
Cirq moving to its first full version does not just come with new features (see 1.0 release notes), but also with more guarantees about stability. Cirq uses semantic versioning, which means that future point release of Cirq will be compatible with the full version release. For example, version 1.1 of Cirq will not introduce breaking changes to Cirq’s interfaces from version 1.0; only at major version bumps (from 1.x to 2.0, for example) will breaking changes occur.

When we began working on Cirq, quantum computers consisted of only a few qubits and a few quantum gates on these qubits. Building Cirq and the supporting software for these custom systems and having them start to scale to hundreds of qubits over the past (nearly) five years has taught us many lessons. One key takeaway from these lessons is that: As quantum computing hardware continues to grow in scale and complexity, we expect that making software to support this growth will be essential to continue meaningful research and progress. In the next five years, with hardware expected to reach hundreds or even thousands of qubits, the software that is developed for quantum computing will need to have a careful eye set on supporting these bigger and bigger systems. Going forward we will need an ever wider set of frameworks, programming languages, and libraries to achieve quantum computing’s promise.

Acknowledgements

We are indebted to all 169 contributors to the Cirq github repo, and the many more who have filed issues and used Cirq in their own software. A particular shout out to the original lead of Cirq, Craig Gidney, to Cirq’s second lead, ‪Bálint Pató who guided Cirq through its middle ages, and to Alan Ho and Catherine Vollgraff Heidweiller for product wisdom. A special thanks to the core Cirq contributors including Doug Strain, Matthew Neely, Tanuj Khatter, Dax Fohl, Adam Zalcman, Kevin Sung, Matt Harrigan, Casey Duckering, Orion Martin, Smit Sanghavi, Bryan O'Gorman, Wojciech Mruczkiewicz, Ryan LaRose, Tony Bruguier, Victory Omole, and Cheng Xing, and our documentarians Auguste Hirth and Abe Asfaw.


By Dave Bacon and Michael Broughton – Quantum AI Team

Gateway API Graduates to Beta

For many years, Kubernetes users have wanted more advanced routing capabilities to be configurable in a Kubernetes API. With Google leadership, Gateway API has been developed to dramatically increase the number of features available. This API enables many new features in Kubernetes, including traffic splitting, header modification, and forwarding traffic to backends in different namespaces, just to name a few.
Since the project was originally proposed, Googlers have helped lead the open source efforts. Two of the top contributors to the project are from Google, while more than 10 engineers from Google have contributed to the API.

This week, the API has graduated from alpha to beta. This marks a significant milestone in the API and reflects new-found stability. There are now over a dozen implementations of the API and many are passing a comprehensive set of conformance tests. This ensures that users will have a consistent experience when using this API, regardless of environment or underlying implementation.

A Simple Example

To highlight some of the new features this API enables, it may help to walk through an example. We’ll start with a Gateway:

apiVersion: gateway.networking.k8s.io/v1beta1

kind: Gateway

metadata:

  name: store-xlb

spec:

  gatewayClassName: gke-l7-gxlb

  listeners:  

  - name: http

    protocol: HTTP

    port: 80

This Gateway uses the gke-l7-gxlb Gateway Class, which means a new external load balancer will be provisioned to serve this Gateway. Of course, we still need to tell the load balancer where to send traffic. We’ll use an HTTPRoute for this:

apiVersion: gateway.networking.k8s.io/v1beta1

kind: HTTPRoute

metadata:

  name: store

spec:

  parentRefs:

  - name: store-xlb

  rules:

  - matches:

    - path:

        type: PathPrefix

        value: /

    backendRefs:

    - name: store-svc

      port: 3080

      weight: 9

    - name: store-canary-svc

      port: 3080

      weight: 1

This simple HTTPRoute tells the load balancer to route traffic to one of the “store-svc” or “store-canary-svc” on port 3080. We’re using weights to do some basic traffic splitting here. That means that approximately 10% of requests will be routed to our canary Service.

Now, imagine that you want to provide a way for users to opt in or out of the canary service. To do that, we’ll add an additional HTTPRoute with some header matching configuration:

apiVersion: gateway.networking.k8s.io/v1beta1

kind: HTTPRoute

metadata:

  name: store-canary-option

spec:

  parentRefs:

  - name: store-xlb

  rules:

  - matches:

    - header:

        name: env

        value: stable

    backendRefs:

    - name: store-svc

      port: 3080

  - matches:

    - header:

        name: env

        value: canary

    backendRefs:

    - name: store-canary-svc

      port: 3080

This HTTPRoute works in conjunction with the first route we created. If any requests set the env header to “stable” or “canary” they’ll be routed directly to their preferred backend.

Getting Started

Unlike previous Kubernetes APIs, you don’t need to have the latest version of Kubernetes to use this API. Instead, this API is built with Custom Resource Definitions (CRDs) that can be installed in any Kubernetes cluster, as long as it is version 1.16 or newer (released almost 3 years ago).

To try this API on GKE, refer to the GKE specific installation instructions. Alternatively, if you’d like to use this API with another implementation, refer to the OSS getting started page.

What’s next for Gateway API?

As the core capabilities of Gateway API are stabilizing, new features and concepts are actively being explored. Ideas such as Route Delegation and a new GRPCRoute are deep in the design process. A new service mesh workstream has been established specifically to build consensus among mesh implementations for how this API can be used for service-to-service traffic. As with many open source projects, we’re trying to find the right balance between enabling new use cases and achieving API stability. This API has already accomplished a lot, but we’re most excited about what’s ahead.


By Rob Scott – GKE Networking

Revisiting Mask Transformer from a Clustering Perspective

Panoptic segmentation is a computer vision problem that serves as a core task for many real-world applications. Due to its complexity, previous work often divides panoptic segmentation into semantic segmentation (assigning semantic labels, such as “person” and “sky”, to every pixel in an image) and instance segmentation (identifying and segmenting only countable objects, such as “pedestrians” and “cars”, in an image), and further divides it into several sub-tasks. Each sub-task is processed individually, and extra modules are applied to merge the results from each sub-task stage. This process is not only complex, but it also introduces many hand-designed priors when processing sub-tasks and when combining the results from different sub-task stages.

Recently, inspired by Transformer and DETR, an end-to-end solution for panoptic segmentation with mask transformers (an extension of the Transformer architecture that is used to generate segmentation masks) was proposed in MaX-DeepLab. This solution adopts a pixel path (consisting of either convolutional neural networks or vision transformers) to extract pixel features, a memory path (consisting of transformer decoder modules) to extract memory features, and a dual-path transformer for interaction between pixel features and memory features. However, the dual-path transformer, which utilizes cross-attention, was originally designed for language tasks, where the input sequence consists of dozens or hundreds of words. Nonetheless, when it comes to vision tasks, specifically segmentation problems, the input sequence consists of tens of thousands of pixels, which not only indicates a much larger magnitude of input scale, but also represents a lower-level embedding compared to language words.

In “CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation”, presented at CVPR 2022, and “kMaX-DeepLab: k-means Mask Transformer”, to be presented at ECCV 2022, we propose to reinterpret and redesign cross-attention from a clustering perspective (i.e., grouping pixels with the same semantic labels together), which better adapts to vision tasks. CMT-DeepLab is built upon the previous state-of-the-art method, MaX-DeepLab, and employs a pixel clustering approach to perform cross-attention, leading to a more dense and plausible attention map. kMaX-DeepLab further redesigns cross-attention to be more like a k-means clustering algorithm, with a simple change on the activation function. We demonstrate that CMT-DeepLab achieves significant performance improvements, while kMaX-DeepLab not only simplifies the modification but also further pushes the state-of-the-art by a large margin, without test-time augmentation. We are also excited to announce the open-source release of kMaX-DeepLab, our best performing segmentation model, in the DeepLab2 library.

Overview
Instead of directly applying cross-attention to vision tasks without modifications, we propose to reinterpret it from a clustering perspective. Specifically, we note that the mask Transformer object query can be considered cluster centers (which aim to group pixels with the same semantic labels), and the process of cross-attention is similar to the k-means clustering algorithm, which adopts an iterative process of (1) assigning pixels to cluster centers, where multiple pixels can be assigned to a single cluster center, and some cluster centers may have no assigned pixels, and (2) updating the cluster centers by averaging pixels assigned to the same cluster center, the cluster centers will not be updated if no pixel is assigned to them).

In CMT-DeepLab and kMaX-DeepLab, we reformulate the cross-attention from the clustering perspective, which consists of iterative cluster-assignment and cluster-update steps.

Given the popularity of the k-means clustering algorithm, in CMT-DeepLab we redesign cross-attention so that the spatial-wise softmax operation (i.e., the softmax operation that is applied along the image spatial resolution) that in effect assigns cluster centers to pixels is instead applied along the cluster centers. In kMaX-DeepLab, we further simplify the spatial-wise softmax to cluster-wise argmax (i.e., applying the argmax operation along the cluster centers). We note that the argmax operation is the same as the hard assignment (i.e., a pixel is assigned to only one cluster) used in the k-means clustering algorithm.

Reformulating the cross-attention of the mask transformer from the clustering perspective significantly improves the segmentation performance and simplifies the complex mask transformer pipeline to be more interpretable. First, pixel features are extracted from the input image with an encoder-decoder structure. Then, a set of cluster centers are used to group pixels, which are further updated based on the clustering assignments. Finally, the clustering assignment and update steps are iteratively performed, with the last assignment directly serving as segmentation predictions.

To convert a typical mask Transformer decoder (consisting of cross-attention, multi-head self-attention, and a feed-forward network) into our proposed k-means cross-attention, we simply replace the spatial-wise softmax with cluster-wise argmax.

The meta architecture of our proposed kMaX-DeepLab consists of three components: pixel encoder, enhanced pixel decoder, and kMaX decoder. The pixel encoder is any network backbone, used to extract image features. The enhanced pixel decoder includes transformer encoders to enhance the pixel features, and upsampling layers to generate higher resolution features. The series of kMaX decoders transform cluster centers into (1) mask embedding vectors, which multiply with the pixel features to generate the predicted masks, and (2) class predictions for each mask.

The meta architecture of kMaX-DeepLab.

Results
We evaluate the CMT-DeepLab and kMaX-DeepLab using the panoptic quality (PQ) metric on two of the most challenging panoptic segmentation datasets, COCO and Cityscapes, against MaX-DeepLab and other state-of-the-art methods. CMT-DeepLab achieves significant performance improvement, while kMaX-DeepLab not only simplifies the modification but also further pushes the state-of-the-art by a large margin, with 58.0% PQ on COCO val set, and 68.4% PQ, 44.0% mask Average Precision (mask AP), 83.5% mean Intersection-over-Union (mIoU) on Cityscapes val set, without test-time augmentation or using an external dataset.

Method PQ
MaX-DeepLab 51.1% (-6.9%)
MaskFormer 52.7% (-5.3%)
K-Net 54.6% (-3.4%)
CMT-DeepLab 55.3% (-2.7%)
kMaX-DeepLab 58.0%
Comparison on COCO val set.
Method PQ APmask mIoU
Panoptic-DeepLab 63.0% (-5.4%) 35.3% (-8.7%) 80.5% (-3.0%)
Axial-DeepLab 64.4% (-4.0%) 36.7% (-7.3%) 80.6% (-2.9%)
SWideRNet 66.4% (-2.0%) 40.1% (-3.9%) 82.2% (-1.3%)
kMaX-DeepLab 68.4% 44.0% 83.5%
Comparison on Cityscapes val set.

Designed from a clustering perspective, kMaX-DeepLab not only has a higher performance but also a more plausible visualization of the attention map to understand its working mechanism. In the example below, kMaX-DeepLab iteratively performs clustering assignments and updates, which gradually improves mask quality.

kMaX-DeepLab’s attention map can be directly visualized as a panoptic segmentation, which gives better plausibility for the model working mechanism (image credit: coco_url, and license).

Conclusions
We have demonstrated a way to better design mask transformers for vision tasks. With simple modifications, CMT-DeepLab and kMaX-DeepLab reformulate cross-attention to be more like a clustering algorithm. As a result, the proposed models achieve state-of-the-art performance on the challenging COCO and Cityscapes datasets. We hope that the open-source release of kMaX-DeepLab in the DeepLab2 library will facilitate future research on designing vision-specific transformer architectures.

Acknowledgements
We are thankful to the valuable discussion and support from Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Florian Schroff, Hartwig Adam, and Alan Yuille.

Source: Google AI Blog


Enabling Creative Expression with Concept Activation Vectors

Advances in computer vision and natural language processing continue to unlock new ways of exploring billions of images available on public and searchable websites. Today’s visual search tools make it possible to search with your camera, voice, text, images, or multiple modalities at the same time. However, it remains difficult to input subjective concepts, such as visual tones or moods, into current systems. For this reason, we have been working collaboratively with artists, photographers, and image researchers to explore how machine learning (ML) might enable people to use expressive queries as a way of visually exploring datasets.

Today, we are introducing Mood Board Search, a new ML-powered research tool that uses mood boards as a query over image collections. This enables people to define and evoke visual concepts on their own terms. Mood Board Search can be useful for subjective queries, such as “peaceful”, or for words and individual images that may not be specific enough to produce useful results in a standard search, such as “abstract details in overlooked scenes” or “vibrant color palette that feels part memory, part dream". We developed, and will continue to develop, this research tool in alignment with our AI Principles.

Search Using Mood Boards
With Mood Board Search, our goal is to design a flexible and approachable interface so people without ML expertise can train a computer to recognize a visual concept as they see it. The tool interface is inspired by mood boards, commonly used by people in creative fields to communicate the “feel” of an idea using collections of visual materials.

With Mood Board Search, users can train a computer to recognize visual concepts in image collections.

To get started, simply drag and drop a small number of images that represent the idea you want to convey. Mood Board Search returns the best results when the images share a consistent visual quality, so results are more likely to be relevant with mood boards that share visual similarities in color, pattern, texture, or composition.

It’s also possible to signal which images are more important to a visual concept by upweighting or downweighting images, or by adding images that are the opposite of the concept. Then, users can review and inspect search results to understand which part of an image best matches the visual concept. Focus mode does this by revealing a bounding box around part of the image, while AI crop cuts in directly, making it easier to draw attention to new compositions.

Supported interactions, like AI crop, allow users to see which part of an image best matches their visual concept.

Powered by Concept Activation Vectors (CAVs)
Mood Board Search takes advantage of pre-trained computer vision models, such as GoogLeNet and MobileNet, and a machine learning approach called Concept Activation Vectors (CAVs).

CAVs are a way for machines to represent images (what we understand) using numbers or directions in a neural net’s embedding space (which can be thought of as what machines understand). CAVs can be used as part of a technique, Testing with CAVs (TCAV), to quantify the degree to which a user-defined concept is important to a classification result; e.g., how sensitive a prediction of "zebra" is to the presence of stripes. This is a research approach we open-sourced in 2018, and the work has since been widely applied to medical applications and science to build ML applications that can provide better explanations for what machines see. You can learn more about embedding vectors in general in this Google AI blog post, and our approach to working with TCAVs in Been Kim’s Keynote at ICLR.

In Mood Board Search, we use CAVs to find a model's sensitivity to a mood board created by the user. In other words, each mood board creates a CAV — a direction in embedding space — and the tool searches an image dataset, surfacing images that are the closest match to the CAV. However, the tool takes it one step further, by segmenting each image in the dataset in 15 different ways, to uncover as many relevant compositions as possible. This is the approach behind features like Focus mode and AI crop.

Three artists created visual concepts to share their way of seeing, shown here in an experimental app by design invention studio, Nord Projects.

Because embedding vectors can be learned and re-used across models, tools like Mood Board Search can help us express our perspective to other people. Early collaborations with creative communities have shown value in being able to create and share subjective experiences with others, resulting in feelings of being able to “break out of visually-similar echo chambers” or “see the world through another person’s eyes”. Even misalignment between model and human understanding of a concept frequently resulted in unexpected and inspiring connections for collaborators. Taken together, these findings point towards new ways of designing collaborative ML systems that embrace personal and collective subjectivity.

Conclusions and Future Work
Today, we’re open-sourcing the code to Mood Board Search, including three visual concepts made by our collaborators, and a Mood Board Search Python Library for people to tap the power of CAVs directly into their own websites and apps. While these tools are early-stage prototypes, we believe this capability can have a wide-range of applications from exploring unorganized image collections to externalizing ways of seeing into collaborative and shareable artifacts. Already, an experimental app by design invention studio Nord Projects, made using Mood Board Search, investigates the opportunities for running CAVs in camera, in real-time. In future work, we plan to use Mood Board Search to learn about new forms of human-machine collaboration and expand ML models and inputs — like text and audio — to allow even deeper subjective discoveries, regardless of medium.

If you’re interested in a demo of this work for your team or organization, email us at [email protected].

Acknowledgments
This blog presents research by (in alphabetical order): Kira Awadalla, Been Kim, Eva Kozanecka, Alison Lentz, Alice Moloney, Emily Reif, and Oliver Siy, in collaboration with design invention studio Nord Projects. We thank our co-author, Eva Kozanecka, our artist collaborators, Alexander Etchells, Tom Hatton, Rachel Maggart, the Imaging team at The British Library for their participation in beta previews, and Blaise Agüera y Arcas, Jess Holbrook, Fernanda Viegas, and Martin Wattenberg for their support of this research project.

Source: Google AI Blog