Today, we are expanding our spam policies to address a deceptive practice known as "back button hijacking", which will become an explicit violation of the "malicious practices" of spam policies, leading to potential spam actions.
Author Archives:
Google Workspace Updates Weekly Recap – April 10, 2026
Book Google Workspace resources from third-party calendars
Greater control and error visibility for Google Sheets formulas
Migration update on restricted access items
Speech translation in Google Meet is now rolling out to mobile devices
Edit your AI-generated scripts when you convert Slides to Vids
Gmail end-to-end encryption now available on mobile devices
Expanding access to longer musical tracks in the Gemini app
Source: Google Workspace Updates
Leveraging CPU memory for faster, cost-efficient TPU LLM training
Host offloading with JAX on Intel® Xeon® processors
As Large Language Models (LLMs) continue to scale into the hundreds of billions of parameters, device memory capacity has become a big limiting factor in training, as intermediate activations from every layer in the forward pass are needed in the backward pass. To reduce device memory pressure, these activations can be rematerialized during the backward pass, trading memory for recomputation. While rematerialization enables larger models to fit within limited device memory, it significantly increases training time and cost.
Intel® Xeon® processors (5th and 6th Gen) with Advanced Matrix Extensions (AMX) enable practical host offloading of selected memory- and compute-intensive components in JAX training workflows. This approach can help teams train larger models, relieve accelerator memory pressure, improve end-to-end throughput, and reduce total cost of ownership—particularly on TPU-based Google Cloud instances.
By publishing these results and implementation details, Google and Intel aim to promote transparency and share practical guidance with the community. This post describes how to enable activation offloading for JAX on TPU platforms and outlines considerations for building scalable, cost-aware hybrid CPU–accelerator training workflows.
Host offloading
Traditional LLM training is usually done on device accelerators alone. However, modern host machines have much larger memory size than accelerators (512GB or more) and can offer extra compute power, e.g., TFLOPS in case of Intel® Xeon® Scalable Processor with AMX capability. Leveraging host resources can be a great alternative to rematerialization. Host offloading selectively moves computation or data between host and device to optimize performance and memory usage.
Host memory offloading keeps frequently-accessed tensors on the device and spills the rest to CPU memory as an extra level of cache. Activation offloading transfers activations computed on-device in the forward pass to the host, stores them in the host memory, and brings them back to the device in the backward pass for gradient computation. This unlocks the ability to train larger models, use bigger batch sizes, and improve throughput.
In this blog post, we provide a practical guide to offload activations through JAX to efficiently train larger models on TPUs with an Intel® Xeon® Scalable Processor.
Enabling memory offloading in JAX
JAX offers multiple strategies for offloading activations, model parameters, and optimizer states to the host. Users can use checkpoint_names() to create a checkpoint for a tensor. The snippet below shows how to create a checkpoint x:
from jax.ad_checkpoint import checkpoint_name
def layer_name(x, w):
w1, w2 = w
x = checkpoint_name(x, "x")
y = x @ w1
return y @ w2, None
Users can provide checkpoint_policies() to select the appropriate memory optimization strategy for intermediate values. There are three strategies:
- Recomputing during backward pass (default behavior)
- Storing on device
- Offloading to host memory after forward pass and loading back during backward pass
The code below moves x from device to the pinned host memory after the forward pass.
from jax import checkpoint_policies as cp
policy = cp.save_and_offload_only_these_names(
names_which_can_be_saved=[], # No values stored on device
names_which_can_be_offloaded=["x"], # Offload activations labeled "x"
offload_src="device", # Move from device memory
offload_dst="pinned_host" # To pinned host memory
)
Measuring Host Offloading Benefits on TPU v5p
We examined TPU host-offloading on JAX on both fine-tuning and training workloads. All our experiments were run on Google Cloud Platform, using a single v5p-8 TPU instance with single host 4th Gen Intel® Xeon® Scalable Processor.
Fine-tuning PaliGemma2: Using the base PaliGemma2 28B model for vision-language tasks, we fine-tuned the attention layers of the language model (Gemma2 27B) while keeping all other parameters frozen. During fine-tuning, we set the LLM sequence length to 256 and the batch size to 256.
The default checkpoint policy is nothing_saveable, which does not keep any activations on-device during the forward pass. The activations are rematerialized during the backward pass for gradient computation. While this approach reduces memory pressure on the TPU, it increases compute time. To apply host offloading, we offload Q, K, and V projection weights using save_and_offload_only_these_names. These activations are transferred to host memory (D2H) during the forward pass and fetched back during the backward pass (H2D), so the device neither stores nor recomputes them. Figure 2 shows 10% reduction in training time from host offloading. This translates directly into a similar reduction in TPU core-hours, yielding meaningful cost savings. The complete fine-tuning recipe is available at [JAX host offloading].
(Bottom) Memory analysis with and without host offloading.
Training Llama2-13B using MaxText: MaxText offers several rematerialization strategies that can be specified in the training configuration file. We used the policy remat_policy: 'qkv_proj_offloaded' to offload Q, K, and V projection weights. Figure 3 shows ~5% reduction in per-step training time compared to fully rematerializing all activations ( remat_policy: 'full').
The step time was 5% faster with host offloading.
When to offload activations
Activation offloading is beneficial when the time to transfer activations across host and device is lower than the time to recompute them. The timing depends on multiple factors such as PCIe bandwidth, model size, batch size, sequence length, activation tensor sizes, compute capabilities of the device, etc. An additional factor is how much the data movement can be overlapped with computation to keep the device busy. Figure 4 demonstrates an efficient overlap of the device-to-host transfer with compute during the backward pass in PaliGemma2 28B training.
Memory offloading overlaps with compute effectively during backward pass host to device.
Smaller model variants such as PaliGemma2 3B and 9B did not see benefits from host offloading because it is faster to rematerialize all tensors than to transfer them to and from the host. Therefore, identifying the appropriate workload and offloading policy is crucial to realizing performance gain from host offloading
Call to Action
If you train on TPUs and are limited by device memory, consider evaluating activation offloading. Start by labeling candidate activations (for example, Q/K/V projections) and compare step time, memory headroom, and overall cost across representative workloads.
In our experiments, we observed up to ~10% improvement in end-to-end training time for larger workloads, which can reduce total cost of ownership (TCO) by shortening time-to-train or enabling the same workload on smaller instances.
Acknowledgments
Emilio Cota, and Karlo Basioli from Google and Eugene Zhulenev (formerly at Google).
Source: Google Open Source Blog
Chrome Beta for iOS Update
Hi everyone! We've just released Chrome Beta 148 (148.0.7778.8) for iOS; it'll become available on App Store in the next few days.
You can see a partial list of the changes in the Git log. If you find a new issue, please let us know by filing a bug.
Chrome Release Team
Google Chrome
Source: Google Chrome Releases
Bringing Rust to the Pixel Baseband
Google is continuously advancing the security of Pixel devices. We have been focusing on hardening the cellular baseband modem against exploitation. Recognizing the risks associated within the complex modem firmware, Pixel 9 shipped with mitigations against a range of memory-safety vulnerabilities. For Pixel 10, Google is advancing its proactive security measures further. Following our previous discussion on "Deploying Rust in Existing Firmware Codebases", this post shares a concrete application: integrating a memory-safe Rust DNS(Domain Name System) parser into the modem firmware. The new Rust-based DNS parser significantly reduces our security risk by mitigating an entire class of vulnerabilities in a risky area, while also laying the foundation for broader adoption of memory-safe code in other areas.
Here we share our experience of working on it, and hope it can inspire the use of more memory safe languages in low-level environments.
Why Modem Memory Safety Can’t Wait
In recent years, we have seen increasing interest in the cellular modem from attackers and security researchers. For example, Google's Project Zero gained remote code execution on Pixel modems over the Internet. Pixel modem has tens of Megabytes of executable code. Given the complexity and remote attack surface of the modem, other critical memory safety vulnerabilities may remain in the predominantly memory-unsafe firmware code.
Why DNS?
The DNS protocol is most commonly known in the context of browsers finding websites. With the evolution of cellular technology, modern cellular communications have migrated to digital data networks; consequently, even basic operations such as call forwarding rely on DNS services.
DNS is a complex protocol and requires parsing of untrusted data, which can lead to vulnerabilities, particularly when implemented in a memory-unsafe language (example: CVE-2024-27227). Implementing the DNS parser in Rust offers value by decreasing the attack surfaces associated with memory unsafety.
Picking a DNS library
DNS already has a level of support in the open-source Rust community. We evaluated multiple open source crates that implement DNS. Based on criteria shared in earlier posts, we identified hickory-proto as the best candidate. It has excellent maintenance, over 75% test coverage, and widespread adoption in the Rust community. Its pervasiveness shows its potential as the de-facto DNS choice and long term support. Although hickory-proto initially lacked no_std support, which is needed for Bare-metal environments (see our previous post on this topic), we were able to add support to it and its dependencies.
Adding no_std support
The work to enable no_std for hickory-proto is mostly mechanical. We shared the process in a previous post. We undertook modifications to hickory_proto and its dependencies to enable no_std support. The upstream no_std work also results in a no_std URL parser, beneficial to other projects.
- https://github.com/hickory-dns/hickory-dns/pull/2104
- https://github.com/servo/rust-url/pull/831
- https://github.com/krisprice/ipnet/pull/58
The above PRs are great examples of how to extend no_std support to existing std-only crates.
Code size study
Code size is the one of the factors that we evaluated when picking the DNS library to use.
| Code size by category |
Rust implemented Shim that calls Hickory-proto on receiving a DNS response | 4KB |
| core, alloc, compiler_builtins (reusable, one-time cost) |
17KB | |
| Hickory-proto library and dependencies | 350KB |
| Sum | 371KB |
We built prototypes and measured size with size-optimized settings. Expectedly, hickory_proto is not designed with embedded use in mind, and is not optimized for size. As the Pixel modem is not tightly memory constrained, we prioritized community support and code quality, leaving code size optimizations as future work.
However, the additional code size may be a blocker for other embedded systems. This could be addressed in the future by adding additional feature flags to conditionally compile only required functionality. Implementing this modularity would be a valuable future work.
Hook-up Rust to modem firmware
Before building the Rust DNS library, we defined several Rust unit tests to cover basic arithmetic, dynamic allocations, and FFI to verify the integration of Rust with the existing modem firmware code base.
Compile Rust code to staticlib
While using cargo is the default choice for compilation in the Rust ecosystem, it presents challenges when integrating it into existing build systems. We evaluated two options:
- Using
cargoto build astaticlibbefore the modem builds. Then add the produced staticlib into the linking step. - Directly work with
rustcand integrate the Rust compilation steps into the existing modem build system.
Option #1 does not scale if we are going to add more Rust components in the future, as linking multiple staticlibs may cause duplicated symbol errors. We chose option #2 as it scales more easily and allows tighter integration into our existing build system. Our existing C/C++ codebase uses Pigweed to drive the primary build system. Pigweed supports Rust targets (example) with direct calls to rustc through rust tools defined in GN.
We compiled all the Rust crates, including hickory-proto, its dependencies, and core, compiler_builtin, alloc, to rlib. Then, we created a staticlib target with a single lib.rs file which references all the rlib crates using extern crate keywords.
Build core, alloc, and compiler_builtins
Android’s Rust Toolchain distributes source code of core, alloc, and compiler_builtins, and we leveraged this for the modem. They can be included to the build graph by adding a GN target with crate_root pointing to the root lib.rs of each crate.
Pixel modem firmware already has a well-tested and specialized global memory allocation system to support some dynamic memory allocations. alloc support was added by implementing the GlobalAlloc with FFI calls to the allocators C APIs:
use core::alloc::{GlobalAlloc, Layout};
extern "C" {
fn mem_malloc(size: usize, alignment: usize) -> *mut u8;
fn mem_free(ptr: *mut u8, alignment: usize);
}
struct MemAllocator;
unsafe impl GlobalAlloc for MemAllocator {
unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
mem_malloc(layout.size(), layout.align())
}
unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
mem_free(ptr, layout.align());
}
}
#[global_allocator]
static ALLOCATOR: MemAllocator = MemAllocator;
Pixel modem firmware already implements a backend for the Pigweed crash facade as the global crash handler. Exposing it into Rust panic_handler through FFI unifies the crash handling for both Rust and C/C++ code.
#![no_std]
use core::panic::PanicInfo;
extern "C" {
pub fn PwCrashBackend(sigature: *const i8, file_name: *const i8, line: u32);
}
#[panic_handler]
fn panic(panic_info: &PanicInfo) -> ! {
let mut filename = "";
let mut line_number: u32 = 0;
if let Some(location) = panic_info.location() {
filename = location.file();
line_number = location.line();
}
let mut cstr_buffer = [0u8; 128];
// Never writes to the last byte to make sure `cstr_buffer` is always zero
// terminated.
let (_, writer) = cstr_buffer.split_last_mut().unwrap();
for (place, ch) in writer.iter_mut().zip(filename.bytes()) {
*place = ch;
}
unsafe {
PwCrashBackend(
"Rust panic\0".as_ptr() as *const i8,
cstr_buffer.as_ptr() as *const i8,
line_number,
);
}
loop {}
}
Link Rust staticlib
The Pixel modem firmware linking has a step that calls the linker to link all the objects generated from C/C++ code. By using llvm-ar -x to extract object files from the Rust combined staticlib and supplying them to the linker, the Rust code appears in the final modem image.
There was a performance issue we experienced due to weak symbols during linking. The inclusion of Rust core and compiler-builtin caused unexpected power and performance regressions on various tests. Upon analysis, we realized that modem optimized implementations of memset and memcpy provided by the modem firmware are accidentally replaced by those defined in compiler_builtin. It seems to happen because both compiler_builtin crate and the existing codebase defines symbols as weak, linker has no way to figure out which one is weaker. We fixed the regression by stripping the compiler_builtin crate before linking using a one line shell script.
llvm-ar -t <rust staticlib> | grep compiler_builtins | xargs llvm-ar -d <rust staticlib>
Integrating hickory-proto
Expose Rust API and calling back to C++
For the DNS parser, we declared the DNS response parsing API in C and then implemented the same API in Rust.
int32_t process_dns_response(uint8_t*, int32_t);
The Rust function returns an integer standing for the error code. The received DNS answers in the DNS response are required to be updated to in-memory data structures that are coupled with the original C implementation, therefore, we use existing C functions to do it. The existing C functions are dispatched from the Rust implementation.
pub unsafe extern "C" fn process_dns_response(
dns_response: *const u8,
response_len: i32,
) -> i32 {
//... validate inputs `dns_response` and `response_len`.
// SAFETY:
// It is safe because `dns_response` is null checked above. `response_len`
// is passed in, safe as long as it is set correctly by vendor code.
match process_response(unsafe {
slice::from_raw_parts(dns_response, response_len)
}) {
Ok(()) => 0,
Err(err) => err.into(),
}
}
fn process_response(response: &[u8]) -> Result<()> {
let response = hickory_proto::op::Message::from_bytes(response)?;
let response = hickory_proto::xfer::DnsResponse::from_message(response)?;
for answer in response.answers() {
match answer.record_type() {
hickory_proto::RecordType:... => {
// SAFETY:
// It is safe because the callback function does not store
// reference of the inputs or their members.
unsafe {
callback_to_c_function(...)?;
}
}
// ... more match arms omitted.
}
}
Ok(())
}
In our case, the DNS responding parsing function API is simple enough for us to hand write, while the callbacks back to C functions for handling the response have complex data type conversions. Therefore, we leveraged bindgen to generate FFI code for the callbacks.
Build third-party crates
Even with all features disabled, hickory-proto introduces more than 30 dependent crates. Manually written build rules are difficult to ensure correctness and scale poorly when upgrading dependencies into new versions.
Fuchsia has developed cargo-gnaw to support building their third party Rust crates. Cargo-gnaw works by invoking cargo metadata to resolve dependencies, then parse and generate GN build rules. This ensures correctness and ease of maintenance.
Conclusion
The Pixel 10 series of phones marks a pivotal moment, being the first Pixel device to integrate a memory-safe language into its modem.
While replacing one piece of risky attack surface is itself valuable, this project lays the foundation for future integration of memory-safe parsers and code into the cellular baseband, ensuring the baseband’s security posture will continue to improve as development continues.
Special thanks to Armando Montanez, Bjorn Mellem, Boky Chen, Cheng-Yu Tsai, Dominik Maier, Erik Gilling, Ever Rosales, Hungyen Weng, Ivan Lozano, James Farrell, Jeffrey Vander Stoep, Jiacheng Lu, Jingjing Bu, Min Xu, Murphy Stein, Ray Weng, Shawn Yang, Sherk Chung, Stephan Chen, Stephen Hines.Source: Google Online Security Blog
6 easy ways to study for finals with Gemini
Learn how to use Gemini as your personal study partner — from turning messy lecture notes into podcasts to testing your knowledge with custom quizzes.
Source: The Official Google Blog
Booking restaurants in the UK just got easier with AI in Search
We’re bringing new agentic capabilities to AI Mode in Search to help you book restaurant reservations.
Source: The Official Google Blog
Dev Channel Update for ChromeOS / ChromeOS Flex
The Dev channel is being updated to OS version 16640.2.0 (Browser version 148.0.7778.6) for most ChromeOS devices.
If you find new issues, please let us know one of the following ways
Visit our ChromeOS communities
General: Chromebook Help Community
Beta Specific: ChromeOS Beta Help Community
Interested in switching channels? Find out how.
Andy Wu,
Google ChromeOS
Source: Google Chrome Releases
Dev Channel Update for ChromeOS / ChromeOS Flex
The Dev channel is being updated to OS version 16640.2.0 (Browser version 148.0.7778.6) for most ChromeOS devices.
If you find new issues, please let us know one of the following ways
Visit our ChromeOS communities
General: Chromebook Help Community
Beta Specific: ChromeOS Beta Help Community
Interested in switching channels? Find out how.
Andy Wu,
Google ChromeOS
Source: Google Chrome Releases
Expanding access to longer musical tracks in the Gemini app
Getting started
- Admins: The Gemini app and related in-app tools are controlled by the Generative AI settings in the Workspace Admin console. Music generation in Gemini is subject to these existing controls. Visit the Help Center to learn more about turning the Gemini app on or off.
- End users: End users will receive access to full-length songs automatically. To get started, select “Create music” from the tools menu. Visit the Help Center to learn more about limits.
Rollout pace
- Rapid Release and Scheduled Release domains: Full rollout (1–3 days for feature visibility) started on April 8, 2026
Availability
- Lyria 3 Pro is now available to the following Google Workspace customers and users with personal accounts who are 18 years or older and signed in to the Gemini app:
- Business: Business Starter
- Enterprise: Enterprise Starter
- Education: Education Fundamentals, Standard, and Plus
- Consumer: Google AI Plus
- Other Editions: Frontline Starter, Standard, and Plus; Nonprofits
- Lyria 3 Pro is already available to the following Google Workspace customers and users with personal accounts who are 18 years or older and signed in to the Gemini app:
- Business: Business Standard and Plus
- Enterprise: Enterprise Standard and Plus
- AI Add-ons: Google AI Pro for Education, AI Expanded Access, AI Ultra Access
- Consumer: Google AI Pro and Ultra
Resources
- Keyword Blog: Lyria 3 Pro: Create longer tracks in more Google products
- Google Workspace Updates: Create longer musical tracks in the Gemini app with Lyria 3 Pro
- Google Workspace Admin Help: Turn the Gemini app on or off
- Google Help: Generate music with Gemini Apps
