The Workspace Policy API provides a centralized, comprehensive view of your security settings, eliminating the need to navigate to numerous pages in the Admin console.
With our latest update, we are introducing mutate endpoints (Create, Update, Delete) alongside existing read-only capabilities (Get, List) for data loss prevention (DLP) rules and detectors. This allows super admins to programmatically manage and fully automate the entire lifecycle of their DLP policies, from initial creation to real-time activation and deactivation.
Note this is an API-only launch for capabilities currently supported in the Admin console.
About DLP
DLP lets Workspace admins control external file sharing to prevent sensitive information leaks. It scans files for violations, triggering incidents and protective actions like content blocking.
How DLP works:
Admins define rules for sensitive content across Drive, Gmail, Chat, and Chrome.
DLP scans content for DLP rule violations that trigger DLP incidents.
DLP enforces the rules you defined and violations trigger actions, such as alerts.
Admins are alerted for DLP rule violations.
Summary of capabilities supported by mutate endpoints for DLP
Getting started
Admins: You must be a super admin to use the Policy API. See our developer documentation to learn more about the Policy API. You can also use GAM, an open source tool for managing Workspace, which now supports the Policy API.
Unlocking TPU performance: Deep kernel profiling with XProf
As machine learning workloads scale to unprecedented heights, developers are increasingly writing highly specialized Tensor Processing Unit (TPU) kernels using frameworks like Pallas, Mosaic, and Triton to maximize hardware performance.
However, customizing high-performance kernels has historically introduced a major engineering challenge: optimization blind spots. To legacy performance profilers, custom compilation paths appear as opaque execution paths. Developers are left with single, massive execution blocks in their trace captures, lacking granular visibility into what is actually occurring inside the chip's internal components. Did a vector processing instruction stall? Was matrix math idle due to data loading bottlenecks?
Traditional profiling relies heavily on compile-time static cost models to estimate kernel efficiency. While helpful for standard operations, these models cannot capture dynamic runtime realities like instruction execution stalls, memory subsystem congestion, or hardware scheduling conflicts.
To open this opaque execution path, we are excited to introduce the Kernel Profiling suite in XProf—a low-level hardware debugging suite engineered specifically for Pallas kernel authoring and optimization on Google TPUs. By combining static compilation tracking with dynamic, sub-microsecond hardware telemetry, XProf Kernel provides the deep transparency required to optimize high-scale ML workloads.
Deep visibility: HLO Graphs & MLIR Inspection
The first step in debugging any custom kernel is understanding how your high-level code is translated by the compiler. When compiling a JAX or PyTorch model, the compiler generates a High-Level Optimizer (HLO) graph. Previously, custom calls inside these graphs remained completely obscured.
XProf's updated Graph Viewer resolves this by exposing the internal compilation logic of these custom regions directly. To unlock this deep visibility, developers must pass the appropriate debug flags to the XLA compilation environment. --xla_enable_custom_call_region_trace=true --xla_xprof_register_llo_debug_info=true
Once these flags are active, any trace captured via XProf includes comprehensive compiler metadata. In the XProf Graph Viewer, clicking on a custom-call block reveals an interactive panel titled "Custom Call Text." This displays the raw, lowered MLIR (Multi-Level Intermediate Representation) code generated by the compiler.
Figure 1: XProf interface displaying an HLO graph, with a "Custom Call Text" panel to reveal raw MLIR code
By displaying the MLIR text side-by-side with high-level source-code representations, developers can immediately verify whether the compiler is correctly fusing operations and structuring memory tiles as intended.
To provide cycle-level execution visibility, XProf exposes Low-Level Operations (LLO) bundle data directly inside the Trace Viewer. An LLO bundle represents the actual machine instructions issued to the TPU core's functional units during every clock cycle.
Through dynamic instrumentation, XProf inserts hardware markers exactly when a LLO bundle region executes. Within the Trace Viewer, this manifests as dedicated, time-aligned execution tracks representing the TPU bundle's slot utilization metrics from static analysis:
While static analysis effectively verifies instruction counts or vector store logic, it remains detached from the dynamic realities of runtime execution. To bridge this gap, XProf introduces fine-grained, periodic performance counter sampling—available starting with TPU v7 (Ironwood). This capability empowers developers to move beyond static estimation and measure precisely how hardware blocks are utilized in real-time, providing the empirical ground truth needed to identify whether compute units are truly active or stalled by memory subsystems.
Consider the optimization of a tiled matrix multiplication (Matmul) kernel. While a static trace might indicate a logically perfect sequence of operations, real-world performance often falters if the Matrix Multiply Unit (MXU) sits idle while awaiting data from High-Bandwidth Memory (HBM). To diagnose and resolve such bottlenecks, developers can utilize a structured three-step profiling workflow:
Set up the Profiling Environment: Configure the TPU v7 (Ironwood) runtime by defining specific hardware counters—such as scalar issues or synchronization waits.
Capture a Kernel Profile: Use the XProf request interface to capture fine-grained performance counters, which can then be visualized as a time-series within the Trace Viewer.
Interpret the Data: Analyze the resulting counters to distinguish between a Memory-Bound Scenario (characterized by massive spikes in sync_wait) and an Optimized Scenario. For instance, implementing triple buffering to overlap memory loads with MXU compute can reduce runtime from 125.5µs to 88µs—a ~30% performance gain validated by a drastic reduction in synchronization events.
By shifting from static code inspection to empirical runtime telemetry, hardware behavior explicitly validates optimization strategies, ensuring every cycle on the silicon is spent productively. For a hands-on example to check out these techniques, please explore our Pallas Matmul w/ Perf Counters demo.
Figure 3: XProf timeline highlighting a comparison between a detailed "Runtime Perf Counter" section sampling at a 1-microsecond frequency and a "Static LLO Region" track below it
Visualizing the "Utilization Gap"
This dynamic tracking exposes the significant gap left by traditional static analysis tools. A static tool analyzes instructions linearly, completely ignoring time. It might flag an MXU instruction block as "100% Utilized."
In contrast, XProf plots actual hardware execution over time. You might discover that a long-running Scalar ALU operation is stalling the entire execution pipeline, leaving the powerful MXU completely idle. By visualizing these temporal idle gaps, developers can adjust data shapes, memory alignments, and instruction sequencing to maximize compute density.
Figure 4 : The UI shows the active TPU Core functional unit tracks (MXU, Scalar ALU, Vector ALU, and memory data pipelines) aligned side-by-side with the active framework Ops, exposing exact execution times and real-time idle cycles.
Overall Utilization from Performance Counters
Navigating profiling metrics can be daunting. Relying on metrics calculated via compile-time cost models often misrepresents performance when applied to custom compilation paths. To solve this, XProf establishes a clear Hierarchy of Trust:
The Absolute Ground Truth (100% Trustworthy): Metrics derived directly from physical hardware registers (HBM utilization, TPO metrics, unprivileged hardware stats). When profiling custom kernels, these represent physical reality and should be your primary optimization anchors.
Estimated Metrics (Use with Caution): Metrics like "Compared to program optimal FLOPS" or "Goodput efficiency" rely on XLA cost models. Because custom compilation paths bypass standard passes, these metrics can be highly skewed or outright non-functional.
For the unvarnished truth, XProf exposes the Perf Counters View, providing direct, tabular access to over 16,000 raw hardware counters read straight from the TPU silicon.
Figure 6: XProf Perf Counters Tabular View
Understanding Trace Tracks: The height of a trace track does not represent a normalized 0-100% percentage. It represents the maximum raw counter value observed in that interval. For example, if a counter increments by 100 cycles over a 500-nanosecond trace window (roughly 1,000 clock cycles on a 2.0 GHz core), it indicates exactly 10% physical utilization of that unit.
To configure and profile the runtime performance counters sampling method, please follow the instructions from <openxla.org/xprof/kernel-profiling.html>.
Advanced Sampling: Event-Triggered Profiling
Previously, dynamic capturing was limited to Periodic Sampling Mode—polling counters based on a host-level timer, which hit a physical resolution floor of 1 microsecond.
To capture lightning-fast hardware cycles, XProf now supports External Event-Triggered Mode. The dynamic sampler intercepts physical TPU trace instructions and boundary triggers (such as entering/exiting custom call scopes), allowing for sub-microsecond capture latency and precise attribution.
Developers can configure up to 28 hardware counters per core, distributed across up to four active SparseCores, creating a 4 x 28 profiling matrix that maximizes data variety while protecting workload performance.
Activating this is straightforward via standard JAX JIT profilers:
options = jax.profiler.ProfileOptions()
# Example request for externally triggered collection
options.advanced_configuration = {
"tpu_enable_periodic_counter_sampling" : True,
"tpu_tc_perf_counter_sampling_options" : (
'is_external_trigger:true scaling:0 counter_size_bits:1 indices:10 indices:11 indices:56 indices:57 indices:58'
),
}
# For periodic sampling, please use interval_us instead of is_external_trigger.
Getting Started
Ready to transition from guessing performance to measuring and optimizing the physical limits of your ML silicon? Explore these open-source resources to get started with XProf Kernel today:
Building on our October launch, Gemini in Google Classroom can now help educators more easily convert rubric files and images into Google Classroom rubrics, right within the assignment creation workflow. Educators can now upload more file types, such as .jpeg and .png files. For example, by uploading a photo of a physical rubric or using existing files, Gemini in Classroom can help educators quickly generate structured, interactive rubrics within the Classroom interface. They can then make edits to the converted rubric before saving it. This Gemini-powered automation reduces manual data entry and helps educators maintain consistent grading standards across their assignments.
With this launch, rubric conversion will be controlled by the Gemini in Classroom setting in the Admin console. If Gemini in Classroom is disabled for your organization, you’ll no longer be able to convert rubrics from documents or images.
This feature is only available in English for users over age 18.
Getting started
Admins: This feature will be available by default if Gemini in Classroom is enabled. Visit the Help Center to learn more about managing access to Gemini in Classroom.
Google Drive is introducing alignment approvals, a lightweight mechanism that allows teams to request and record document sign-offs without file changes resetting the approval flow. When a document is in a partially approved state, collaborators can continue making edits without resetting any recorded approver decisions.
Alignment approvals can be initiated via a new checkbox within the standard request dialog across web clients.
When checked, "Require all approvers to review the same content" resets pending approvals if the file content changes. This is the default behavior.
When unchecked, changes to the file content don't reset pending approvals.
This feature serves use cases where strict content locking is unnecessary, making it easier for teams to maintain momentum on fluid projects. While an approval remains pending, individual approvers retain the flexibility to manually reset their approved status back to pending if subsequent content edits no longer match their expectations.
Admins: Approval requests are enabled by default and can be disabled at the domain, OU, and group level. There is no admin setting that controls alignment approvals specifically; users can access them if they have access to the broader approval request feature. Visit the Help Center to learn more about managing Drive approvals.
Posted by Alice Yuan, Developer Relations Engineer at Google, Arti Arutiunov, Product Manager at Datadog and Nikita Ogorodnikov, Staff Software Engineer at Datadog
Performance regressions are notoriously hard to reproduce, making regressions a massive bottleneck for mobile developers. Although signals like ANR rates indicate what issues occur in production, pinpointing the specific line of code that resulted in the performance issue has historically necessitated exhaustive manual reproduction or speculative trial-and-error experimentation.
Datadog collaborated with Google to mitigate this frustration by integrating the ProfilingManager API (available on Android 15+ devices) into its Real User Monitoring (RUM) and Continuous Profiling platforms. This integration transforms the debugging workflow, allowing developers to move beyond surface-level symptoms to being able to detect the why behind a performance bottleneck.
By leveraging this system-level API, Datadog now processes millions of production profiles weekly across the globe according to Datadog internal data of June 2026. It provides engineering teams with a new level of visibility into real-world performance, all while maintaining a low runtime overhead for production-scale performance monitoring.
The impact of ProfilingManager
ProfilingManager is a system service introduced in Android 15 that enables apps to programmatically collect performance data such as call stack samples, field traces and memory heap dumps directly from production environments. This capability shifts the engineering paradigm from reactive manual reproduction to proactive field analysis.
For example, a Google communications app used field traces to investigate why its cold start times were slower on newer, more powerful hardware. By diving into the field-collected traces and comparing traces across different device types, the engineer discovered a hidden scheduling issue: a background text-to-speech service was unnecessarily being prewarmed during app startup. The traces revealed that this background process was monopolizing the device's highest-performing big CPU core, forcing the app's main thread to sleep while the prewarm occurred.
Solving the Android code-level visibility challenge
Prior to the implementation of ProfilingManager, Datadog’s Real User Monitoring (RUM) focused on high-level application health and session-level telemetry to assess the user journey. Engineering teams could monitor Android performance signals like time to initial display, ANR rates, CPU load, and frozen frames. These insights extended to granular interactions, such as network latency, touch events, and main thread hangs. However, while this data effectively highlighted which performance bottlenecks were surfacing in the field, it provided no clear path to identifying the root cause of these failures.
To address this, Datadog needed a profiling engine capable of capturing Android traces directly from devices in production with minimal performance impact. After evaluating alternative approaches, such as writing their own trace processor using Android Debug APIs, the team selected ProfilingManager because it is the most performant solution of the profiling options they evaluated and offloads the sampling decisions overhead to the OS.
ProfilingManager supports a wide range of collection methods, including CPU traces, call stack sampling, memory analysis through Java heap dumps and native heap profiles. It enables developers to profile production builds, upload trace files to external storage, and review them in the Perfetto trace analyzer UI. As a SaaS provider, Datadog uploads, visualizes, and analyzes these profiles collected via its SDK, providing a unified view of application health.
By centralizing high-fidelity telemetry within a unified observability API, ProfilingManager empowers Datadog and its clients to proactively monitor, investigate, and remediate complex Android performance regressions through key technical advantages:
Granular session diagnostics: ProfilingManager enhances debuggability by delivering direct OS-level trace data, overcoming the visibility and alignment challenges typical of custom logging with system services. To dive deeper, developers can download these traces from Datadog to investigate further in visualization tools like the Perfetto UI.
Automated telemetry triggers: By leveraging native system events to initiate trace recordings at key optimization points, Datadog reduces the need to build custom collection logic. While the initial rollout focuses on the APP_FULLY_DRAWN signal, there are already plans to expand this observability to include ANR, OOM, and COLD_START triggers.
Proactive trace snapshots: By interfacing directly with the system-level Perfetto service (traced), ProfilingManager utilizes a proactive background recording model designed to capture unpredictable issues. This ensures that developers receive a precise visualization of the events leading up to a performance anomaly, offering a level of insight that exceeds what is possible through manual instrumentation.
Bottleneck detection at scale: Datadog is able to synthesize telemetry from across Datadog’s global customer base to uncover regressions that only emerge under unique hardware configurations and variable network environments.
System-enforced resource stability: The API leverages sampling trace collection to ensure performance and user experience impacts remain unnoticeable.
On-device data controls: ProfilingManager filters out irrelevant information from other processes on-device before the profile is delivered to the app. This minimizes file sizes and ensures that only data relevant to the app's processes is provided.
Processing millions of weekly profiles to optimize real-world apps
An example of Datadog's time to initial display measurement with
stack sampling powered by ProfilingManager
Integrating a system-level profiling API into a global monitoring SDK required solving infrastructure challenges. Because ProfilingManager generates highly detailed performance traces, the Datadog engineering team had to build a pipeline capable of parsing and analyzing these profiles on the server side at scale. Beyond profile collection, Datadog also emphasizes the importance of balancing sampling frequency with collecting enough data to generate meaningful insights about your application. Datadog relies on ProfilingManager’s built-in rate limiting as a critical stability safeguard, preventing excessive telemetry requests from overburdening user devices.
The team has been profiling Datadog's own native Android application and a number of early adopters’ applications for months, gathering millions of profiles to ensure a fast, error-free launch experience and to refine their performance-detection algorithms. Today, the production integration seamlessly scales across a variety of Android devices.
Conclusion
By integrating Android’s ProfilingManager API, Datadog successfully closed the visibility gap between backend systems and mobile client applications for their customers. By processing millions of profiles weekly with negligible device overhead, Datadog equips Android developers with the code-level insights necessary to diagnose complex performance bugs instantly, helping developers build smoother applications and improve their app’s performance signals in the Play Store. To adopt the ProfilingManager API directly into your performance observability framework, check out our documentation.
In the future, Datadog aims to make Android profiling data a first-class input for coding agents to autonomously resolve performance bottlenecks, closing the feedback loop between detection and remediation. Datadog is working toward making Android profiling broadly accessible to developers.
Google has announced the Google Colab Command-Line Interface (CLI), a new tool that allows developers and AI agents to connect local terminals to remote Colab runtimes for frictionless execution. The lightweight CLI enables users to easily request high-powered GPUs, run local Python scripts remotely, and seamlessly retrieve artifact logs or models like fine-tuned Gemma 3 adapters. By integrating directly into standard terminal environments, the tool is highly programmable and ready to be used by AI agents such as Antigravity or Claude Code to manage complex machine learning pipelines.