For Google Meet Hardware, ‘Auto-Update Expiration (AUE)’ will now be referred to as ‘end of Meet support’

What’s changing

We’re making changes to the terminology that refers to  when a Meet hardware device is no longer supported:

  • ‘End of Meet support date’ replaces Auto-Update Expiration (AUE) in the Meet hardware Help Center.
  • The Admin console will show ‘end of Meet support’ instead of ‘EOL (end of life)’ in both device information and fleet overview pages.
  • The field name for “end of life” in Admin console’s CSV downloads will change from “eolDate” to “endOfMeetSupportDate”
End of Meet support as indicated in the device information page


End of Meet support as indicated in the Google Meet hardware fleet overview


Additional details

The end of Meet support date for Intel 10th generation devices have been extended from June 2028 to June 2029 to reflect their continued availability.


Getting started


Rollout pace


Availability

  • This update impacts all Google Workspace customers with Meet hardware devices. 


Resources


Prompt users to update to your latest app version

Posted by Lidia Gaymond – Product Manager, Google Play

For years, Google Play has helped users enjoy the latest versions of your app through auto-updates or in-app updates. While most users update their apps this way, some may still be stuck on outdated, unsupported or broken versions of your app.

Today, we are introducing a new tool that will prompt these users to update, bringing them closer to the app experience you intended to deliver.

Play recovery tools allow you to prompt users running specific versions of your app to update every time they restart the app.

Image of side by side mobile device screens showing how the prompt to update may look to users
Note: Images are examples and subject to change

To use this new feature, log into Google Play Console and head to your Releases or to the App Bundle Explorer page, where you can select the app versions where you want to deliver the prompts. Alternatively, the feature is also available via the Play Developer API, and will soon be extended to allow you to target multiple app versions at once. Please note that the version you want to deploy the prompt to needs to be built as an app bundle.

You can then narrow your targeting criteria by country or Android version (if required), with no prior integration necessary.

Currently, over 50% of users are responding to the prompts, enabling more users to get the best experience of your apps.

After prompting users to update, you can use Play Console's recovery tools to edit your update configuration, view its progress, or cancel the recovery action altogether. Learn more about the feature here and start using it today!

January 2024 update to Display & Video 360 API

Today we’re announcing the January 2024 update to the Display & Video 360 API. This update includes the following features:

More detailed information about this update can be found in the Display & Video 360 API release notes. Before using these new features, make sure to update your client library to the latest version.

If you need help with these new features, please contact us using our support contact form.

Google Workspace Updates Weekly Recap – January 26, 2024

2 New updates

Unless otherwise indicated, the features below are available to all Google Workspace customers, and are fully launched or in the process of rolling out. Rollouts should take no more than 15 business days to complete if launching to both Rapid and Scheduled Release at the same time. If not, each stage of rollout should take no more than 15 business days to complete.


Updated List / Grid toggle now available in Google Drive 
To make the List / Grid toggle more visible in Google Drive, we’ve introduced a new split-button toggle with the updated Google Material Design 3. This toggle is now available across all Drive web pages, making it easier for you to switch between viewing modes based on your preference. | The List / Grid toggle is now available to all Google Workspace customers, Google Workspace Individual subscribers, and users with personal Google accounts. | Learn more about viewing and reordering your files and folders.
new grid view in drive

Jump to where your collaborator is while editing in Google Sites
When working on content in Google Docs or Slides, you can go to wherever your collaborator is working within the document or presentation by clicking their avatar in the top right corner. This week, we’re excited to announce a similar feature in Google Sites. Now, when a Site editor clicks on a collaborator’s avatar, they will jump to the selected tile of the clicked collaborator. If the collaborator is on a different page, the site editor will be taken to that page. | Rolling out to Rapid Release domains now; launch to Scheduled Release domains planned for February 12, 2024. | Available to all Google Workspace customers, Google Workspace Individual subscribers, and users with personal Google accounts. | Learn more about following a collaborator on Google Sites.



Previous announcements

The announcements below were published on the Workspace Updates blog earlier this week. Please refer to the original blog posts for complete details.


Manage reported Google Chat content from the new moderation tool in the Admin console 
We’re introducing a centralized location for reviewing and taking action on reported Google Chat content in the Admin console under Apps > Google Workspace > Moderation, alongside the email quarantine tool for Gmail. | Google Chat content reporting and moderation is available to Google Workspace Enterprise Plus, Education Standard and Education Plus customers only. | Learn more about reported Google Chat content. 

Closed caption support in Google Meet expands to an additional thirty-one languages 
We’ve expanded support for closed captioning to include additional languages. | Learn more about captions & translated captions in Google Meet. 

Improving space creation in Google Chat 
We are making two improvements to the space creation flow in Google Chat: preventing duplicate space names and providing a method to configure spaces for specific use cases. | Learn more about space creation in Google Chat.

Completed rollouts

The features below completed their rollouts to Rapid Release domains, Scheduled Release domains, or both. Please refer to the original blog posts for additional details.

Rapid Release Domains: 

Scheduled Release Domains: 
Rapid and Scheduled Release Domains: 

For a recap of announcements in the past six months, check out What’s new in Google Workspace (recent releases).  

Mixed-input matrix multiplication performance optimizations

AI-driven technologies are weaving themselves into the fabric of our daily routines, with the potential to enhance our access to knowledge and boost our overall productivity. The backbone of these applications lies in large language models (LLMs). LLMs are memory-intensive and typically require specialized hardware accelerators to efficiently deliver tens of exaflops of computing power. This blog post shows how we can start addressing the computational challenges by utilizing memory more effectively.

The bulk of an LLM’s memory and compute are consumed by weights in matrix multiplication operations. Using narrower data types reduces memory consumption. For example, storing weights in the 8-bit integer (i.e., U8 or S8) data type reduces the memory footprint by 4× relative to single-precision (F32) and 2× relative to half-precision (F16) or bfloat16 (BF16). Furthermore, previous work has shown that LLM models running matrix multiplications with weights in S8 and input in F16 (preserving higher precision of the user-input) is an effective method for increasing the efficiency with acceptable trade-offs in accuracy. This technique is known as weight-only quantization and requires efficient implementation of matrix multiplication with mixed-inputs, e.g., half-precision input multiplied with 8-bits integer. Hardware accelerators, including GPUs, support a fixed set of data types, and thus, mixed-input matrix multiplication requires software transformations to map to the hardware operations.

To that end, in this blog we focus on mapping mixed-input matrix multiplication onto the NVIDIA Ampere architecture. We present software techniques addressing data type conversion and layout conformance to map mixed-input matrix multiplication efficiently onto hardware-supported data types and layouts. Our results show that the overhead of additional work in software is minimal and enables performance close to the peak hardware capabilities. The software techniques described here are released in the open-source NVIDIA/CUTLASS repository.

Memory footprint for an 175B parameter LLM model with various data types formats.

The matrix-multiply-accumulate operation

Modern AI hardware accelerators such as Google’s TPU and NVIDIA’s GPU multiply matrices natively in the hardware by targeting Tensor Cores, which are specialized processing elements to accelerate matrix operations, particularly for AI workloads. In this blog, we focus on NVIDIA Ampere Tensor Cores, which provide the matrix-multiply-accumulate (mma) operation. For the rest of the blog the reference to mma is for Ampere Tensor Cores. The supported data types, shapes, and data layout of the two input matrices (called operands) for the mma operation are fixed in hardware. This means that matrix multiplications with various data types and larger shapes are implemented in the software by tiling the problem onto hardware-supported data types, shapes, and layouts.

The Tensor Core mma operation is defined by specifying two input matrices (e.g., A & B, shown below) to produce a result matrix, C. The mma operation natively supports mixed-precision. Mixed-precision Tensor Cores allow mixing input (A and B) data type with the result (C) data type. In contrast, mixed-input matrix multiplication involves mixing the input data types, and it is not supported by the hardware, so it needs to be implemented in the software.

Tensor Core operation of M-by-N-by-K on input matrix A of M-by-K and matrix B of K-by-N produces output matrix C of M-by-N.

Challenges of mixed-input matrix multiplication

To simplify the discussion, we restrict to a specific example of mixed-input matrix multiplication: F16 for user input and U8 for the model weights (written as F16 * U8). The techniques described here work for various combinations of mixed-input data types.

A GPU programmer can access a hierarchy of memory, including global memory, shared memory, and registers, which are arranged in order of decreasing capacity but increasing speed. NVIDIA Ampere Tensor Core mma operations consume input matrices from registers. Furthermore, input and output matrices are required to conform to a layout of data within a group of 32 threads known as a warp. The supported data type and layout within a warp are fixed for an mma operation, so to implement mixed-input multiplication efficiently, it is necessary to solve the challenges of data type conversion and layout conformance in software.


Data type conversion

The mma operation requires two input matrices with the same data type. Thus, mixed-input matrix multiplication, where one of the operands is stored in U8 in global memory and other in F16, requires a data type conversion from U8 to F16. The conversion will bring two operands to F16, mapping the mixed-input matrix multiplication to hardware-supported mixed-precision Tensor Cores. Given the large number of weights, there are a large number of such operations, and our techniques show how to reduce their latency and improve performance.


Layout conformance

The mma operation also requires the layout of two input matrices, within the registers of a warp, to be conformat with hardware specification. The layout for the input matrix B of U8 data type in mixed-input matrix multiplication (F16 * U8) needs to conform with the converted F16 data type. This is called layout conformance and needs to be achieved in the software.

The figure below shows an mma operation consuming matrix A and matrix B from registers to produce matrix C in registers, distributed across one warp. The thread T0 is highlighted and zoomed in to show the weight matrix B goes through data type conversion and needs a layout conformance to be able to map to the hardware-supported Tensor Core operation.

The mapping of mixed-input (F32 = F16 * U8) operation in software to natively supported warp-level Tensor Cores in hardware (F32 = F16 * F16). (Original figure source Developing CUDA kernels to push Tensor Cores to the Absolute Limit on NVIDIA A100.)

Software strategies addressing challenges

A typical data type conversion involves a sequence of operations on 32-bit registers, shown below. Each rectangular block represents a register and the adjoining text are the operations. The entire sequence shows the conversion from 4xU8 to 2x(2xF16). The sequence involves roughly 10 operations.

NumericArrayConvertor from 4xU8 to 2x(2xF16) in 32-bit registers.

There are many ways of achieving layout conformance. Two of the existing solutions are:

  1. Narrower bitwidth shared memory loads: In this approach, threads issue narrow bitwidth memory loads moving the U8 data from shared memory to registers. This results in two 32-bit registers, with each register containing 2xF16 values (shown above for the matrix B’s thread T0). The narrower shared memory load achieves layout conformance directly into registers without needing any shuffles; however, it does not utilize the full shared memory bandwidth.
  2. Pre-processing in global memory: An alternative strategy involves rearranging the data within the global memory (one level above the shared memory in memory hierarchy), allowing wider shared memory loads. This approach maximizes the shared memory bandwidth utilization and ensures that the data is loaded in a conformant layout directly in the registers. Although the rearrangement process can be executed offline prior to the LLM deployment, ensuring no impact on the application performance, it introduces an additional, non-trivial hardware-specific pre-processing step that requires an extra program to rearrange the data. NVIDIA/FasterTransformer adopts this method to effectively address layout conformance challenges.

Optimized software strategies

To further optimize and reduce the overhead of data type conversion and layout conformance, we have implemented FastNumericArrayConvertor and FragmentShuffler, respectively.

FastNumericArrayConvertor operates on 4xU8 in 32-bit registers without unpacking individual 1xU8 values. Furthermore, it uses less expensive arithmetic operations which reduces the number of instructions and increases the speed of the conversion.

The conversion sequence for U8-to-F16 is shown below. The operations use packed 32b registers, avoiding explicit unpacking and packing. FastNumericArrayConvertor uses the permute byte to rearrange bytes of 4xU8 into two registers. Additionally, FastNumericArrayConvertor does not use expensive integer to floating-point conversion instructions and employs vectorized operations to obtain the packed results in two 32-bit registers containing 2x(2xF16) values. The FastNumericArrayConvertor for U8-to-F16 approximately uses six operations, a 1.6× reduction relative to the approach shown above.

FastNumericArrayConvertor utilizes permute bytes and packed arithmetic, reducing the number of instructions in the data type conversion.

FragmentShuffler handles the layout conformance by shuffling data in a way that allows the use of wider bitwidth load operation, increasing shared memory bandwidth utilization and reducing the total number of operations.

NVIDIA Ampere architecture provides a load matrix instruction (ldmatrix). The ldmatrix is a warp-level operation, where 32 threads of a warp move the data from shared memory to registers in the shape and layout that mma matrix A and B consume. The use of ldmatrix reduces the number of load instructions and increases the memory bandwidth utilization. Since the ldmatrix instruction moves U8 data to registers, the layout after the load conforms with U8*U8 mma operation, and not with F16*F16 mma operation. We implemented FragmentShuffler to rearrange the data within registers using shuffle (shfl.sync) operations to achieve the layout conformance.

The most significant contribution of this work is to achieve layout conformance through register shuffles, avoiding offline pre-processing in global memory or narrower bitwidth shared memory loads. Furthermore, we provide implementations for FastNumericArrayConvertor covering data type conversion from U8-to-F16, S8-to-F16, U8-to-BF16, and S8-to-BF16.


Performance results

We measured the performance of eight mixed-input variants of our method (shown below in blue and red; varying the data types of matrix A and B) and two mixed-precision data types (shown in green) on an NVIDIA A100 SXM chip. The performance results are shown in FLOPS (higher is better). Notably, the first eight matrix-multipications require additional operations relative to the last two, because the mixed-precision variants directly target hardware-accelerated Tensor Core operations and do not need data type conversion and layout conformance. Even so, our approach demonstrates mixed-input matrix multiplication performance only slightly below or on par with mixed-precision.

Mixed-input matrix multiplication performance on NVIDIA A100 40GB SMX4 chip for a compute-bound matrix problem shape m=3456, n=4096, k=2048.

Acknowledgements

We would like to mention several folks who have contributed through technical brainstorming and improving the blog post including, Quentin Colombet, Jacques Pienaar, Allie Culp, Calin Cascaval, Ashish Gondimalla, Matt Walsh, Marek Kolodziej, and Aman Bhatia. We would like to thank our NVIDIA partners Rawn Henry, Pradeep Ramani, Vijay Thakkar, Haicheng Wu, Andrew Kerr, Matthew Nicely, and Vartika Singh.

Source: Google AI Blog


Long Term Support Channel Update for ChromeOS

A new LTC-120 version 120.0.6099.235 (Platform Version: 15662.76.0), is being rolled out for most ChromeOS devices. 

If you have devices in the LTC channel, they will be updated to this version. The LTS channel remains on LTS-114 until April 2nd, 2024. 

Release notes for LTC-120 can be found here 
Want to know more about Long-term Support? Click here


Giuliana Pritchard 
Google Chrome OS

Dev Channel Update for ChromeOS/ChromeOS Flex

  The Dev channel is being updated to 122.0.6261.6 (Platform version: 15753.4.0) for most ChromeOS devices. This build contains a number of bug fixes and security updates.

If you find new issues, please let us know one of the following ways

  1. File a bug
  2. Visit our ChromeOS communities
    1. General: Chromebook Help Community
    2. Beta Specific: ChromeOS Beta Help Community
  3. Report an issue or send feedback on Chrome

Interested in switching channels? Find out how.

Cole Brown,
Google ChromeOS

Chrome Dev for Android Update

Hi everyone! We've just released Chrome Dev 123 (123.0.6262.0) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome

Improving space creation in Google Chat

What’s changing

We are making two improvements to the space creation flow in Google Chat: preventing duplicate space names and providing a method to configure spaces for specific use cases. 

In order to prevent duplicate space names in Google Chat, you will no longer be able to create a space within your company’s domain if the space name already exists. Similar to the experience in Gmail and Google Groups, if a space name already exists, an error message will appear. The same will happen if a user tries to edit the name of a space to another name that already exists. 

This feature will not immediately apply to space names created via Chat API. Starting June 3, 2024, spaces created via Chat API will also be subject to this error. 
prevent duplicate space names
Starting this week, you will also notice an updated user interface when creating a new space in Chat. In addition to easily determining access settings, you can now designate the space for Collaboration (useful for discussing a topic or team project) or for Announcements (helpful for facilitating one-way controlled communication such as company-wide updates). 
new space creation flow

After creating the space, an updated welcome page will appear with contextual options that provide users with next steps depending on the space type selected. For a Collaboration space, users get the options to add members, share files, assign tasks and create a welcome message that appears when a new member joins the space. You will also have the ability to toggle “History on” and will see contextual suggestions for eligible apps (based on popular apps in your domain) to explore in Marketplace and add into the space. For an Announcement space, users will get the option to add groups, manage space permissions and add guidelines on how to use the space. 
space welcome page

Getting started 

  • Admins: There is no admin control for this feature. 
  • End users: 
    • You will receive an error message when editing or creating a space in Google Chat that has the name of a space that already exists. Existing spaces with the same name will not be impacted. 
    • Visit the Help Center to learn more about the new space creation flow
  • Developers: Update apps created via Chat API to ensure they do not conflict with this new error code for duplicate space names by June 3, 2024. 

Rollout pace 

Duplicate space name

New space creation flow

Web: 

Mobile: 
  • This feature will be available on mobile by the end of Q1 2024 

Availability 

  • Available to all Google Workspace customers and Google Workspace Individual subscribers 

Resources