Unless otherwise indicated, the features below are available to all Google Workspace customers, and are fully launched or in the process of rolling out. Rollouts should take no more than 15 business days to complete if launching to both Rapid and Scheduled Release at the same time. If not, each stage of rollout should take no more than 15 business days to complete.
New ways to use the Google Sheets app on iOS devices
Adding ‘Admin managed apps’ category to Google Workspace Marketplace
We’re excited to announce a new featured app category in the Marketplace: Admin managed. These Enterprise apps can be installed only by a Google Workspace administrator for their organization. | Available now to all Google Workspace customers. | Learn more about Featured app categories.
Bulk select in Gmail on Android and iOS devices
We’re introducing a feature that enables you to bulk select a batch of messages in the email threadlist with one tap using the Gmail app on Android and iOS devices. After clicking the select all icon, a batch of messages will be selected, enabling you to easily perform email actions such as deleting multiple messages or marking them as “read”. | This feature is available now on Android devices and is rolling out now to Rapid Release and Scheduled Release domains on iOS devices. | Available to all Google Workspace customers and users with personal Google Accounts.
The announcements below were published on the Workspace Updates blog earlier this week. Please refer to the original blog posts for complete details.
Create shareable video presentations in Google Slides
We’re introducing slides recordings, a new Google Slides feature that lets you easily record yourself presenting, and then share the presentation with others to view when it works for them. | Available to Google Workspace Business Standard, Business Plus, Enterprise Starter, Enterprise Essentials, Enterprise Essentials Plus, Enterprise Standard, Enterprise Plus and Education Plus only. | Learn more about slides recordings.
The next evolution of automated data entry in Google Sheets
Expanding message bubbles in Google Chat to iOS devices
In September, we introduced message bubbles in Google Chat on web and Android, enabling users to more easily differentiate incoming versus outgoing messages in the Chat message stream. This week, we’re excited to announce the expansion of message bubbles to iOS devices. | Learn more about message bubbles.
Updates to the Google Drive scanner on Android & iOS devices
We’re introducing additional enhancements to the Drive scanner on Android devices, which now powers the Google Pixel camera and includes improvements to the scanner experience when capturing content. We’re also expanding the Google Drive scanner and title suggestion feature to iOS devices. | Learn more about Drive scanner.
Introducing a new homepage view in Google Drive
We’ve added a new streamlined homepage for Drive called Home that makes it easier and faster for you to find files that matter most. | Learn more about Drive home.
Introducing a new mobile experience for Google Chat
Google Vault now supports Calendar, which means customers can take new actions around Calendar data. | Available to Google Workspace Business Plus, Enterprise Essentials, Enterprise Essentials Plus, Enterprise Standard, Enterprise Plus, Education Standard, Education Plus customers or customers with the Vault add-on license only. | Learn more about Vault supporting Calendar.
More insights to help admins troubleshoot Google Meet hardware issues
In 2022, we introduced several improvements for managing Google Meet hardware devices. These improvements included surfacing additional information about device issues, such as a description of the issue, when the issue was detected, and more. Now, we’re taking these improvements one step further by providing admins with even more data points. | Learn more about Google Meet hardware issues.
Monitor insider risk of Google Workspace data with Chronicle
Admins can now more seamlessly integrate their Google Workspace data with Chronicle (Google’s cloud-native Security Operations platform), to quickly detect, investigate and take action on risky activity and threats. | Available to Google Workspace Enterprise Standard and Enterprise Plus customers only. | Learn more about Chronicle.
Google Classroom now supports roster import from SIS partners
Educators can now easily import students from their student information system (SIS) to Google Classroom using OneRoster. This integration saves educators time and helps make class setup much quicker. | Available to Education Plus and the Teaching and Learning Upgrade only. | Learn more about roster import.
The Dev channel has been updated to 121.0.6156.3 for Windows, Mac and Linux.
A partial list of changes is available in the Git log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.
Posted by Eliya Nachmani, Research Scientist, and Michelle Tadmor Ramanovich, Software Engineer, Google Research
Speech-to-speech translation (S2ST) is a type of machine translation that converts spoken language from one language to another. This technology has the potential to break down language barriers and facilitate communication between people from different cultures and backgrounds.
Previously, we introduced Translatotron 1 and Translatotron 2, the first ever models that were able to directly translate speech between two languages. However they were trained in supervised settings with parallel speech data. The scarcity of parallel speech data is a major challenge in this field, so much that most public datasets are semi- or fully-synthesized from text. This adds additional hurdles to learning translation and reconstruction of speech attributes that are not represented in the text and are thus not reflected in the synthesized training data.
Here we present Translatotron 3, a novel unsupervised speech-to-speech translation architecture. In Translatotron 3, we show that it is possible to learn a speech-to-speech translation task from monolingual data alone. This method opens the door not only to translation between more language pairs but also towards translation of the non-textual speech attributes such as pauses, speaking rates, and speaker identity. Our method does not include any direct supervision to target languages and therefore we believe it is the right direction for paralinguistic characteristics (e.g., such as tone, emotion) of the source speech to be preserved across translation. To enable speech-to-speech translation, we use back-translation, which is a technique from unsupervised machine translation (UMT) where a synthetic translation of the source language is used to translate texts without bilingual text datasets. Experimental results in speech-to-speech translation tasks between Spanish and English show that Translatotron 3 outperforms a baseline cascade system.
Translatotron 3 addresses the problem of unsupervised S2ST, which can eliminate the requirement for bilingual speech datasets. To do this, Translatotron 3’s design incorporates three key aspects:
Pre-training the entire model as a masked autoencoder with SpecAugment, a simple data augmentation method for speech recognition that operates on the logarithmic mel spectogram of the input audio (instead of the raw audio itself) and is shown to effectively improve the generalization capabilities of the encoder.
Unsupervised embedding mapping based on multilingual unsupervised embeddings (MUSE), which is trained on unpaired languages but allows the model to learn an embedding space that is shared between the source and target languages.
A reconstruction loss based on back-translation, to train an encoder-decoder direct S2ST model in a fully unsupervised manner.
The model is trained using a combination of the unsupervised MUSE embedding loss, reconstruction loss, and S2S back-translation loss. During inference, the shared encoder is utilized to encode the input into a multilingual embedding space, which is subsequently decoded by the target language decoder.
Translatotron 3 employs a shared encoder to encode both the source and target languages. The decoder is composed of a linguistic decoder, an acoustic synthesizer (responsible for acoustic generation of the translation speech), and a singular attention module, like Translatotron 2. However, for Translatotron 3 there are two decoders, one for the source language and another for the target language. During training, we use monolingual speech-text datasets (i.e., these data are made up of speech-text pairs; they are not translations).
The encoder has the same architecture as the speech encoder in the Translatotron 2. The output of the encoder is split into two parts: the first part incorporates semantic information whereas the second part incorporates acoustic information. By using the MUSE loss, the first half of the output is trained to be the MUSE embeddings of the text of the input speech spectrogram. The latter half is updated without the MUSE loss. It is important to note that the same encoder is shared between source and target languages. Furthermore, the MUSE embedding is multilingual in nature. As a result, the encoder is able to learn a multilingual embedding space across source and target languages. This allows a more efficient and effective encoding of the input, as the encoder is able to encode speech from both languages into a common embedding space, rather than maintaining a separate embedding space for each language.
Like Translatotron 2, the decoder is composed of three distinct components, namely the linguistic decoder, the acoustic synthesizer, and the attention module. To effectively handle the different properties of the source and target languages, however, Translatotron 3 has two separate decoders, for the source and target languages.
Two part training
The training methodology consists of two parts: (1) auto-encoding with reconstruction and (2) a back-translation term. In the first part, the network is trained to auto-encode the input to a multilingual embedding space using the MUSE loss and the reconstruction loss. This phase aims to ensure that the network generates meaningful multilingual representations. In the second part, the network is further trained to translate the input spectrogram by utilizing the back-translation loss. To mitigate the issue of catastrophic forgetting and enforcing the latent space to be multilingual, the MUSE loss and the reconstruction loss are also applied in this second part of training. To ensure that the encoder learns meaningful properties of the input, rather than simply reconstructing the input, we apply SpecAugment to encoder input at both phases. It has been shown to effectively improve the generalization capabilities of the encoder by augmenting the input data.
During the back-translation training phase (illustrated in the section below), the network is trained to translate the input spectrogram to the target language and then back to the source language. The goal of back-translation is to enforce the latent space to be multilingual. To achieve this, the following losses are applied:
MUSE loss: The MUSE loss measures the similarity between the multilingual embedding of the input spectrogram and the multilingual embedding of the back-translated spectrogram.
Reconstruction loss: The reconstruction loss measures the similarity between the input spectrogram and the back-translated spectrogram.
In addition to these losses, SpecAugment is applied to the encoder input at both phases. Before the back-translation training phase, the network is trained to auto-encode the input to a multilingual embedding space using the MUSE loss and reconstruction loss.
To ensure that the encoder generates multilingual representations that are meaningful for both decoders, we employ a MUSE loss during training. The MUSE loss forces the encoder to generate such a representation by using pre-trained MUSE embeddings. During the training process, given an input text transcript, we extract the corresponding MUSE embeddings from the embeddings of the input language. The error between MUSE embeddings and the output vectors of the encoder is then minimized. Note that the encoder is indifferent to the language of the input during inference due to the multilingual nature of the embeddings.
The training and inference in Translatotron 3. Training includes the reconstruction loss via the auto-encoding path and employs the reconstruction loss via back-translation.
Following are examples of direct speech-to-speech translation from Translatotron 3:
Spanish-to-English (on Conversational dataset)
TTS-synthesized reference (English)
Translatotron 3 (English)
Spanish-to-English (on CommonVoice11 Synthesized dataset)
TTS-synthesized reference (English)
Translatotron 3 (English)
Spanish-to-English (on CommonVoice11 dataset)
TTS reference (English)
Translatotron 3 (English)
To empirically evaluate the performance of the proposed approach, we conducted experiments on English and Spanish using various datasets, including the Common Voice 11 dataset, as well as two synthesized datasets derived from the Conversational and Common Voice 11 datasets.
The translation quality was measured by BLEU (higher is better) on ASR (automatic speech recognition) transcriptions from the translated speech, compared to the corresponding reference translation text. Whereas, the speech quality is measured by the MOS score (higher is better). Furthermore, the speaker similarity is measured by the average cosine similarity (higher is better).
Because Translatotron 3 is an unsupervised method, as a baseline we used a cascaded S2ST system that is combined from ASR, unsupervised machine translation (UMT), and TTS (text-to-speech). Specifically, we employ UMT that uses the nearest neighbor in the embedding space in order to create the translation.
Translatotron 3 outperforms the baseline by large margins in every aspect we measured: translation quality, speaker similarity, and speech quality. It particularly excelled on the conversational corpus. Moreover, Translatotron 3 achieves speech naturalness similar to that of the ground truth audio samples (measured by MOS, higher is better).
Translation quality (measured by BLEU, where higher is better) evaluated on three Spanish-English corpora.
Speech similarity (measured by average cosine similarity between input speaker and output speaker, where higher is better) evaluated on three Spanish-English corpora.
Mean-opinion-score (measured by average MOS metric, where higher is better) evaluated on three Spanish-English corpora.
As future work, we would like to extend the work to more languages and investigate whether zero-shot S2ST can be applied with the back-translation technique. We would also like to examine the use of back-translation with different types of speech data, such as noisy speech and low-resource languages.
The direct contributors to this work include Eliya Nachmani, Alon Levkovitch, Yifan Ding, Chulayutsh Asawaroengchai, Heiga Zhen, and Michelle Tadmor Ramanovich. We also thank Yu Zhang, Yuma Koizumi, Soroosh Mariooryad, RJ Skerry-Ryan, Neil Zeghidour, Christian Frank, Marco Tagliasacchi, Nadav Bar, Benny Schlesinger and Yonghui Wu.
Hi, everyone! We've just released Chrome 120 (120.0.6099.43) for Android to a small percentage of users. It'll become available on Google Play over the next few days. You can find more details about early Stable releases here.
This release includes stability and performance improvements. You can see a full list of the changes in the Git log. If you find a new issue, please let us know by filing a bug.
In the vibrant city of Miami, Google Fiber proudly marked a decade of connecting lives and fostering community. The Miami GFiber team recently celebrated this milestone with sponsoring a movie night held on Brickell Key Properties, the very first property to get Google Fiber Webpass service in Miami in 2013.
Serving over 300 apartment and condominium communities across Miami, GFiber has offered fast, reliable connectivity for over a decade, and we’re continuing to connect new communities in the area every day.
We constantly work to make sure our customers get more than what they need from our service. "My community relies on GFiber for both excellent customer service and reliable internet, ensuring top-notch quality and speed at every stage," said Daniela Alvarez, a GFiber customer and Brickell Key resident.
Our GFiber Miami team loved marking the occasion with our customers and celebrating 10 truly great years. We can’t wait to see what’s next for GFiber in Miami. Thank you to our customers and communities who have shared this journey with us! To find out if GFiber is available in your building, check availability here.