Google Workspace Updates Weekly Recap – March 17, 2023

New updates 

There are no new updates to share this week. Please see below for a recap of published announcements. 


Previous announcements

The announcements below were published on the Workspace Updates blog earlier this week. Please refer to the original blog posts for complete details.


Introducing new space manager capabilities in Google Chat
Space managers now have additional capabilities to ensure effective conversations take place in spaces: space configuration, member management, and conversation moderation. | Learn more.

External label for Google Meet participants
“External” labels will be available in Google Meet. Users will see a label in the top-left corner of their meeting screen indicating that participants who are external to the meeting host’s domain have joined the meeting. In the people panel, external participants will be denoted with the same icon. | Learn more.

Provide custom Google Meet background images for your users
Admins can now provide a set of images for the background replace feature in Google Meet. This will enable users to easily select an image that properly represents their company's specific brand and style. | Learn more

Improving your security with shorter Session Length defaults
To further improve security for our customers, we are changing the default session length to 16 hours for existing Google Cloud customers. Note that this update refers to managing user connections to Google Cloud services (e.g. Google Cloud console), not connections to Google services (e.g. Gmail on the web). | Learn more



Completed rollouts

The features below completed their rollouts to Rapid Release domainsScheduled Release domains, or both. Please refer to the original blog post for additional details.


Rapid Release Domains:
Scheduled Release Domains:
Rapid and Scheduled Release Domains:

Vid2Seq: a pretrained visual language model for describing multi-event videos

Videos have become an increasingly important part of our daily lives, spanning fields such as entertainment, education, and communication. Understanding the content of videos, however, is a challenging task as videos often contain multiple events occurring at different time scales. For example, a video of a musher hitching up dogs to a dog sled before they all race away involves a long event (the dogs pulling the sled) and a short event (the dogs being hitched to the sled). One way to spur research in video understanding is via the task of dense video captioning, which consists of temporally localizing and describing all events in a minutes-long video. This differs from single image captioning and standard video captioning, which consists of describing short videos with a single sentence.

Dense video captioning systems have wide applications, such as making videos accessible to people with visual or auditory impairments, automatically generating chapters for videos, or improving the search of video moments in large databases. Current dense video captioning approaches, however, have several limitations — for example, they often contain highly specialized task-specific components, which make it challenging to integrate them into powerful foundation models. Furthermore, they are often trained exclusively on manually annotated datasets, which are very difficult to obtain and hence are not a scalable solution.

In this post, we introduce “Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning”, to appear at CVPR 2023. The Vid2Seq architecture augments a language model with special time tokens, allowing it to seamlessly predict event boundaries and textual descriptions in the same output sequence. In order to pre-train this unified model, we leverage unlabeled narrated videos by reformulating sentence boundaries of transcribed speech as pseudo-event boundaries, and using the transcribed speech sentences as pseudo-event captions. The resulting Vid2Seq model pre-trained on millions of narrated videos improves the state of the art on a variety of dense video captioning benchmarks including YouCook2, ViTT and ActivityNet Captions. Vid2Seq also generalizes well to the few-shot dense video captioning setting, the video paragraph captioning task, and the standard video captioning task. Finally, we have also released the code for Vid2Seq here.

Vid2Seq is a visual language model that predicts dense event captions together with their temporal grounding in a video by generating a single sequence of tokens.

A visual language model for dense video captioning

Multimodal transformer architectures have improved the state of the art on a wide range of video tasks, such as action recognition. However it is not straightforward to adapt such an architecture to the complex task of jointly localizing and captioning events in minutes-long videos.

For a general overview of how we achieve this, we augment a visual language model with special time tokens (like text tokens) that represent discretized timestamps in the video, similar to Pix2Seq in the spatial domain. Given visual inputs, the resulting Vid2Seq model can both take as input and generate sequences of text and time tokens. First, this enables the Vid2Seq model to understand the temporal information of the transcribed speech input, which is cast as a single sequence of tokens. Second, this allows Vid2Seq to jointly predict dense event captions and temporally ground them in the video while generating a single sequence of tokens.

The Vid2Seq architecture includes a visual encoder and a text encoder, which encode the video frames and the transcribed speech input, respectively. The resulting encodings are then forwarded to a text decoder, which autoregressively predicts the output sequence of dense event captions together with their temporal localization in the video. The architecture is initialized with a powerful visual backbone and a strong language model.

Vid2Seq model overview: We formulate dense event captioning as a sequence-to-sequence problem, using special time tokens to allow the model to seamlessly understand and generate sequences of tokens containing both textual semantic information and temporal localization information grounding each text sentence in the video.

Large-scale pre-training on untrimmed narrated videos

Due to the dense nature of the task, the manual collection of annotations for dense video captioning is particularly expensive. Hence we pre-train the Vid2Seq model using unlabeled narrated videos, which are easily available at scale. In particular, we use the YT-Temporal-1B dataset, which includes 18 million narrated videos covering a wide range of domains.

We use transcribed speech sentences and their corresponding timestamps as supervision, which are cast as a single sequence of tokens. We pre-train Vid2Seq with a generative objective that teaches the decoder to predict the transcribed speech sequence given visual inputs only, and a denoising objective that encourages multimodal learning by requiring the model to predict masked tokens given a noisy transcribed speech sequence and visual inputs. In particular, noise is added to the speech sequence by randomly masking out spans of tokens.

Vid2Seq is pre-trained on unlabeled narrated videos with a generative objective (top) and a denoising objective (bottom).

Results on downstream dense video captioning benchmarks

The resulting pre-trained Vid2Seq model can be fine-tuned on downstream tasks with a simple maximum likelihood objective using teacher forcing (i.e., predicting the next token given previous ground-truth tokens). After fine-tuning, Vid2Seq notably improves the state of the art on three standard downstream dense video captioning benchmarks (ActivityNet Captions, YouCook2 and ViTT) and two video clip captioning benchmarks (MSR-VTT, MSVD). In our paper we provide additional ablation studies, qualitative results, as well as results in the few-shot settings and in the video paragraph captioning task.

Comparison to state-of-the-art methods for dense video captioning (left) and for video clip captioning (right), on the CIDEr metric (higher is better).

Conclusion

We introduce Vid2Seq, a novel visual language model for dense video captioning that simply predicts all event boundaries and captions as a single sequence of tokens. Vid2Seq can be effectively pretrained on unlabeled narrated videos at scale, and achieves state-of-the-art results on various downstream dense video captioning benchmarks. Learn more from the paper and grab the code here.


Acknowledgements

This research was conducted by Antoine Yang, Arsha Nagrani, Paul Hongsuck Seo, Antoine Miech, Jordi Pont-Tuset, Ivan Laptev, Josef Sivic and Cordelia Schmid.

Source: Google AI Blog


Chrome Dev for Android Update

Hi everyone! We've just released Chrome Dev 113 (113.0.5651.0) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Erhu Akpobaro
Google Chrome

Dev Channel Update for Desktop

The dev channel has been updated to 113.0.5653.0 for Windows, Linux and Mac.


A partial list of changes is available in the log. Interested in switching release channels? Find out how. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.

Prudhvikumar Bommana
Google Chrome

Beta Channel Update for ChromeOS and Flex

The Beta channel is being updated to ChromeOS version: 15359.24.0 and Browser version: 112.0.5615.29 for most devices. This build contains a number of bug fixes and security updates.

If you find new issues, please let us know one of the following ways

  1. File a bug
  2. Visit our ChromeOS communities
    1. General: Chromebook Help Community
    2. Beta Specific: ChromeOS Beta Help Community
  3. Report an issue or send feedback on Chrome

Interested in switching channels? Find out how.


Google ChromeOS.

Build your first AppSheet app: how I built a food tracker

Posted by Filipe Gracio, PhD - Customer Engineer

I keep forgetting what I have in the freezer. At first I used Google Sheets to keep track of it, but I wanted something that was easy to consult and update on my smartphone. So I turned to AppSheet! Here’s a tutorial to follow to make a similar tracking solution.

Creating the database

First I created a database that imported my data from the Sheet:

A cropped screen shot illustrating creating a database in AppSheet by importing data from sheets

After I selected “Import from Sheets” and selected the sheet I was cumbersomely maintaining, I get the preview of the new database:

A cropped screen shot illustrating creating a database in AppSheet by importing data from sheets

Creating the App

Then I can go back and create an App:

A cropped screen shot illustrating creating an app

After I name it I can select the database I just created.

A cropped screen shot illustrating step 1 of selecting the database

Then

A cropped screen shot illustrating step 2 of selecting the database

The App now starts getting created, and then I can start customizing it!

Customizing the App:

I decided I want to actually add more information to the App. For example, I want to categorize my items, so I need another column. I can edit the data for this and I'll add a column “Category”.

A cropped screen shot illustrating editing the data

After adding the extra column, this is the result:

A cropped screen shot showing the data with the new column added

That’s going to come in handy later for presentation and organization!

Now let's do some configuration about how the items are presented on the actual app. That’s in the UX section of the App builder. I want to select “Table”, Group by “category” and then sort alphabetically by “Item”

A cropped screen shot showing The Pirmary views in the UX Section of the App builder

After tweaking a few more options in UX “Brand” and “Format Rules”, this is how my app is looks:

A screen shot of the app on a mobile device displaying with content from the original dataset

Using the App - adding and updating items.

Now, I can see what I have in the freezer at all times. If I cook something and have a leftover, I can just add it by clicking the + button. After that, I just need to add in the info:

A screen shot illustrating the functionality of the app on a mobile device

And of course, if I use something I can just tap on it to edit the amount (or delete it).

Try it yourself!

This small App is something I use every week now! It is much easier than my old method, plus I learned how to use AppSheet. And this was just a quite simple use case - which only touched the tip of the iceberg of AppSheet’s features. If you work for organizations that have information to share and organize, this technology could be useful for you.

Try it out for yourself: you can use the complete set of AppSheet features at no cost while building one or many app prototypes. You can also invite up to 10 test users at no cost to use your apps and share feedback.

Thank you to my colleague Florian Opitz, Customer Engineer - Google Workspace + Security , for his useful edits and suggestions.

Responsible AI at Google Research: The Impact Lab


Globalized technology has the potential to create large-scale societal impact, and having a grounded research approach rooted in existing international human and civil rights standards is a critical component to assuring responsible and ethical AI development and deployment. The Impact Lab team, part of Google’s Responsible AI Team, employs a range of interdisciplinary methodologies to ensure critical and rich analysis of the potential implications of technology development. The team’s mission is to examine socioeconomic and human rights impacts of AI, publish foundational research, and incubate novel mitigations enabling machine learning (ML) practitioners to advance global equity. We study and develop scalable, rigorous, and evidence-based solutions using data analysis, human rights, and participatory frameworks.

The uniqueness of the Impact Lab’s goals is its multidisciplinary approach and the diversity of experience, including both applied and academic research. Our aim is to expand the epistemic lens of Responsible AI to center the voices of historically marginalized communities and to overcome the practice of ungrounded analysis of impacts by offering a research-based approach to understand how differing perspectives and experiences should impact the development of technology.


What we do

In response to the accelerating complexity of ML and the increased coupling between large-scale ML and people, our team critically examines traditional assumptions of how technology impacts society to deepen our understanding of this interplay. We collaborate with academic scholars in the areas of social science and philosophy of technology and publish foundational research focusing on how ML can be helpful and useful. We also offer research support to some of our organization’s most challenging efforts, including the 1,000 Languages Initiative and ongoing work in the testing and evaluation of language and generative models. Our work gives weight to Google's AI Principles.

To that end, we:

  • Conduct foundational and exploratory research towards the goal of creating scalable socio-technical solutions
  • Create datasets and research-based frameworks to evaluate ML systems
  • Define, identify, and assess negative societal impacts of AI
  • Create responsible solutions to data collection used to build large models
  • Develop novel methodologies and approaches that support responsible deployment of ML models and systems to ensure safety, fairness, robustness, and user accountability
  • Translate external community and expert feedback into empirical insights to better understand user needs and impacts
  • Seek equitable collaboration and strive for mutually beneficial partnerships

We strive not only to reimagine existing frameworks for assessing the adverse impact of AI to answer ambitious research questions, but also to promote the importance of this work.


Current research efforts


Understanding social problems

Our motivation for providing rigorous analytical tools and approaches is to ensure that social-technical impact and fairness is well understood in relation to cultural and historical nuances. This is quite important, as it helps develop the incentive and ability to better understand communities who experience the greatest burden and demonstrates the value of rigorous and focused analysis. Our goals are to proactively partner with external thought leaders in this problem space, reframe our existing mental models when assessing potential harms and impacts, and avoid relying on unfounded assumptions and stereotypes in ML technologies. We collaborate with researchers at Stanford, University of California Berkeley, University of Edinburgh, Mozilla Foundation, University of Michigan, Naval Postgraduate School, Data & Society, EPFL, Australian National University, and McGill University.

We examine systemic social issues and generate useful artifacts for responsible AI development.

Centering underrepresented voices

We also developed the Equitable AI Research Roundtable (EARR), a novel community-based research coalition created to establish ongoing partnerships with external nonprofit and research organization leaders who are equity experts in the fields of education, law, social justice, AI ethics, and economic development. These partnerships offer the opportunity to engage with multi-disciplinary experts on complex research questions related to how we center and understand equity using lessons from other domains. Our partners include PolicyLink; The Education Trust - West; Notley; Partnership on AI; Othering and Belonging Institute at UC Berkeley; The Michelson Institute for Intellectual Property, HBCU IP Futures Collaborative at Emory University; Center for Information Technology Research in the Interest of Society (CITRIS) at the Banatao Institute; and the Charles A. Dana Center at the University of Texas, Austin. The goals of the EARR program are to: (1) center knowledge about the experiences of historically marginalized or underrepresented groups, (2) qualitatively understand and identify potential approaches for studying social harms and their analogies within the context of technology, and (3) expand the lens of expertise and relevant knowledge as it relates to our work on responsible and safe approaches to AI development.

Through semi-structured workshops and discussions, EARR has provided critical perspectives and feedback on how to conceptualize equity and vulnerability as they relate to AI technology. We have partnered with EARR contributors on a range of topics from generative AI, algorithmic decision making, transparency, and explainability, with outputs ranging from adversarial queries to frameworks and case studies. Certainly the process of translating research insights across disciplines into technical solutions is not always easy but this research has been a rewarding partnership. We present our initial evaluation of this engagement in this paper.

EARR: Components of the ML development life cycle in which multidisciplinary knowledge is key for mitigating human biases.

Grounding in civil and human rights values

In partnership with our Civil and Human Rights Program, our research and analysis process is grounded in internationally recognized human rights frameworks and standards including the Universal Declaration of Human Rights and the UN Guiding Principles on Business and Human Rights. Utilizing civil and human rights frameworks as a starting point allows for a context-specific approach to research  that takes into account how a technology will be deployed and its community impacts. Most importantly, a rights-based approach to research enables us to prioritize conceptual and applied methods that emphasize the importance of understanding the most vulnerable users and the most salient harms to better inform day-to-day decision making, product design and long-term strategies.


Ongoing work


Social context to aid in dataset development and evaluation

We seek to employ an approach to dataset curation, model development and evaluation that is rooted in equity and that avoids expeditious but potentially risky approaches, such as utilizing incomplete data or not considering the historical and social cultural factors related to a dataset. Responsible data collection and analysis requires an additional level of careful consideration of the context in which the data are created. For example, one may see differences in outcomes across demographic variables that will be used to build models and should question the structural and system-level factors at play as some variables could ultimately be a reflection of historical, social and political factors. By using proxy data, such as race or ethnicity, gender, or zip code, we are systematically merging together the lived experiences of an entire group of diverse people and using it to train models that can recreate and maintain harmful and inaccurate character profiles of entire populations. Critical data analysis also requires a careful understanding that correlations or relationships between variables do not imply causation; the association we witness is often caused by additional multiple variables.


Relationship between social context and model outcomes

Building on this expanded and nuanced social understanding of data and dataset construction, we also approach the problem of anticipating or ameliorating the impact of ML models once they have been deployed for use in the real world. There are myriad ways in which the use of ML in various contexts — from education to health care — has exacerbated existing inequity because the developers and decision-making users of these systems lacked the relevant social understanding, historical context, and did not involve relevant stakeholders. This is a research challenge for the field of ML in general and one that is central to our team.


Globally responsible AI centering community experts

Our team also recognizes the saliency of understanding the socio-technical context globally. In line with Google’s mission to “organize the world’s information and make it universally accessible and useful”, our team is engaging in research partnerships globally. For example, we are collaborating with The Natural Language Processing team and the Human Centered team in the Makerere Artificial Intelligence Lab in Uganda to research cultural and language nuances as they relate to language model development.


Conclusion

We continue to address the impacts of ML models deployed in the real world by conducting further socio-technical research and engaging external experts who are also part of the communities that are historically and globally disenfranchised. The Impact Lab is excited to offer an approach that contributes to the development of solutions for applied problems through the utilization of social-science, evaluation, and human rights epistemologies.


Acknowledgements

We would like to thank each member of the Impact Lab team — Jamila Smith-Loud, Andrew Smart, Jalon Hall, Darlene Neal, Amber Ebinama, and Qazi Mamunur Rashid — for all the hard work they do to ensure that ML is more responsible to its users and society across communities and around the world.

Source: Google AI Blog


How students are making an impact on mental health through technology

Posted by Laura Cincera, Program Manager Google Developer Student Clubs Europe

Mental health remains one of the most neglected areas of healthcare worldwide, with nearly 1 billion people currently living with a mental health condition that requires support. But what if there was a way to make mental health care more accessible and tailored to individual needs?

The Google Developer Student Clubs Solution Challenge aims to inspire and empower university students to tackle our most pressing challenges - like mental health. The Solution Challenge is an annual opportunity to turn visionary ideas into reality and make a real-world impact using the United Nations' 17 Sustainable Development Goals as a blueprint for action. Students from all over the world work together and apply their skills to create innovative solutions using Google technology, creativity and the power of community.

One of last year’s top Solution Challenge proposals, Xtrinsic, was a cooperation between two communities of student leaders - GDSC Freiburg in Germany and GDSC Kyiv in Ukraine. The team developed an innovative mental health research and therapy application that adapts to users' personal habits and needs providing effective support at scale.

The team behind Xtrinsic includes Alexander Monneret, Chikordili Fabian Okeke, Emma Rein, and Vandysh Kateryna, who come from different backgrounds but share a common mission to improve mental health research and therapy.

Using a wearable device and TensorFlow, Xtrinsic helps users manage their symptoms by providing customized behavioral suggestions based on their physiological signs. It acts as an intervention tool for mental health issues such as nightmares, panic attacks, and anxiety and adapts the user's environment to their specific needs - which is essential for effective interventions. For example, if the user experiences a panic attack, the app detects the physiological signs using a smartwatch and a machine learning model, and triggers appropriate action, such as playing relaxing sounds, changing the room light to blue, or starting a guided breathing exercise. The solution was built using several Google technologies, including Android, Assistant/Actions on Google, Firebase, Flutter, Google Cloud, TensorFlow, WearOS, DialogFlow, and Google Health Services.

The team behind Xtrinsic is diverse. Alexander, Chikordili, Emma and Vandysh come from different backgrounds but share a passion for AI and how it can be leveraged to improve the lives of many. They all recognize the importance of shedding awareness on mental health and creating a supportive culture that is free from stigma. Their personal experiences in conflict areas, such as Syria and Ukraine inspired them to develop the application.

Solution Challenge Google Developer Student Clubs Xtrinsic project For mental health research and therapy GDSC Ukraine and Germany

Xtrinsic was recognized as one of the Top 3 winning teams in the 2022 Google Solution Challenge for its innovative approach to mental health research and therapy. The team has since supported several other social impact initiatives - helping grow the network of entrepreneurs and community leaders in Europe and beyond.

Google Developer Student Clubs Help students grow and build solutions

Learn more about Google Developer Student Clubs

If you feel inspired to make a positive change through technology, submit your project to Solution Challenge 2023 here. And if you’re passionate about technology and are ready to use your skills to help your local community, then consider becoming a Google Developer Student Clubs Lead!

We encourage all interested university students to apply here and submit their applications as soon as possible. The applications in Europe, India, North America and MENA are currently open.

Learn more about Google Developer Student Clubs here.