Turn access to Bard on or off for your users

What’s changing 

Earlier this year, we announced Bard, an early experiment by Google that lets you collaborate with generative AI. As a creative and helpful collaborator, Bard can supercharge your imagination, boost your productivity, and help you bring your ideas to life. 


In the coming days, Google Workspace admins will be able to turn access to Bard on for their users, in the Admin console under Apps > Additional Google services > Early Access Apps.


Note: At first, Bard will be the only service managed by the Early Access Apps control. Over time, we may be adding other services under this control.

Turning Early Access Apps on or off in the Admin Console




Who’s impacted 

Admins and end users 


Why it’s important 

At introduction, Bard was available to users with personal Google Accounts. Beginning today, Workspace admins will have the option to open up access to Bard for their end users through the newly introduced Early Access Apps control. Note that Bard is separate from the AI powered innovations we’re working on for Google Workspace — stay tuned to the Workspace Updates blog for more information about Workspace specific features.


Additional details 

Regional Availability 
The admin control to enable or disable access to Bard will be available for all Google Workspace customers, even if Bard isn’t available in your country yet. That means that even if Early Access Apps are turned on for your organization, users located in countries in which Bard is not available will not be able to access the service.


Getting started


  • Admins
    • This feature will be OFF by default and can be enabled at domain, OU, or group level. Visit the Help Center to learn more about turning Early Access Apps on or off for users.
    • Access to core services in Google Workspace by Early Access Apps is off by default
    • Note: Services and features managed by the Early Access Apps control are Additional Services and are not covered under your Google Workspace agreement, Data Processing Agreement, or HIPAA Business Associate Addendum. When you turn on the Early Access Apps control, your organization’s end users will gain access to new and experimental features and services — visit our Help Center for more details.

  • End users: If enabled by your admin, visit the Help Center to learn more about using Bard.


Rollout pace


Availability

  • Available to all Google Workspace customers, as well as legacy G Suite Basic and Business customers
  • Not available for any Google Workspace for Education accounts designated as under 18. Use this article in our Help Center for more information.

Resources


Extended Stable Channel Update for Desktop

 The Extended Stable channel has been updated to 112.0.5615.183 for Windows and Mac which will roll out over the coming days/weeks.

A full list of changes in this build is available in the log. Interested in switching release channels? Find out how here. If you find a new issue, please let us know by filing a bug. The community help forum is also a great place to reach out for help or learn about common issues.


Srinivas Sista
Google Chrome

Making authentication faster than ever: passkeys vs. passwords




In recognition of World Password Day 2023, Google announced its next step toward a passwordless future: passkeys. 



Passkeys are a new, passwordless authentication method that offer a convenient authentication experience for sites and apps, using just a fingerprint, face scan or other screen lock. They are designed to enhance online security for users. Because they are based on the public key cryptographic protocols that underpin security keys, they are resistant to phishing and other online attacks, making them more secure than SMS, app based one-time passwords and other forms of multi-factor authentication (MFA). And since passkeys are standardized, a single implementation enables a passwordless experience across browsers and operating systems. 



Passkeys can be used in two different ways: on the same device or from a different device. For example, if you need to sign in to a website on an Android device and you have a passkey stored on that same device, then using it only involves unlocking the phone. On the other hand, if you need to sign in to that website on the Chrome browser on your computer, you simply scan a QR code to connect the phone and computer to use the passkey.



The technology behind the former (“same device passkey”) is not new: it was originally developed within the FIDO Alliance and first implemented by Google in August 2019 in select flows. Google and other FIDO members have been working together on enhancing the underlying technology of passkeys over the last few years to improve their usability and convenience. This technology behind passkeys allows users to log in to their account using any form of device-based user verification, such as biometrics or a PIN code. A credential is only registered once on a user’s personal device, and then the device proves possession of the registered credential to the remote server by asking the user to use their device’s screen lock. 



The user’s biometric, or other screen lock data, is never sent to Google’s servers - it stays securely stored on the device, and only cryptographic proof that the user has correctly provided it is sent to Google. Passkeys are also created and stored on your devices and are not sent to websites or apps. If you create a passkey on one device the Google Password Manager can make it available on your other devices that are signed into the same system account.





Learn more on how passkey works under the hood in our Google Security Blog.






Emerging Google data shows promise for a passwordless future with passkeys


Passkeys were originally designed to provide simpler and more secure authentication experiences for users, and so far, the technology has proven to be simpler and faster than passwords. Google data (March-April 2023) shows how the percentage of users successfully authenticating through same device passkeys is 4x higher than the success rate typically achieved with passwords: average authentication success rate with passwords is 13.8%, while local passkey success rate is 63.8% (see figure 1 below). 



Passkeys are not just easier to use, but also significantly faster than passwords. On average, a user can successfully sign in within 14.9 seconds, while it typically takes twice as long to sign in with passwords (30.4 seconds, as seen in Figure 2 below). Preliminary, qualitative data collected from user research also indicates that  users already perceive this convenience as the key value of passkeys.





Figure 1: authentication success rate with passkey vs password. Data from March-April 2023 (n≈100M)





Figure 2: time spent authenticating with passkey vs password (data from March-April 2023). Dashed, vertical lines indicate average duration for each authentication method (n≈100M) 





We are excited to share this data following our launch of passkeys for Google Accounts. Passkeys are faster, more secure, and more convenient than passwords and MFA, making them a desirable alternative to passwords and a promising development in the journey to a more secure future. To learn more about passkeys and how to turn a basic form-based username and password sign-in system into one that supports passkeys, check out the documentation on developers.google.com/identity/passkeys.  

Introducing rules_oci


Today, we are announcing the General Availability 1.0 version of rules_oci, an open-sourced Bazel plugin (“ruleset”) that makes it simpler and more secure to build container images with Bazel. This effort was a collaboration we had with Aspect and the Rules Authors Special Interest Group. In this post, we’ll explain how rules_oci differs from its predecessor, rules_docker, and describe the benefits it offers for both container image security and the container community.


Bazel and Distroless for supply chain security


Google’s popular build and test tool, known as Bazel, is gaining fast adoption within enterprises thanks to its ability to scale to the largest codebases and handle builds in almost any language. Because Bazel manages and caches dependencies by their integrity hash, it is uniquely suited to make assurances about the supply chain based on the Trust-on-First-Use principle. One way Google uses Bazel is to build widely used Distroless base images for Docker. 



Distroless is a series of minimal base images which improve supply-chain security. They restrict what's in your runtime container to precisely what's necessary for your app, which is a best practice employed by Google and other tech companies that have used containers in production for many years. Using minimal base images reduces the burden of managing risks associated with security vulnerabilities, licensing, and governance issues in the supply chain for building applications.



rules_oci vs rules_docker


Historically, building container images was supported by rules_docker, which is now in maintenance mode. The new ruleset, called rules_oci, is better suited for Distroless as well as most Bazel container builds for several reasons:


  • The Open Container Initiative standard has changed the playing field, and there are now multiple container runtimes and image formats. rules_oci is not tied to running a docker daemon already installed on the machine.

  • rules_docker was created before many excellent container manipulation tools existed, such as Crane, Skopeo, and Zot. rules_oci is able to simply rely on trusted third-party toolchains and avoid building or maintaining any Bazel-specific tools.

  • rules_oci doesn’t include any language-specific rules, which makes it much more maintainable than rules_docker. Also, it avoids the pitfalls of stale dependencies on other language rulesets.


Other benefits of rules_oci


There are other great features of rules_oci to highlight as well. For example, it uses Bazel’s downloader to fetch layers from a remote registry, improving caching and allowing transparent use of a private registry. Multi-architecture images make it more convenient to target platforms like ARM-based servers, and support Windows Containers as well. Code signing allows users to verify that a container image they use was created by the developer who signed it, and was not modified by any third-party along the way (e.g. person-in-the-middle attack). In combination with the work on Bazel team’s roadmap, you’ll also get a Software Bill of Materials (SBOM) showing what went into the container you use.




Since adopting rules_oci and Bazel 6, the Distroless team has seen a number of improvements to our build processes, image outputs, and security metadata:


  • Native support for signing allows us to eliminate a race condition that could have left some images unsigned. We now sign on immutable digests references to images during the build instead of tags after the build.

  • Native support for oci indexes (multi platform images) allowed us to remove our dependency on docker during build. This also means more natural and debuggable failures when something goes wrong with multi platform builds.

  • Improvements to fetching and caching means our CI builds are faster and more reliable when using remote repositories.

  • Distroless images are now accompanied by SBOMs embedded in a signed attestation, which you can view with cosign and some jq magic:






cosign download attestation gcr.io/distroless/base:latest-amd64 | jq -rcs '.[0].payload' | base64 -d | jq -r '.predicate' | jq





In the end, rules_oci allowed us to modernize the Distroless build while also adding necessary supply chain security metadata to allow organizations to make better decisions about the images they consume.

Get started with rules_oci


Today, we’re happy to announce that rules_oci is now a 1.0 version. This stability guarantee follows the semver standard, and promises that future releases won’t include breaking public API changes. Aspect provides resources for using rules_oci, such as a Migration guide from rules_docker. It also provides support, training, and consulting services for effectively adopting rules_oci to build containers in all languages.



If you use rules_docker today, or are considering using Bazel to build your containers, this is a great time to give rules_oci a try. You can help by filing actionable issues, contributing code, or donating to the Rules Authors SIG OpenCollective. Since the project is developed and maintained entirely as community-driven open source, your support is essential to keeping the project healthy and responsive to your needs.







Special thanks to Sahin Yort and Alex Eagle from Aspect. 


MaMMUT: A simple vision-encoder text-decoder architecture for multimodal tasks

Vision-language foundational models are built on the premise of a single pre-training followed by subsequent adaptation to multiple downstream tasks. Two main and disjoint training scenarios are popular: a CLIP-style contrastive learning and next-token prediction. Contrastive learning trains the model to predict if image-text pairs correctly match, effectively building visual and text representations for the corresponding image and text inputs, whereas next-token prediction predicts the most likely next text token in a sequence, thus learning to generate text, according to the required task. Contrastive learning enables image-text and text-image retrieval tasks, such as finding the image that best matches a certain description, and next-token learning enables text-generative tasks, such as Image Captioning and Visual Question Answering (VQA). While both approaches have demonstrated powerful results, when a model is pre-trained contrastively, it typically does not fare well on text-generative tasks and vice-versa. Furthermore, adaptation to other tasks is often done with complex or inefficient methods. For example, in order to extend a vision-language model to videos, some models need to do inference for each video frame separately. This limits the size of the videos that can be processed to only a few frames and does not fully take advantage of motion information available across frames.

Motivated by this, we present “A Simple Architecture for Joint Learning for MultiModal Tasks”, called MaMMUT, which is able to train jointly for these competing objectives and which provides a foundation for many vision-language tasks either directly or via simple adaptation. MaMMUT is a compact, 2B-parameter multimodal model that trains across contrastive, text generative, and localization-aware objectives. It consists of a single image encoder and a text decoder, which allows for a direct reuse of both components. Furthermore, a straightforward adaptation to video-text tasks requires only using the image encoder once and can handle many more frames than prior work. In line with recent language models (e.g., PaLM, GLaM, GPT3), our architecture uses a decoder-only text model and can be thought of as a simple extension of language models. While modest in size, our model outperforms the state of the art or achieves competitive performance on image-text and text-image retrieval, video question answering (VideoQA), video captioning, open-vocabulary detection, and VQA.

The MaMMUT model enables a wide range of tasks such as image-text/text-image retrieval (top left and top right), VQA (middle left), open-vocabulary detection (middle right), and VideoQA (bottom).

Decoder-only model architecture

One surprising finding is that a single language-decoder is sufficient for all these tasks, which obviates the need for both complex constructs and training procedures presented before. For example, our model (presented to the left in the figure below) consists of a single visual encoder and single text-decoder, connected via cross attention, and trains simultaneously on both contrastive and text-generative types of losses. Comparatively, prior work is either not able to handle image-text retrieval tasks, or applies only some losses to only some parts of the model. To enable multimodal tasks and fully take advantage of the decoder-only model, we need to jointly train both contrastive losses and text-generative captioning-like losses.

MaMMUT architecture (left) is a simple construct consisting of a single vision encoder and a single text decoder. Compared to other popular vision-language models — e.g., PaLI (middle) and ALBEF, CoCa (right) — it trains jointly and efficiently for multiple vision-language tasks, with both contrastive and text-generative losses, fully sharing the weights between the tasks.

Decoder two-pass learning

Decoder-only models for language learning show clear advantages in performance with smaller model size (almost half the parameters). The main challenge for applying them to multimodal settings is to unify the contrastive learning (which uses unconditional sequence-level representation) with captioning (which optimizes the likelihood of a token conditioned on the previous tokens). We propose a two-pass approach to jointly learn these two conflicting types of text representations within the decoder. During the first pass, we utilize cross attention and causal masking to learn the caption generation task — the text features can attend to the image features and predict the tokens in sequence. On the second pass, we disable the cross-attention and causal masking to learn the contrastive task. The text features will not see the image features but can attend bidirectionally to all text tokens at once to produce the final text-based representation. Completing this two-pass approach within the same decoder allows for accommodating both types of tasks that were previously hard to reconcile. While simple, we show that this model architecture is able to provide a foundation for multiple multimodal tasks.

MaMMUT decoder-only two-pass learning enables both contrastive and generative learning paths by the same model.

Another advantage of our architecture is that, since it is trained for these disjoint tasks, it can be seamlessly applied to multiple applications such as image-text and text-image retrieval, VQA, and captioning.

Moreover, MaMMUT easily adapts to video-language tasks. Previous approaches used a vision encoder to process each frame individually, which required applying it multiple times. This is slow and restricts the number of frames the model can handle, typically to only 6–8. With MaMMUT, we use sparse video tubes for lightweight adaptation directly via the spatio-temporal information from the video. Furthermore, adapting the model to Open-Vocabulary Detection is done by simply training to detect bounding-boxes via an object-detection head.

Adaptation of the MaMMUT architecture to video tasks (left) is simple and fully reuses the model. This is done by generating a video “tubes” feature representation, similar to image patches, that are projected to lower dimensional tokens and run through the vision encoder. Unlike prior approaches (right) that need to run multiple individual images through the vision encoder, we use it only once.

Results

Our model achieves excellent zero-shot results on image-text and text-image retrieval without any adaptation, outperforming all previous state-of-the-art models. The results on VQA are competitive with state-of-the-art results, which are achieved by much larger models. The PaLI model (17B parameters) and the Flamingo model (80B) have the best performance on the VQA2.0 dataset, but MaMMUT (2B) has the same accuracy as the 15B PaLI.

MaMMUT outperforms the state of the art (SOTA) on Zero-Shot Image-Text (I2T) and Text-Image (T2I) retrieval on both MS-COCO (top) and Flickr (bottom) benchmarks.
Performance on the VQA2.0 dataset is competitive but does not outperform large models such as Flamingo-80B and PalI-17B. Performance is evaluated in the more challenging open-ended text generation setting.

MaMMUT also outperforms the state-of-the-art on VideoQA, as shown below on the MSRVTT-QA and MSVD-QA datasets. Note that we outperform much bigger models such as Flamingo, which is specifically designed for image+video pre-training and is pre-trained with both image-text and video-text data.

MaMMUT outperforms the SOTA models on VideoQA tasks (MSRVTT-QA dataset, top, MSVD-QA dataset, bottom), outperforming much larger models, e.g., the 5B GIT2 or Flamingo, which uses 80B parameters and is pre-trained for both image-language and vision-language tasks.

Our results outperform the state-of-the-art on open-vocabulary detection fine-tuning as is also shown below.

MAMMUT open-vocabulary detection results on the LVIS dataset compared to state-of-the-art methods. We report the average precisions for rare classes (APr) as is previously adopted in the literature.

Key ingredients

We show that joint training of both contrastive and text-generative objectives is not an easy task, and in our ablations we find that these tasks are served better by different design choices. We see that fewer cross-attention connections are better for retrieval tasks, but more are preferred by VQA tasks. Yet, while this shows that our model’s design choices might be suboptimal for individual tasks, our model is more effective than more complex, or larger, models.

Ablation studies showing that fewer cross-attention connections (1-2) are better for retrieval tasks (top), whereas more connections favor text-generative tasks such as VQA (bottom).

Conclusion

We presented MaMMUT, a simple and compact vision-encoder language-decoder model that jointly trains a number of conflicting objectives to reconcile contrastive-like and text-generative tasks. Our model also serves as a foundation for many more vision-language tasks, achieving state-of-the-art or competitive performance on image-text and text-image retrieval, videoQA, video captioning, open-vocabulary detection and VQA. We hope it can be further used for more multimodal applications.


Acknowledgements

The work described is co-authored by: Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, and Anelia Angelova. We would like to thank Mojtaba Seyedhosseini, Vijay Vasudevan, Priya Goyal, Jiahui Yu, Zirui Wang, Yonghui Wu, Runze Li, Jie Mei, Radu Soricut, Qingqing Huang, Andy Ly, Nan Du, Yuxin Wu, Tom Duerig, Paul Natsev, Zoubin Ghahramani for their help and support.

Source: Google AI Blog


Chrome Dev for Android Update

Hi everyone! We've just released Chrome Dev 115 (115.0.5748.0) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome

Media transcoding and editing, transform and roll out!

Posted by Andrew Lewis - Software Engineer, Android Media Solutions

The creation of user-generated content is on the rise, and users are looking for more ways to personalize and add uniqueness to their creations. These creations are then shared to a vast network of devices, each with its own capabilities. The Jetpack Media3 1.0 release includes new functionality in the Transformer module for converting media files between formats, or transcoding, and applying editing operations. For example, you can trim a clip from a longer piece of media and apply effects to the video track to share over social media, or transcode media into a more efficient codec for upload to a server.

The overall goal of Transformer is to provide an easy to use, reliable and performant API for transcoding and editing media, including support for customizing functionality, following the same API design principles to ExoPlayer. The library is supported on devices running Android 5.0 Lollipop (API 21) onwards and includes device-specific optimizations, giving developers a strong foundation to build on. This post gives an introduction to the new functionality and describes some of the many features we're planning for upcoming releases!


Getting Started

Most operations with Transformer will follow the same general pattern:

  1. Configure a TransformationRequest with settings like your desired output format
  2. Create a Transformer and pass it your TransformationRequest
  3. Apply additional effects and edits
  4. Attach a listener to react to completion events
  5. Start the transformation

Of course, depending on your desired transformations, you may not need every step. Here's an example of transcoding an input video to the H.265/HEVC video format and removing the audio track.

// Create a TransformationRequest and set the output format to H.265 val transformationRequest = TransformationRequest.Builder().setVideoMimeType(MimeTypes.VIDEO_H265).build() // Create a Transformer val transformer = Transformer.Builder(context) .setTransformationRequest(transformationRequest) // Pass in TransformationRequest .setRemoveAudio(true) // Remove audio track .addListener(transformerListener) // transformerListener is an implementation of Transformer.Listener .build() // Start the transformation val inputMediaItem = MediaItem.fromUri("path_to_input_file") transformer.startTransformation(inputMediaItem, outputPath)

During transformation you can get progress updates with Transformer.getProgress. When the transformation completes the listener is notified in its onTransformationCompleted or onTransformationError callback, and you can process the output media as needed.

Check out our documentation to learn about further capabilities in the Transformer APIs. You can also find details about using Transformer to accurately convert 10-bit HDR content to 8-bit SDR in the "Dealing with color washout" blog post to ensure your video's colors remain as vibrant as possible in the case that your app or the device doesn't support HDR content.


Edits, effects, and extensions

Media3 includes a set of core video effects for simple edits, such as scaling, cropping, and color filters, which you can use with Transformer. For example, you can create a Presentation effect to scale the input to 480p resolution while maintaining the original aspect ratio, and apply it with setVideoEffects:

Transformer.Builder(context) .setVideoEffects(listOf(Presentation.createForHeight(480))) .build()

You can also chain multiple effects to create more complex results. This example converts the input video to grayscale and rotates it by 30 degrees:

Transformer.Builder(context) .setVideoEffects(listOf( RgbFilter.createGrayscaleFilter(), ScaleToFitTransformation.Builder() .setRotationDegrees(30f) .build())) .build()

It's also possible to extend Transformer’s functionality by implementing custom effects that build on existing ones. Here is an example of subclassing MatrixTransformation, where we start zoomed in by 2 times, then zoom out gradually as the frame presentation time increases:

val zoomOutEffect = MatrixTransformation { presentationTimeUs -> val transformationMatrix = Matrix() val scale = 2 - min(1f, presentationTimeUs / 1_000_000f) // Video will zoom from 2x to 1x in the first second transformationMatrix.postScale(/* sx= */ scale, /* sy= */ scale) transformationMatrix // The calculated transformations will be applied each frame in turn } Transformer.Builder(context) .setVideoEffects(listOf(zoomOutEffect)) .build()

Here's a screen recording that shows this effect being applied in the Transformer demo app:

moving image showing what subclassing matrix transformation looks like in the Transformer demo app

For even more advanced use cases, you can wrap your own OpenGL code or other processing libraries in a custom GL texture processor and plug those into Transformer as custom effects. See the demo app for some examples of custom effects. The README also has instructions for trying a demo of MediaPipe integration with Transformer.


Coming soon

Transformer is actively under development but ready to use, so please give it a try and share your feedback! The Media3 development branch includes a sneak peek into several new features building on the 1.0 release described here, including support for tone-mapping HDR videos to SDR using OpenGL, previewing video effects using ExoPlayer.setVideoEffects, and custom audio processing. We are also working on support for editing multiple videos in more flexible compositions, with export from Transformer and playback through ExoPlayer, making Media3 an end-to-end solution for transforming media.

We hope you'll find Transformer an easy-to-use and powerful tool for implementing fantastic media editing experiences on Android! You can send us feature requests and bug reports in the Media3 GitHub issue tracker, and follow this blog to get updates on new features. Stay tuned for our upcoming talk “High quality Android media experiences” at Google I/O.

Google Summer of Code 2023 accepted contributors announced!

We are pleased to announce the Google Summer of Code (GSoC) Contributors for 2023. Over the last few weeks, our 171 mentoring organizations have read through applications, had discussions with applicants, and made the difficult decision of selecting the GSoC Contributors they will be mentoring this summer.

Some notable results from this year’s application period:
  • 43,765 applicants from 160 countries
  • 7,723 proposals submitted
  • 967 GSoC contributors accepted from 65 countries
  • Over 2,400 mentors and organization administrators

Over the next few weeks, our GSoC 2023 Contributors will be actively engaging with their new open source community and getting acclimated with their organization. Mentors will guide the GSoC Contributors through the documentation and processes used by the community, as well as help with planning their milestones and projects for the summer. This Community Bonding period helps familiarize the GSoC Contributors with the languages and tools they will need to successfully complete their projects. Coding begins May 29th and most folks will wrap up September 5th, however, for the second year in a row, GSoC Contributors can request a longer coding period and wrap up their projects by mid-November instead.

We’d like to express our appreciation to the thousands of applicants who took the time to reach out to our mentoring organizations and submit proposals. Through the experience of asking questions, researching, and writing your proposals we hope you all learned more about open source and maybe even found a community you want to contribute to outside of Google Summer of Code! We always say that communication is key, and staying connected with the community or reaching out to other organizations is a great way to set the stage for future opportunities. Open source communities are always looking for new and eager collaborators to bring fresh ideas to the table. We hope you connect with an open source community or even start your own open source project!

There are a handful of program changes to this 19th year of GSoC and we are excited to see how our GSoC Contributors and mentoring organizations take advantage of these adjustments. A big thank you to all of our mentors and organization administrators who make this program so special.

GSoC Contributors—have fun this summer and keep learning! Your mentors and community members have dozens, and in some cases hundreds, of years of combined experience. Let them share their knowledge with you and help you become awesome open source contributors!

By Perry Burnham, Associate Program Manager for the Google Open Source Programs Office