PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations

Evolution strategy (ES) is a family of optimization techniques inspired by the ideas of natural selection: a population of candidate solutions are usually evolved over generations to better adapt to an optimization objective. ES has been applied to a variety of challenging decision making problems, such as legged locomotion, quadcopter control, and even power system control.

Compared to gradient-based reinforcement learning (RL) methods like proximal policy optimization (PPO) and soft actor-critic (SAC), ES has several advantages. First, ES directly explores in the space of controller parameters, while gradient-based methods often explore within a limited action space, which indirectly influences the controller parameters. More direct exploration has been shown to boost learning performance and enable large scale data collection with parallel computation. Second, a major challenge in RL is long-horizon credit assignment, e.g., when a robot accomplishes a task in the end, determining which actions it performed in the past were the most critical and should be assigned a greater reward. Since ES directly considers the total reward, it relieves researchers from needing to explicitly handle credit assignment. In addition, because ES does not rely on gradient information, it can naturally handle highly non-smooth objectives or controller architectures where gradient computation is non-trivial, such as meta–reinforcement learning. However, a major weakness of ES-based algorithms is their difficulty in scaling to problems that require high-dimensional sensory inputs to encode the environment dynamics, such as training robots with complex vision inputs.

In this work, we propose “PI-ARS: Accelerating Evolution-Learned Visual-Locomotion with Predictive Information Representations”, a learning algorithm that combines representation learning and ES to effectively solve high dimensional problems in a scalable way. The core idea is to leverage predictive information, a representation learning objective, to obtain a compact representation of the high-dimensional environment dynamics, and then apply Augmented Random Search (ARS), a popular ES algorithm, to transform the learned compact representation into robot actions. We tested PI-ARS on the challenging problem of visual-locomotion for legged robots. PI-ARS enables fast training of performant vision-based locomotion controllers that can traverse a variety of difficult environments. Furthermore, the controllers trained in simulated environments successfully transfer to a real quadruped robot.

PI-ARS trains reliable visual-locomotion policies that are transferable to the real world.

Predictive Information
A good representation for policy learning should be both compressive, so that ES can focus on solving a much lower dimensional problem than learning from raw observations would entail, and task-critical, so the learned controller has all the necessary information needed to learn the optimal behavior. For robotic control problems with high-dimensional input space, it is critical for the policy to understand the environment, including the dynamic information of both the robot itself and its surrounding objects.

As such, we propose an observation encoder that preserves information from the raw input observations that allows the policy to predict the future states of the environment, thus the name predictive information (PI). More specifically, we optimize the encoder such that the encoded version of what the robot has seen and planned in the past can accurately predict what the robot might see and be rewarded in the future. One mathematical tool to describe such a property is that of mutual information, which measures the amount of information we obtain about one random variable X by observing another random variable Y. In our case, X and Y would be what the robot saw and planned in the past, and what the robot sees and is rewarded in the future. Directly optimizing the mutual information objective is a challenging problem because we usually only have access to samples of the random variables, but not their underlying distributions. In this work we follow a previous approach that uses InfoNCE, a contrastive variational bound on mutual information to optimize the objective.

Left: We use representation learning to encode PI of the environment. Right: We train the representation by replaying trajectories from the replay buffer and maximize the predictability between the observation and motion plan in the past and the observation and reward in the future of the trajectory.

Predictive Information with Augmented Random Search
Next, we combine PI with Augmented Random Search (ARS), an algorithm that has shown excellent optimization performance for challenging decision-making tasks. At each iteration of ARS, it samples a population of perturbed controller parameters, evaluates their performance in the testing environment, and then computes a gradient that moves the controller towards the ones that performed better.

We use the learned compact representation from PI to connect PI and ARS, which we call PI-ARS. More specifically, ARS optimizes a controller that takes as input the learned compact representation PI and predicts appropriate robot commands to achieve the task. By optimizing a controller with smaller input space, it allows ARS to find the optimal solution more efficiently. Meanwhile, we use the data collected during ARS optimization to further improve the learned representation, which is then fed into the ARS controller in the next iteration.

An overview of the PI-ARS data flow. Our algorithm interleaves between two steps: 1) optimizing the PI objective that updates the policy, which is the weights for the neural network that extracts the learned representation; and 2) sampling new trajectories and updating the controller parameters using ARS.

Visual-Locomotion for Legged Robots
We evaluate PI-ARS on the problem of visual-locomotion for legged robots. We chose this problem for two reasons: visual-locomotion is a key bottleneck for legged robots to be applied in real-world applications, and the high-dimensional vision-input to the policy and the complex dynamics in legged robots make it an ideal test-case to demonstrate the effectiveness of the PI-ARS algorithm. A demonstration of our task setup in simulation can be seen below. Policies are first trained in simulated environments, and then transferred to hardware.

An illustration of the visual-locomotion task setup. The robot is equipped with two cameras to observe the environment (illustrated by the transparent pyramids). The observations and robot state are sent to the policy to generate a high-level motion plan, such as feet landing location and desired moving speed. The high-level motion plan is then achieved by a low-level Motion Predictive Control (MPC) controller.

Experiment Results
We first evaluate the PI-ARS algorithm on four challenging simulated tasks:

  • Uneven stepping stones: The robot needs to walk over uneven terrain while avoiding gaps.
  • Quincuncial piles: The robot needs to avoid gaps both in front and sideways.
  • Moving platforms: The robot needs to walk over stepping stones that are randomly moving horizontally or vertically. This task illustrates the flexibility of learning a vision-based policy in comparison to explicitly reconstructing the environment.
  • Indoor navigation: The robot needs to navigate to a random location while avoiding obstacles in an indoor environment.

As shown below, PI-ARS is able to significantly outperform ARS in all four tasks in terms of the total task reward it can obtain (by 30-50%).

Left: Visualization of PI-ARS policy performance in simulation. Right: Total task reward (i.e., episode return) for PI-ARS (green line) and ARS (red line). The PI-ARS algorithm significantly outperforms ARS on four challenging visual-locomotion tasks.

We further deploy the trained policies to a real Laikago robot on two tasks: random stepping stone and indoor navigation. We demonstrate that our trained policies can successfully handle real-world tasks. Notably, the success rate of the random stepping stone task improved from 40% in the prior work to 100%.

PI-ARS trained policy enables a real Laikago robot to navigate around obstacles.

Conclusion
In this work, we present a new learning algorithm, PI-ARS, that combines gradient-based representation learning with gradient-free evolutionary strategy algorithms to leverage the advantages of both. PI-ARS enjoys the effectiveness, simplicity, and parallelizability of gradient-free algorithms, while relieving a key bottleneck of ES algorithms on handling high-dimensional problems by optimizing a low-dimensional representation. We apply PI-ARS to a set of challenging visual-locomotion tasks, among which PI-ARS significantly outperforms the state of the art. Furthermore, we validate the policy learned by PI-ARS on a real quadruped robot. It enables the robot to walk over randomly-placed stepping stones and navigate in an indoor space with obstacles. Our method opens the possibility of incorporating modern large neural network models and large-scale data into the field of evolutionary strategy for robotics control.

Acknowledgements
We would like to thank our paper co-authors: Ofir Nachum, Tingnan Zhang, Sergio Guadarrama, and Jie Tan. We would also like to thank Ian Fischer and John Canny for valuable feedback.

Source: Google AI Blog


Improving the Google Chat and Gmail search experience on web and mobile

What’s changing

In order to help you find more accurate and customized search suggestions and results, we’re introducing three features that improve the Google Chat and Gmail search experience on web and mobile:
  • Search suggestions: Search-query suggestions based on your past search history in Chat will now appear as you type in the Chat search bar. This will help you quickly recall important messages, files, and more on mobile. 
  • Gmail labels: You can now search messages under a specific Gmail label in the app to return results only within that label. You can also use search chips in the Gmail search bar to refine label searches. 
 
  • Related results: For Gmail search-queries that return no results, related results will be shown to improve the overall search experience. 

Getting started 


Rollout pace 

Search suggestions: 
  • This feature is now available on Android devices
  • Rollout to iOS devices will complete by the end of October 
Gmail labels: 
  • This feature is now available on Android and iOS devices 
Related results: 
  • This feature is now available on web 

Availability 

  • Available to all Google Workspace customers, as well as legacy G Suite Basic and Business customers 
  • Available to users with personal Google Accounts 

Resources 


Roadmap 


Easily find Google Workspace Marketplace apps with enhanced search filters

What’s changing

We’re introducing enhanced search filters in the Google Workspace Marketplace to help you quickly find relevant apps. These new filters allow you to search by category, price, rating whether it’s a private app for the organization, if it works with other apps, and more. 

Getting started 

Rollout pace 

Availability 

  • Available to Google Workspace Business Starter, Business Standard, Business Plus, Enterprise Standard, Enterprise Plus, Education Fundamentals, Education Plus, and Nonprofits, as well as legacy G Suite Basic and Business customers
  • Not available to Google Workspace Essentials, Enterprise Essentials, and Frontline customers 
  • Available to users with personal Google Accounts 

Resources 

Present Google Slides directly in Google Meet

 This announcement was made at Google Cloud Next ‘22. Visit the Cloud Blog to learn more about the latest Google Workspace innovations for the ever-changing world of work. 


What’s changing

Many Google Meet users share content on their screen during meetings, and we know it’s important for presenters to actively interact with their audience. 

You will now be able to control your Slides and engage with your audience all in one screen by presenting Slides from Meet. This updated experience can help you present with greater confidence and ultimately make digital interactions feel more like when you’re physically together.

Who’s impacted 

End users 


Why you’d use it 

This feature fosters active collaboration by enabling you to see your Slides content, controls, and audience all in one place. 


Getting started 

  • Admins: There is no admin control for this feature. 
  • End users: 
    • Select ‘Present a Tab’ in Meet > choose a Google Slide presentation > manage your presentation with controls in the bottom corner of the presentation. 
    • Visit the Help Center to learn more about controlling Slides presentations in Google Meet

Rollout pace 


Availability 

  • Available to Google Workspace Business Standard, Business Plus, Enterprise Essentials, Enterprise Standard, Education Standard, Enterprise Plus, Education Plus, the Teaching and Learning Upgrade, and Nonprofits customers 
  • Not available to Google Workspace Essentials, Business Starter, Education Fundamentals, Frontline, as well as legacy G Suite Basic and Business customers 
  • Not available to users with personal Google Accounts 

Resources

Roadmap 

Announcing GUAC, a great pairing with SLSA (and SBOM)!

Supply chain security is at the fore of the industry’s collective consciousness. We’ve recently seen a significant rise in software supply chain attacks, a Log4j vulnerability of catastrophic severity and breadth, and even an Executive Order on Cybersecurity.

It is against this background that Google is seeking contributors to a new open source project called GUAC (pronounced like the dip). GUAC, or Graph for Understanding Artifact Composition, is in the early stages yet is poised to change how the industry understands software supply chains. GUAC addresses a need created by the burgeoning efforts across the ecosystem to generate software build, security, and dependency metadata. True to Google’s mission to organize and make the world’s information universally accessible and useful, GUAC is meant to democratize the availability of this security information by making it freely accessible and useful for every organization, not just those with enterprise-scale security and IT funding.

Thanks to community collaboration in groups such as OpenSSF, SLSA, SPDX, CycloneDX, and others, organizations increasingly have ready access to:

These data are useful on their own, but it’s difficult to combine and synthesize the information for a more comprehensive view. The documents are scattered across different databases and producers, are attached to different ecosystem entities, and cannot be easily aggregated to answer higher-level questions about an organization’s software assets.

To help address this issue we’ve teamed up with Kusari, Purdue University, and Citi to create GUAC, a free tool to bring together many different sources of software security metadata. We’re excited to share the project’s proof of concept, which lets you query a small dataset of software metadata including SLSA provenance, SBOMs, and OpenSSF Scorecards.

What is GUAC

Graph for Understanding Artifact Composition (GUAC) aggregates software security metadata into a high fidelity graph database—normalizing entity identities and mapping standard relationships between them. Querying this graph can drive higher-level organizational outcomes such as audit, policy, risk management, and even developer assistance.

Conceptually, GUAC occupies the “aggregation and synthesis” layer of the software supply chain transparency logical model:

GUAC has four major areas of functionality:

  1. Collection
    GUAC can be configured to connect to a variety of sources of software security metadata. Some sources may be open and public (e.g., OSV); some may be first-party (e.g., an organization’s internal repositories); some may be proprietary third-party (e.g., from data vendors).
  2. Ingestion
    From its upstream data sources GUAC imports data on artifacts, projects, resources, vulnerabilities, repositories, and even developers.
  3. Collation
    Having ingested raw metadata from disparate upstream sources, GUAC assembles it into a coherent graph by normalizing entity identifiers, traversing the dependency tree, and reifying implicit entity relationships, e.g., project → developer; vulnerability → software version; artifact → source repo, and so on.
  4. Query
    Against an assembled graph one may query for metadata attached to, or related to, entities within the graph. Querying for a given artifact may return its SBOM, provenance, build chain, project scorecard, vulnerabilities, and recent lifecycle events — and those for its transitive dependencies.

    A CISO or compliance officer in an organization wants to be able to reason about the risk of their organization. An open source organization like the Open Source Security Foundation wants to identify critical libraries to maintain and secure. Developers need richer and more trustworthy intelligence about the dependencies in their projects.

    The good news is, increasingly one finds the upstream supply chain already enriched with attestations and metadata to power higher-level reasoning and insights. The bad news is that it is difficult or impossible today for software consumers, operators, and administrators to gather this data into a unified view across their software assets.

    To understand something complex like the blast radius of a vulnerability, one needs to trace the relationship between a component and everything else in the portfolio—a task that could span thousands of metadata documents across hundreds of sources. In the open source ecosystem, the number of documents could reach into the millions.

    GUAC aggregates and synthesizes software security metadata at scale and makes it meaningful and actionable. With GUAC in hand, we will be able to answer questions at three important stages of software supply chain security:

    • Proactive, e.g.,
      • What are the most used critical components in my software supply chain ecosystem?
      • Where are the weak points in my overall security posture?
      • How do I prevent supply chain compromises before they happen?
      • Where am I exposed to risky dependencies?
    • Operational, e.g.,
      • Is there evidence that the application I’m about to deploy meets organization policy?
      • Do all binaries in production trace back to a securely managed repository?
    • Reactive, e.g.,
      • Which parts of my organization’s inventory is affected by new vulnerability X?
      • A suspicious project lifecycle event has occurred. Where is risk introduced to my organization?
      • An open source project is being deprecated. How am I affected?

Get Involved

GUAC is an Open Source project on Github, and we are excited to get more folks involved and contributing (read the contributor guide to get started)! The project is still in its early stages, with a proof of concept that can ingest SLSA, SBOM, and Scorecard documents and support simple queries and exploration of software metadata. The next efforts will focus on scaling the current capabilities and adding new document types for ingestion. We welcome help and contributions of code or documentation.

Since the project will be consuming documents from many different sources and formats, we have put together a group of “Technical Advisory Members'' to help advise the project. These members include representation from companies and groups such as SPDX, CycloneDX Anchore, Aquasec, IBM, Intel, and many more. If you’re interested in participating as a contributor or advisor representing end users’ needs—or the sources of metadata GUAC consumes—you can register your interest in the relevant GitHub issue.

The GUAC team will be showcasing the project at Kubecon NA 2022 next week. Come by our session if you’ll be there and have a chat with us—we’d be happy to talk in person or virtually!

10 new reasons to love Messages by Google

We use messaging apps to feel connected, without the headache of needing to know what phone or network we’re on. That is why our focus with Messages by Google is to help you build connections. It’s also built around RCS, a modern messaging protocol that supports richer text features, higher resolution images and videos, and enables end-to-end encryption. With RCS, we can give everyone a secure and modern messaging experience. We continue to advocate for RCS across the industry so key players #GetTheMessage and make the experience better for everyone.

As RCS adoption accelerates, we’re doing what’s possible to improve messaging between Android and iOS, like adding support for reactions. This builds on a suite of features that you already love, like an organized inbox that separates personal and business messages, the ability to share sharper videos and scheduled messages. And we’re doing even more.

Here are 10 ways Messages is evolving with safer, smarter and more modern features.

1. Ever been in a chat where the conversation with friends is flowing and you’re catching up with tons of messages? Soon you’ll be able to respond to an individual message in a conversation when RCS is enabled, making it easier to respond to a specific message without breaking the flow.

2. Earlier this year, we started displaying emoji reactions from iPhone users on your Android phone. Now we’re taking a step further by letting you react to SMS texts from iPhone users with emoji as well. While RCS is the ultimate solution, we're doing what we can to help Android users have a way to consistently react to messages.

3. We’re making voice messages more accessible. Using machine learning, Voice Message Transcription auto-transcribes the message so you can access it with ease. Say you’re in a crowded space and get an audio message from a loved one: transcripts will let you “view” the audio like you would a traditional text message. In addition to Pixel 7 and Pixel 7 Pro, this feature is also available on Pixel 6, Pixel 6A, Pixel 6 Pro, Samsung Galaxy S22 and the Galaxy Fold 4.

4. Reminders are now included directly in Messages to help you remember important moments without navigating across several apps on your phone. Remind yourself to call Mom on her birthday, or schedule that appointment during regular business hours. And if you save someone’s birthday or anniversary in your phone’s contacts app, you’ll get a gentle reminder about them when you open the Messages app.

5. You can now watch YouTube videos within Messages without ever leaving the app. So when someone sends you a YouTube link, you can quickly watch and respond without the hassle of switching back and forth.

6. If you are like me and always scrolling through messages endlessly to find the address that your friend sent you a while back, we got you covered. Messages will now intelligently suggest you “star” messages that contain texts like addresses, door codes and phone numbers to help you easily keep track and quickly find important conversations.

7. Sometimes texting is too slow and impersonal, so you need to get yourself on a video call. Messages will recognize texts like “Can you talk now?” and suggest a Meet call by showing an icon right next to the message. It will also suggest adding calendar events for messages like “Let’s meet at 6pm on Tuesday”, to help you stay on top of important events.

8. In some countries, we’re experimenting with a feature that lets you chat with businesses you found on Search and Maps directly through Messages, so all conversations appear in one place that’s searchable, private and secure. You can plan your next trip, score tickets to the big game and find deals from your favorite retailers — all without leaving the Messages app.

9. Messages work across your favorite devices, from your phone to Chromebook to your smartwatch. Try sending a message from your new Pixel Watch by asking Google Assistant.

10. Your messaging apps should work wherever you are—even in the air! That's why we partnered with United Airlines to offer messaging on United flights, when you have RCS turned on. It will be available on United WiFi for most carriers starting this fall, with broader support coming soon.

A fresh new look

We’re updating the Messages icon over the coming weeks to better reflect today's modern messaging experience and share the same look as many of Google's other products. It takes more than one side to have a conversation, and that’s reflected in the design, with overlapping messaging bubbles coming together as one.

Our Phone and Contacts apps will also be updated with the same look and feel to signal their shared purpose: helping you communicate.

Each is designed to adapt to Material You themes, so they can always match your personal style. And of course, we obsessed over every pixel to ensure these new icons are instantly recognizable as communication tools and accessible to everyone.

There’s more to come as we continue to build new tools and features into the app — all with the safety and security of Google. Download the Messages app on Google Play today to give it a spin, and try out the new features that will begin rolling out in the coming weeks.

Source: Android


MUSIQ: Assessing Image Aesthetic and Technical Quality with Multi-scale Transformers

Understanding the aesthetic and technical quality of images is important for providing a better user visual experience. Image quality assessment (IQA) uses models to build a bridge between an image and a user's subjective perception of its quality. In the deep learning era, many IQA approaches, such as NIMA, have achieved success by leveraging the power of convolutional neural networks (CNNs). However, CNN-based IQA models are often constrained by the fixed-size input requirement in batch training, i.e., the input images need to be resized or cropped to a fixed size shape. This preprocessing is problematic for IQA because images can have very different aspect ratios and resolutions. Resizing and cropping can impact image composition or introduce distortions, thus changing the quality of the image.

In CNN-based models, images need to be resized or cropped to a fixed shape for batch training. However, such preprocessing can alter the image aspect ratio and composition, thus impacting image quality. Original image used under CC BY 2.0 license.

In “MUSIQ: Multi-scale Image Quality Transformer”, published at ICCV 2021, we propose a patch-based multi-scale image quality transformer (MUSIQ) to bypass the CNN constraints on fixed input size and predict the image quality effectively on native-resolution images. The MUSIQ model supports the processing of full-size image inputs with varying aspect ratios and resolutions and allows multi-scale feature extraction to capture image quality at different granularities. To support positional encoding in the multi-scale representation, we propose a novel hash-based 2D spatial embedding combined with an embedding that captures the image scaling. We apply MUSIQ on four large-scale IQA datasets, demonstrating consistent state-of-the-art results across three technical quality datasets (PaQ-2-PiQ, KonIQ-10k, and SPAQ) and comparable performance to that of state-of-the-art models on the aesthetic quality dataset AVA.

The patch-based MUSIQ model can process the full-size image and extract multi-scale features, which better aligns with a person’s typical visual response.

In the following figure, we show a sample of images, their MUSIQ score, and their mean opinion score (MOS) from multiple human raters in the brackets. The range of the score is from 0 to 100, with 100 being the highest perceived quality. As we can see from the figure, MUSIQ predicts high scores for images with high aesthetic quality and high technical quality, and it predicts low scores for images that are not aesthetically pleasing (low aesthetic quality) or that contain visible distortions (low technical quality).

High quality
76.10 [74.36] 69.29 [70.92]
   
Low aesthetics quality
55.37 [53.18] 32.50 [35.47]
   
Low technical quality
14.93 [14.38] 15.24 [11.86]
Predicted MUSIQ score (and ground truth) on images from the KonIQ-10k dataset. Top: MUSIQ predicts high scores for high quality images. Middle: MUSIQ predicts low scores for images with low aesthetic quality, such as images with poor composition or lighting. Bottom: MUSIQ predicts low scores for images with low technical quality, such as images with visible distortion artifacts (e.g., blurry, noisy).

The Multi-scale Image Quality Transformer
MUSIQ tackles the challenge of learning IQA on full-size images. Unlike CNN-models that are often constrained to fixed resolution, MUSIQ can handle inputs with arbitrary aspect ratios and resolutions.

To accomplish this, we first make a multi-scale representation of the input image, containing the native resolution image and its resized variants. To preserve the image composition, we maintain its aspect ratio during resizing. After obtaining the pyramid of images, we then partition the images at different scales into fixed-size patches that are fed into the model.

Illustration of the multi-scale image representation in MUSIQ.

Since patches are from images of varying resolutions, we need to effectively encode the multi-aspect-ratio multi-scale input into a sequence of tokens, capturing both the pixel, spatial, and scale information. To achieve this, we design three encoding components in MUSIQ, including: 1) a patch encoding module to encode patches extracted from the multi-scale representation; 2) a novel hash-based spatial embedding module to encode the 2D spatial position for each patch; and 3) a learnable scale embedding to encode different scales. In this way, we can effectively encode the multi-scale input as a sequence of tokens, serving as the input to the Transformer encoder.

To predict the final image quality score, we use the standard approach of prepending an additional learnable “classification token” (CLS). The CLS token state at the output of the Transformer encoder serves as the final image representation. We then add a fully connected layer on top to predict the IQS. The figure below provides an overview of the MUSIQ model.

Overview of MUSIQ. The multi-scale multi-resolution input will be encoded by three components: the scale embedding (SCE), the hash-based 2D spatial embedding (HSE), and the multi-scale patch embedding (MPE).

Since MUSIQ only changes the input encoding, it is compatible with any Transformer variants. To demonstrate the effectiveness of the proposed method, in our experiments we use the classic Transformer with a relatively lightweight setting so that the model size is comparable to ResNet-50.

Benchmark and Evaluation
To evaluate MUSIQ, we run experiments on multiple large-scale IQA datasets. On each dataset, we report the Spearman’s rank correlation coefficient (SRCC) and Pearson linear correlation coefficient (PLCC) between our model prediction and the human evaluators’ mean opinion score. SRCC and PLCC are correlation metrics ranging from -1 to 1. Higher PLCC and SRCC means better alignment between model prediction and human evaluation. The graph below shows that MUSIQ outperforms other methods on PaQ-2-PiQ, KonIQ-10k, and SPAQ.

Performance comparison of MUSIQ and previous state-of-the-art (SOTA) methods on four large-scale IQA datasets. On each dataset we compare the Spearman’s rank correlation coefficient (SRCC) and Pearson linear correlation coefficient (PLCC) of model prediction and ground truth.

Notably, the PaQ-2-PiQ test set is entirely composed of large pictures having at least one dimension exceeding 640 pixels. This is very challenging for traditional deep learning approaches, which require resizing. MUSIQ can outperform previous methods by a large margin on the full-size test set, which verifies its robustness and effectiveness.

It is also worth mentioning that previous CNN-based methods often required sampling as many as 20 crops for each image during testing. This kind of multi-crop ensemble is a way to mitigate the fixed shape constraint in the CNN models. But since each crop is only a sub-view of the whole image, the ensemble is still an approximate approach. Moreover, CNN-based methods both add additional inference cost for every crop and, because they sample different crops, they can introduce randomness in the result. In contrast, because MUSIQ takes the full-size image as input, it can directly learn the best aggregation of information across the full image and it only needs to run the inference once.

To further verify that the MUSIQ model captures different information at different scales, we visualize the attention weights on each image at different scales.

Attention visualization from the output tokens to the multi-scale representation, including the original resolution image and two proportionally resized images. Brighter areas indicate higher attention, which means that those areas are more important for the model output. Images for illustration are taken from the AVA dataset.

We observe that MUSIQ tends to focus on more detailed areas in the full, high-resolution images and on more global areas on the resized ones. For example, for the flower photo above, the model’s attention on the original image is focusing on the pedal details, and the attention shifts to the buds at lower resolutions. This shows that the model learns to capture image quality at different granularities.

Conclusion
We propose a multi-scale image quality transformer (MUSIQ), which can handle full-size image input with varying resolutions and aspect ratios. By transforming the input image to a multi-scale representation with both global and local views, the model can capture the image quality at different granularities. Although MUSIQ is designed for IQA, it can be applied to other scenarios where task labels are sensitive to image resolution and aspect ratio. The MUSIQ model and checkpoints are available at our GitHub repository.

Acknowledgements
This work is made possible through a collaboration spanning several teams across Google. We’d like to acknowledge contributions from Qifei Wang, Yilin Wang and Peyman Milanfar.

Source: Google AI Blog


Chrome Beta for Android Update

Hi everyone! We've just released Chrome Beta 107 (107.0.5304.54) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome

Chrome Beta for Android Update

Hi everyone! We've just released Chrome Beta 107 (107.0.5304.54) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome

Chrome Beta for Android Update

Hi everyone! We've just released Chrome Beta 107 (107.0.5304.54) for Android. It's now available on Google Play.

You can see a partial list of the changes in the Git log. For details on new features, check out the Chromium blog, and for details on web platform updates, check here.

If you find a new issue, please let us know by filing a bug.

Krishna Govind
Google Chrome