Tag Archives: images

Open sourcing the attention center model

When you look at an image, what parts of an image do you pay attention to first? Would a machine be able to learn this? We provide a machine learning model that can be used to do just that. Why is it useful? The latest generation image format (JPEG XL) supports serving the parts that you pay attention to first, which results in an improved user experience: images will appear to load faster. But the model not only works for encoding JPEG XL images, but can be used whenever we need to know where a human would look first.

An open sourcing attention center model

What regions in an image will attract the majority of human visual attention first? We trained a model to predict such a region when given an image, called the attention center model, which is now open sourced. In addition to the model, we provide a script to use it in combination with the JPEG XL encoder: google/attention-center.

Some example predictions of our attention center model are shown in the following figure, where the green dot is the predicted attention center point for the image. Note that in the “two parrots” image both parrots’ heads are visually important, so the attention center point will be in the middle.

Four images in quadrants as follows: A red door with brass doorknob in top left quadrant, headshot of a brown skinned girl waering a colorful sweater and ribbons in her hair and painted face smiling at the camera in the top right quadrant, A teal shuttered catherdral style window against a sand colored stucco wall with pink and red hibiscus in the forefront in the bottom left quadrant, A blue and yellow macaw and red and green macaw next to each other in the bottom right quadrant
Images are from Kodak image data set: http://r0k.us/graphics/kodak/

The model is 2MB and in the TensorFlow Lite format. It takes an RGB image as input and outputs a 2D point, which is the predicted center of human attention on the image. That predicted center is the place where we should start with operations (decoding and displaying in JPEG XL case). This allows the most visually salient/import regions to be processed as early as possible. Check out the code and continue to build upon it!

Attention center ground-truth data

To train a model to predict the attention center, we first need to have some ground-truth data from the attention center. Given an image, some attention points can either be collected by eye trackers [1], or be approximated by mouse clicks on a blurry version of the image [2]. We first apply temporal filtering to those attention points and keep only the initial ones, and then apply spatial filtering to remove noise (e.g., random gazes). We then compute the center of the remaining attention points as the attention center ground-truth. An example illustration figure is shown below for the process of obtaining the ground-truth.

Five images in a row showing the original image of a person standing on a rock by the ocean; the first is the original image, the second showing gaze/attention points, the third shoing temporal filtering, the fourth spatial filtering, and fifth, attention center

Attention center model architecture

The attention center model is a deep neural net, which takes an image as input, and uses a pre-trained classification network, e.g, ResNet, MobileNet, etc., as the backbone. Several intermediate layers that output from the backbone network are used as input for the attention center prediction module. These different intermediate layers contain different information e.g., shallow layers often contain low level information like intensity/color/texture, while deeper layers usually contain higher and more semantic information like shape/object. All are useful for the attention prediction. The attention center prediction applies convolution, deconvolution and/or resizing operator together with aggregation and sigmoid function to generate a weighting map for the attention center. And then an operator (the Einstein summation operator in our case) can be applied to compute the (gravity) center from the weighting map. An L2 norm between the predicted attention center and the ground-truth attention center can be computed as the training loss.

Attention center model architecture

Progressive JPEG XL images with attention center model

JPEG XL is a new image format that allows the user to encode images in a way to ensure the more interesting parts come first. This has the advantage that when viewing images that are transferred over the web, we can already display the attention grabbing part of the image, i.e. the parts where the user looks first and as soon as the user looks elsewhere ideally the rest of the image already has arrived and has been decoded. Using Saliency in progressive JPEG XL images | Google Open Source Blog illustrates how this works in principle. In short, in JPEG XL, the image is divided into square groups (typically of size 256 x 256), and the JPEG XL encoder will choose a starting group in the image and then grow concentric squares around that group. It was this need for figuring out where the attention center of an image is that led us to open source the attention center model, together with a script to use it in combination with the JPEG XL encoder. Progressive decoding of JPEG XL images has recently been added to Chrome starting from version 107. At the moment, JPEG XL is behind an experimental flag, which can be enabled by going to chrome://flags, searching for “jxl”.

To try out how partially loaded progressive JPEG XL images look, you can go to https://google.github.io/attention-center/.

By Moritz Firsching, Junfeng He, and Zoltan Szabadka – Google Research

References

[1] Valliappan, Nachiappan, Na Dai, Ethan Steinberg, Junfeng He, Kantwon Rogers, Venky Ramachandran, Pingmei Xu et al. "Accelerating eye movement research via accurate and affordable smartphone eye tracking." Nature communications 11, no. 1 (2020): 1-12.

[2] Jiang, Ming, Shengsheng Huang, Juanyong Duan, and Qi Zhao. "Salicon: Saliency in context." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1072-1080. 2015.

Using Saliency in progressive JPEG XL images

At Google, we are working towards improving the web experience for users. Getting images delivered fast is a crucial part of the web experience and progressive images can help getting the salient parts, detected by machine learning, first. When you look at an image, you don’t immediately look at the entire image, but tend to gaze at the most interesting, or “salient”, parts of the image first. When delivering images over the web, it is now possible to organize the data in such a way that the most salient parts arrive first. Ideally you don’t even notice that some less salient parts have not yet arrived, because by the time you look at those parts they have already arrived and rendered.

We will explain how this works with the new open source image format JPEG XL, but we’ll start by taking a step back and describing how images are currently delivered and rendered on the web.

How partial images are displayed on the web

It’s important that web sites including images load quickly, because waiting for images to load causes frustration. Two techniques in particular are used to make images appear fast: One is showing an approximation of the image before all bytes of the image are transmitted, often known as “progressive image loading.” Another is making the byte size of the image smaller by using strong image compression.

What is progressive image loading?

Some image formats are implemented in a way that does not allow any kind of progressive image loading; all the bytes of the image have to be received before rendering can begin. The next, most simple, type of image loading is sometimes called “sequential image loading.” For these images, the data is organized in a way that pixels come in a particular order, typically in rows and from top to bottom.

Formats with this kind of image loading include PNG, webp, and JPEG. The JPEG format allows more sophisticated forms of progressive images. Here, we can organize the data so that it comes in multiple scans, with each scan showing more detail than the previous one.

For example, even if only approximately 15% of the data for an image is loaded, it often already has decent results. See the following images comparing no progression:

100% of bytes loaded, original image
100% of bytes loaded, original image

15% of bytes loaded, no progressive image loading
15% of bytes loaded, no progressive image loading

15% of bytes loaded, sequential image loading
15% of bytes loaded, sequential image loading

100% of bytes loaded, original image
15% of bytes loaded, progressive JPEG

In the first scan, the progressive JPEG only has a small amount of information available for the image, (e.g. only the average color of 8x8 blocks). Known as the DC-only scan, because the average color of each 8x8 block is called DC-component in the discrete cosine transform, it is the basis of JPEG image compression. Check out this computerphile video on JPEG DCT for a basic introduction. Instead of displaying an image that consists of 8x8 blocks, JPEG rendering in Chrome and Firefox choose to render the preview with some smoothing, to provide a less distracting experience.

Progressive JPEG XLs

While the quality (and therefore byte-sizes) of the individual scans in a progressive JPEG image can be controlled, the order within a scan is still top to bottom, like in a sequential JPEG. JPEG XL goes beyond that by making it possible to send the data necessary to display all details of the most salient parts first, followed by the less salient parts. For example, in a portrait, we can decide to first send the bytes for the face, and then, for the out-of-focus background.

In general, progressive JPEG XL works in the following way:
  • There is always an 8x8 downsampled image available (similar to a DC-only scan in a progressive JPEG). The decoder can display that with a nice upsampling, which gives the impression of a smoothed version of the image.
  • The image is divided into square groups (typically of size 256 x 256) and it is possible to provide an order of these groups during encoding. In particular, we can order the groups by saliency and choose an order that anticipates where the viewer might look first, while not being disturbing.
While the format allows for a very flexible order of the groups, our current encoder chooses a starting group and then grows concentric squares around that group. This is because we expect that this will be less distracting to the user. To make successive updates even less noticeable, we smooth the boundary between groups for which all the data has arrived and those that still contain an incomplete approximation. One requirement of this technique is a good way of identifying where the salient parts of an image are, which is needed when encoding an image. This information is typically represented by a saliency map which can be visualized as a heatmap image, where the more salient parts are redder.

Original image next to saliency map image
Original image.                                                                                                             Saliency map.

Smooth DC-image next to image with group border
Smooth DC-image.                                                                                                  Image with group order.

Stay tuned for videos showing progressive JPEG XL in action.

How to find good saliency maps for images

Saliency prediction models (overview) aim at predicting which regions in an image will attract human attention. To predict saliency effectively, our model leverages the power of deep neural nets to consider both high level semantic signals like face, objects, shapes etc., as well as low or medium level signals like color, intensity, texture, and so on. The model is trained on a large scale public gaze/saliency data set, to make sure the predicted saliency best mimics human gaze/fixation behaviour on each image. The model takes an image as the input and output a saliency map, which can serve as a visual importance map, and hence help determine the decoding order for each region in the image. Example images and their predicted saliency are as follows:

Example images and their predicted saliency

At the time of writing (July 2021), Chrome and Firefox did not yet support decoding JPEG XL image progressively in the way we describe, but the spec does allow encoding arbitrary group orders.

Different users have different experiences when it comes to looking at images loading on the web.We hope that this way of progressively delivering images will improve user experience especially on lower-bandwidth connections.

By Moritz Firsching and Junfeng He – Google Research

Using Saliency in progressive JPEG XL images

At Google, we are working towards improving the web experience for users. Getting images delivered fast is a crucial part of the web experience and progressive images can help getting the salient parts, detected by machine learning, first. When you look at an image, you don’t immediately look at the entire image, but tend to gaze at the most interesting, or “salient”, parts of the image first. When delivering images over the web, it is now possible to organize the data in such a way that the most salient parts arrive first. Ideally you don’t even notice that some less salient parts have not yet arrived, because by the time you look at those parts they have already arrived and rendered.

We will explain how this works with the new open source image format JPEG XL, but we’ll start by taking a step back and describing how images are currently delivered and rendered on the web.

How partial images are displayed on the web

It’s important that web sites including images load quickly, because waiting for images to load causes frustration. Two techniques in particular are used to make images appear fast: One is showing an approximation of the image before all bytes of the image are transmitted, often known as “progressive image loading.” Another is making the byte size of the image smaller by using strong image compression.

What is progressive image loading?

Some image formats are implemented in a way that does not allow any kind of progressive image loading; all the bytes of the image have to be received before rendering can begin. The next, most simple, type of image loading is sometimes called “sequential image loading.” For these images, the data is organized in a way that pixels come in a particular order, typically in rows and from top to bottom.

Formats with this kind of image loading include PNG, webp, and JPEG. The JPEG format allows more sophisticated forms of progressive images. Here, we can organize the data so that it comes in multiple scans, with each scan showing more detail than the previous one.

For example, even if only approximately 15% of the data for an image is loaded, it often already has decent results. See the following images comparing no progression:

100% of bytes loaded, original image
100% of bytes loaded, original image

15% of bytes loaded, no progressive image loading
15% of bytes loaded, no progressive image loading

15% of bytes loaded, sequential image loading
15% of bytes loaded, sequential image loading

100% of bytes loaded, original image
15% of bytes loaded, progressive JPEG

In the first scan, the progressive JPEG only has a small amount of information available for the image, (e.g. only the average color of 8x8 blocks). Known as the DC-only scan, because the average color of each 8x8 block is called DC-component in the discrete cosine transform, it is the basis of JPEG image compression. Check out this computerphile video on JPEG DCT for a basic introduction. Instead of displaying an image that consists of 8x8 blocks, JPEG rendering in Chrome and Firefox choose to render the preview with some smoothing, to provide a less distracting experience.

Progressive JPEG XLs

While the quality (and therefore byte-sizes) of the individual scans in a progressive JPEG image can be controlled, the order within a scan is still top to bottom, like in a sequential JPEG. JPEG XL goes beyond that by making it possible to send the data necessary to display all details of the most salient parts first, followed by the less salient parts. For example, in a portrait, we can decide to first send the bytes for the face, and then, for the out-of-focus background.

In general, progressive JPEG XL works in the following way:
  • There is always an 8x8 downsampled image available (similar to a DC-only scan in a progressive JPEG). The decoder can display that with a nice upsampling, which gives the impression of a smoothed version of the image.
  • The image is divided into square groups (typically of size 256 x 256) and it is possible to provide an order of these groups during encoding. In particular, we can order the groups by saliency and choose an order that anticipates where the viewer might look first, while not being disturbing.
While the format allows for a very flexible order of the groups, our current encoder chooses a starting group and then grows concentric squares around that group. This is because we expect that this will be less distracting to the user. To make successive updates even less noticeable, we smooth the boundary between groups for which all the data has arrived and those that still contain an incomplete approximation. One requirement of this technique is a good way of identifying where the salient parts of an image are, which is needed when encoding an image. This information is typically represented by a saliency map which can be visualized as a heatmap image, where the more salient parts are redder.

Original image next to saliency map image
Original image.                                                                                                             Saliency map.

Smooth DC-image next to image with group border
Smooth DC-image.                                                                                                  Image with group order.

Stay tuned for videos showing progressive JPEG XL in action.

How to find good saliency maps for images

Saliency prediction models (overview) aim at predicting which regions in an image will attract human attention. To predict saliency effectively, our model leverages the power of deep neural nets to consider both high level semantic signals like face, objects, shapes etc., as well as low or medium level signals like color, intensity, texture, and so on. The model is trained on a large scale public gaze/saliency data set, to make sure the predicted saliency best mimics human gaze/fixation behaviour on each image. The model takes an image as the input and output a saliency map, which can serve as a visual importance map, and hence help determine the decoding order for each region in the image. Example images and their predicted saliency are as follows:

Example images and their predicted saliency

At the time of writing (July 2021), Chrome and Firefox did not yet support decoding JPEG XL image progressively in the way we describe, but the spec does allow encoding arbitrary group orders.

Different users have different experiences when it comes to looking at images loading on the web.We hope that this way of progressively delivering images will improve user experience especially on lower-bandwidth connections.

By Moritz Firsching and Junfeng He – Google Research

More options to help websites preview their content on Google Search

Google uses content previews, including text snippets and other media, to help people decide whether a result is relevant to their query. The type of preview shown depends on many factors, including the type of content a person is looking for and the kind of device they're viewing it on.

For instance, if you look for recipe results on Google, you may see thumbnail images and user ratings--things that may be more helpful than text snippets when it comes to deciding what you want to eat. Alternately, or perhaps you're looking for a concert nearby, and are able to check out the events directly in the search results. These are made possible by publishers marking up their pages with structured data.

Google automatically generates previews in a way intended to help a user understand why the results shown are relevant to their search and why the user would want to visit the linked pages. However, we recognize that site owners may wish to independently adjust the extent of their preview content in search results. To make it easier for individual websites to define how much or which text should be available for snippeting and the extent to which other media should be included in their previews, we're now introducing several new settings for webmasters. 

Letting Google know about your snippet and content preview preferences

Previously, it was only possible to allow a textual snippet or to not allow one. We're now introducing a set of methods that allow more fine-grained configuration of the preview content shown for your pages. This is done through two types of new settings: a set of robots meta tags and an HTML attribute. 

Using robots meta tags

The robots meta tag is added to an HTML page's <head>, or specified via the x-robots-tag HTTP header. The robots meta tags addressing the preview content for a page are:

  • "nosnippet"
    This is an existing option to specify that you don't want any textual snippet shown for this page. 
  • "max-snippet:[number]"
    New! Specify a maximum text-length, in characters, of a snippet for your page.
  • "max-video-preview:[number]"
    New! Specify a maximum duration in seconds of an animated video preview.
  • "max-image-preview:[setting]"
    New! Specify a maximum size of image preview to be shown for images on this page, using either "none", "standard", or "large".

They can be combined, for example:

<meta name="robots" content="max-snippet:50, max-image-preview:large">

Preview settings from these meta tags will become effective in mid-to-late October 2019 and may take about a week for the global rollout to complete.

Using the new data-nosnippet HTML attribute

A new way to help limit which part of a page is eligible to be shown as a snippet is the "data-nosnippet" HTML attribute on span, div, and section elements. With this, you can prevent that part of an HTML page from being shown within the textual snippet on the page.

For example:

<p><span data-nosnippet>Harry Houdini</span> is undoubtedly the most famous magician ever to live.</p>

The data-nosnippet HTML attribute will be start affecting presentation on Google products later this year. Learn more in our developer documentation for the robots meta tag, x-robots-tag, and data-nosnippet.

A note about rich results and featured snippets

Content in structured data is eligible for display as rich results in search. These kinds of results do not conform to limits declared in the above meta robots settings, but rather, can be addressed with much greater specificity by limiting or modifying the content provided in the structured data itself. For example, if a recipe is included in structured data, the contents of that structured data may be presented in a recipe carousel in the search results. Similarly, if an event is marked up with structured data, it may be presented as such in the search results. To limit that presentation, a publisher can limit the amount and type of content in the structured data. 

Some special features on Search depend on the availability of preview content, so limiting your previews may prevent your content from appearing in these areas. Featured snippets, for example, requires a certain minimum number of characters to be displayed. This can vary by language, which is why there is no exact max-snippets length we can provide to ensure appearing in this feature. Those who do not wish to have content appear as featured snippets can experiment with lower max-snippet lengths. Those who want a guaranteed way to opt-out of featured snippets should use nosnippet.

The AMP Format

The AMP format comes with certain benefits, including eligibility for more prominent presentation of thumbnail images in search results and in the Google Discover feed. These characteristics have been shown to drive more traffic to publishers’ articles. However, publishers who do not want Google to use larger thumbnail images when their AMP pages are presented in search and Discover can use the above meta robots settings to specify max-image-preview of “standard” or “none.”

These new options are available to content owners worldwide and will operate the same for results we display globally. We hope they make it easier for you to optimize the value you get from Search and achieve your business goals. For more information, check out our developer documentation on meta tags. Should you have any questions, feel free to reach out to us, or drop by our webmaster help forums.


Helping publishers and users get more out of visual searches on Google Images with AMP

Google Images has made a series of changes to help people explore, learn and do more through visual search. An important element of visual search is the ability for users to scan many ideas before coming to a decision, whether it’s purchasing a product, learning more about a stylish room, or finding instructions for a DIY project. Often this involves loading many web pages, which can slow down a search considerably and prevent users from completing a task. 

As previewed at Google I/O, we’re launching a new AMP-powered feature in Google Images on the mobile web, Swipe to Visit, which makes it faster and easier for users to browse and visit web pages. After a Google Images user selects an image to view on a mobile device, they will get a preview of the website header, which can be easily swiped up to load the web page instantly. 

Swipe to Visit uses AMP's prerender capability to show a preview of the page displayed at the bottom of the screen. When a user swipes up on the preview, the web page is displayed instantly and the publisher receives a pageview. The speed and ease of this experience makes it more likely for users to visit a publisher's site, while still allowing users to continue their browsing session.

Publishers who support AMP don’t need to take any additional action for their sites to appear in Swipe to Visit on Google Images. Publishers who don’t support AMP can learn more about getting started with AMP here. In the coming weeks, publishers can also view their traffic data from AMP in Google Images in a Search Console’s performance report for Google Images in a new search area named “AMP on Image result”.

We look forward to continuing to support the Google Images ecosystem with features that help users and publishers alike.


Helping publishers and users get more out of visual searches on Google Images with AMP

Google Images has made a series of changes to help people explore, learn and do more through visual search. An important element of visual search is the ability for users to scan many ideas before coming to a decision, whether it’s purchasing a product, learning more about a stylish room, or finding instructions for a DIY project. Often this involves loading many web pages, which can slow down a search considerably and prevent users from completing a task. 

As previewed at Google I/O, we’re launching a new AMP-powered feature in Google Images on the mobile web, Swipe to Visit, which makes it faster and easier for users to browse and visit web pages. After a Google Images user selects an image to view on a mobile device, they will get a preview of the website header, which can be easily swiped up to load the web page instantly. 

Swipe to Visit uses AMP's prerender capability to show a preview of the page displayed at the bottom of the screen. When a user swipes up on the preview, the web page is displayed instantly and the publisher receives a pageview. The speed and ease of this experience makes it more likely for users to visit a publisher's site, while still allowing users to continue their browsing session.

Publishers who support AMP don’t need to take any additional action for their sites to appear in Swipe to Visit on Google Images. Publishers who don’t support AMP can learn more about getting started with AMP here. In the coming weeks, publishers can also view their traffic data from AMP in Google Images in a Search Console’s performance report for Google Images in a new search area named “AMP on Image result”.

We look forward to continuing to support the Google Images ecosystem with features that help users and publishers alike.


Mobile-First indexing, structured data, images, and your site

It's been two years since we started working on "mobile-first indexing" - crawling the web with smartphone Googlebot, similar to how most users access it. We've seen websites across the world embrace the mobile web, making fantastic websites that work on all kinds of devices. There's still a lot to do, but today, we're happy to announce that we now use mobile-first indexing for over half of the pages shown in search results globally.

Checking for mobile-first indexing

In general, we move sites to mobile-first indexing when our tests assure us that they're ready. When we move sites over, we notify the site owner through a message in Search Console. It's possible to confirm this by checking the server logs, where a majority of the requests should be from Googlebot Smartphone. Even easier, the URL inspection tool allows a site owner to check how a URL from the site (it's usually enough to check the homepage) was last crawled and indexed.

If your site uses responsive design techniques, you should be all set! For sites that aren't using responsive web design, we've seen two kinds of issues come up more frequently in our evaluations:

Missing structured data on mobile pages

Structured data is very helpful to better understand the content on your pages, and allows us to highlight your pages in fancy ways in the search results. If you use structured data on the desktop versions of your pages, you should have the same structured data on the mobile versions of the pages. This is important because with mobile-first indexing, we'll only use the mobile version of your page for indexing, and will otherwise miss the structured data.

Testing your pages in this regard can be tricky. We suggest testing for structured data in general, and then comparing that to the mobile version of the page. For the mobile version, check the source code when you simulate a mobile device, or use the HTML generated with the mobile-friendly testing tool. Note that a page does not need to be mobile-friendly in order to be considered for mobile-first indexing.

Missing alt-text for images on mobile pages

The value of alt-attributes on images ("alt-text") is a great way to describe images to users with screen-readers (which are used on mobile too!), and to search engine crawlers. Without alt-text for images, it's a lot harder for Google Images to understand the context of images that you use on your pages.

Check "img" tags in the source code of the mobile version for representative pages of your website. As above, the source of the mobile version can be seen by either using the browser to simulate a mobile device, or by using the Mobile-Friendly test to check the Googlebot rendered version. Search the source code for "img" tags, and double-check that your page is providing appropriate alt-attributes for any that you want to have findable in Google Images.

For example, that might look like this:

With alt-text (good!):
<img src="cute-puppies.png" alt="A photo of cute puppies on a blanket">

Without alt-text:
<img src="sad-puppies.png">

It's fantastic to see so many great websites that work well on mobile! We're looking forward to being able to index more and more of the web using mobile-first indexing, helping more users to search the web in the same way that they access it: with a smartphone. We’ll continue to monitor and evaluate this change carefully. If you have any questions, please drop by our Webmaster forums or our public events.


An update to referral source URLs for Google Images

Every day, hundreds of millions of people use Google Images to visually discover and explore content on the web. Whether it be finding ideas for your next baking project, or visual instructions on how to fix a flat tire, exploring image results can sometimes be much more helpful than exploring text.

Updating the referral source

For webmasters, it hasn't always been easy to understand the role Google Images plays in driving site traffic. To address this, we will roll out a new referrer URL specific to Google Images over the next few months. The referrer URL is part of the HTTP header, and indicates the last page the user was on and clicked to visit the destination webpage.
If you create software to track or analyze website traffic, we want you to be prepared for this change. Make sure that you are ingesting the new referer URL, and attribute the traffic to Google Images. The new referer URL is: https://images.google.com.
If you use Google Analytics to track site data, the new referral URL will be automatically ingested and traffic will be attributed to Google Images appropriately. Just to be clear, this change will not affect Search Console. Webmasters will continue to receive an aggregate list of top search queries that drive traffic to their site.

How this affects country-specific queries

The new referer URL has the same country code top level domain (ccTLD) as the URL used for searching on Google Images. In practice, this means that most visitors worldwide come from images.google.com. That's because last year, we made a change so that google.com became the default choice for searchers worldwide. However, some users may still choose to go directly to a country specific service, such as google.co.uk for the UK. For this use case, the referer uses that country TLD (for example, images.google.co.uk).
We hope this change will foster a healthy visual content ecosystem. If you're interested in learning how to optimize your pages for Google Images, please refer to the Google Image Publishing Guidelines. If you have questions, feedback or suggestions, please let us know through the Webmaster Tools Help Forum.

Similar items: Rich products feature on Google Image Search

Image Search recently launched “Similar items” on mobile web and the Android Search app. The “Similar items” feature is designed to help users find products they love in photos that inspire them on Google Image Search. Using machine vision technology, the Similar items feature identifies products in lifestyle images and displays matching products to the user. Similar items supports handbags, sunglasses, and shoes and will cover other apparel and home & garden categories in the next few months.

The Similar items feature enables users to browse and shop inspirational fashion photography and find product info about items they’re interested in. Try it out by opening results from queries like [designer handbags].

Finding price and availability information was one of the top Image Search feature requests from our users. The Similar items carousel gets millions of impressions and clicks daily from all over the world.

To make your products eligible for Similar items, make sure to add and maintain schema.org product metadata on your pages. The schema.org/Product markup helps Google find product offerings on the web and give users an at-a-glance summary of product info.

To ensure that your products are eligible to appear in Similar items:

  • Ensure that the product offerings on your pages have schema.org product markup, including an image reference. Products with name, image, price & currency, and availability meta-data on their host page are eligible for Similar items
  • Test your pages with Google’s Structured Data Testing Tool to verify that the product markup is formatted correctly
  • See your images on image search by issuing the query “site:yourdomain.com.” For results with valid product markup, you may see product information appear once you tap on the images from your site. It can take up to a week for Googlebot to recrawl your website.

Right now, Similar items is available on mobile browsers and the Android Google Search App globally, and we plan to expand to more platforms in 2017.

If you have questions, find us in the dedicated Structured data section of our forum, on Twitter, or on Google+. To prevent your images from showing in Similar items, webmasters can opt-out of Google Image Search.

We’re excited to help users find your products on the web by showcasing buyable items. Thanks for partnering with us to make the web more shoppable!

Introducing the Open Images Dataset

Originally posted on the Google Research Blog

In the last few years, advances in machine learning have enabled Computer Vision to progress rapidly, allowing for systems that can automatically caption images to apps that can create natural language replies in response to shared photos. Much of this progress can be attributed to publicly available image datasets, such as ImageNet and COCO for supervised learning, and YFCC100M for unsupervised learning.

Today, we introduce Open Images, a dataset consisting of ~9 million URLs to images that have been annotated with labels spanning over 6000 categories. We tried to make the dataset as practical as possible: the labels cover more real-life entities than the 1000 ImageNet classes, there are enough images to train a deep neural network from scratch and the images are listed as having a Creative Commons Attribution license*.

The image-level annotations have been populated automatically with a vision model similar to Google Cloud Vision API. For the validation set, we had human raters verify these automated labels to find and remove false positives. On average, each image has about 8 labels assigned. Here are some examples:
Annotated images form the Open Images dataset. Left: Ghost Arches by Kevin Krejci. Right: Some Silverware by J B. Both images used under CC BY 2.0 license
We have trained an Inception v3 model based on Open Images annotations alone, and the model is good enough to be used for fine-tuning applications as well as for other things, like DeepDream or artistic style transfer which require a well developed hierarchy of filters. We hope to improve the quality of the annotations in Open Images the coming months, and therefore the quality of models which can be trained.

The dataset is a product of a collaboration between Google, CMU and Cornell universities, and there are a number of research papers built on top of the Open Images dataset in the works. It is our hope that datasets like Open Images and the recently released YouTube-8M will be useful tools for the machine learning community.

By Ivan Krasin and Tom Duerig, Software Engineers

* While we tried to identify images that are licensed under a Creative Commons Attribution license, we make no representations or warranties regarding the license status of each image and you should verify the license for each image yourself.