Tag Archives: Video

Introducing the Indexing API and structured data for livestreams

Over the past few years, it's become easier than ever to stream live videos online, from celebrity updates to special events. But it's not always easy for people to determine which videos are live and know when to tune in.
Today, we're introducing new tools to help more people discover your livestreams in Search and Assistant. With livestream structured data and the Indexing API, you can let Google know when your video is live, so it will be eligible to appear with a red "live" badge:

Add livestream structured data to your page

If your website streams live videos, use the livestream developer documentation to flag your video as a live broadcast and mark the start and end times. In addition, VideoObject structured data is required to tell Google that there's a video on your page.

Update Google quickly with the Indexing API

The Indexing API now supports pages with livestream structured data. We encourage you to call the Indexing API to request that your site is crawled in time for the livestream. We recommend calling the Indexing API when your livestream begins and ends, and if the structured data changes.
For more information, visit our developer documentation. If you have any questions, ask us in the Webmaster Help Forum. We look forward to seeing your live videos on Google!

Changes to the URL Performance Report for YouTube video placements

What's changing?
The URL_PERFORMANCE_REPORT in the AdWords API will exclude information for YouTube video placements starting October 30, 2018, in keeping with our data retention policies. As a result, placements where the Url field has a domain of www.youtube.com will no longer appear in the report. New and improved placement reports will be available in one of the upcoming releases of the new Google Ads API.

What you should do
Review your application and workflows and make the necessary changes to ensure that the exclusion of video placements in this report will not cause problems. Watch this blog for updates regarding new placement reports in the Google Ads API.

If you have any questions or need help, please contact us via the forum.

Verifying your Google Assistant media action integrations on Android

Posted by Nevin Mital, Partner Developer Relations

The Media Controller Test (MCT) app is a powerful tool that allows you to test the intricacies of media playback on Android, and it's just gotten even more useful. Media experiences including voice interactions via the Google Assistant on Android phones, cars, TVs, and headphones, are powered by Android MediaSession APIs. This tool will help you verify your integrations. We've now added a new verification testing framework that can be used to help automate your QA testing.

The MCT is meant to be used in conjunction with an app that implements media APIs, such as the Universal Android Music Player. The MCT surfaces information about the media app's MediaController, such as the PlaybackState and Metadata, and can be used to test inter-app media controls.

The Media Action Lifecycle can be complex to follow; even in a simple Play From Search request, there are many intermediate steps (simplified timeline depicted below) where something could go wrong. The MCT can be used to help highlight any inconsistencies in how your music app handles MediaController TransportControl requests.

Timeline of the interaction between the User, the Google Assistant, and the third party Android App for a Play From Search request.

Previously, using the MCT required a lot of manual interaction and monitoring. The new verification testing framework offers one-click tests that you can run to ensure that your media app responds correctly to a playback request.

Running a verification test

To access the new verification tests in the MCT, click the Test button next to your desired media app.

MCT Screenshot of launch screen; contains a list of installed media apps, with an option to go to either the Control or Test view for each.

The next screen shows you detailed information about the MediaController, for example the PlaybackState, Metadata, and Queue. There are two buttons on the toolbar in the top right: the button on the left toggles between parsable and formatted logs, and the button on the right refreshes this view to display the most current information.

MCT Screenshot of the left screen in the Testing view for UAMP; contains information about the Media Controller's Playback State, Metadata, Repeat Mode, Shuffle Mode, and Queue.

By swiping to the left, you arrive at the verification tests view, where you can see a scrollable list of defined tests, a text field to enter a query for tests that require one, and a section to display the results of the test.

MCT Screenshot of the right screen in the Testing view for UAMP; contains a list of tests, a query text field, and a results display section.

As an example, to run the Play From Search Test, you can enter a search query into the text field then hit the Run Test button. Looks like the test succeeded!

MCT Screenshot of the right screen in the Testing view for UAMP; the Play From Search test was run with the query 'Memories' and ended successfully.

Below are examples of the Pause Test (left) and Seek To test (right).

MCT Screenshot of the right screen in the Testing view for UAMP; a Pause test was run successfully. MCT Screenshot of the right screen in the Testing view for UAMP; a Seek To test was run successfully.

Android TV

The MCT now also works on Android TV! For your media app to work with the Android TV version of the MCT, your media app must have a MediaBrowserService implementation. Please see here for more details on how to do this.

On launching the MCT on Android TV, you will see a list of installed media apps. Note that an app will only appear in this list if it implements the MediaBrowserService.

Android TV MCT Screenshot of the launch screen; contains a list of installed media apps that implement the MediaBrowserService.

Selecting an app will take you to the testing screen, which will display a list of verification tests on the right.

Android TV MCT Screenshot of the testing screen; contains a list of tests on the right side.

Running a test will populate the left side of the screen with selected MediaController information. For more details, please check the MCT logs in Logcat.

Android TV MCT Screenshot of the testing screen; the Pause test was run successfully and the left side of the screen now displays selected MediaController information.

Tests that require a query are marked with a keyboard icon. Clicking on one of these tests will open an input field for the query. Upon hitting Enter, the test will run.

Android TV MCT Screenshot of the testing screen; clicking on the Seek To test opened an input field for the query.

To make text input easier, you can also use the ADB command:

adb shell input text [query]

Note that '%s' will add a space between words. For example, the command adb shell input text hello%sworld will add the text "hello world" to the input field.

What's next

The MCT currently includes simple single-media-action tests for the following requests:

  • Play
  • Play From Search
  • Play From Media ID
  • Play From URI
  • Pause
  • Stop
  • Skip To Next
  • Skip To Previous
  • Skip To Queue Item
  • Seek To

For a technical deep dive on how the tests are structured and how to add more tests, visit the MCT GitHub Wiki. We'd love for you to submit pull requests with more tests that you think are useful to have and for any bug fixes. Please make sure to review the contributions process for more information.

Check out the latest updates on GitHub!

Automatic Photography with Google Clips



To me, photography is the simultaneous recognition, in a fraction of a second, of the significance of an event as well as of a precise organization of forms which give that event its proper expression.
Henri Cartier-Bresson

The last few years have witnessed a Cambrian-like explosion in AI, with deep learning methods enabling computer vision algorithms to recognize many of the elements of a good photograph: people, smiles, pets, sunsets, famous landmarks and more. But, despite these recent advancements, automatic photography remains a very challenging problem. Can a camera capture a great moment automatically?

Recently, we released Google Clips, a new, hands-free camera that automatically captures interesting moments in your life. We designed Google Clips around three important principles:
  • We wanted all computations to be performed on-device. In addition to extending battery life and reducing latency, on-device processing means that none of your clips leave the device unless you decide to save or share them, which is a key privacy control.
  • We wanted the device to capture short videos, rather than single photographs. Moments with motion can be more poignant and true-to-memory, and it is often easier to shoot a video around a compelling moment than it is to capture a perfect, single instant in time.
  • We wanted to focus on capturing candid moments of people and pets, rather than the more abstract and subjective problem of capturing artistic images. That is, we did not attempt to teach Clips to think about composition, color balance, light, etc.; instead, Clips focuses on selecting ranges of time containing people and animals doing interesting activities.
Learning to Recognize Great Moments
How could we train an algorithm to recognize interesting moments? As with most machine learning problems, we started with a dataset. We created a dataset of thousands of videos in diverse scenarios where we imagined Clips being used. We also made sure our dataset represented a wide range of ethnicities, genders, and ages. We then hired expert photographers and video editors to pore over this footage to select the best short video segments. These early curations gave us examples for our algorithms to emulate. However, it is challenging to train an algorithm solely from the subjective selection of the curators — one needs a smooth gradient of labels to teach an algorithm to recognize the quality of content, ranging from "perfect" to "terrible."

To address this problem, we took a second data-collection approach, with the goal of creating a continuous quality score across the length of a video. We split each video into short segments (similar to the content Clips captures), randomly selected pairs of segments, and asked human raters to select the one they prefer.
We took this pairwise comparison approach, instead of having raters score videos directly, because it is much easier to choose the better of a pair than it is to specify a number. We found that raters were very consistent in pairwise comparisons, and less so when scoring directly. Given enough pairwise comparisons for any given video, we were able to compute a continuous quality score over the entire length. In this process, we collected over 50,000,000 pairwise comparisons on clips sampled from over 1,000 videos. That’s a lot of human effort!
Training a Clips Quality Model
Given this quality score training data, our next step was to train a neural network model to estimate the quality of any photograph captured by the device. We started with the basic assumption that knowing what’s in the photograph (e.g., people, dogs, trees, etc.) will help determine “interestingness”. If this assumption is correct, we could learn a function that uses the recognized content of the photograph to predict its quality score derived above from human comparisons.

To identify content labels in our training data, we leveraged the same Google machine learning technology that powers Google image search and Google Photos, which can recognize over 27,000 different labels describing objects, concepts, and actions. We certainly didn’t need all these labels, nor could we compute them all on device, so our expert photographers selected the few hundred labels they felt were most relevant to predicting the “interestingness” of a photograph. We also added the labels most highly correlated with the rater-derived quality scores.

Once we had this subset of labels, we then needed to design a compact, efficient model that could predict them for any given image, on-device, within strict power and thermal limits. This presented a challenge, as the deep learning techniques behind computer vision typically require strong desktop GPUs, and algorithms adapted to run on mobile devices lag far behind state-of-the-art techniques on desktop or cloud. To train this on-device model, we first took a large set of photographs and again used Google’s powerful, server-based recognition models to predict label confidence for each of the “interesting” labels described above. We then trained a MobileNet Image Content Model (ICM) to mimic the predictions of the server-based model. This compact model is capable of recognizing the most interesting elements of photographs, while ignoring non-relevant content.

The final step was to predict a single quality score for an input photograph from its content predicted by the ICM, using the 50M pairwise comparisons as training data. This score is computed with a piecewise linear regression model that combines the output of the ICM into a frame quality score. This frame quality score is averaged across the video segment to form a moment score. Given a pairwise comparison, our model should compute a moment score that is higher for the video segment preferred by humans. The model is trained so that its predictions match the human pairwise comparisons as well as possible.
Diagram of the training process for generating frame quality scores. Piecewise linear regression maps from an ICM embedding to a score which, when averaged across a video segment, yields a moment score. The moment score of the preferred segment should be higher.
This process allowed us to train a model that combines the power of Google image recognition technology with the wisdom of human raters–represented by 50 million opinions on what makes interesting content!

While this data-driven score does a great job of identifying interesting (and non-interesting) moments, we also added some bonuses to our overall quality score for phenomena that we know we want Clips to capture, including faces (especially recurring and thus “familiar” ones), smiles, and pets. In our most recent release, we added bonuses for certain activities that customers particularly want to capture, such as hugs, kisses, jumping, and dancing. Recognizing these activities required extensions to the ICM model.

Shot Control
Given this powerful model for predicting the “interestingness” of a scene, the Clips camera can decide which moments to capture in real-time. Its shot control algorithms follow three main principles:
  1. Respect Power & Thermals: We want the Clips battery to last roughly three hours, and we don’t want the device to overheat — the device can’t run at full throttle all the time. Clips spends much of its time in a low-power mode that captures one frame per second. If the quality of that frame exceeds a threshold set by how much Clips has recently shot, it moves into a high-power mode, capturing at 15 fps. Clips then saves a clip at the first quality peak encountered.
  2. Avoid Redundancy: We don’t want Clips to capture all of its moments at once, and ignore the rest of a session. Our algorithms therefore cluster moments into visually similar groups, and limit the number of clips in each cluster.
  3. The Benefit of Hindsight: It’s much easier to determine which clips are the best when you can examine the totality of clips captured. Clips therefore captures more moments than it intends to show to the user. When clips are ready to be transferred to the phone, the Clips device takes a second look at what it has shot, and only transfers the best and least redundant content.
Machine Learning Fairness
In addition to making sure our video dataset represented a diverse population, we also constructed several other tests to assess the fairness of our algorithms. We created controlled datasets by sampling subjects from different genders and skin tones in a balanced manner, while keeping variables like content type, duration, and environmental conditions constant. We then used this dataset to test that our algorithms had similar performance when applied to different groups. To help detect any regressions in fairness that might occur as we improved our moment quality models, we added fairness tests to our automated system. Any change to our software was run across this battery of tests, and was required to pass. It is important to note that this methodology can’t guarantee fairness, as we can’t test for every possible scenario and outcome. However, we believe that these steps are an important part of our long-term work to achieve fairness in ML algorithms.

Conclusion
Most machine learning algorithms are designed to estimate objective qualities – a photo contains a cat, or it doesn’t. In our case, we aim to capture a more elusive and subjective quality – whether a personal photograph is interesting, or not. We therefore combine the objective, semantic content of photographs with subjective human preferences to build the AI behind Google Clips. Also, Clips is designed to work alongside a person, rather than autonomously; to get good results, a person still needs to be conscious of framing, and make sure the camera is pointed at interesting content. We’re happy with how well Google Clips performs, and are excited to continue to improve our algorithms to capture that “perfect” moment!

Acknowledgements
The algorithms described here were conceived and implemented by a large group of Google engineers, research scientists, and others. Figures were made by Lior Shapira. Thanks to Lior and Juston Payne for video content.

Source: Google AI Blog


Introducing the Webmaster Video Series, now in Hindi

Google offers a broad range of resources, in multiple languages, to help you better understand your website and improve its performance. The recently released Search Engine Optimization (SEO) Starter Guide, the Help Center, the Webmaster forums (which are available in 16 languages), and the various Webmaster blogs are just a few of them.
A few months ago, we launched the SEO Snippets video series, where the Google team answered some of the webmaster and SEO questions that we regularly see on the Webmaster Central Help Forum. We are now launching a similar series in Hindi, called the SEO Snippets in Hindi.

IFrom deciding what language to create content in (Hindi vs. Hinglish) to duplicate content, we’re answering the most frequently asked questions on the Hindi Webmaster forum and the India Webmaster community on Google+, in Hindi.
Check out the links shared in the videos to get more helpful webmaster information, drop by our help forum and subscribe to our YouTube channel for more tips and insights!

Update to Engagement Reporting for Bumper Ads

Historically, AdWords API reporting has not included engagements for bumper ads. Bumper ads are video ads that are 6 seconds or shorter, appear at the beginning of a YouTube video, and can't be skipped.

Bumper ads support “drawer open” engagements, where a user can mouse over the ad to expand a widget with more information. These engagements were previously not included in the Engagements and EngagementRate fields in reports. Starting in mid-February 2018, we are going to be changing this behavior for all historical and future bumper ad reporting to include these engagements. This brings bumper ads in line with other types of video ads, which already reported these engagements.

This means that your historical reporting data, starting up to two years ago in January 2016, will be updated to include this statistic to bring it inline with future data.

If you have any questions about this migration, please contact us via the forum.

Introducing the new Webmaster Video Series

Google has a broad range of resources to help you better understand your website and improve its performance. This Webmaster Central Blog, the Help Center, the Webmaster forum, and the recently released Search Engine Optimization (SEO) Starter Guide are just a few.

We also have a YouTube channel, for answers to your questions in video format. To help with short & to the point answers to specific questions, we've just launched a new series, which we call SEO Snippets.

In this series of short videos, the Google team will be answering some of the webmaster and SEO questions that we regularly see on the Webmaster Central Help Forum. From 404 errors, how and when crawling works, a site's URL structure, to duplicate content, we'll have something here for you.

Check out the links shared in the videos to get more helpful webmaster information, drop by our help forum and subscribe to our YouTube channel for more tips and insights!


Gmail Add-ons framework now available to all developers

Originally posted by Wesley Chun, G Suite Developer Advocate on the G Suite Blog

Email remains at the heart of how companies operate. That's why earlier this year, we previewed Gmail Add-ons—a way to help businesses speed up workflows. Since then, we've seen partners build awesome applications, and beginning today, we're extending the Gmail add-on preview to include all developers. Now anyone can start building a Gmail add-on.

Gmail Add-ons let you integrate your app into Gmail and extend Gmail to handle quick actions.

They are built using native UI context cards that can include simple text dialogs, images, links, buttons and forms. The add-on appears when relevant, and the user is just a click away from your app's rich and integrated functionality.

Gmail Add-ons are easy to create. You only have to write code once for your add-on to work on both web and mobile, and you can choose from a rich palette of widgets to craft a custom UI. Create an add-on that contextually surfaces cards based on the content of a message. Check out this video to see how we created an add-on to collate email receipts and expedite expense reporting.

Per the video, you can see that there are three components to the app's core functionality. The first component is getContextualAddOn()—this is the entry point for all Gmail Add-ons where data is compiled to build the card and render it within the Gmail UI. Since the add-on is processing expense reports from email receipts in your inbox, the createExpensesCard()parses the relevant data from the message and presents them in a form so your users can confirm or update values before submitting. Finally, submitForm()takes the data and writes a new row in an "expenses" spreadsheet in Google Sheets, which you can edit and tweak, and submit for approval to your boss.

Check out the documentation to get started with Gmail Add-ons, or if you want to see what it's like to build an add-on, go to the codelab to build ExpenseItstep-by-step. While you can't publish your add-on just yet, you can fill out this form to get notified when publishing is opened. We can't wait to see what Gmail Add-ons you build!

Gmail add-ons framework now available to all developers



Email remains at the heart of how companies operate. That’s why earlier this year, we previewed Gmail Add-ons—a way to help businesses speed up workflows. Since then, we’ve seen partners build awesome applications, and beginning today, we’re extending the Gmail add-on preview to include all developers. Now anyone can start building a Gmail add-on.

Gmail Add-ons let you integrate your app into Gmail and extend Gmail to handle quick actions. They are built using native UI context cards that can include simple text dialogs, images, links, buttons and forms. The add-on appears when relevant, and the user is just a click away from your app's rich and integrated functionality.

Gmail Add-ons are easy to create. You only have to write code once for your add-on to work on both web and mobile, and you can choose from a rich palette of widgets to craft a custom UI. Create an add-on that contextually surfaces cards based on the content of a message. Check out this video to see how we created an add-on to collate email receipts and expedite expense reporting.

Per the video, you can see that there are three components to the app’s core functionality. The first component is getContextualAddOn()—this is the entry point for all Gmail Add-ons where data is compiled to build the card and render it within the Gmail UI. Since the add-on is processing expense reports from email receipts in your inbox, the createExpensesCard()parses the relevant data from the message and presents them in a form so your users can confirm or update values before submitting. Finally, submitForm() takes the data and writes a new row in an “expenses” spreadsheet in Google Sheets, which you can edit and tweak, and submit for approval to your boss.

Check out the documentation to get started with Gmail Add-ons, or if you want to see what it's like to build an add-on, go to the codelab to build ExpenseIt step-by-step. While you can't publish your add-on just yet, you can fill out this form to get notified when publishing is opened. We can’t wait to see what Gmail Add-ons you build!

Announcing AVA: A Finely Labeled Video Dataset for Human Action Understanding



Teaching machines to understand human actions in videos is a fundamental research problem in Computer Vision, essential to applications such as personal video search and discovery, sports analysis, and gesture interfaces. Despite exciting breakthroughs made over the past years in classifying and finding objects in images, recognizing human actions still remains a big challenge. This is due to the fact that actions are, by nature, less well-defined than objects in videos, making it difficult to construct a finely labeled action video dataset. And while many benchmarking datasets, e.g., UCF101, ActivityNet and DeepMind’s Kinetics, adopt the labeling scheme of image classification and assign one label to each video or video clip in the dataset, no dataset exists for complex scenes containing multiple people who could be performing different actions.

In order to facilitate further research into human action recognition, we have released AVA, coined from “atomic visual actions”, a new dataset that provides multiple action labels for each person in extended video sequences. AVA consists of URLs for publicly available videos from YouTube, annotated with a set of 80 atomic actions (e.g. “walk”, “kick (an object)”, “shake hands”) that are spatial-temporally localized, resulting in 57.6k video segments, 96k labeled humans performing actions, and a total of 210k action labels. You can browse the website to explore the dataset and download annotations, and read our arXiv paper that describes the design and development of the dataset.

Compared with other action datasets, AVA possesses the following key characteristics:
  • Person-centric annotation. Each action label is associated with a person rather than a video or clip. Hence, we are able to assign different labels to multiple people performing different actions in the same scene, which is quite common.
  • Atomic visual actions. We limit our action labels to fine temporal scales (3 seconds), where actions are physical in nature and have clear visual signatures.
  • Realistic video material. We use movies as the source of AVA, drawing from a variety of genres and countries of origin. As a result, a wide range of human behaviors appear in the data.
Examples of 3-second video segments (from Video Source) with their bounding box annotations in the middle frame of each segment. (For clarity, only one bounding box is shown for each example.)

To create AVA, we first collected a diverse set of long form content from YouTube, focusing on the “film” and “television” categories, featuring professional actors of many different nationalities. We analyzed a 15 minute clip from each video, and uniformly partitioned it into 300 non-overlapping 3-second segments. The sampling strategy preserved sequences of actions in a coherent temporal context.

Next, we manually labeled all bounding boxes of persons in the middle frame of each 3-second segment. For each person in the bounding box, annotators selected a variable number of labels from a pre-defined atomic action vocabulary (with 80 classes) that describe the person’s actions within the segment. These actions were divided into three groups: pose/movement actions, person-object interactions, and person-person interactions. Because we exhaustively labeled all people performing all actions, the frequencies of AVA’s labels followed a long-tail distribution, as summarized below.
Distribution of AVA’s atomic action labels. Labels displayed in the x-axis are only a partial set of our vocabulary.

The unique design of AVA allows us to derive some interesting statistics that are not available in other existing datasets. For example, given the large number of persons with at least two labels, we can measure the co-occurrence patterns of action labels. The figure below shows the top co-occurring action pairs in AVA with their co-occurrence scores. We confirm expected patterns such as people frequently play instruments while singing, lift a person while playing with kids, and hug while kissing.
Top co-occurring action pairs in AVA.

To evaluate the effectiveness of human action recognition systems on the AVA dataset, we implemented an existing baseline deep learning model that obtains highly competitive performance on the much smaller JHMDB dataset. Due to challenging variations in zoom, background clutter, cinematography, and appearance variation, this model achieves a relatively modest performance when correctly identifying actions on AVA (18.4% mAP). This suggests that AVA will be a useful testbed for developing and evaluating new action recognition architectures and algorithms for years to come.

We hope that the release of AVA will help improve the development of human action recognition systems, and provide opportunities to model complex activities based on labels with fine spatio-temporal granularity at the level of individual person’s actions. We will continue to expand and improve AVA, and are eager to hear feedback from the community to help us guide future directions. Please join the AVA users mailing list to receive dataset updates as well as to send us emails for feedback.

Acknowledgements
The core team behind AVA includes Chunhui Gu, Chen Sun, David Ross, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. We thank many Google colleagues and annotators for their dedicated support on this project.