Tag Archives: Video

Automatic Photography with Google Clips



To me, photography is the simultaneous recognition, in a fraction of a second, of the significance of an event as well as of a precise organization of forms which give that event its proper expression.
Henri Cartier-Bresson

The last few years have witnessed a Cambrian-like explosion in AI, with deep learning methods enabling computer vision algorithms to recognize many of the elements of a good photograph: people, smiles, pets, sunsets, famous landmarks and more. But, despite these recent advancements, automatic photography remains a very challenging problem. Can a camera capture a great moment automatically?

Recently, we released Google Clips, a new, hands-free camera that automatically captures interesting moments in your life. We designed Google Clips around three important principles:
  • We wanted all computations to be performed on-device. In addition to extending battery life and reducing latency, on-device processing means that none of your clips leave the device unless you decide to save or share them, which is a key privacy control.
  • We wanted the device to capture short videos, rather than single photographs. Moments with motion can be more poignant and true-to-memory, and it is often easier to shoot a video around a compelling moment than it is to capture a perfect, single instant in time.
  • We wanted to focus on capturing candid moments of people and pets, rather than the more abstract and subjective problem of capturing artistic images. That is, we did not attempt to teach Clips to think about composition, color balance, light, etc.; instead, Clips focuses on selecting ranges of time containing people and animals doing interesting activities.
Learning to Recognize Great Moments
How could we train an algorithm to recognize interesting moments? As with most machine learning problems, we started with a dataset. We created a dataset of thousands of videos in diverse scenarios where we imagined Clips being used. We also made sure our dataset represented a wide range of ethnicities, genders, and ages. We then hired expert photographers and video editors to pore over this footage to select the best short video segments. These early curations gave us examples for our algorithms to emulate. However, it is challenging to train an algorithm solely from the subjective selection of the curators — one needs a smooth gradient of labels to teach an algorithm to recognize the quality of content, ranging from "perfect" to "terrible."

To address this problem, we took a second data-collection approach, with the goal of creating a continuous quality score across the length of a video. We split each video into short segments (similar to the content Clips captures), randomly selected pairs of segments, and asked human raters to select the one they prefer.
We took this pairwise comparison approach, instead of having raters score videos directly, because it is much easier to choose the better of a pair than it is to specify a number. We found that raters were very consistent in pairwise comparisons, and less so when scoring directly. Given enough pairwise comparisons for any given video, we were able to compute a continuous quality score over the entire length. In this process, we collected over 50,000,000 pairwise comparisons on clips sampled from over 1,000 videos. That’s a lot of human effort!
Training a Clips Quality Model
Given this quality score training data, our next step was to train a neural network model to estimate the quality of any photograph captured by the device. We started with the basic assumption that knowing what’s in the photograph (e.g., people, dogs, trees, etc.) will help determine “interestingness”. If this assumption is correct, we could learn a function that uses the recognized content of the photograph to predict its quality score derived above from human comparisons.

To identify content labels in our training data, we leveraged the same Google machine learning technology that powers Google image search and Google Photos, which can recognize over 27,000 different labels describing objects, concepts, and actions. We certainly didn’t need all these labels, nor could we compute them all on device, so our expert photographers selected the few hundred labels they felt were most relevant to predicting the “interestingness” of a photograph. We also added the labels most highly correlated with the rater-derived quality scores.

Once we had this subset of labels, we then needed to design a compact, efficient model that could predict them for any given image, on-device, within strict power and thermal limits. This presented a challenge, as the deep learning techniques behind computer vision typically require strong desktop GPUs, and algorithms adapted to run on mobile devices lag far behind state-of-the-art techniques on desktop or cloud. To train this on-device model, we first took a large set of photographs and again used Google’s powerful, server-based recognition models to predict label confidence for each of the “interesting” labels described above. We then trained a MobileNet Image Content Model (ICM) to mimic the predictions of the server-based model. This compact model is capable of recognizing the most interesting elements of photographs, while ignoring non-relevant content.

The final step was to predict a single quality score for an input photograph from its content predicted by the ICM, using the 50M pairwise comparisons as training data. This score is computed with a piecewise linear regression model that combines the output of the ICM into a frame quality score. This frame quality score is averaged across the video segment to form a moment score. Given a pairwise comparison, our model should compute a moment score that is higher for the video segment preferred by humans. The model is trained so that its predictions match the human pairwise comparisons as well as possible.
Diagram of the training process for generating frame quality scores. Piecewise linear regression maps from an ICM embedding to a score which, when averaged across a video segment, yields a moment score. The moment score of the preferred segment should be higher.
This process allowed us to train a model that combines the power of Google image recognition technology with the wisdom of human raters–represented by 50 million opinions on what makes interesting content!

While this data-driven score does a great job of identifying interesting (and non-interesting) moments, we also added some bonuses to our overall quality score for phenomena that we know we want Clips to capture, including faces (especially recurring and thus “familiar” ones), smiles, and pets. In our most recent release, we added bonuses for certain activities that customers particularly want to capture, such as hugs, kisses, jumping, and dancing. Recognizing these activities required extensions to the ICM model.

Shot Control
Given this powerful model for predicting the “interestingness” of a scene, the Clips camera can decide which moments to capture in real-time. Its shot control algorithms follow three main principles:
  1. Respect Power & Thermals: We want the Clips battery to last roughly three hours, and we don’t want the device to overheat — the device can’t run at full throttle all the time. Clips spends much of its time in a low-power mode that captures one frame per second. If the quality of that frame exceeds a threshold set by how much Clips has recently shot, it moves into a high-power mode, capturing at 15 fps. Clips then saves a clip at the first quality peak encountered.
  2. Avoid Redundancy: We don’t want Clips to capture all of its moments at once, and ignore the rest of a session. Our algorithms therefore cluster moments into visually similar groups, and limit the number of clips in each cluster.
  3. The Benefit of Hindsight: It’s much easier to determine which clips are the best when you can examine the totality of clips captured. Clips therefore captures more moments than it intends to show to the user. When clips are ready to be transferred to the phone, the Clips device takes a second look at what it has shot, and only transfers the best and least redundant content.
Machine Learning Fairness
In addition to making sure our video dataset represented a diverse population, we also constructed several other tests to assess the fairness of our algorithms. We created controlled datasets by sampling subjects from different genders and skin tones in a balanced manner, while keeping variables like content type, duration, and environmental conditions constant. We then used this dataset to test that our algorithms had similar performance when applied to different groups. To help detect any regressions in fairness that might occur as we improved our moment quality models, we added fairness tests to our automated system. Any change to our software was run across this battery of tests, and was required to pass. It is important to note that this methodology can’t guarantee fairness, as we can’t test for every possible scenario and outcome. However, we believe that these steps are an important part of our long-term work to achieve fairness in ML algorithms.

Conclusion
Most machine learning algorithms are designed to estimate objective qualities – a photo contains a cat, or it doesn’t. In our case, we aim to capture a more elusive and subjective quality – whether a personal photograph is interesting, or not. We therefore combine the objective, semantic content of photographs with subjective human preferences to build the AI behind Google Clips. Also, Clips is designed to work alongside a person, rather than autonomously; to get good results, a person still needs to be conscious of framing, and make sure the camera is pointed at interesting content. We’re happy with how well Google Clips performs, and are excited to continue to improve our algorithms to capture that “perfect” moment!

Acknowledgements
The algorithms described here were conceived and implemented by a large group of Google engineers, research scientists, and others. Figures were made by Lior Shapira. Thanks to Lior and Juston Payne for video content.

Source: Google AI Blog


Introducing the Webmaster Video Series, now in Hindi

Google offers a broad range of resources, in multiple languages, to help you better understand your website and improve its performance. The recently released Search Engine Optimization (SEO) Starter Guide, the Help Center, the Webmaster forums (which are available in 16 languages), and the various Webmaster blogs are just a few of them.
A few months ago, we launched the SEO Snippets video series, where the Google team answered some of the webmaster and SEO questions that we regularly see on the Webmaster Central Help Forum. We are now launching a similar series in Hindi, called the SEO Snippets in Hindi.

IFrom deciding what language to create content in (Hindi vs. Hinglish) to duplicate content, we’re answering the most frequently asked questions on the Hindi Webmaster forum and the India Webmaster community on Google+, in Hindi.
Check out the links shared in the videos to get more helpful webmaster information, drop by our help forum and subscribe to our YouTube channel for more tips and insights!

Update to Engagement Reporting for Bumper Ads

Historically, AdWords API reporting has not included engagements for bumper ads. Bumper ads are video ads that are 6 seconds or shorter, appear at the beginning of a YouTube video, and can't be skipped.

Bumper ads support “drawer open” engagements, where a user can mouse over the ad to expand a widget with more information. These engagements were previously not included in the Engagements and EngagementRate fields in reports. Starting in mid-February 2018, we are going to be changing this behavior for all historical and future bumper ad reporting to include these engagements. This brings bumper ads in line with other types of video ads, which already reported these engagements.

This means that your historical reporting data, starting up to two years ago in January 2016, will be updated to include this statistic to bring it inline with future data.

If you have any questions about this migration, please contact us via the forum.

Introducing the new Webmaster Video Series

Google has a broad range of resources to help you better understand your website and improve its performance. This Webmaster Central Blog, the Help Center, the Webmaster forum, and the recently released Search Engine Optimization (SEO) Starter Guide are just a few.

We also have a YouTube channel, for answers to your questions in video format. To help with short & to the point answers to specific questions, we've just launched a new series, which we call SEO Snippets.

In this series of short videos, the Google team will be answering some of the webmaster and SEO questions that we regularly see on the Webmaster Central Help Forum. From 404 errors, how and when crawling works, a site's URL structure, to duplicate content, we'll have something here for you.

Check out the links shared in the videos to get more helpful webmaster information, drop by our help forum and subscribe to our YouTube channel for more tips and insights!


Gmail Add-ons framework now available to all developers

Originally posted by Wesley Chun, G Suite Developer Advocate on the G Suite Blog

Email remains at the heart of how companies operate. That's why earlier this year, we previewed Gmail Add-ons—a way to help businesses speed up workflows. Since then, we've seen partners build awesome applications, and beginning today, we're extending the Gmail add-on preview to include all developers. Now anyone can start building a Gmail add-on.

Gmail Add-ons let you integrate your app into Gmail and extend Gmail to handle quick actions.

They are built using native UI context cards that can include simple text dialogs, images, links, buttons and forms. The add-on appears when relevant, and the user is just a click away from your app's rich and integrated functionality.

Gmail Add-ons are easy to create. You only have to write code once for your add-on to work on both web and mobile, and you can choose from a rich palette of widgets to craft a custom UI. Create an add-on that contextually surfaces cards based on the content of a message. Check out this video to see how we created an add-on to collate email receipts and expedite expense reporting.

Per the video, you can see that there are three components to the app's core functionality. The first component is getContextualAddOn()—this is the entry point for all Gmail Add-ons where data is compiled to build the card and render it within the Gmail UI. Since the add-on is processing expense reports from email receipts in your inbox, the createExpensesCard()parses the relevant data from the message and presents them in a form so your users can confirm or update values before submitting. Finally, submitForm()takes the data and writes a new row in an "expenses" spreadsheet in Google Sheets, which you can edit and tweak, and submit for approval to your boss.

Check out the documentation to get started with Gmail Add-ons, or if you want to see what it's like to build an add-on, go to the codelab to build ExpenseItstep-by-step. While you can't publish your add-on just yet, you can fill out this form to get notified when publishing is opened. We can't wait to see what Gmail Add-ons you build!

Gmail add-ons framework now available to all developers



Email remains at the heart of how companies operate. That’s why earlier this year, we previewed Gmail Add-ons—a way to help businesses speed up workflows. Since then, we’ve seen partners build awesome applications, and beginning today, we’re extending the Gmail add-on preview to include all developers. Now anyone can start building a Gmail add-on.

Gmail Add-ons let you integrate your app into Gmail and extend Gmail to handle quick actions. They are built using native UI context cards that can include simple text dialogs, images, links, buttons and forms. The add-on appears when relevant, and the user is just a click away from your app's rich and integrated functionality.

Gmail Add-ons are easy to create. You only have to write code once for your add-on to work on both web and mobile, and you can choose from a rich palette of widgets to craft a custom UI. Create an add-on that contextually surfaces cards based on the content of a message. Check out this video to see how we created an add-on to collate email receipts and expedite expense reporting.

Per the video, you can see that there are three components to the app’s core functionality. The first component is getContextualAddOn()—this is the entry point for all Gmail Add-ons where data is compiled to build the card and render it within the Gmail UI. Since the add-on is processing expense reports from email receipts in your inbox, the createExpensesCard()parses the relevant data from the message and presents them in a form so your users can confirm or update values before submitting. Finally, submitForm() takes the data and writes a new row in an “expenses” spreadsheet in Google Sheets, which you can edit and tweak, and submit for approval to your boss.

Check out the documentation to get started with Gmail Add-ons, or if you want to see what it's like to build an add-on, go to the codelab to build ExpenseIt step-by-step. While you can't publish your add-on just yet, you can fill out this form to get notified when publishing is opened. We can’t wait to see what Gmail Add-ons you build!

Announcing AVA: A Finely Labeled Video Dataset for Human Action Understanding



Teaching machines to understand human actions in videos is a fundamental research problem in Computer Vision, essential to applications such as personal video search and discovery, sports analysis, and gesture interfaces. Despite exciting breakthroughs made over the past years in classifying and finding objects in images, recognizing human actions still remains a big challenge. This is due to the fact that actions are, by nature, less well-defined than objects in videos, making it difficult to construct a finely labeled action video dataset. And while many benchmarking datasets, e.g., UCF101, ActivityNet and DeepMind’s Kinetics, adopt the labeling scheme of image classification and assign one label to each video or video clip in the dataset, no dataset exists for complex scenes containing multiple people who could be performing different actions.

In order to facilitate further research into human action recognition, we have released AVA, coined from “atomic visual actions”, a new dataset that provides multiple action labels for each person in extended video sequences. AVA consists of URLs for publicly available videos from YouTube, annotated with a set of 80 atomic actions (e.g. “walk”, “kick (an object)”, “shake hands”) that are spatial-temporally localized, resulting in 57.6k video segments, 96k labeled humans performing actions, and a total of 210k action labels. You can browse the website to explore the dataset and download annotations, and read our arXiv paper that describes the design and development of the dataset.

Compared with other action datasets, AVA possesses the following key characteristics:
  • Person-centric annotation. Each action label is associated with a person rather than a video or clip. Hence, we are able to assign different labels to multiple people performing different actions in the same scene, which is quite common.
  • Atomic visual actions. We limit our action labels to fine temporal scales (3 seconds), where actions are physical in nature and have clear visual signatures.
  • Realistic video material. We use movies as the source of AVA, drawing from a variety of genres and countries of origin. As a result, a wide range of human behaviors appear in the data.
Examples of 3-second video segments (from Video Source) with their bounding box annotations in the middle frame of each segment. (For clarity, only one bounding box is shown for each example.)

To create AVA, we first collected a diverse set of long form content from YouTube, focusing on the “film” and “television” categories, featuring professional actors of many different nationalities. We analyzed a 15 minute clip from each video, and uniformly partitioned it into 300 non-overlapping 3-second segments. The sampling strategy preserved sequences of actions in a coherent temporal context.

Next, we manually labeled all bounding boxes of persons in the middle frame of each 3-second segment. For each person in the bounding box, annotators selected a variable number of labels from a pre-defined atomic action vocabulary (with 80 classes) that describe the person’s actions within the segment. These actions were divided into three groups: pose/movement actions, person-object interactions, and person-person interactions. Because we exhaustively labeled all people performing all actions, the frequencies of AVA’s labels followed a long-tail distribution, as summarized below.
Distribution of AVA’s atomic action labels. Labels displayed in the x-axis are only a partial set of our vocabulary.

The unique design of AVA allows us to derive some interesting statistics that are not available in other existing datasets. For example, given the large number of persons with at least two labels, we can measure the co-occurrence patterns of action labels. The figure below shows the top co-occurring action pairs in AVA with their co-occurrence scores. We confirm expected patterns such as people frequently play instruments while singing, lift a person while playing with kids, and hug while kissing.
Top co-occurring action pairs in AVA.

To evaluate the effectiveness of human action recognition systems on the AVA dataset, we implemented an existing baseline deep learning model that obtains highly competitive performance on the much smaller JHMDB dataset. Due to challenging variations in zoom, background clutter, cinematography, and appearance variation, this model achieves a relatively modest performance when correctly identifying actions on AVA (18.4% mAP). This suggests that AVA will be a useful testbed for developing and evaluating new action recognition architectures and algorithms for years to come.

We hope that the release of AVA will help improve the development of human action recognition systems, and provide opportunities to model complex activities based on labels with fine spatio-temporal granularity at the level of individual person’s actions. We will continue to expand and improve AVA, and are eager to hear feedback from the community to help us guide future directions. Please join the AVA users mailing list to receive dataset updates as well as to send us emails for feedback.

Acknowledgements
The core team behind AVA includes Chunhui Gu, Chen Sun, David Ross, Caroline Pantofaru, Yeqing Li, Sudheendra Vijayanarasimhan, George Toderici, Susanna Ricco, Rahul Sukthankar, Cordelia Schmid, and Jitendra Malik. We thank many Google colleagues and annotators for their dedicated support on this project.

Modifying events with the Google Calendar API

Originally posted by Wesley Chun (@wescpy), Developer Advocate, G Suite, on the G Suite Developers Blog.

You might be using the Google Calendar API, or alternatively email markup, to insert events into your users' calendars. Thankfully, these tools allow your apps to do this seamlessly and automatically, which saves your users a lot of time. But what happens if plans change? You need your apps to also be able to modify an event.

While email markup does support this update, it's limited in what it can do, so in today's video, we'll show you how to modify events with the Calendar API. We'll also show you how to create repeating events. Check it out:

Imagine a potential customer being interested in your product, so you set up one or two meetings with them. As their interest grows, they request regularly-scheduled syncs as your product makes their short list—your CRM should be able to make these adjustments in your calendar without much work on your part. Similarly, a "dinner with friends" event can go from a "rain check" to a bi-monthly dining experience with friends you've grown closer to. Both of these events can be updated with a JSON request payload like what you see below to adjust the date and make it repeating:

    
var TIMEZONE = "America/Los_Angeles";
var EVENT = {
"start": {"dateTime": "2017-07-01T19:00:00", "timeZone": TIMEZONE},
"end": {"dateTime": "2017-07-01T22:00:00", "timeZone": TIMEZONE},
"recurrence": ["RRULE:FREQ=MONTHLY;INTERVAL=2;UNTIL=20171231"]
};

This event can then be updated with a single call to the Calendar API's events().patch() method, which in Python would look like the following given the request data above, GCAL as the API service endpoint, and a valid EVENT_ID to update:



GCAL.events().patch(calendarId='primary', eventId=EVENT_ID,
sendNotifications=True, body=EVENT).execute()

If you want to dive deeper into the code sample, check out this blog post. Also, if you missed it, check out this video that shows how you can insert events into Google Calendar as well as the official API documentation. Finally, if you have a Google Apps Script app, you can access Google Calendar programmatically with its Calendar service.

We hope you can use this information to enhance your apps to give your users an even better and timely experience.

Modifying events with the Google Calendar API


You might be using the Google Calendar API, or alternatively email markup, to insert events into your users’ calendars. Thankfully, these tools allow your apps to do this seamlessly and automatically, which saves your users a lot of time. But what happens if plans change? You need your apps to also be able to modify an event.

While email markup does support this update, it’s limited in what it can do, so in today’s video, we’ll show you how to modify events with the Calendar API. We’ll also show you how to create repeating events. Check it out:

Imagine a potential customer being interested in your product, so you set up one or two meetings with them. As their interest grows, they request regularly-scheduled syncs as your product makes their short list—your CRM should be able to make these adjustments in your calendar without much work on your part. Similarly, a “dinner with friends” event can go from a “rain check” to a bi-monthly dining experience with friends you’ve grown closer to. Both of these events can be updated with a JSON request payload like what you see below to adjust the date and make it repeating:
var TIMEZONE = "America/Los_Angeles";
var EVENT = {
"start": {"dateTime": "2017-07-01T19:00:00", "timeZone": TIMEZONE},
"end": {"dateTime": "2017-07-01T22:00:00", "timeZone": TIMEZONE},
"recurrence": ["RRULE:FREQ=MONTHLY;INTERVAL=2;UNTIL=20171231"]
};

This event can then be updated with a single call to the Calendar API’s events().patch() method, which in Python would look like the following given the request data above, GCAL as the API service endpoint, and a valid EVENT_ID to update:
GCAL.events().patch(calendarId='primary', eventId=EVENT_ID,
sendNotifications=True, body=EVENT).execute()

If you missed it, check out this video that shows how you can insert events into Google Calendar as well as the official API documentation. Also, if you have a Google Apps Script app, you can programmatically access Google Calendar with its Calendar service.

We hope you can use this information to enhance your apps to give your users an even better and timely experience.

Building trust and increasing transparency with MRC-accredited measurement

Measurement has been top of mind in our recent conversations with advertisers, and for good reason. As we’ve said many times, “If you can’t measure it, how do you know it worked?” Committing to measurement is critical, but just the first step. We believe that that the industry needs metrics that are trusted, transparent and easily verified. Today, we’re pleased to share several updates on the work we’re doing with third party verification and audit partners to ensure that the metrics available from Google are objective and accurate.
Transparency and trust are the core principles of our measurement strategy. We strongly believe in the need for third-party accreditation through the Media Rating Council (MRC). We gained our first accreditations back in 2006, and for over ten years we’ve partnered with the MRC, advocating for standards across the industry and contributing to ongoing discussions that set guidelines for measuring the effectiveness of ads. We currently maintain over 30 MRC accreditations across display and video, desktop and mobile web, mobile apps, and clicks, plays, impressions and viewability.

MRC accreditation for 3rd party viewability reporting on YouTube

Since 2015, we've completed integrations with Moat, Integral Ad Science and DoubleVerify to enable third-party viewability reporting on YouTube. These integrations offer advertisers additional choice for measuring viewability on YouTube, alongside Active View.
Today, we’re announcing that each of these integrations will undergo a stringent, independent audit for MRC accreditation. The audit will validate that data collection, aggregation and reporting for served video impressions, viewable impressions, related viewability statistics and General Invalid Traffic (GIVT) across desktop and mobile for each integration adheres to MRC and IAB standards. In short, advertisers will have even greater confidence in the metrics returned by these third party partners about their campaigns on YouTube.
“Google’s announcement that they are undertaking an independent audit of their 3rd party viewability reporting integrations is a positive step forward for marketers. At the ANA, our goal is to create transparency for the advertising supply chain. This action from Google today demonstrates their commitment to partnering with us to deliver this goal."
—Bob Liodice, President and CEO, Association for National Advertisers

New MRC accreditations for DoubleClick and AdWords

Our commitment to MRC accreditation goes beyond our media to include our platform solutions as well. We maintain several accreditations for DoubleClick already, and today we’re announcing that we are now fully accredited for video impressions and viewability statistics for desktop web, mobile web and mobile app in DoubleClick Campaign Manager.
We are also seeking MRC accreditation for video impressions and viewability statistics and GIVT detection for display and video in both AdWords and DoubleClick Bid Manager. These MRC audits will span across all video available through these buying platforms — including YouTube and partner inventory.
"Google's commitment to MRC's initiatives has been unwavering over time, and their participation in industry standards projects has been helpful. We look forward to working on these new audits and expanding the industry's trust as it relates to YouTube's third party integrations and DoubleClick Bid Manager."
—George Ivie, CEO and Executive Director, Media Rating Council
“Google’s announcement to bring more media transparency is important progress that will help move the industry forward. At P&G, we are encouraged by Google’s actions, which should make a positive impact on creating a clean and productive media supply chain.”
—Marc Pritchard, Chief Brand Officer, Procter & Gamble

With so much activity underway, we know that it can be difficult to stay current. For an up to date list of all MRC accreditations, click here.
Transparency and trust are fundamental to measurement, and they’re fundamental to our strategy for giving marketers and publishers the metrics and insights they need to make better decisions. A solid foundation has been created, but there is much more work to do. In 2017, we’ll continue to seek ways to raise the bar on transparent and trustworthy measurement, and we welcome your partnership along the way.