Tag Archives: Google Photos

Top Shot on Pixel 3



Life is full of meaningful moments — from a child’s first step to an impromptu jump for joy — that one wishes could be preserved with a picture. However, because these moments are often unpredictable, missing that perfect shot is a frustrating problem that smartphone camera users face daily. Using our experience from developing Google Clips, we wondered if we could develop new techniques for the Pixel 3 camera that would allow everyone to capture the perfect shot every time.

Top Shot is a new feature recently launched with Pixel 3 that helps you to capture precious moments precisely and automatically at the press of the shutter button. Top Shot saves and analyzes the image frames before and after the shutter press on the device in real-time using computer vision techniques, and recommends several alternative high-quality HDR+ photos.
Examples of Top Shot on Pixel 3. On the left, a better smiling shot is recommended. On the right, a better jump shot is recommended. The recommended images are high-quality HDR+ shots.
Capturing Multiple Moments
When a user opens the Pixel 3 Camera app, Top Shot is enabled by default, helping to capture the perfect moment by analyzing images taken both before and after the shutter press. Each image is analyzed for some qualitative features (e.g., whether the subject is smiling or not) in real-time and entirely on-device to preserve privacy and minimize latency. Each image is also associated with additional signals, such as optical flow of the image, exposure time, and gyro sensor data to form the input features used to score the frame quality.

When you press the shutter button, Top Shot captures up to 90 images from 1.5 seconds before and after the shutter press, selecting up to two alternative shots to save in high resolution — the original shutter frame and high-res alternatives for you to review (other lower-res frames can also be reviewed as desired). The shutter frame is processed and saved first. The best alternative shots are saved afterwards. Google’s Visual Core on Pixel 3 is used to process these top alternative shots as HDR+ images with a very small amount of extra latency, and are embedded into the file of the Motion Photo.
Top-level diagram of Top Shot capture.
Given Top Shot runs in the camera as a background process, it must have very low power consumption. As such, Top Shot uses a hardware-accelerated MobileNet-based single shot detector (SSD). The execution of such optimized models is also throttled by power and thermal limits.

Recognizing Top Moments
When we set out to understand how to enable people to capture the best moments with their camera, we focused on three key attributes: 1) functional qualities like lighting, 2) objective attributes (are the subject's eyes open? Are they smiling?), and 3) subjective qualities like emotional expressions. We designed a computer vision model to recognize these attributes while operating in a low-latency, on-device mode.

During our development process, we started with a vanilla MobileNet model and set out to optimize for Top Shot, arriving at a customized architecture that operated within our accuracy, latency and power tradeoff constraints. Our neural network design detects low-level visual attributes in early layers, like whether the subject is blurry, and then dedicates additional compute and parameters toward more complex objective attributes like whether the subject's eyes are open, and subjective attributes like whether there is an emotional expression of amusement or surprise. We trained our model using knowledge distillation over a large number of diverse face images using quantization during both training and inference.

We then adopted a layered Generalized Additive Model (GAM) to provide quality scores for faces and combine them into a weighted-average “frame faces” score. This model made it easy for us to interpret and identify the exact causes of success or failure, enabling rapid iteration to improve the quality and performance of our attributes model. The number of free parameters was on the order of dozens, so we could optimize these using Google's black box optimizer, Vizier, in tandem with any other parameters that affected selection quality.

Frame Scoring Model
While Top Shot prioritizes for face analysis, there are good moments in which faces are not the primary subject. To handle those use cases, we include the following additional scores in the overall frame quality score:
  • Subject motion saliency score — the low-resolution optical flow between the current frame and the previous frame is estimated in ISP to determine if there is salient object motion in the scene.
  • Global motion blur score — estimated from the camera motion and the exposure time. The camera motion is calculated from sensor data from the gyroscope and OIS (optical image stabilization).
  • “3A” scores — the status of auto exposure, auto focus, and auto white balance, are also considered.
All the individual scores are used to train a model predicting an overall quality score, which matches the frame preference of human raters, to maximize end-to-end product quality.

End-to-End Quality and Fairness
Most of the above components are each evaluated for accuracy independently However, Top Shot presents requirements that are uniquely challenging since it’s running real-time in the Pixel Camera. Additionally, we needed to ensure that all these signals are combined in a system with favorable results. That means we need to gauge our predictions against what our users perceive as the “top shot.”

To test this, we collected data from hundreds of volunteers, along with their opinions of which frames (out of up to 90!) looked best. This donated dataset covers many typical use cases, e.g. portraits, selfies, actions, landscapes, etc.

Many of the 3-second clips provided by Top Shot had more than one good shot, so it was important for us to engineer our quality metrics to handle this. We used some modified versions of traditional Precision and Recall, some classic ranking metrics (such as Mean Reciprocal Rank), and a few others that were designed specifically for the Top Shot task as our objective. In addition to these metrics, we additionally investigated causes of image quality issues we saw during development, leading to improvements in avoiding blur, handling multiple faces better, and more. In doing so, we were able to steer the model towards a set of selections people were likely to rate highly.

Importantly, we tested the Top Shot system for fairness to make sure that our product can offer a consistent experience to a very wide range of users. We evaluated the accuracy of each signal used in Top Shot on several different subgroups of people (based on gender, age, ethnicity, etc), testing for accuracy of each signal across those subgroups.

Conclusion
Top Shot is just one example of how Google leverages optimized hardware and cutting-edge machine learning to provide useful tools and services. We hope you’ll find this feature useful, and we’re committed to further improving the capabilities of mobile phone photography!

Acknowledgements
This post reflects the work of a large group of Google engineers, research scientists, and others including: Ari Gilder, Aseem Agarwala, Brendan Jou, David Karam, Eric Penner, Farooq Ahmad, Henri Astre, Hillary Strickland, Marius Renn, Matt Bridges, Maxwell Collins, Navid Shiee, Ryan Gordon, Sarah Clinckemaillie, Shu Zhang, Vivek Kesarwani, Xuhui Jia, Yukun Zhu, Yuzo Watanabe and Chris Breithaupt.

Source: Google AI Blog


Build new experiences with the Google Photos Library API

Posted by Jan-Felix Schmakeit, Google Photos Developer Lead

As we shared in May, people create and consume photos and videos in many different ways, and we think it should be easier to do more with the photos people take, across more of the apps and devices we all use. That's why we created the Google Photos Library API: to give you the ability to build photo and video experiences in your products that are smarter, faster, and more helpful.

After a successful developer preview over the past few months, the Google Photos Library API is now generally available. If you want to build and test your own experience, you can visit our developer documentation to get started. You can also express your interest in joining the Google Photos partner program if you are planning a larger integration.

Here's a quick overview of the Google Photos Library API and what you can do:

Whether you're a mobile, web, or backend developer, you can use this REST API to utilize the best of Google Photos and help people connect, upload, and share from inside your app. We are also launching client libraries in multiple languages that will help you get started quicker.

Users have to authorize requests through the API, so they are always in the driver's seat. Here are a few things you can help your users do:

  • Easily find photos, based on
    • what's in the photo
    • when it was taken
    • attributes like media format
  • Upload directly to their photo library or an album
  • Organize albums and add titles and locations
  • Use shared albums to easily transfer and collaborate

Putting machine learning to work in your app is simple too. You can use smart filters, like content categories, to narrow down or exclude certain types of photos and videos and make it easier for your users to find the ones they're looking for.

Thanks to everyone who provided feedback throughout our developer preview, your contributions helped make the API better. You can read our release notes to follow along with any new releases of our API. And, if you've been using the Picasa Web Albums API, here's a migration guide that will help you move to the Google Photos Library API.

Introducing the Google Photos partner program

Posted by Jan-Felix Schmakeit, Google Photos Developer Lead

People create and consume photos and videos in many different ways, and we think it should be easier to do more with the photos you've taken, across all the apps and devices you use.

That's why we're introducing a new Google Photos partner program that gives you the tools and APIs to build photo and video experiences in your products that are smarter, faster and more helpful.

Building with the Google Photos Library API

With the Google Photos Library API, your users can seamlessly access their photos whenever they need them.

Whether you're a mobile, web, or backend developer, you can use this REST API to utilize the best of Google Photos and help people connect, upload, and share from inside your app.

Your user is always in the driver's seat. Here are a few things you can help them to do:

  • Easily find photos, based on
    • what's in the photo
    • when it was taken
    • attributes like description and media format
  • Upload directly to their photo library
  • Organize albums and add titles and locations
  • Use shared albums to easily transfer and collaborate

With the Library API, you don't have to worry about maintaining your own storage and infrastructure, as photos and videos remain safely backed up in Google Photos.

Putting machine intelligence to work in your app is simple too. You can use smart filters, like content categories, to narrow down or exclude certain types of photos and videos and make it easier for your users to find the ones they're looking for.

We've also aimed to take the hassle out of building a smooth user experience. Features like thumbnailing and cross-platform deep-links mean you can offload common tasks and focus on what makes your product unique.

Getting started

Today, we're launching a developer preview of the Google Photos Library API. You can start building and testing it in your own projects right now.

Get started by visiting our developer documentation where you can also express your interest in joining the Google Photos partner program. Some of our early partners, including HP, Legacy Republic, NixPlay, Xero and TimeHop are already building better experiences using the API.

If you are following Google I/O, you can also join us for our session to learn more.

We're excited for the road ahead and look forward to working with you to develop new apps that work with Google Photos.

PhotoScan: Taking Glare-Free Pictures of Pictures



Yesterday, we released an update to PhotoScan, an app for iOS and Android that allows you to digitize photo prints with just a smartphone. One of the key features of PhotoScan is the ability to remove glare from prints, which are often glossy and reflective, as are the plastic album pages or glass-covered picture frames that host them. To create this feature, we developed a unique blend of computer vision and image processing techniques that can carefully align and combine several slightly different pictures of a print to separate the glare from the image underneath.
Left: A regular digital picture of a physical print. Right: Glare-free digital output from PhotoScan
When taking a single picture of a photo, determining which regions of the picture are the actual photo and which regions are glare is challenging to do automatically. Moreover, the glare may often saturate regions in the picture, rendering it impossible to see or recover the parts of the photo underneath it. But if we take several pictures of the photo while moving the camera, the position of the glare tends to change, covering different regions of the photo. In most cases we found that every pixel of the photo is likely not to be covered by glare in at least one of the pictures. While no single view may be glare-free, we can combine multiple pictures of the printed photo taken at different angles to remove the glare. The challenge is that the images need to be aligned very accurately in order to combine them properly, and this processing needs to run very quickly on the phone to provide a near instant experience.
Left: The captured, input images (5 in total). Right: If we stabilize the images on the photo, we can see just the glare moving, covering different parts of the photo. Notice no single image is glare-free.
Our technique is inspired by our earlier work published at SIGGRAPH 2015, which we dubbed “obstruction-free photography”. It uses similar principles to remove various types of obstructions from the field of view. However, the algorithm we originally proposed was based on a generative model where the motion and appearance of both the main scene and the obstruction layer are estimated. While that model is quite powerful and can remove a variety of obstructions, it is too computationally expensive to be run on smartphones. We therefore developed a simpler model that treats glare as an outlier, and only attempts to register the underlying, glare-free photo. While this model is simpler, the task is still quite challenging as the registration needs to be highly accurate and robust.

How it Works
We start from a series of pictures of the print taken by the user while moving the camera. The first picture - the “reference frame” - defines the desired output viewpoint. The user is then instructed to take four additional frames. In each additional frame, we detect sparse feature points (we compute ORB features on Harris corners) and use them to establish homographies mapping each frame to the reference frame.
Left: Detected feature matches between the reference frame and each other frame (left), and the warped frames according to the estimated homographies (right).
While the technique may sound straightforward, there is a catch - homographies are only able to align flat images. But printed photos are often not entirely flat (as is the case with the example shown above). Therefore, we use optical flow — a fundamental, computer vision representation for motion, which establishes pixel-wise mapping between two images — to correct the non-planarities. We start from the homography-aligned frames, and compute “flow fields” to warp the images and further refine the registration. In the example below, notice how the corners of the photo on the left slightly “move” after registering the frames using only homographies. The right hand side shows how the photo is better aligned after refining the registration using optical flow.
Comparison between the warped frames using homographies (left) and after the additional warp refinement using optical flow (right).
The difference in the registration is subtle, but has a big impact on the end result. Notice how small misalignments manifest themselves as duplicated image structures in the result, and how these artifacts are alleviated with the additional flow refinement.
Comparison between the glare removal result with (right) and without (left) optical flow refinement. In the result using homographies only (left), notice artifacts around the eye, nose and teeth of the person, and duplicated stems and flower petals on the fabric.
Here too, the challenge was to make optical flow, a naturally slow algorithm, work very quickly on the phone. Instead of computing optical flow at each pixel as done traditionally (the number of flow vectors computed is equal to the number of input pixels), we represent a flow field by a smaller number of control points, and express the motion at each pixel in the image as a function of the motion at the control points. Specifically, we divide each image into tiled, non-overlapping cells to form a coarse grid, and represent the flow of a pixel in a cell as the bilinear combination of the flow at the four corners of the cell that contains it.

The grid setup for grid optical flow. A point p is represented as the bilinear interpolation of the four corner points of the cell that encapsulates it.
Left: Illustration of the computed flow field on one of the frames. Right: The flow color coding: orientation and magnitude represented by hue and saturation, respectively.
This results in a much smaller problem to solve, since the number of flow vectors to compute now equals the number of grid points, which is typically much smaller than the number of pixels. This process is similar in nature to the spline-based image registration described in Szeliski and Coughlan (1997). With this algorithm, we were able to reduce the optical flow computation time by a factor of ~40 on a Pixel phone!
Flipping between the homography-registered frame and the flow-refined warped frame (using the above flow field), superimposed on the (clean) reference frame, shows how the computed flow field “snaps” image parts to their corresponding parts in the reference frame, improving the registration.
Finally, in order to compose the glare-free output, for any given location in the registered frames, we examine the pixel values, and use a soft minimum algorithm to obtain the darkest observed value. More specifically, we compute the expectation of the minimum brightness over the registered frames, assigning less weight to pixels close to the (warped) image boundaries. We use this method rather than computing the minimum directly across the frames due to the fact that corresponding pixels at each frame may have slightly different brightness. Therefore, per-pixel minimum can produce visible seams due to sudden intensity changes at boundaries between overlaid images.
Regular minimum (left) versus soft minimum (right) over the registered frames.
The algorithm can support a variety of scanning conditions — matte and gloss prints, photos inside or outside albums, magazine covers.

Input     Registered     Glare-free
To get the final result, the Photos team has developed a method that automatically detects and crops the photo area, and rectifies it to a frontal view. Because of perspective distortion, the scanned rectangular photo usually appears to be a quadrangle on the image. The method analyzes image signals, like color and edges, to figure out the exact boundary of the original photo on the scanned image, then applies a geometric transformation to rectify the quadrangle area back to its original rectangular shape yielding high-quality, glare-free digital version of the photo.
So overall, quite a lot going on under the hood, and all done almost instantaneously on your phone! To give PhotoScan a try, download the app on Android or iOS.

Now your photos look better than ever, even those dusty old prints

Photos from the past, meet scanner from the future

Google Photos is a home for all your photos and videos, but what about those old prints that are some of your most treasured memories? Such as photos of grandma when she was young, your childhood pet, and that hairstyle you wish you could forget.

We all have those old albums and boxes of photos, but we don’t take the time to digitize them because it’s just too hard to get it right. We don’t want to mail away our original copy, buying a scanner is costly and time consuming, and if you try to take a photo of a photo, you end up with crooked edges and glare.

We knew there had to be a better way, so we’re introducing PhotoScan, a brand new, standalone app from Google Photos that easily scans just about any photo, free, from anywhere. Get it today for Android and iOS.


PhotoScan gets you great looking digital copies in seconds - it detects edges, straightens the image, rotates it to the correct orientation, and removes glare. Scanned photos can be saved in one tap to Google Photos to be organized, searchable, shared, and safely backed up at high quality -- for free.  

See how the PhotoScan technology works behind the scenes by watching this video from our friends Nat & Lo.

Pro edits, no pro needed

After all that time in the attic, your photos might need a few polishes. Or you might even want to edit that selfie from this morning. Getting the right look can take a lot of time and with so many editing tools it’s tough to know where to begin.

Today we’re rolling out three easy ways to get great looking photos in Google Photos: a new and improved auto enhance, unique new looks, and advanced editing tools. Open a photo and then tap the pencil icon to start editing. First, for auto enhance, just select Auto, and see instant enhancements a pro editor might make - like balancing exposure and saturation to bring out the details.

Second, our 12 new looks take style to the next level. These unique looks make edits based on the individual photo and its brightness, darkness, warmth, or saturation, before applying the style. All looks use machine intelligence to complement the content of your photo, and choosing one is just a matter of taste.
Third, our advanced editing controls for Light and Color allow you to fine tune your photos, including highlights, shadows, and warmth. Deep Blue is particularly good for images of sea and sky where the color blue is the focal point.
The Google Photos app with the new photo editor will begin rolling out today across Android, iOS and the web. Just in time for your next set of holiday memories.
Posted by Jingyu Cui, Software Engineer, Google Photos