Tag Archives: Computational Imaging

Take All Your Pictures to the Cleaners, with Google Photos Noise and Blur Reduction

Despite recent leaps in imaging technology, especially on mobile devices, image noise and limited sharpness remain two of the most important levers for improving the visual quality of a photograph. These are particularly relevant when taking pictures in poor light conditions, where cameras may compensate by increasing the ISO or slowing the shutter speed, thereby exacerbating the presence of noise and, at times, increasing image blur. Noise can be associated with the particle nature of light (shot noise) or be introduced by electronic components during the readout process (read noise). The captured noisy signal is then processed by the camera image processor (ISP) and later may be further enhanced, amplified, or distorted by a photographic editing process. Image blur can be caused by a wide variety of phenomena, from inadvertent camera shake during capture, an incorrect setting of the camera’s focus (automatic or not), or due to the finite lens aperture, sensor resolution or the camera’s image processing.

It is far easier to minimize the effects of noise and blur within a camera pipeline, where details of the sensor, optical hardware and software blocks are understood. However, when presented with an image produced from an arbitrary (possibly unknown) camera, improving noise and sharpness becomes much more challenging due to the lack of detailed knowledge and access to the internal parameters of the camera. In most situations, these two problems are intrinsically related: noise reduction tends to eliminate fine structures along with unwanted details, while blur reduction seeks to boost structures and fine details. This interconnectedness increases the difficulty of developing image enhancement techniques that are computationally efficient to run on mobile devices.

Today, we present a new approach for camera-agnostic estimation and elimination of noise and blur that can improve the quality of most images. We developed a pull-push denoising algorithm that is paired with a deblurring method, called polyblur. Both of these components are designed to maximize computational efficiency, so users can successfully enhance the quality of a multi-megapixel image in milliseconds on a mobile device. These noise and blur reduction strategies are critical components of the recent Google Photos editor updates, which includes “Denoise” and “Sharpen” tools that enable users to enhance images that may have been captured under less than ideal conditions, or with older devices that may have had more noisy sensors or less sharp optics.

A demonstration of the “Denoise” and “Sharpen” tools now available in the Google Photos editor.

How Noisy is An Image?
In order to accurately process a photographic image and successfully reduce the unwanted effects of noise and blur, it is vitally important to first characterize the types and levels of noise and blur found in the image. So, a camera-agnostic approach for noise reduction begins by formulating a method to gauge the strength of noise at the pixel level from any given image, regardless of the device that created it. The noise level is modeled as a function of the brightness of the underlying pixel. That is, for each possible brightness level, the model estimates a corresponding noise level in a manner agnostic to either the actual source of the noise or the processing pipeline.

To estimate this brightness-based noise level, we sample a number of small patches across the image and measure the noise level within each patch, after roughly removing any underlying structure in the image. This process is repeated at multiple scales, making it robust to artifacts that may arise from compression, image resizing, or other non-linear camera processing operations.

The two segments on the left illustrate signal-dependent noise present in the input image (center). The noise is more prominent in the bottom, darker crop and is unrelated to the underlying structure, but rather to the light level. Such image segments are sampled and processed to generate the spatially-varying noise map (right) where red indicates more noise is present.

Reducing Noise Selectively with a Pull-Push Method
We take advantage of self-similarity of patches across the image to denoise with high fidelity. The general principle behind such so-called “non-local” denoising is that noisy pixels can be denoised by averaging pixels with similar local structure. However, these approaches typically incur high computational costs because they require a brute force search for pixels with similar local structure, making them impractical for on-device use. In our “pull-push” approach1, the algorithmic complexity is decoupled from the size of filter footprints thanks to effective information propagation across spatial scales.

The first step in pull-push is to build an image pyramid (i.e., multiscale representation) in which each successive level is generated recursively by a “pull” filter (analogous to downsampling). This filter uses a per-pixel weighting scheme to selectively combine existing noisy pixels together based on their patch similarities and estimated noise, thus reducing the noise at each successive, “coarser” level. Pixels at coarser levels (i.e., with lower resolution) pull and aggregate only compatible pixels from higher resolution, “finer” levels. In addition to this, each merged pixel in the coarser layers also includes an estimated reliability measure computed from the similarity weights used to generate it. Thus, merged pixels provide a simple per-pixel, per-level characterization of the image and its local statistics. By efficiently propagating this information through each level (i.e., each spatial scale), we are able to track a model of the neighborhood statistics for increasingly larger regions in a multiscale manner.

After the pull stage is evaluated to the coarsest level, the “push” stage fuses the results, starting from the coarsest level and generating finer levels iteratively. At a given scale, the push stage generates “filtered” pixels following a process similar to that of the pull stage, but going from coarse to finer levels. The pixels at each level are fused with those of coarser levels by doing a weighted average of same-level pixels along with coarser-level filtered pixels using the respective reliability weights. This enables us to reduce pixel noise while preserving local structure, because only average reliable information is included. This selective filtering and reliability (i.e. information) multiscale propagation is what makes push-pull different from existing frameworks.

This series of images shows how filtering progresses through the pull-push process. Coarser level pixels pull and aggregate only compatible pixels from finer levels, as opposed to the traditional multiscale approaches using a fixed (non-data dependent) kernel. Notice how the noise is reduced throughout the stages.

The pull-push approach has a low computational cost, because the algorithm to selectively filter similar pixels over a very large neighborhood has a complexity that is only linear with the number of image pixels. In practice, the quality of this denoising approach is comparable to traditional non-local methods with much larger kernel footprints, but operates at a fraction of the computational cost.

Image enhanced using the pull-push denoising method.

How Blurry Is an Image?
An image with poor sharpness can be thought of as being a more pristine latent image that was operated on by a blur kernel. So, if one can identify the blur kernel, it can be used to reduce the effect. This is referred to as “deblurring”, i.e., the removal or reduction of an undesired blur effect induced by a particular kernel on a particular image. In contrast, “sharpening” refers to applying a sharpening filter, built from scratch and without reference to any particular image or blur kernel. Typical sharpening filters are also, in general, local operations that do not take account of any other information from other parts of the image, whereas deblurring algorithms estimate the blur from the whole image. Unlike arbitrary sharpening, which can result in worse image quality when applied to an image that is already sharp, deblurring a sharp image with a blur kernel accurately estimated from the image itself will have very little effect.

We specifically target relatively mild blur, as this scenario is more technically tractable, more computationally efficient, and produces consistent results. We model the blur kernel as an anisotropic (elliptical) Gaussian kernel, specified by three parameters that control the strength, direction and aspect ratio of the blur.

Gaussian blur model and example blur kernels. Each row of the plot on the right represents possible combinations of σ0, ρ and θ. We show three different σ0 values with three different ρ values for each.

Computing and removing blur without noticeable delay for the user requires an algorithm that is much more computationally efficient than existing approaches, which typically cannot be executed on a mobile device. We rely on an intriguing empirical observation: the maximal value of the image gradient across all directions at any point in a sharp image follows a particular distribution. Finding the maximum gradient value is efficient, and can yield a reliable estimate of the strength of the blur in the given direction. With this information in hand, we can directly recover the parameters that characterize the blur.

Polyblur: Removing Blur by Re-blurring
To recover the sharp image given the estimated blur, we would (in theory) need to solve a numerically unstable inverse problem (i.e., deblurring). The inversion problem grows exponentially more unstable with the strength of the blur. As such, we target the case of mild blur removal. That is, we assume that the image at hand is not so blurry as to be beyond practical repair. This enables a more practical approach — by carefully combining different re-applications of an operator we can approximate its inverse.

Mild blur, as shown in these examples, can be effectively removed by combining multiple applications of the estimated blur.

This means, rather counterintuitively, that we can deblur an image by re-blurring it several times with the estimated blur kernel. Each application of the (estimated) blur corresponds to a first order polynomial, and the repeated applications (adding or subtracting) correspond to higher order terms in a polynomial. A key aspect of this approach, which we call polyblur, is that it is very fast, because it only requires a few applications of the blur itself. This allows it to operate on megapixel images in a fraction of a second on a typical mobile device. The degree of the polynomial and its coefficients are set to invert the blur without boosting noise and other unwanted artifacts.

The deblurred image is generated by adding and subtracting multiple re-applications of the estimated blur (polyblur).

Integration with Google Photos
The innovations described here have been integrated and made available to users in the Google Photos image editor in two new adjustment sliders called “Denoise” and “Sharpen”. These features allow users to improve the quality of everyday images, from any capture device. The features often complement each other, allowing both denoising to reduce unwanted artifacts, and sharpening to bring clarity to the image subjects. Try using this pair of tools in tandem in your images for best results. To learn more about the details of the work described here, check out our papers on polyblur and pull-push denoising. To see some examples of the effect of our denoising and sharpening up close, have a look at the images in this album.

The authors gratefully acknowledge the contributions of Ignacio Garcia-Dorado, Ryan Campbell, Damien Kelly, Peyman Milanfar, and John Isidoro. We are also thankful for support and feedback from Navin Sarma, Zachary Senzer, Brandon Ruffin, and Michael Milne.

1 The original pull-push algorithm was developed as an efficient scattered data interpolation method to estimate and fill in the missing pixels in an image where only a subset of the pixels are specified. Here, we extend its methodology and present a data-dependent multiscale algorithm for denoising images efficiently. 

Source: Google AI Blog

Introducing NIMA: Neural Image Assessment

Quantification of image quality and aesthetics has been a long-standing problem in image processing and computer vision. While technical quality assessment deals with measuring pixel-level degradations such as noise, blur, compression artifacts, etc., aesthetic assessment captures semantic level characteristics associated with emotions and beauty in images. Recently, deep convolutional neural networks (CNNs) trained with human-labelled data have been used to address the subjective nature of image quality for specific classes of images, such as landscapes. However, these approaches can be limited in their scope, as they typically categorize images to two classes of low and high quality. Our proposed method predicts the distribution of ratings. This leads to a more accurate quality prediction with higher correlation to the ground truth ratings, and is applicable to general images.

In “NIMA: Neural Image Assessment” we introduce a deep CNN that is trained to predict which images a typical user would rate as looking good (technically) or attractive (aesthetically). NIMA relies on the success of state-of-the-art deep object recognition networks, building on their ability to understand general categories of objects despite many variations. Our proposed network can be used to not only score images reliably and with high correlation to human perception, but also it is useful for a variety of labor intensive and subjective tasks such as intelligent photo editing, optimizing visual quality for increased user engagement, or minimizing perceived visual errors in an imaging pipeline.

In general, image quality assessment can be categorized into full-reference and no-reference approaches. If a reference “ideal” image is available, image quality metrics such as PSNR, SSIM, etc. have been developed. When a reference image is not available, “blind” (or no-reference) approaches rely on statistical models to predict image quality. The main goal of both approaches is to predict a quality score that correlates well with human perception. In a deep CNN approach to image quality assessment, weights are initialized by training on object classification related datasets (e.g. ImageNet), and then fine-tuned on annotated data for perceptual quality assessment tasks.

Typical aesthetic prediction methods categorize images as low/high quality. This is despite the fact that each image in the training data is associated to a histogram of human ratings, rather than a single binary score. A histogram of ratings is an indicator of overall quality of an image, as well as agreements among raters. In our approach, instead of classifying images a low/high score or regressing to the mean score, the NIMA model produces a distribution of ratings for any given image — on a scale of 1 to 10, NIMA assigns likelihoods to each of the possible scores. This is more directly in line with how training data is typically captured, and it turns out to be a better predictor of human preferences when measured against other approaches (more details are available in our paper).

Various functions of the NIMA vector score (such as the mean) can then be used to rank photos aesthetically. Some test photos from the large-scale database for Aesthetic Visual Analysis (AVA) dataset, as ranked by NIMA, are shown below. Each AVA photo is scored by an average of 200 people in response to photography contests. After training, the aesthetic ranking of these photos by NIMA closely matches the mean scores given by human raters. We find that NIMA performs equally well on other datasets, with predicted quality scores close to human ratings.
Ranking some examples labelled with the “landscape” tag from AVA dataset using NIMA. Predicted NIMA (and ground truth) scores are shown below each image.
NIMA scores can also be used to compare the quality of images of the same subject which may have been distorted in various ways. Images shown in the following example are part of the TID2013 test set, which contain various types and levels of distortions.
Ranking some examples from TID2013 dataset using NIMA. Predicted NIMA scores are shown below each image.
Perceptual Image Enhancement
As we’ve shown in another recent paper, quality and aesthetic scores can also be used to perceptually tune image enhancement operators. In other words, maximizing NIMA score as part of a loss function can increase the likelihood of enhancing perceptual quality of an image. The following example shows that NIMA can be used as a training loss to tune a tone enhancement algorithm. We observed that the baseline aesthetic ratings can be improved by contrast adjustments directed by the NIMA score. Consequently, our model is able to guide a deep CNN filter to find aesthetically near-optimal settings of its parameters, such as brightness, highlights and shadows.

NIMA can be used as a training loss to enhance images. In this example, local tone and contrast of images is enhanced by training a deep CNN with NIMA as its loss. Test images are obtained from the MIT-Adobe FiveK dataset.
Looking Ahead
Our work on NIMA suggests that quality assessment models based on machine learning may be capable of a wide range of useful functions. For instance, we may enable users to easily find the best pictures among many; or to even enable improved picture-taking with real-time feedback to the user. On the post-processing side, these models may be used to guide enhancement operators to produce perceptually superior results. In a direct sense, the NIMA network (and others like it) can act as reasonable, though imperfect, proxies for human taste in photos and possibly videos. We’re excited to share these results, though we know that the quest to do better in understanding what quality and aesthetics mean is an ongoing challenge — one that will involve continuing retraining and testing of our models.

Motion Stills — Now on Android

Last year, we launched Motion Stills, an iOS app that stabilizes your Live Photos and lets you view and share them as looping GIFs and videos. Since then, Motion Stills has been well received, being listed as one of the top apps of 2016 by The Verge and Mashable. However, from its initial release, the community has been asking us to also make Motion Stills available for Android. We listened to your feedback and today, we're excited to announce that we’re bringing this technology, and more, to devices running Android 5.1 and later!
Motion Stills on Android: Instant stabilization on your device.
With Motion Stills on Android we built a new recording experience where everything you capture is instantly transformed into delightful short clips that are easy to watch and share. You can capture a short Motion Still with a single tap like a photo, or condense a longer recording into a new feature we call Fast Forward. In addition to stabilizing your recordings, Motion Stills on Android comes with an improved trimming algorithm that guards against pocket shots and accidental camera shakes. All of this is done during capture on your Android device, no internet connection required!

New streaming pipeline
For this release, we redesigned our existing iOS video processing pipeline to use a streaming approach that processes each frame of a video as it is being recorded. By computing intermediate motion metadata, we are able to immediately stabilize the recording while still performing loop optimization over the full sequence. All this leads to instant results after recording — no waiting required to share your new GIF.
Capture using our streaming pipeline gives you instant results.
In order to display your Motion Stills stream immediately, our algorithm computes and stores the necessary stabilizing transformation as a low resolution texture map. We leverage this texture to apply the stabilization transform using the GPU in real-time during playback, instead of writing a new, stabilized video that would tax your mobile hardware and battery.

Fast Forward
Fast Forward allows you to speed up and condense a longer recording into a short, easy to share clip. The same pipeline described above allows Fast Forward to process up to a full minute of video, right on your phone. You can even change the speed of playback (from 1x to 8x) after recording. To make this possible, we encode videos with a denser I-frame spacing to enable efficient seeking and playback. We also employ additional optimizations in the Fast Forward mode. For instance, we apply adaptive temporal downsampling in the linear solver and long-range stabilization for smooth results over the whole sequence.
Fast Forward condenses your recordings into easy to share clips.
Try out Motion Stills
Motion Stills is an app for us to experiment and iterate quickly with short-form video technology, gathering valuable feedback along the way. The tools our users find most fun and useful may be integrated later on into existing products like Google Photos. Download Motion Stills for Android from the Google Play store—available for mobile phones running Android 5.1 and later—and share your favorite clips on social media with hashtag #motionstills.

Motion Stills would not have been possible without the help of many Googlers. We want to especially acknowledge the work of Matthias Grundmann in advancing our stabilization technology, as well as our UX and interaction designers Jacob Zukerman, Ashley Ma and Mark Bowers.

Enhance! RAISR Sharp Images with Machine Learning

Everyday the web is used to share and store millions of pictures, enabling one to explore the world, research new topics of interest, or even share a vacation with friends and family. However, many of these images are either limited by the resolution of the device used to take the picture, or purposely degraded in order to accommodate the constraints of cell phones, tablets, or the networks to which they are connected. With the ubiquity of high-resolution displays for home and mobile devices, the demand for high-quality versions of low-resolution images, quickly viewable and shareable from a wide variety of devices, has never been greater.

With “RAISR: Rapid and Accurate Image Super-Resolution”, we introduce a technique that incorporates machine learning in order to produce high-quality versions of low-resolution images. RAISR produces results that are comparable to or better than the currently available super-resolution methods, and does so roughly 10 to 100 times faster, allowing it to be run on a typical mobile device in real-time. Furthermore, our technique is able to avoid recreating the aliasing artifacts that may exist in the lower resolution image.

Upsampling, the process of producing an image of larger size with significantly more pixels and higher image quality from a low quality image, has been around for quite a while. Well-known approaches to upsampling are linear methods which fill in new pixel values using simple, and fixed, combinations of the nearby existing pixel values. These methods are fast because they are fixed linear filters (a constant convolution kernel applied uniformly across the image). But what makes these upsampling methods fast, also makes them ineffective in bringing out vivid details in the higher resolution results. As you can see in the example below, the upsampled image looks blurry – one would hesitate to call it enhanced.
Left: Low-res original, Right: simple (bicubic) upsampled version (2x)

With RAISR, we instead use machine learning and train on pairs of images, one low quality, one high, to find filters that, when applied to selectively to each pixel of the low-res image, will recreate details that are of comparable quality to the original. RAISR can be trained in two ways. The first is the "direct" method, where filters are learned directly from low and high-resolution image pairs. The other method involves first applying a computationally cheap upsampler to the low resolution image (as in the figure above) and then learning the filters from the upsampled and high resolution image pairs. While the direct method is computationally faster, the 2nd method allows for non-integer scale factors and better leveraging of hardware-based upsampling.

For either method, RAISR filters are trained according to edge features found in small patches of images, - brightness/color gradients, flat/textured regions, etc. - characterized by direction (the angle of an edge), strength (sharp edges have a greater strength) and coherence (a measure of how directional the edge is). Below is a set of RAISR filters, learned from a database of 10,000 high and low resolution image pairs (where the low-res images were first upsampled). The training process takes about an hour.
Collection of learned 11x11 filters for 3x super-resolution. Filters can be learned for a range of super-resolution factors, including fractional ones. Note that as the angle of the edge changes, we see the angle of the filter rotate as well. Similarly, as the strength increases, the sharpness of the filters increases, and the anisotropy of the filter increases with rising coherence.

From left to right, we see that the learned filters correspond selectively to the direction of the underlying edge that is being reconstructed. For example, the filter in the middle of the bottom row is most appropriate for a strong horizontal edge (gradient angle of 90 degrees) with a high degree of coherence (a straight, rather than a curved, edge). If this same horizontal edge is low-contrast, then a different filter is selected such one in the top row.

In practice, at run-time RAISR selects and applies the most relevant filter from the list of learned filters to each pixel neighborhood in the low-resolution image. When these filters are applied to the lower quality image, they recreate details that are of comparable quality to the original high resolution, and offer a significant improvement to linear, bicubic, or Lanczos interpolation methods.
Top: RAISR algorithm at run-time, applied to a cheap upscaler’s output. Bottom: Low-res original (left), bicubic upsampler 2x (middle), RAISR output (right)

Some examples of RAISR in action can be seen below:
Top: Original, Bottom: RAISR super-resolved 2x
Left: Original, Right: RAISR super-resolved 3x

One of the more complex aspects of super-resolution is getting rid of aliasing artifacts such as Moire patterns and jaggies that arise when high frequency content is rendered in lower resolution (as is the case when images are purposefully degraded). Depending on the shape of the underlying features, these artifacts can be varied and hard to undo.
Example of aliasing artifacts seen on the lower right (Image source)

Linear methods simply can not recover the underlying structure, but RAISR can. Below is an example where the aliased spatial frequencies are apparent under the numbers 3 and 5 in the low-resolution original on the left, while the RAISR image on the right recovered the original structure. Another important advantage of the filter learning approach used by RAISR is that we can specialize it to remove noise, or compression artifacts unique to individual compression algorithms (such as JPEG) as part of the training process. By providing it with examples of such artifacts, RAISR can learn to undo other effects besides resolution enhancement, having them “baked” inside the resulting filters.
Left: Low res original, with strong aliasing. Right: RAISR output, removing aliasing.

Super-resolution technology, using one or many frames, has come a long way. Today, the use of machine learning, in tandem with decades of advances in imaging technology, has enabled progress in image processing that yields many potential benefits. For example, in addition to improving digital “pinch to zoom” on your phone, one could capture, save, or transmit images at lower resolution and super-resolve on demand without any visible degradation in quality, all while utilizing less of mobile data and storage plans.

To learn more about the details of our research and a comparison to other current architectures, check out our paper, which will appear soon in the IEEE Transactions on Computational Imaging.