Tag Archives: compression

Improving Sparse Training with RigL

Modern deep neural network architectures are often highly redundant [1, 2, 3], making it possible to remove a significant fraction of connections without harming performance. The sparse neural networks that result have been shown to be more parameter and compute efficient compared to dense networks, and, in many cases, can significantly decrease wall clock inference times.

By far the most popular method for training sparse neural networks is pruning, (dense-to-sparse training) which usually requires first training a dense model, and then “sparsifying” it by cutting out the connections with negligible weights. However, this process has two limitations.

  1. The size of the largest trainable sparse model is limited by that of the largest trainable dense model. Even if sparse models are more parameter efficient, one cannot use pruning to train models that are larger and more accurate than the largest possible dense models.
  2. Pruning is inefficient, meaning that large amounts of computation must be performed for parameters that are zero valued or that will be zero during inference. Additionally, it remains unknown if the performance of the current best pruning algorithms are an upper bound on the quality of sparse models.
Training sparse networks from scratch, on the other hand, is efficient, however often achieves inferior performance compared to pruning.

In “Rigging the Lottery: Making All Tickets Winners”, presented at ICML 2020, we introduce RigL, an algorithm for training sparse neural networks that uses a fixed parameter count and computational cost throughout training, without sacrificing accuracy relative to existing dense-to-sparse training methods. The algorithm identifies which neurons should be active during training, which helps the optimization process to utilize the most relevant connections and results in better sparse solutions. An example of this is shown below, where, during the training of a multilayer perceptron (MLP) network on MNIST, our sparse network trained with RigL learns to focus on the center of the images, discarding the uninformative pixels from the edges. A Tensorflow implementation of our method along with three other baselines (SET, SNFS, SNIP) can be found at github.com/google-research/rigl.

Left: Average MNIST image. Right: Evolution of the connectivity of the input throughout the training of a 98% sparse, 2-layer MLP on MNIST. Training starts from a random sparse mask, where each input pixel has roughly six outgoing connections. Connections that originate from the edges do not exhibit meaningful gradients and are therefore replaced by more informative connections that originate from the center pixels.

RigL Overview
The RigL method starts with a network initialized with a random sparse topology. At regularly spaced intervals we remove a fraction of the connections with the smallest weight magnitudes. Such a strategy has been shown to have very little effect on the loss. RigL then activates new connections using instantaneous gradient information, i.e., without using past gradient information. After updating the connectivity, training continues with the updated network until the next scheduled update. Next, the system activates connections with large gradients, since these connections are expected to decrease the loss most quickly.

RigL begins with a random sparse initialization of the network. It then trains the network and trims out those connections with weak activations. Based on the gradients calculated for the new configuration, it grows new connections and trains again, repeating the cycle.

Evaluating Performance
By changing the connectivity of the neurons dynamically during training, RigL helps optimize to find better solutions. To demonstrate this, we restart training from a bad solution that exhibits poor accuracy and show that RigL's mask updates help the optimization achieve better loss compared to static training, in which connectivity of the sparse network remains the same.

Training loss of RigL and Static methods starting from the same static sparse solution, shown together with their final test accuracies.

The figure below summarizes the performance of various methods on training an 80% sparse ResNet-50 architecture. We compare RigL with two recent sparse training methods, SET and SNFS and three baseline training methods: Static, Small-Dense and Pruning. Two of these methods (SNFS and Pruning) require dense resources as they need to either train a large network or store the gradients of it. Overall, we observe that the performance of all methods improves with additional training time; thus, for each method we run extended training with up to 5x the training steps of the original 100 epochs.

As noted in a number of studies [4, 5, 6, 7], training a network with fixed sparsity from scratch (Static) leads to inferior performance compared to solutions found by pruning. Training a small, dense network (Small-Dense) with the same number of parameters gets better results than Static, but fails to match the performance of dynamic sparse models. Similarly, SET improves the performance over Small-Dense, but saturates at around 75% accuracy, revealing the limits of growing new connections randomly. Methods that use gradient information to grow new connections (RigL and SNFS) obtain higher accuracy in general, but RigL achieves the highest accuracy, while also consistently requiring fewer FLOPs (and memory footprint) than the other methods.

Performance of sparse training methods on training an 80% sparse ResNet-50 architecture with uniform sparsity distribution. Points at each curve correspond to the individual training runs with increasing training length. The number of FLOPs required to train a standard dense ResNet-50 along with its performance is indicated with a dashed red line. RigL matches the standard ResNet-50 performance, even though it is 5x smaller in size.

Observing the trend between extended training and performance, we compare the results using longer training runs. Within the interval considered (i.e., 1x-100x) RigL's performance constantly improves with additional training. RigL achieves state of art performance of 68.07% Top-1 accuracy at training with a 99% sparse ResNet-50 architecture. Similarly extended training of a 90% sparse MobileNet-v1 architecture with RigL achieves 70.55% Top-1 accuracy. Obtaining the same results with fewer training iterations is an exciting future research direction.

Effect of training time on RigL accuracy at training 99% sparse ResNet-50 (left) and 90% sparse MobileNets-v1 (right) architectures.

Other experiments include image classification on CIFAR-10 datasets and character-based language modelling using RNNs with the WikiText-103 dataset and can be found in the full paper.

Future Work
RigL is useful in three different scenarios:

  1. Improving the accuracy of sparse models intended for deployment.
  2. Improving the accuracy of large sparse models that can only be trained for a limited number of iterations.
  3. Combining with sparse primitives to enable training of extremely large sparse models which otherwise would not be possible.
The third scenario is unexplored due to the lack of hardware and software support for sparsity. Nonetheless, work continues [8, 9, 10] to improve the performance of sparse networks on current hardware and new types of hardware accelerators are expected to have better support for parameter sparsity [11, 12]. We hope RigL provides the tools to take advantage of, and motivation for, such advances.

AcknowledgementsWe would like to thank Eleni Triantafillou, Hugo Larochelle, Bart van Merrienboer, Fabian Pedregosa, Joan Puigcerver, Danny Tarlow, Nicolas Le Roux, Karen Simonyan for giving feedback on the preprint of the paper; Namhoon Lee for helping us verify and debug our SNIP implementation; Chris Jones for helping us discover and solve the distributed training bug; and Tom Small for creating the visualization of the algorithm.

Source: Google AI Blog


Celebrating 10 years of WebM and WebRTC

Originally posted on the Chromium Blog

Ten years ago, Google planted the seeds for two foundational web media technologies, hoping they would provide the roots for a more vibrant internet. Two acquisitions, On2 Technologies and Global IP Solutions, led to a pair of open source projects: the WebM Project, a family of cutting edge video compression technologies (codecs) offered by Google royalty-free, and the WebRTC Project building APIs for real-time voice and video communication on the web.

These initiatives were major technical endeavors, essential infrastructure for enabling the promise of HTML5 with support for video conferencing and streaming. But this was also a philosophical evolution for media as Product Manager Mike Jazayeri noted in his blog post hailing the launch of the WebM Project:
“A key factor in the web’s success is that its core technologies such as HTML, HTTP, TCP/IP, etc. are open and freely implementable.”
As emerging first-class participants in the web experience, media and communication components also had to be free and open.

A decade later, these principles have ensured compression and communication technologies capable of keeping pace with a web ecosystem characterized by exponential growth of media consumption, devices, and demand. Starting from VP8 in 2010, the WebM Project has delivered up to 50% video bitrate savings with VP9 in 2013 and an additional 30% with AV1 in 2018—with adoption by YouTube, Facebook, Netflix, Twitch, and more. Equally importantly, the WebM team co-founded the Alliance for Open Media which has brought the IP of over 40 major tech companies in support of open and free codecs. With Chrome, Edge, Firefox and Safari supporting WebRTC, more than 85% of all installed browsers globally have become a client for real-time communications on the Internet. WebRTC has become a stable standard and it is now the default solution for video calling on the Web. These technologies have succeeded together, as today over 90% of encoded WebRTC video in Chrome uses VP8 or VP9.

The need for these technologies has been highlighted by COVID-19, as people across the globe have found new ways to work, educate, and connect with loved ones via video chat. The compression of open codecs has been essential to keeping services running on limited bandwidth, with over a billion hours of VP9 and AV1 content viewed every day. WebRTC has allowed for an ecosystem of interoperable communications apps to flourish: since the beginning of March 2020, we have seen in Chrome a 13X increase in received video streams via WebRTC.

These successes would not have been possible without all the supporters that make an open source community. Thank you to all the code contributors, testers, bug filers, and corporate partners who helped make this ecosystem a reality. A decade in, Google remains as committed as ever to open media on the web. We look forward to continuing that work with all of you in the next decade and beyond.

By Matt Frost, Product Director Chrome Media and Niklas Blum, Senior Product Manager WebRTC

Optimizing Multiple Loss Functions with Loss-Conditional Training



In many machine learning applications the performance of a model cannot be summarized by a single number, but instead relies on several qualities, some of which may even be mutually exclusive. For example, a learned image compression model should minimize the compressed image size while maximizing its quality. It is often not possible to simultaneously optimize all the values of interest, either because they are fundamentally in conflict, like the image quality and the compression ratio in the example above, or simply due to the limited model capacity. Hence, in practice one has to decide how to balance the values of interest.
The trade-off between the image quality and the file size in image compression. Ideally both the image distortion and the file size would be minimized, but these two objectives are fundamentally in conflict.
The standard approach to training a model that must balance different properties is to minimize a loss function that is the weighted sum of the terms measuring those properties. For instance, in the case of image compression, the loss function would include two terms, corresponding to the image reconstruction quality and the compression rate. Depending on the coefficients on these terms, training with this loss function results in a model producing image reconstructions that are either more compact or of higher quality.

If one needs to cover different trade-offs between model qualities (e.g. image quality vs compression rate), the standard practice is to train several separate models with different coefficients in the loss function of each. This requires keeping around multiple models both during training and inference, which is very inefficient. However, all of these separate models solve very related problems, suggesting that some information could be shared between them.

In two concurrent papers accepted at ICLR 2020, we propose a simple and broadly applicable approach that avoids the inefficiency of training multiple models for different loss trade-offs and instead uses a single model that covers all of them. In “You Only Train Once: Loss-Conditional Training of Deep Networks”, we give a general formulation of the method and apply it to several tasks, including variational autoencoders and image compression, while in “Adjustable Real-time Style Transfer”, we dive deeper into the application of the method to style transfer.

Loss-Conditional Training
The idea behind our approach is to train a single model that covers all choices of coefficients of the loss terms, instead of training a model for each set of coefficients. We achieve this by (i) training the model on a distribution of losses instead of a single loss function, and (ii) conditioning the model outputs on the vector of coefficients of the loss terms. This way, at inference time the conditioning vector can be varied, allowing us to traverse the space of models corresponding to loss functions with different coefficients.

This training procedure is illustrated in the diagram below for the style transfer task. For each training example, first the loss coefficients are randomly sampled. Then they are used both to condition the main network via the conditioning network and to compute the loss. The whole system is trained jointly end-to-end, i.e., the model parameters are trained concurrently with random sampling of loss functions.
Overview of the method, using stylization as an example. The main stylization network is conditioned on randomly sampled coefficients of the loss function and is trained on a distribution of loss functions, thus learning to model the entire family of loss functions.
The conceptual simplicity of this approach makes it applicable to many problem domains, with only minimal changes to existing code bases. Here we focus on two such applications, image compression and style transfer.

Application: Variable-Rate Image Compression
As a first example application of our approach, we show the results for learned image compression. When compressing an image, a user should be able to choose the desired trade-off between the image quality and the compression rate. Classic image compression algorithms are designed to allow for this choice. Yet, many leading learned compression methods require training a separate model for each such trade-off, which is computationally expensive both at training and at inference time. For problems such as this, where one needs a set of models optimized for different losses, our method offers a simple way to avoid inefficiency and cover all trade-offs with a single model.

We apply the loss-conditional training technique to the learned image compression model of Balle et al. The loss function here consists of two terms, a reconstruction term responsible for the image quality and a compactness term responsible for the compression rate. As illustrated below, our technique allows training a single model covering a wide range of quality-compression tradeoffs.
Compression at different quality levels with a single model. All animations are generated with a single model by varying the conditioning value.
Application: Adjustable Style Transfer
The second application we demonstrate is artistic style transfer, in which one synthesizes an image by merging the content from one image and the style from another. Recent methods allow training deep networks that stylize images in real time and in multiple styles. However, for each given style these methods do not allow the user to have control over the details of the synthesized output, for instance, how much to stylize the image and on which style features to place greater emphasis. If the stylized output is not appealing to the user, they have to train multiple models with different hyper-parameters until they get a favorite stylization.

Our proposed method instead allows training a single model covering a wide range of stylization variants. In this task, we condition the model on a loss function, which has coefficients corresponding to five loss terms, including the content loss and four terms for the stylization loss. Intuitively, the content loss regulates how much the stylized image should be similar to the original content, while the four stylization losses define which style features get carried over to the final stylized image. Below we show the outputs of our single model when varying all these coefficients:
Adjustable style transfer. All stylizations are generated with a single network by varying the conditioning values.
Clearly, the model captures a lot of variation within each style, such as the degree of stylization, the type of elements being added to the image, their exact configuration and locations, and more. More examples can be found on our webpage along with an interactive demo.

Conclusion
We have proposed loss-conditional training, a simple and general method that allows training a single deep network for tasks that would formerly require a large set of separately trained networks. While we have shown its application to image compression and style transfer, many more applications are possible — whenever the loss function has coefficients to be tuned, our method allows training a single model covering a wide range of these coefficients.

Acknowledgements
This blog post covers the work by multiple researchers on the Google Brain team: Mohammad Babaeizadeh, Johannes Balle, Josip Djolonga, Alexey Dosovitskiy, and Golnaz Ghiasi. This blog post would not be possible without crucial contributions from all of them. Images from the MS-COCO dataset and from unsplash.com are used for illustrations.

Source: Google AI Blog


Google and Binomial partner to open source high quality basis universal

Today, Google and Binomial are excited to announce the high quality update to the original Basis Universal release.

Basis Universal allows you to have state of the art web performance with your images, keeping images compressed even on the GPU. Older systems like JPEG and PNG may look small in storage size, but once they hit the GPU they are processed as uncompressed data! The original Basis Universal codec created images that were 6-8 times smaller than JPEG on the GPU while maintaining a similar storage size.

Today we release a high quality Basis Universal codec that utilizes the highest quality formats modern GPUs support, finally bringing the web up to modern GPU texture standards—with cross platform support. The textures are larger in storage size and GPU compressed size, but are still 3-4 times smaller than sending a JPEG or PNG file to be processed on the GPU, and can transcode to a lower quality format for older GPUs.
Original Image by Erol Ahmed from Unsplash.com
Visual comparison of Basis Universal High Quality

Best of all, we are actively working on standardizing Basis Universal with the Khronos Group.

Since our original release in Summer 2019 we’ve seen widespread adoption of Basis Universal in engines like three.js, Babylon.js, Godot, and more, changing what is possible for people to create on the web. Now that a high quality option is available, we expect to see even more adoption and groundbreaking applications created with it.

Please feel free to join our community on Github and check out the full demo there as well. You can also follow standardization efforts via Khronos Group events and forums.

By Stephanie Hurlburt, Binomial and Jamieson Brettle, Chrome Media

Announcing the Third Workshop and Challenge on Learned Image Compression



With the large amount of media content being downloaded and streamed across the internet, minimizing bandwidth while maintaining quality remains a constant challenge. In 2015, researchers demonstrated that neural network-based image compression could yield significant improvements to image resolution while retaining good quality and high compression speed. Continued advances in compression and bandwidth optimization techniques were stimulated in part by two successful workshops that we hosted at CVPR in 2018 and 2019.

Today, we are excited to announce the Third Workshop and Challenge On Learned Image Compression (CLIC) at CVPR 2020. This workshop challenges researchers to use machine learning, neural networks and other computer vision approaches to increase the quality and lower the bandwidth needed for multimedia transmission. This year’s workshop will also include two challenges: a low-rate image compression challenge and a P-Frame video compression challenge.

Similar to previous years, the goal of the low-rate image compression challenge is to compress an image dataset to 0.15 bits per pixel while maintaining the highest possible quality. Finalists will be selected by measuring their performance against the PSNR and MS-SSIM evaluation metrics. The final ranking will then be determined by a human evaluated rating task.

This year we are also introducing a P-Frame compression track, the first video compression task in this series. In this challenge, participants must first generate a transformation between two adjacent video frames. In the decompression part of the task, participants then use the first frame and their compressed representation to reconstruct the second frame. This challenge will be ranked based solely on the MS-SSIM performance score.

If you are doing research in the field of learned image compression or video compression, we encourage you to participate in CLIC, whether in the two competitions or the paper-only track for publications to be presented at the workshop at CVPR 2020. The validation server is currently available for submissions. The deadline for the final submission of the test set is March 23rd, 2020. For more details on the competition and an up-to-date schedule, please refer to compression.cc. Additional announcements and answers to questions can be found on our Google Groups page.

Acknowledgements
This workshop is being jointly hosted by researchers at Google, Twitter and ETH Zurich. We’d like to thank: George Toderici (Google), Nick Johnston (Google), Johannes Ballé (Google), Eirikur Agustsson (Google), Lucas Theis (Google), Wenzhe Shi (Twitter), Radu Timofte (ETH Zurich) and Fabian Mentzer (ETH Zurich) for their contributions.

Source: Google AI Blog


Google and Binomial Partner to Open-Source Basis Universal Texture Format

Today, Google and Binomial are excited to announce that we have partnered to open source the Basis Universal texture codec to improve the performance of transmitting images on the web and within desktop and mobile applications, while maintaining GPU efficiency. This release fills an important gap in the graphics compression ecosystem and complements earlier work in Draco geometry compression.

The Basis Universal texture format is 6-8 times smaller than JPEG on the GPU, yet is a similar storage size as JPEG – making it a great alternative to current GPU compression methods that are inefficient and don’t operate cross platform – and provides a more performant alternative to JPEG/PNG. It creates compressed textures that work well in a variety of use cases - games, virtual & augmented reality, maps, photos, small-videos, and more!

Without a universal texture format, developers are left with 2 options:

  • Use GPU formats and take the storage size hit.
  • Use other formats that have reduced storage size but couldn't compete with the GPU performance.

Maintaining so many different GPU formats is a burden on the whole ecosystem, from GPU manufacturers to software developers to the end user who can’t get a great cross platform experience. We’re streamlining this with one solution that has built-in flexibility (like optional higher quality modes) but is much easier on everyone to improve and maintain.

How does it all work? Compress your image using the encoder, choosing the quality settings that make sense for your project (you can also submit multiple images for small videos or optimization purposes, just know they’ll share the same color palette). Insert the transcoder code before rendering, which will turn the intermediary format into the GPU format your computer can read. The image stays compressed throughout this process, even on your GPU!  Instead of needing to decode and read the whole image, the GPU will read only the parts it needs. Enjoy the performance benefits!
Basis Universal can efficiently target the most common GPU formats
Google and Binomial will be working together to continue to support, maintain and add features, so check back frequently for the latest. This initial release of Basis Universal transcodes into the following GPU formats: PVRTC1 opaque, ETC1, ETC2 basic alpha, BC1-5, and BC7 opaque. Over the coming months more functionality will be added including BC7 transparent, ASTC opaque and alpha, PVRTC1 transparent, and higher quality BC7/ASTC.
Basis Universal reduces transmission size for texture while maintaining similar image quality.
See full benchmarking results
Basis Universal improves GPU memory usage over .jpeg and .png
With this partnership, we hope to see adoption of the transcoder in all major browsers to make performant cross-platform compressed textures accessible to everyone via the WebGL API, and the forthcoming WebGPU API. In addition to opening up the possibility of seamless integration into pipelines, everyone now has access to the state of the art compressor, which will also be open sourced.

We look forward to seeing what people do with Basis Universal now that it's open sourced. Check out the code and demo on GitHub, let us know what you think, and how you plan to use it! Currently, Basis Universal transcoders are available in C++ and WebAssembly.

By Stephanie Hurlburt, Binomial and Jamieson Brettle, Chrome Media

Announcing the Second Workshop and Challenge on Learned Image Compression



Last year, we announced the Workshop and Challenge on Learned Image Compression (CLIC), an event that aimed to advance the field of image compression with and without neural networks. Held during the 2018 Computer Vision and Pattern Recognition conference (CVPR 2018), CLIC was quite a success, with 23 accepted workshop papers, 95 authors and 41 entries into the competition. This spawned many new algorithms for image compression, domain specific applications to medical image compression and augmentations to existing methods based, with the winner Tucodec (abbreviated TUCod4c in the image below) achieving 13% better mean opinion score (MOS) than Better Portable Graphics (BPG) compression.
An example image from the 2018 test set, comparing the original image to BPG, JPEG and the results from nine competing teams. All the methods are better than JPEG in color reproduction and many of them are comparable to BPG in their ability to create legible text on the sign.
This year, we are again happy co-sponsor the second Workshop and Challenge on Learned Image Compression at CVPR 2019 in Long Beach, California.The half day workshop will feature talks from invited guests Anne Aaron (Netflix), Aaron Van Den Oord (DeepMind) and Jyrki Alakuijala (Google), along with presentations from five top performing teams in the 2019 competition, which is currently open for submissions.

This year's competition features two tracks for participants to compete in. The first track remains the same as last year, in what we're calling the "low-rate compression" track. The goal for low-rate compression is to compress an image dataset to 0.15 bits per pixel and maintaining the highest quality metrics as measured by PSNR, MS-SSIM and a human evaluated rating task.

The second track incorporates feedback from last year's workshop, in which participants expressed interest in the inverse challenge of determining the amount an image could be compressed and still look good. In this "transparent compression" challenge, we set a relatively high quality threshold for the test dataset (in both PSNR and MS-SSIM) with the goal of compressing the dataset to the smallest file sizes.

If you're doing research in the field of learned image compression, we encourage you to participate in CLIC during CVPR 2019. For more details on the competition and dates, please refer to compression.cc.

Acknowledgements
This workshop is being jointly hosted by researchers at Google, Twitter and ETH Zürich. We'd like to thank: George Toderici (Google), Michele Covell (Google), Johannes Ballé (Google), Nick Johnston (Google), Eirikur Agustsson (Google), Wenzhe Shi (Twitter), Lucas Theis (Twitter), Radu Timofte (ETH Zürich), Fabian Mentzer (ETH Zürich) for their contributions.

Source: Google AI Blog


Brotli Compression in Google Display Ads

Posted by Michael Burns, Software Engineer, Publisher Tagging & Ads Latency Team

Our goal is to help publishers monetize their content and build sustainable businesses through advertising products that allow sites to load as fast as possible to minimize impact to user experience.

Almost two years ago, our compression team announced a new compression algorithm called Brotli. Today, we are happy to announce that the Brotli compression algorithm is now being used to compress Google Display Ads whenever possible. In our experiments, we see data savings of 15% in aggregate over standard gzip compression, and in some instances, a savings of over 40%! This reduces the amount of data sent to end users by tens of thousands of gigabytes every day! This also results in faster page loads and less battery consumption.

We hope results like this will encourage wider adoption and will advance web standards such as Brotli compression.

AMP Compression Update

Posted by Zachary Nado, Software Engineer

Recently we announcedthe addition of Brotli compression to the Google AMP Cache. All AMP documents served from the Google AMP Cache can now be served with Brotli, which will save a considerable amount of bandwidth for our users and further our goal of improving the mobile experience.

Brotliis a newer, more efficient compression algorithm created by Jyrki Alakuijala and Zoltán Szabadka with the Google Research Europe Compression Team. Launched in 2015, it has already been used to enable considerable savings in other areas of Google. While it is a generic compression algorithm, it has particularly impressive performance when applied to web documents; we have seen an average decrease in document size of around 10% when using Brotli instead of gzip, which has amounted to hundreds of gigabytes of bandwidth saved per day across the Google AMP Cache.

With smaller document sizes, pages load faster while also saving bandwidth which can amount to noticeable savings for users on limited data plans. The Google AMP Cache is just the beginning though, as engineering teams are working on Brotli support in many other products which can enable bandwidth savings throughout Google.

Announcing Guetzli: A New Open Source JPEG Encoder

Crossposted on the Google Research Blog

At Google, we care about giving users the best possible online experience, both through our own services and products and by contributing new tools and industry standards for use by the online community. That’s why we’re excited to announce Guetzli, a new open source algorithm that creates high quality JPEG images with file sizes 35% smaller than currently available methods, enabling webmasters to create webpages that can load faster and use even less data.

Guetzli [guɛtsli] — cookie in Swiss German — is a JPEG encoder for digital images and web graphics that can enable faster online experiences by producing smaller JPEG files while still maintaining compatibility with existing browsers, image processing applications and the JPEG standard. From the practical viewpoint this is very similar to our Zopfli algorithm, which produces smaller PNG and gzip files without needing to introduce a new format; and different than the techniques used in RNN-based image compression, RAISR, and WebP, which all need client and ecosystem changes for compression gains at internet scale.

The visual quality of JPEG images is directly correlated to its multi-stage compression process: color space transform, discrete cosine transform, and quantization. Guetzli specifically targets the quantization stage in which the more visual quality loss is introduced, the smaller the resulting file. Guetzli strikes a balance between minimal loss and file size by employing a search algorithm that tries to overcome the difference between the psychovisual modeling of JPEG's format, and Guetzli’s psychovisual model, which approximates color perception and visual masking in a more thorough and detailed way than what is achievable by simpler color transforms and the discrete cosine transform. However, while Guetzli creates smaller image file sizes, the tradeoff is that these search algorithms take significantly longer to create compressed images than currently available methods.

orig-libjpeg-guetzli.png
Figure 1. 16x16 pixel synthetic example of  a phone line  hanging against a blue sky — traditionally a case where JPEG compression algorithms suffer from artifacts. Uncompressed original is on the left. Guetzli (on the right) shows less ringing artefacts than libjpeg (middle) and has a smaller file size.
And while Guetzli produces smaller image file sizes without sacrificing quality, we additionally found that in experiments where compressed image file sizes are kept constant that human raters consistently preferred the images Guetzli produced over libjpeg images, even when the libjpeg files were the same size or even slightly larger. We think this makes the slower compression a worthy tradeoff.

montage-cats-zoom-eye2.png
Figure 2. 20x24 pixel zoomed areas from a picture of a cat’s eye. Uncompressed original on the left. Guetzli (on the right) shows less ringing artefacts than libjpeg (middle) without requiring a larger file size.
It is our hope that webmasters and graphic designers will find Guetzli useful and apply it to their photographic content, making users’ experience smoother on image-heavy websites in addition to reducing load times and bandwidth costs for mobile users. Last, we hope that the new explicitly psychovisual approach in Guetzli will inspire further image and video compression research.

By Robert Obryk and Jyrki Alakuijala, Software Engineers, Google Research Europe