Tag Archives: Style Transfer

Optimizing Multiple Loss Functions with Loss-Conditional Training

In many machine learning applications the performance of a model cannot be summarized by a single number, but instead relies on several qualities, some of which may even be mutually exclusive. For example, a learned image compression model should minimize the compressed image size while maximizing its quality. It is often not possible to simultaneously optimize all the values of interest, either because they are fundamentally in conflict, like the image quality and the compression ratio in the example above, or simply due to the limited model capacity. Hence, in practice one has to decide how to balance the values of interest.
The trade-off between the image quality and the file size in image compression. Ideally both the image distortion and the file size would be minimized, but these two objectives are fundamentally in conflict.
The standard approach to training a model that must balance different properties is to minimize a loss function that is the weighted sum of the terms measuring those properties. For instance, in the case of image compression, the loss function would include two terms, corresponding to the image reconstruction quality and the compression rate. Depending on the coefficients on these terms, training with this loss function results in a model producing image reconstructions that are either more compact or of higher quality.

If one needs to cover different trade-offs between model qualities (e.g. image quality vs compression rate), the standard practice is to train several separate models with different coefficients in the loss function of each. This requires keeping around multiple models both during training and inference, which is very inefficient. However, all of these separate models solve very related problems, suggesting that some information could be shared between them.

In two concurrent papers accepted at ICLR 2020, we propose a simple and broadly applicable approach that avoids the inefficiency of training multiple models for different loss trade-offs and instead uses a single model that covers all of them. In “You Only Train Once: Loss-Conditional Training of Deep Networks”, we give a general formulation of the method and apply it to several tasks, including variational autoencoders and image compression, while in “Adjustable Real-time Style Transfer”, we dive deeper into the application of the method to style transfer.

Loss-Conditional Training
The idea behind our approach is to train a single model that covers all choices of coefficients of the loss terms, instead of training a model for each set of coefficients. We achieve this by (i) training the model on a distribution of losses instead of a single loss function, and (ii) conditioning the model outputs on the vector of coefficients of the loss terms. This way, at inference time the conditioning vector can be varied, allowing us to traverse the space of models corresponding to loss functions with different coefficients.

This training procedure is illustrated in the diagram below for the style transfer task. For each training example, first the loss coefficients are randomly sampled. Then they are used both to condition the main network via the conditioning network and to compute the loss. The whole system is trained jointly end-to-end, i.e., the model parameters are trained concurrently with random sampling of loss functions.
Overview of the method, using stylization as an example. The main stylization network is conditioned on randomly sampled coefficients of the loss function and is trained on a distribution of loss functions, thus learning to model the entire family of loss functions.
The conceptual simplicity of this approach makes it applicable to many problem domains, with only minimal changes to existing code bases. Here we focus on two such applications, image compression and style transfer.

Application: Variable-Rate Image Compression
As a first example application of our approach, we show the results for learned image compression. When compressing an image, a user should be able to choose the desired trade-off between the image quality and the compression rate. Classic image compression algorithms are designed to allow for this choice. Yet, many leading learned compression methods require training a separate model for each such trade-off, which is computationally expensive both at training and at inference time. For problems such as this, where one needs a set of models optimized for different losses, our method offers a simple way to avoid inefficiency and cover all trade-offs with a single model.

We apply the loss-conditional training technique to the learned image compression model of Balle et al. The loss function here consists of two terms, a reconstruction term responsible for the image quality and a compactness term responsible for the compression rate. As illustrated below, our technique allows training a single model covering a wide range of quality-compression tradeoffs.
Compression at different quality levels with a single model. All animations are generated with a single model by varying the conditioning value.
Application: Adjustable Style Transfer
The second application we demonstrate is artistic style transfer, in which one synthesizes an image by merging the content from one image and the style from another. Recent methods allow training deep networks that stylize images in real time and in multiple styles. However, for each given style these methods do not allow the user to have control over the details of the synthesized output, for instance, how much to stylize the image and on which style features to place greater emphasis. If the stylized output is not appealing to the user, they have to train multiple models with different hyper-parameters until they get a favorite stylization.

Our proposed method instead allows training a single model covering a wide range of stylization variants. In this task, we condition the model on a loss function, which has coefficients corresponding to five loss terms, including the content loss and four terms for the stylization loss. Intuitively, the content loss regulates how much the stylized image should be similar to the original content, while the four stylization losses define which style features get carried over to the final stylized image. Below we show the outputs of our single model when varying all these coefficients:
Adjustable style transfer. All stylizations are generated with a single network by varying the conditioning values.
Clearly, the model captures a lot of variation within each style, such as the degree of stylization, the type of elements being added to the image, their exact configuration and locations, and more. More examples can be found on our webpage along with an interactive demo.

We have proposed loss-conditional training, a simple and general method that allows training a single deep network for tasks that would formerly require a large set of separately trained networks. While we have shown its application to image compression and style transfer, many more applications are possible — whenever the loss function has coefficients to be tuned, our method allows training a single model covering a wide range of these coefficients.

This blog post covers the work by multiple researchers on the Google Brain team: Mohammad Babaeizadeh, Johannes Balle, Josip Djolonga, Alexey Dosovitskiy, and Golnaz Ghiasi. This blog post would not be possible without crucial contributions from all of them. Images from the MS-COCO dataset and from unsplash.com are used for illustrations.

Source: Google AI Blog

Supercharging Style Transfer

Pastiche. A French word, it designates a work of art that imitates the style of another one (not to be confused with its more humorous Greek cousin, parody). Although it has been used for a long time in visual art, music and literature, pastiche has been getting mass attention lately with online forums dedicated to images that have been modified to be in the style of famous paintings. Using a technique known as style transfer, these images are generated by phone or web apps that allow a user to render their favorite picture in the style of a well known work of art.

Although users have already produced gorgeous pastiches using the current technology, we feel that it could be made even more engaging. Right now, each painting is its own island, so to speak: the user provides a content image, selects an artistic style and gets a pastiche back. But what if one could combine many different styles, exploring unique mixtures of well known artists to create an entirely unique pastiche?

Learning a representation for artistic style

In our recent paper titled “A Learned Representation for Artistic Style”, we introduce a simple method to allow a single deep convolutional style transfer network to learn multiple styles at the same time. The network, having learned multiple styles, is able to do style interpolation, where the pastiche varies smoothly from one style to another. Our method enables style interpolation in real-time as well, allowing this to be applied not only to static images, but also videos.
Credit: awesome dog role played by Google Brain team office dog Picabo.
In the video above, multiple styles are combined in real-time and the resulting style is applied using a single style transfer network. The user is provided with a set of 13 different painting styles and adjusts their relative strengths in the final style via sliders. In this demonstration, the user is an active participant in producing the pastiche.

A Quick History of Style Transfer

While transferring the style of one image to another has existed for nearly 15 years [1] [2], leveraging neural networks to accomplish it is both very recent and very fascinating. In “A Neural Algorithm of Artistic Style” [3], researchers Gatys, Ecker & Bethge introduced a method that uses deep convolutional neural network (CNN) classifiers. The pastiche image is found via optimization: the algorithm looks for an image which elicits the same kind of activations in the CNN’s lower layers - which capture the overall rough aesthetic of the style input (broad brushstrokes, cubist patterns, etc.) - yet produces activations in the higher layers - which capture the things that make the subject recognizable - that are close to those produced by the content image. From some starting point (e.g. random noise, or the content image itself), the pastiche image is progressively refined until these requirements are met.
Content image: The Tübingen Neckarfront by Andreas Praefcke, Style painting: “Head of a Clown”, by Georges Rouault.
The pastiches produced via this algorithm look spectacular:
Figure adapted from L. Gatys et al. "A Neural Algorithm of Artistic Style" (2015). 
This work is considered a breakthrough in the field of deep learning research because it provided the first proof of concept for neural network-based style transfer. Unfortunately this method for stylizing an individual image is computationally demanding. For instance, in the first demos available on the web, one would upload a photo to a server, and then still have plenty of time to go grab a cup of coffee before a result was available.

This process was sped up significantly by subsequent research [4, 5] that recognized that this optimization problem may be recast as an image transformation problem, where one wishes to apply a single, fixed painting style to an arbitrary content image (e.g. a photograph). The problem can then be solved by teaching a feed-forward, deep convolutional neural network to alter a corpus of content images to match the style of a painting. The goal of the trained network is two-fold: maintain the content of the original image while matching the visual style of the painting.

The end result of this was that what once took a few minutes for a single static image, could now be run real time (e.g. applying style transfer to a live video). However, the increase in speed that allowed real-time style transfer came with a cost - a given style transfer network is tied to the style of a single painting, losing some flexibility of the original algorithm, which was not tied to any one style. This means that to build a style transfer system capable of modeling 100 paintings, one has to train and store 100 separate style transfer networks.

Our Contribution: Learning and Combining Multiple Styles

We started from the observation that many artists from the impressionist period employ similar brush stroke techniques and color palettes. Furthermore, painting by say, Monet, are even more visually similar.
Poppy Field (left) and Impression, Sunrise (right) by Claude Monet. Images from Wikipedia
We leveraged this observation in our training of a machine learning system. That is, we trained a single system that is able to capture and generalize across many Monet paintings or even a diverse array of artists across genres. The pastiches produced are qualitatively comparable to those produced in previous work, while originating from the same style transfer network.
Pastiches produced by our single network, trained on 32 varied styles. These pastiches are qualitatively equivalent to those created by single-style networks: Image Credit: (from top to bottom) content photographs by Andreas Praefcke, Rich Niewiroski Jr. and J.-H. Janßen, (from left to right) style paintings by William Glackens, Paul Signac, Georges Rouault, Edvard Munch and Vincent van Gogh.
The technique we developed is simple to implement and is not memory intensive. Furthermore, our network, trained on several artistic styles, permits arbitrary combining multiple painting styles in real-time, as shown in the video above. Here are four styles being combined in different proportions on a photograph of Tübingen:
Unlike previous approaches to fast style transfer, we feel that this method of modeling multiple styles at the same time opens the door to exciting new ways for users to interact with style transfer algorithms, not only allowing the freedom to create new styles based on the mixture of several others, but to do it in real-time. Stay tuned for a future post on the Magenta blog, in which we will describe the algorithm in more detail and release the TensorFlow source code to run this model and demo yourself. We also recommend that you check out Nat & Lo’s fantastic video explanation on the subject of style transfer.


[1] Efros, Alexei A., and William T. Freeman. Image quilting for texture synthesis and transfer (2001).

[2] Hertzmann, Aaron, Charles E. Jacobs, Nuria Oliver, Brian Curless, and David H. Salesin. Image analogies (2001).

[3] Gatys, Leon A., Alexander S. Ecker, and Matthias Bethge. A Neural Algorithm of Artistic Style (2015).

[4] Ulyanov, Dmitry, Vadim Lebedev, Andrea Vedaldi, and Victor Lempitsky. Texture Networks: Feed-forward Synthesis of Textures and Stylized Images (2016).

[5] Johnson, Justin, Alexandre Alahi, and Li Fei-Fei. Perceptual Losses for Real-Time Style Transfer and Super-Resolution (2016).

* This work was done during an internship with the Google Brain Team. Vincent is currently a Ph.D. candidate at MILA, Université de Montréal.