Neural Style Transfer Using TensorFlow

4 min readJul 12, 2021

Overview

Neural Style Transfer is a technique for generating new artistic images from existing content and style images. One of the ways to implement neural style transfer is by using an unsupervised deep learning algorithm.

This blog post presents the main ideas and implementation details of Neural Style Transfer in Python using TensorFlow deep learning library.

Complete code can be accessed here — LINK

Main Concepts

Dataset

Only 2 images — a Content Image and a Style Image are needed. Algorithm generates a new image — Generated Image — with content similar to content image and style similar to style image. A labelled or a large dataset is not required.

Content Image — picture taken at California’s Great America Amusement Park located in Santa Clara — Content Image — Image taken at California’s Great America Amusement Park located in Santa Clara (Image by author)

Image Resolution

Image resolution refers to the number of pixels in width and height direction. Interestingly, content and style images can have different resolutions . However, the generated image has the same resolution as the content image.

Also, there is no requirement for the content and style images to have a specific shape/aspect ratio. The images must only satisfy a minimum resolution requirement based on the selected deep learning model.

Deep Learning Model

Neural Style Transfer uses a pre-trained deep learning model. Further training or fine tuning of the pre-trained model is not required.

Any of the popular pre-trained deep learning models like VGG19, ResNet50, etc can be used. These networks have been pre-trained on large datasets like ImageNet and so have learned to extract complex features from images.

Optimization Process

In Neural Style Transfer, the generated image is interestingly also the input to the model.

Pixel values of the generated image are updated during the training process. Weights, bias and other parameters of the pre-trained neural network model are kept fixed.

The input/generated image can be initialized as a noisy copy of the content image at the start of training. As training progresses, content of input image starts to look like the content of content image and style of input image starts to look like the style of style image.

Any of the popular deep learning optimizers like Adam, SGD, etc can be used for updating the pixel values of the input image.

Initialization — generated image initialized as a noisy copy of the content image at start of training (Image generated by code)

Image Content and Style Representation

The objective is to generate an image with content similar to the content image and style similar to the style image. This is the most interesting and insightful part on how to define the content and style of an image.

As an input image passes through different layers in the neural network, it gets transformed in terms of the pixel values, number of pixels in height and width directions and number of channels. These transformed outputs can be considered as different representations of the input image.

The earlier layers in the network learn to detect and represent basic features like edges, corners while the deeper layers in the network could learn to encode more complex features and shapes.

Output from one of these inner layers can be used to represent the content of the image. Any inner layer can be selected which provides good results. Similarly, one or multiple inner layers can be used to represent the style of the image.

We can consider the outputs from the selected inner layer to directly correspond to the content of the image. However, for representing the style of the image we compute the correlations across different channels of the inner layer outputs by using the Gram matrix approach.

Implementation Details

Complete code for implementation of Neural Style Transfer using an unsupervised deep learning algorithm in Python and TensorFlow can be accessed on Google Colab here — LINK