Skip to content

Implementation and experimentation of the SRCNN model in TensorFlow 2.0

License

Notifications You must be signed in to change notification settings

kingfischer16/srcnn-tf2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

srcnn-tf2

Implementation and experimentation of the SRCNN model in TensorFlow 2.0


Purpose

This repository contains an implementation of the SRCNN super resolution algorithm [1] implemented in TensorFlow 2.0 (v2.3.1). This library is used to construct, train, and evaluate the SRCNN model, as well as to explore variations on architecture and implementation.

Model

Model architecture

The base SRCNN model consists of upscaling using bicubic interpolation three main convolutional layers [1]:

  1. A layer for mapping low-resolution patches as feature vectors, consisting of n1 x c (where c is the number of channels in the image) filters of size f1 x f1.
  2. A non-linear mapping layer consisting of n2 filters of size 1 x 1. This layer maps each low-resolution patch onto another high-resolution patch in the following layer. Can be more that one layer to increase non-linearity.
  3. A reconstruction layer consisting of c x n2 filters of size f3 x f3, where the layers are aggregated to product the final image.

Typical values for the parameters are:

  • n1: 64
  • n2: 32
  • f1: 9
  • f3: 5

Loss function

The authors of [1] use the mean squared error, MSE, as a loss function and note that this favors a high PSNR measurement on performance testing. While PSNR corresponds to closely matching pixel values, it does not necessarily help create images with high perceptual image quality (e.g. satisfying to human vision).

Gains in perceptual quality can be made by using content or texture loss, or a weighted combination of these. This will be added in the future.

Data

This work beings by using the same training and benchmarking data used in the original paper.

Datasets

Preprocessing

Training is done only on the T91 datset, but these images require preprocessing.

Image patch generation

We will train on a series of 32 x 32 image patches pulled from the T91 dataset by stepping the patch window across each image by a stride of 14 pixels, which will give roughly 90,000 images. These images are downscaled and them blurred using a Gaussian kernel. Upscaling will happen in the model class.

Data augmentation

Rotation - Each 32 x 32 pixel image is rotated by 90 degrees to give 4 images per input image.

Channel swap - Swapping the red, green, and blue channels of each image, creating 6 images per input image.

Performance

We will use the following to gauge the performance of upscaling:

Model exploration

After creating and benchmarking the basic SRCNN model described in [1] we will explore the effect of changing several factors in an attempt to maximize the effectiveness of the model. The following will be altered:

  • Number of filters in layers 1 and 3
  • Size of filters in layers 1 and 3
  • Number of non-linear layers
  • Loss functions: content/texture/hyrbid loss
  • Batch normalization and dropout
  • Ensembling: multiple trained SRCNN models with fused prediction output
  • Self-ensembling: Predict on separate 90 degree rotations of the same input image and combine the predictions

Attributions

References

  1. C. Dong, C. C. Loy, K. He, X. Tang, "Learning a deep convolutional network for image super-resolution," in ECCV, 2014.

Github projects

This repository is built on the work of the following projects:

About

Implementation and experimentation of the SRCNN model in TensorFlow 2.0

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published