This ai is used to modify features of an image, for instance adding blond hairs, a smile or even removing sunglasses.
To achieve this, an AutoEncoder network has been used with multiple models such as Deep Convolutional or Progressive Growing architectures. Furthermore, to improve the sharpness of the results, a Deep Feature Consistent loss has been added.
All results are obtained with 30 minutes of training using Google Colab.
Here we follow this equation (strength is a constant with value 1.5) :
z_new = z + feature_vector * strength
Feature vectors are obtained by sampling 1000 images featuring this attribute and 1000 images without it. Then, we compute the average difference :
feature_vector = mean(z_positive - z_negative)
These GIFs represent a linear interpolation of latent vectors between the representation of an image (z_start) and its feature changed representation (z_end).
Similar to the previous section, this one delineates the linear interpolation between two image representations.
Start | End | DCAE | PGAE |
---|---|---|---|
Two different autoencoder architectures have been used :
The Deep Convolutional AutoEncoder is the simplest architecture. Composed of convolution, pooling, upsampling and batch normalization layers. The image resolution is 32x32 px and the latent vector is composed of 100 values.
The Progressive Growing AutoEncoder architecture is similar to the one described in this paper. Of course, this model is an auto encoder, not a GAN. All layers are based on DCAE but the training method is different. We progressively add layers that increase the image resolution from 8x8 to 64x64 px. In addition, no fully connected layer is used to produce the latent vector.
Furthermore, to improve image's sharpness, a deep feature consistent loss has been added like described in this paper. The target and the output are passed through a pretrained VGG19 network and MSE losses between the first three VGG layers are summed to produce the total loss. For the PGAE model, this loss is used only for 'high resolution' layers with a size of at least 32x32 px (otherwise, MSE is used). This has been implemented thanks to this repo (MIT license).
It turns out that the DCAE model outputs better results even though the resolution is twice as small compared to PGAE images.
- data : Dataset analysis
- display : Functions to plot and show data
- main : Contains code to train, eval and tweak the model. Also used to display representations like seen in this file
- net : Network models
- params : Hyper parameters and config
- stats : Statistics to retrieve feature vectors
- train : Training functions and statistics
- user : User config (not on git, more details bellow)
- vgg_loss : Used for deep consistent loss, implemented thanks to this file
Some user specific properties are gathered within the module src/user.py. This module is not on git, you must create it. Here are all properties of this file :
- dataset_path, string : The path of the root of the dataset
The dataset used to train the network and to make statistics is the celeba dataset. Since it is complicated to download it via pytorch, it was downloaded from kaggle.
celeba
├── img_align_celeba
│ └── img_align_celeba
│ ├── 000001.jpg
│ └── ...
├── img_align_celeba
├── list_attr_celeba.csv
├── list_bbox_celeba.csv
├── list_eval_partition.csv
└── list_landmarks_align_celeba.csv
This repo is under the MIT License. Contributions are welcome, feel free to add another model or to tune hyperparameters.