This repository contains Keras implementation of the Variational Autoencoder for face generation.
Note for people who speak Serbian language: Detaljna teorijska objašnjenja i matematička izvođenja potrebna za implementaciju Varijacionog Autoenkodera mogu se naći na sledećem linku
The model was trained on CelebA dataset. It contains 202600 images of human faces, which are labeled with attributes (such as smiling, young, male, eyeglasses, ...).
Following results are for the model trained on images of size 128x128.
The images blurry, but that is expected for VAE because it averages squared error across pixels.
New faces are generated by sampling the vector of the latent space from the prior distribution and passing it through the decoder.
Posterior distribution should be close to standard normal distribution, since the KL divergence in the loss function pushes posterior distribution to be close to the prior (which is standard normal distribution). We can not visualize 200 dimensional distribution, so the t-SNE algorithm is performed to reduce the dimensions to 2.
2D Distribution evaluated on 20 000 images is plotted on the right, and its histogram is presented on the left:
Since it is assumed that covariance matrix of the posterior distribution is diagonal, distribution of each dimension should be close to standard normal.
Distribution of the first 30 dimensions of the posterior distribution (evaluated on the 20 000 images) are shown with blue color and standard normal distributions are shown with red:
Using labeled attributes we can extract specific feature vectors from the latent space. For example if we want to extract smiling vector, we can do that by calculating the average vector of all the vectors from images that have smiling face and subtract it from the average vector of all the vectors from images that do not have smiling face. By normalizing the resulting vector we should get unit vector pointing in the direction not_smiling-->smiling. We can add this vector (with some intensity) to the vector of the image of the face without smile, and pass it through the decoder in order to get the image of the same face with smile. Same principle can be applied to other feature vectors.
If we take 2 vectors of the latent space from 2 corresponding images and start 'walking' from one vector to another, if we pass the vectors we encounter through the decoder we should get gradual transition from one face to another: