This is an implementation of a Variational Autoencoder (VAE) for human faces. You can use it to alter the facial expressions in images. It can automatically detect faces in an image, extract them, modify them, and then place them back into the image. An example of a face where we added smiling with this is:
Currently the VAE can only handle 128x128 input images, so the altered face will never have a higher resolution than that. Therefore the model works best when the size of the faces in the input images is a little smaller than 128x128, we recommend 100x100.
This project requires Python 3.6 or higher. The neural network is based on PyTorch. For Face-Detection and Landmark-Detection, we use dlib. Please install all the dependencies from the requirents.txt
file with
python3 -m pip install -r requirements.txt
Now you need to load the weights for the trained VAE into a new folder named data
. They can be found here. Also, you need to put the trained landmark detector there, so download this file and put it there as well.
Now you can run the project on your own images. To specify what you want to do edit the config.py
file. Example:
config = {
'attribute': 'Smiling',
'parameter_range': (0, 5.1, 1), # (min, max, step)
'image_filename': './myimage.jpg',
'sample_size': 5
}
- specify the image to modify under
image_filename
- specify the attribute to change with
attribute
. Currently available are:Smiling
,Mustache
,Young
andMale
- specify a range of parameters to use under
parameter_range
. Each parameter from this range is used to create one image and the value specifies how strong the attribute should be applied. A parameter valuedp
will lead to the addition ofp
times the attribute vector to the base latent vector. The value ofparameter_range
should be a 3-tuple like(start, end, step)
which will lead to a range of parameters starting atstart
(imclusive) and stopping atend
(exclusive) using steps of sizestep
.- Note: Parameter values can also be negative. E.g. if you want to make someone look older, you need to add a negative multiple of the expression vector
Young
, otherwise this would make the person look younger.
- Note: Parameter values can also be negative. E.g. if you want to make someone look older, you need to add a negative multiple of the expression vector
sample_size
specifies how many different versions of the attribute vector should be used. This is there because there are multiple versions of each vector, e.g. different versions of theSmiling
vector lead to different types of smiling and it also depends on the image how well a single version works. Ifsample_size
is set ton
, this means that there will ben
versions used that are selected randomly.
To modify your picture run
python3 enhance.py
This will plot the resulting images using matplotlib. If you want to store the resulting images in a directory of your choice, you can run the script like this:
python3 enhance.py --out_dir <your_output_directory>
Optionally, you can also plot the detected face and its landmarks by setting the --plot_landmarks
option.
The input image will be scanned for faces with a dlib face detector, and the found face will be analyzed with a shape predictor (also from dlib) that aligns 68 facial landmarks onto the face. This leads to a picture like this:
Afterwards the image will be aligned, i.e. we rotate it so that the eyes will form a horizontal line and we will scale it to a uniform size of 128x128 pixels around the face:
This aligned face can then be fed into the Variational Autoencoder, i.e. it will be encoded as a 500 elements vector that represents the face. This vector can be decoded again to obtain an approximate version of the face again.
To alter the facial expression we have a set of precomputed expression vectors that correspond to certain facial attributes such as smiling, age or gender. The facial expression can be altered by adding or subtracting the suitable expression vector to the face vector encoded by the VAE. If you e.g. want to make someone smile, you can add a multiple of the smiling vector, if you want to remove a smile, you will subtract a multiple of the smiling vector.
Now, we have a modified latent vector that can be decoded by the VAE again to obtain an actual image of the modified face:
This modified image is now scaled back to the original size and moved to its original location in the original image. We use the previously computed locations of the 68 facial landmarks to create a mask around the actual face:
We then keep only the pixels that are part of our face and morph them into the original image. This yields the final result:Here are some example pictures from the CelebA Dataset, the internet and me that have been modified with this VAE.
It also works fine with non-portrait images such as this one from Elon Musk: