Baris Gecer 1,2, Jiankang Deng 1,2, & Stefanos Zafeiriou 1,2
1 Imperial College London
2 Huawei CBG
The last few years have witnessed the great success of non-linear generative models in synthesizing high-quality photorealistic face images. Many recent 3D facial texture reconstruction and pose manipulation from a single image approaches still rely on large and clean face datasets to train image-to-image Generative Adversarial Networks (GANs). Yet the collection of such a large scale high-resolution 3D texture dataset is still very costly and difficult to maintain age/ethnicity balance. Moreover, regression-based approaches suffer from generalization to the in-the-wild conditions and are unable to fine-tune to a target-image. In this work, we propose an unsupervised approach for one-shot 3D facial texture completion that does not require large-scale texture datasets, but rather harnesses the knowledge stored in 2D face generators. The proposed approach rotates an input image in 3D and fill-in the unseen regions by reconstructing the rotated image in a 2D face generator, based on the visible parts. Finally, we stitch the most visible textures at different angles in the UV image-plane. Further, we frontalize the target image by projecting the completed texture into the generator. The qualitative and quantitative experiments demonstrate that the completed UV textures and frontalized images are of high quality, resembles the original identity, can be used to train a texture GAN model for 3DMM fitting and improve pose-invariant face recognition.
Overview of the method. The proposed approach iteratively optimizes the texture UV-maps for different re-rendered images with their masks. At the end of each optimization, generated images are used to acquire partial UV images by dense landmarks. Finally, the completed UV images are fed to the next iteration for progressive texture building.This implementation is only tested under Ubuntu environment with Nvidia GPUs and CUDA 10.0 and CuDNN-7.0 installed.
git clone https://github.com/barisgecer/OSTeC --recursive
cd OSTeC
conda env create -f environment.yml -n ostec
source activate ostec
- 2.a. Install Nvdiffrast library:
cd external/deep3dfacerecon/nvdiffrast # ./OSTeC/external/deep3dfacerecon/nvdiffrast
pip install .
- 2.b. Install Arcface Pytorch:
cd .. # ./OSTeC/external/deep3dfacerecon/
git clone https://github.com/deepinsight/insightface.git
cp -r ./insightface/recognition/arcface_torch/ ./models/
- 2.c. Prepare prerequisite models: Deep3DFaceRecon_pytorch method uses Basel Face Model 2009 (BFM09) to represent 3d faces. Get access to BFM09 using this link. After getting the access, download "01_MorphableModel.mat". In addition, we use an Expression Basis provided by Guo et al.. Download the Expression Basis (Exp_Pca.bin) using this link (google drive). Organize all files into the following structure:
OSTeC
│
└─── external
│
└─── deep3dfacerecon
│
└─── BFM
│
└─── 01_MorphableModel.mat
│
└─── Exp_Pca.bin
|
└─── ...
- 2.d. Deep3DFaceRecon_pytorch provides a model trained on a combination of CelebA, LFW, 300WLP, IJB-A, LS3D-W, and FFHQ datasets. Download the pre-trained model using this link (google drive) and organize the directory into the following structure:
OSTeC
│
└─── external
│
└─── deep3dfacerecon
│
└─── checkpoints
│
└─── face_recon
│
└─── epoch_latest.pth
-
Download the models here: https://drive.google.com/file/d/1TBoNt55vleRkMZaT9XKt6oNQmo8hkN-Q/view?usp=sharing
-
And place it under 'models' directory like the following:
OSTeC
│
└─── models
│
└─── resnet_18_20191231.h5
│
└─── vgg16_zhang_perceptual.pkl
│
└─── alignment
│ .
│ .
│
└─── fr_models
.
.
-
Download the Graphonomy model here: https://drive.google.com/file/d/1eUe18HoH05p0yFUd_sN6GXdTj82aW0m9/view?usp=sharing (If the link doesn't work for some reason check the original Graphonomy github page and download 'CIHP trained model')
-
And place it under 'models' directory like the following:
OSTeC
│
└─── models
│
└─── Graphonomy
│
└─── inference.pth
- Run
python run_ostec.py --source_dir [source_dir] --save_dir [save_dir] [-f] -i [iterations (default 200)] -m [soft|hard|auto]
- Modes (-m or --mode):
- soft: keep the original texture for visible parts (recommended when the input image is high resolution, near-frontal, and non-occluded.)
- hard: generate all
- auto: soft for frontal, hard for profile images
- The source code shared here is protected under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License which does NOT allow commercial use. To view a copy of this license, see LICENSE
- Copyright (c) 2020, Baris Gecer. All rights reserved.
- This work is made available under the CC BY-NC-SA 4.0.
- Our projection relies on NVIDIA's StyleGANv2
- Thanks @jiankangdeng for providing Face Recognition and Landmark Detection models
- We use MTCNN for face detection
- We use Graphonomy for face segmentation (i.e. to exclude hairs, occlusion)
- 3D face reconstruction has been originally solved by GANFit. However, since it is commercialized and will not be public, I had to re-implement the ports for Deep3DFaceRecon_pytorch.
- We initialize StyleGAN parameters by Style-Encoder (by @rolux, @pbaylies).
- Thanks Zhang et al. for VGG16 model
If you find this work is useful for your research, please cite our paper:
@InProceedings{Gecer_2021_CVPR,
author = {Gecer, Baris and Deng, Jiankang and Zafeiriou, Stefanos},
title = {OSTeC: One-Shot Texture Completion},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2021},
pages = {7628-7638}
}