Junshu Tang, Bo Zhang, Binxin Yang, Ting Zhang, Dong Chen, Lizhuang Ma, and Fang Wen.
In contrast to the traditional avatar creation pipeline which is a costly process, contemporary generative approaches directly learn the data distribution from photographs. While plenty of works extend unconditional generative models and achieve some levels of controllability, it is still challenging to ensure multi-view consistency, especially in large poses. In this work, we propose a network that generates 3D-aware portraits while being controllable according to semantic parameters regarding pose, identity, expression and illumination. Our network uses neural scene representation to model 3D-aware portraits, whose generation is guided by a parametric face model that supports explicit control. While the latent disentanglement can be further enhanced by contrasting images with partially different attributes, there still exists noticeable inconsistency in non-face areas, e.g, hair and background, when animating expressions. We solve this by proposing a volume blending strategy in which we form a composite output by blending dynamic and static areas, with two parts segmented from the jointly learned semantic field. Our method outperforms prior arts in extensive experiments, producing realistic portraits with vivid expression in natural lighting when viewed from free viewpoints. It also demonstrates generalization ability to real images as well as out-of-domain data, showing great promise in real applications.
Install dependencies:
pip install -r requirements.txt
pip install -U git+https://github.com/fadel/pytorch_ema
Training requirements
-
Nvdiffrast. We use Nvdiffrast which is a pytorch library that provides high-performance primitive operations for rasterization-based differentiable rendering.
git clone https://github.com/NVlabs/nvdiffrast.git cd nvdiffrast/ python setup.py install
-
Basel Face Model 2009 (BFM09). Get access to BFM09 using this link. After getting the access, download
01_MorphableModel.mat
. In addition, we use an Expression Basis provided by Guo et al.. Download the Expression Basis (Exp_Pca.bin) using this link. Put them incheckpoints/face_ckpt/BFM/
-
Face Reconstruction Model. We use the network to extract identity, expression, lighting, and pose coefficients. Download the pretrained model
epoch_20.pth
and put it incheckpoints/face_ckpt/face_ckp/recon_model
-
Face Recognition Model. We use the ArcFace for extracting the deep face feature. Download the pretrained model
ms1mv3_arcface_r50_fp16/backbone.pth
and put it incheckpoints/face_ckpt/face_ckp/recog_model
-
Face Landmark Detection. Download
shape_predictor_68_face_landmarks.dat
from Dlib and put it incheckpoints/face_ckpt/face_ckp/
. -
Face Parsing Network. Download
79999_iter.pth
from face-parsing.PyTorch and put it incheckpoints/face_ckpt/face_ckp/
.
Download the pretrained models from here and save them in checkpoints/model
. For pretrained VAE decoder, please download our pretrained models from here and save them in checkpoints/vae_ckp/
. We provide a test sequence in here. Please download obama/mat/*.mat
and put them in data/
. Then run the command.
python test.py --curriculum FFHQ_512 --load_dir checkpoints/model/ --output_dir results --blend_mode both --seeds 41
-
FFHQ. Download
images1024x1024
and resize to 512x512 resolution and put them indata/ffhq/img
. -
Preprocess. Run the command, modify
aligned_image_path
andmat_path
.python preprocess.py --curriculum FFHQ_512 --image_dir data/ffhq/img --img_output_dir aligned_image_path --mat_output_dir mat_path
-
RAVDESS. We select 10 videos and sample 400 images from each video, resulting in 96,000 images in total. We extract expression coefficients for each image. You can download these data from here.
python train_vae.py --curriculum VAE_ALL --output_dir results/vae --render_dir results/render --weight 0.0025 --factor id # id/exp/gamma
- You can also download our pretrained VAE decoders from here and save them in
checkpoints/vae_ckp/
.
python train_control.py --curriculum FFHQ_512 --output_dir train_ffhq_512 --warmup1 5000 --warmup2 20000
python train_control.py --curriculum FFHQ_512 --output_dir train_ffhq_512 --load_dir load_dir --set_step 20001 --warmup1 5000 --warmup2 20000 --second
If you use this code for your research, please cite our paper.
@article{tang20233dfaceshop,
title={3DFaceShop: Explicitly Controllable 3D-Aware Portrait Generation},
author={Tang, Junshu and Zhang, Bo and Yang, Binxin and Zhang, Ting and Chen, Dong and Ma, Lizhuang and Wen, Fang},
journal={IEEE Transactions on Visualization \& Computer Graphics},
number={01},
pages={1--18},
year={2023},
publisher={IEEE Computer Society}
}
This code borrows heavily from pi-GAN, StyleGAN2 and Deep3DFaceRecon.