$\text{Style}^2\text{Talker}$ : High-Resolution Talking Head Generation with Emotion Style and Art Style
This repository provides official implementations of PyTorch for the $partial$ core components of the following paper:
Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style
Shuai Tan, et al.
In AAAI, 2024.
Our approach takes an identity image and an audio clip as inputs and generates a talking head with emotion style and art style, which are controlled respectively by an emotion source text and an art source picture. The pipeline of our
We train and test based on Python 3.7 and Pytorch. To install the dependencies run:
conda create -n style2talker python=3.7
conda activate style2talker
- python packages
pip install -r requirements.txt
- Run the demo:
The result will be stored in save_path.
python inference.py --img_path path/to/image --wav_path path/to/audio --source_3DMM path/to/source_3DMM --style_e_source "a textual description for emotion style" --art_style_id num/for/art_style --save_path path/to/save
- Crop videos in training datasets:
python data_preprocess/crop_video.py
- Extract 3DMM parameters from cropped videos using Deep3DFaceReconstruction:
python data_preprocess/extract_3DMM.py
- Extract landmarks from cropped videos:
python data_preprocess/extract_lmdk.py
- Extract mel feature from audio:
python data_preprocess/get_mel.py
- We save the video frames and 3DMM parameters in a lmdb file:
python data_preprocess/prepare_lmdb.py
- Following VToonify, different art styles correspond to different checkpoints, and you can use the following script to train the model to get the art style you want:
# Train Style-A: python -m torch.distributed.launch --nproc_per_node=4 --master_port 12344 train_style_a.py
- We use the following dataset for Style-E training.
- MEAD. download link.
- We use the following dataset for Style-A training.
- MEAD. download link.
- HDTF. download link.
- Art reference picture dataset.
- Cartoon. download link.
- Illustration, Arcane, Comic, Pixar. download link.
Some code are borrowed from following projects:
- AGRoL
- PIRenderer
- Deep3DFaceRecon_pytorch
- SadTalker
- VToonify
- DualStyleGAN
- StyleHEAT
- FOMM video preprocessing
Thanks for their contributions!
If you find this codebase useful for your research, please use the following entry.
@inproceedings{tan2024style2talker,
title={Style2Talker: High-Resolution Talking Head Generation with Emotion Style and Art Style},
author={Tan, Shuai and Ji, Bin and Pan, Ye},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={5},
pages={5079--5087},
year={2024}
}