Official PyTorch implementation of "Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation".
conda create -n hvdm python=3.8 -y
source activate hvdm
pip install torch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2
pip install natsort tqdm gdown omegaconf einops lpips pyspng tensorboard imageio av moviepy PyWavelets
We conduct experiments on three datasets: SkyTimelapse, UCF-101, TaiChi. Please refer to the directories structure below and locate it in the /data
folder. You can modify the data directory path where data is stored by changing the data_location
variable in tools/dataloader.py
.
The dataset and checkpoints should be placed in the following structures below
HVDM
├── configs
├── data
└── SKY
├── 001.png
└── ...
└── TaiChi
├── 001.png
└── ...
└── UCF-101
├── folder
├── 001.avi
└── ...
├── ...
├── results
├── ddpm_final_[DATASET]_42
├── model_[EPOCH].pth
└── ...
└── first_stage_ae_final_[DATASET]_42
├── model_[EPOCH].pth
└── ...
├── tools
└── main.py
For settings related to the experiment name, please refer to the PVDM which is the repository our code is based on. Here, [EXP_NAME]
is an experiment name you want to specifiy, [DATASET]
is either SKY
or UCF101
or TaiChi
, and [DIRECTOTY]
denotes a directory of the autoencoder to be used.
python main.py
--exp first_stage \
--id [EXP_NAME] \
--pretrain_config configs/autoencoder/base.yaml \
--data [DATASET_NAME] \
--batch_size [BATCH_SIZE]
This script will automatically save logs and checkpoints in ./results
folder.
python main.py \
--exp ddpm \
--id [EXP_NAME] \
--pretrain_config configs/autoencoder/base.yaml \
--data [DATASET] \
--first_model [AUTOENCODER DIRECTORY]
--diffusion_config configs/latent-diffusion/base.yaml \
--batch_size [BATCH_SIZE]
We are currently working on incorporating code for Image2Video and Video Dynamics Control. Also the model checkpoints will be released soon.
python sample.py
--exp ddpm \
--first_model './results/model_[EPOCH].pth' \
--second_model 'results/ddpm_main_UCF101_42/ema_model_[EPOCH].pth' \
--mode short
python sample.py
--exp ddpm \
--first_model '.results/model_[EPOCH].pth' \
--second_model 'results/ddpm_main_[DATASET]_42/ema_model_[EPOCH].pth' \
--mode long
@article{kim2024hybrid,
title={Hybrid Video Diffusion Models with 2D Triplane and 3D Wavelet Representation},
author={Kim, Kihong and Lee, Haneol and Park, Jihye and Kim, Seyeon and Lee, Kwanghee and Kim, Seungryong and Yoo, Jaejun},
journal={arXiv preprint arXiv:2402.13729},
year={2024}
}
HVDM draws significant inspiration from the following projects: pvdm, wavediff, latent-diffusion, and stylegan2-ada-pytorch repositories. We thank to all contributors for making their work openly accessible.