Skip to content

lavinal712/GenTron

Repository files navigation

GenTron: Diffusion Transformers for Image and Video Generation

Unofficial PyTorch Implementation

GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen, Mengmeng Xu, Jiawei Ren, Yuren Cong, Sen He, Yanping Xie, Animesh Sinha, Ping Luo, Tao Xiang, Juan-Manuel Perez-Rua
The University of Hong Kong, Meta

This repository contains:

Setup

conda create -n gentron python=3.10
conda activate gentron
pip install -r requirements.txt

Sampling

sample

python sample.py --image_size 512 --seed 1
python sample.py --model GenTron-T2I-XL/2 --image_size 256 --ckpt /path/to/model.pt
python sample_t2v.py --model GenTron-T2V-XL/2 --image_size 256 --ckpt /path/to/model.pt
GenTron Model Train Steps Image Resolution
B/2 150000 256x256

Training T2I Model

Preparation

torchrun --nnodes=1 --nproc_per_node=1 extract_features.py --data_path /path/to/ImageNet/train --features_path /path/to/ImageNet/features

Training

Train GenTron-T2I model directly.

accelerate launch --mixed_precision fp16 train.py --model GenTron-T2I-XL/2 --data_path /path/to/ImageNet/train
accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train.py --model GenTron-T2I-XL/2 --data_path /path/to/ImageNet/train

Train GenTron-T2I model with extracted features.

accelerate launch --mixed_precision fp16 train_v2.py --model GenTron-T2I-XL/2 --features_path /path/to/ImageNet/features
accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train_v2.py --model GenTron-T2I-XL/2 --features_path /path/to/ImageNet/features

Training T2V Model

Preparation

WebVid-10M Datset.

Assumes webvid data is structured as follows.
Webvid/
    videos/
        000001_000050/      ($page_dir)
            1.mp4           (videoid.mp4)
            ...
            5000.mp4
        ...

MSR-VTT Datset.

The official data and video links can be found in link.

For the convenience, you can also download the splits and captions by,

wget https://github.com/ArrowLuo/CLIP4Clip/releases/download/v0.0/msrvtt_data.zip

Besides, the raw videos can be found in sharing from Frozen️ in Time, i.e.,

wget https://www.robots.ox.ac.uk/~maxbain/frozen-in-time/data/MSRVTT.zip

Training

Train GenTron-T2V model directly.

accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train_t2v.py --model GenTron-T2V-XL/2 --meta_path /path/to/webvid/results_10M_train.csv --data_dir /path/to/webvid
accelerate launch --multi_gpu --num_processes N --mixed_precision fp16 train_t2v.py --model GenTron-T2V-XL/2 --meta_path /path/to/msrvtt_data/MSRVTT_data.json --data_dir /path/to/MSRVTT

Acknowledgments

Citation

@article{chen2023gentron,
  title={Gentron: Delving deep into diffusion transformers for image and video generation},
  author={Chen, Shoufa and Xu, Mengmeng and Ren, Jiawei and Cong, Yuren and He, Sen and Xie, Yanping and Sinha, Animesh and Luo, Ping and Xiang, Tao and Perez-Rua, Juan-Manuel},
  journal={arXiv preprint arXiv:2312.04557},
  year={2023}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages