Skip to content

Commit

Permalink
code release
Browse files Browse the repository at this point in the history
  • Loading branch information
hw-liang committed Apr 1, 2024
1 parent 6a421c1 commit 8a61e4e
Show file tree
Hide file tree
Showing 129 changed files with 25,836 additions and 3 deletions.
4 changes: 1 addition & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ Konstantinos N. Plataniotis, and Zhangyang Wang

## News

- 2024.4.1: Released code!
- 2024.3.25: Released on arxiv!

## Overview
Expand All @@ -23,13 +24,10 @@ We release our pre-generated static assets in `data/` directory. During training

## Custom Prompts

## Code is coming soon!

## Training

## Testing

## Citation

If you find this repository/work helpful in your research, please consider citing the paper and starring the repo ⭐.

470 changes: 470 additions & 0 deletions VideoCrafter/License

Large diffs are not rendered by default.

199 changes: 199 additions & 0 deletions VideoCrafter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,199 @@

## ___***VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models***___

<a href='https://ailab-cvc.github.io/videocrafter2/'><img src='https://img.shields.io/badge/Project-Page-green'></a>
<a href='https://arxiv.org/abs/2401.09047'><img src='https://img.shields.io/badge/Technique-Report-red'></a>
[![Discord](https://dcbadge.vercel.app/api/server/rrayYqZ4tf?style=flat)](https://discord.gg/rrayYqZ4tf)
<a href='https://huggingface.co/spaces/VideoCrafter/VideoCrafter2'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
[![GitHub](https://img.shields.io/github/stars/VideoCrafter/VideoCrafter?style=social)](https://github.com/VideoCrafter/VideoCrafter)

### 🔥🔥 Our dedicated high-resolution I2V model is released at: :point_right:[DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter)!!!

[![](https://img.youtube.com/vi/0NfmIsNAg-g/0.jpg)](https://www.youtube.com/watch?v=0NfmIsNAg-g)

### 🔥The VideoCrafter2 Large improvements over VideoCrafter1 with limited data. Better Motion, Better Concept Combination!!!

Please Join us and create your own film on [Discord/Floor33](https://discord.gg/rrayYqZ4tf).

##### 🎥 Exquisite film, produced by VideoCrafter2, directed by Human
[![IMAGE ALT TEXT HERE](https://img.youtube.com/vi/TUsFkW0tK-s/0.jpg)](https://www.youtube.com/watch?v=TUsFkW0tK-s)

## 🔆 Introduction

🤗🤗🤗 VideoCrafter is an open-source video generation and editing toolbox for crafting video content.
It currently includes the Text2Video and Image2Video models:

### 1. Generic Text-to-video Generation
Click the GIF to access the high-resolution video.

<table class="center">
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/d20ee09d-fc32-44a8-9e9a-f12f44b30411"><img src=assets/t2v/tom.gif width="320"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/f1d9f434-28e8-44f6-a9b8-cffd67e4574d"><img src=assets/t2v/child.gif width="320"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/bbcfef0e-d8fb-4850-adc0-d8f937c2fa36"><img src=assets/t2v/woman.gif width="320"></td>
<tr>
<td style="text-align:center;" width="320">"Tom Cruise's face reflects focus, his eyes filled with purpose and drive."</td>
<td style="text-align:center;" width="320">"A child excitedly swings on a rusty swing set, laughter filling the air."</td>
<td style="text-align:center;" width="320">"A young woman with glasses is jogging in the park wearing a pink headband."</td>
<tr>
</table >

<table class="center">
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/7edafc5a-750e-45f3-a46e-b593751a4b12"><img src=assets/t2v/couple.gif width="320"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/37fe41c8-31fb-4e77-bcf9-fa159baa6d86"><img src=assets/t2v/rabbit.gif width="320"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/09791a46-a243-41b8-a6bb-892cdd3a83a2"><img src=assets/t2v/duck.gif width="320"></td>
<tr>
<td style="text-align:center;" width="320">"With the style of van gogh, A young couple dances under the moonlight by the lake."</td>
<td style="text-align:center;" width="320">"A rabbit, low-poly game art style"</td>
<td style="text-align:center;" width="320">"Impressionist style, a yellow rubber duck floating on the wave on the sunset"</td>
<tr>
</table >

### 2. Generic Image-to-video Generation

<table class="center">
<td><img src=assets/i2v/input/blackswan.png width="170"></td>
<td><img src=assets/i2v/input/horse.png width="170"></td>
<td><img src=assets/i2v/input/chair.png width="170"></td>
<td><img src=assets/i2v/input/sunset.png width="170"></td>
<tr>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/1a57edd9-3fd2-4ce9-8313-89aca95b6ec7"><img src=assets/i2v/blackswan.gif width="170"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/d671419d-ae49-4889-807e-b841aef60e8a"><img src=assets/i2v/horse.gif width="170"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/39d730d9-7b47-4132-bdae-4d18f3e651ee"><img src=assets/i2v/chair.gif width="170"></td>
<td><a href="https://github.com/AILab-CVC/VideoCrafter/assets/18735168/dc8dd0d5-a80d-4f31-94db-f9ea0b13172b"><img src=assets/i2v/sunset.gif width="170"></td>
<tr>
<td style="text-align:center;" width="170">"a black swan swims on the pond"</td>
<td style="text-align:center;" width="170">"a girl is riding a horse fast on grassland"</td>
<td style="text-align:center;" width="170">"a boy sits on a chair facing the sea"</td>
<td style="text-align:center;" width="170">"two galleons moving in the wind at sunset"</td>

</table >

:boom: **You are highly recommended to try our dedicated I2V model [DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter): Higher resolution, Better Dynamics, More Coherence!!!**

---

## 📝 Changelog
- __[2024.02.05]__: 🔥🔥 Release new I2V model with the resolution of 640x1024 of VideoCrafter1/DynamiCrafter.

- __[2024.01.26]__: Release the 512x320 checkpoint of VideoCrafter2.

- __[2024.01.18]__: Release the [VideoCrafter2](https://ailab-cvc.github.io/videocrafter2/) and [Tech Report](https://arxiv.org/abs/2401.09047)!

- __[2023.10.30]__: Release [VideoCrafter1](https://arxiv.org/abs/2310.19512) Technical Report!

- __[2023.10.13]__: Release the VideoCrafter1, High Quality Video Generation!

- __[2023.08.14]__: Release a new version of VideoCrafter on [Discord/Floor33](https://discord.gg/uHaQuThT). Please join us to create your own film!

- __[2023.04.18]__: Release a VideoControl model with most of the watermarks removed!

- __[2023.04.05]__: Release pretrained Text-to-Video models, VideoLora models, and inference code.
<br>


## ⏳ Models

|T2V-Models|Resolution|Checkpoints|
|:---------|:---------|:--------|
|VideoCrafter2|320x512|[Hugging Face](https://huggingface.co/VideoCrafter/VideoCrafter2/blob/main/model.ckpt)
|VideoCrafter1|576x1024|[Hugging Face](https://huggingface.co/VideoCrafter/Text2Video-1024/blob/main/model.ckpt)
|VideoCrafter1|320x512|[Hugging Face](https://huggingface.co/VideoCrafter/Text2Video-512/blob/main/model.ckpt)

|I2V-Models|Resolution|Checkpoints|
|:---------|:---------|:--------|
|VideoCrafter1|640x1024|[Hugging Face](https://huggingface.co/Doubiiu/DynamiCrafter_1024/blob/main/model.ckpt)
|VideoCrafter1|320x512|[Hugging Face](https://huggingface.co/VideoCrafter/Image2Video-512/blob/main/model.ckpt)



## ⚙️ Setup

### 1. Install Environment via Anaconda (Recommended)
```bash
conda create -n videocrafter python=3.8.5
conda activate videocrafter
pip install -r requirements.txt
```


## 💫 Inference
### 1. Text-to-Video

1) Download pretrained T2V models via [Hugging Face](https://huggingface.co/VideoCrafter/VideoCrafter2/blob/main/model.ckpt), and put the `model.ckpt` in `checkpoints/base_512_v2/model.ckpt`.
2) Input the following commands in terminal.
```bash
sh scripts/run_text2video.sh
```

### 2. Image-to-Video

1) Download pretrained I2V models via [Hugging Face](https://huggingface.co/VideoCrafter/Image2Video-512-v1.0/blob/main/model.ckpt), and put the `model.ckpt` in `checkpoints/i2v_512_v1/model.ckpt`.
2) Input the following commands in terminal.
```bash
sh scripts/run_image2video.sh
```

### 3. Local Gradio demo

1. Download the pretrained T2V and I2V models and put them in the corresponding directory according to the previous guidelines.
2. Input the following commands in terminal.
```bash
python gradio_app.py
```

---
## 📋 Techinical Report
😉 VideoCrafter2 Tech report: [VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models](https://arxiv.org/abs/2401.09047)

😉 VideoCrafter1 Tech report: [VideoCrafter1: Open Diffusion Models for High-Quality Video Generation](https://arxiv.org/abs/2310.19512)
<br>

## 😉 Citation
The technical report is currently unavailable as it is still in preparation. You can cite the paper of our image-to-video model and related base model.
```
@misc{chen2024videocrafter2,
title={VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models},
author={Haoxin Chen and Yong Zhang and Xiaodong Cun and Menghan Xia and Xintao Wang and Chao Weng and Ying Shan},
year={2024},
eprint={2401.09047},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@misc{chen2023videocrafter1,
title={VideoCrafter1: Open Diffusion Models for High-Quality Video Generation},
author={Haoxin Chen and Menghan Xia and Yingqing He and Yong Zhang and Xiaodong Cun and Shaoshu Yang and Jinbo Xing and Yaofang Liu and Qifeng Chen and Xintao Wang and Chao Weng and Ying Shan},
year={2023},
eprint={2310.19512},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@article{xing2023dynamicrafter,
title={DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors},
author={Jinbo Xing and Menghan Xia and Yong Zhang and Haoxin Chen and Xintao Wang and Tien-Tsin Wong and Ying Shan},
year={2023},
eprint={2310.12190},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
@article{he2022lvdm,
title={Latent Video Diffusion Models for High-Fidelity Long Video Generation},
author={Yingqing He and Tianyu Yang and Yong Zhang and Ying Shan and Qifeng Chen},
year={2022},
eprint={2211.13221},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
```


## 🤗 Acknowledgements
Our codebase builds on [Stable Diffusion](https://github.com/Stability-AI/stablediffusion).
Thanks the authors for sharing their awesome codebases!


## 📢 Disclaimer
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.
****
25 changes: 25 additions & 0 deletions VideoCrafter/cog.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Configuration for Cog ⚙️
# Reference: https://github.com/replicate/cog/blob/main/docs/yaml.md

build:
gpu: true
system_packages:
- "libgl1-mesa-glx"
- "libglib2.0-0"
python_version: "3.11"
python_packages:
- "torch==2.0.1"
- "opencv-python==4.8.1.78"
- "torchvision==0.15.2"
- "pytorch_lightning==2.1.0"
- "einops==0.7.0"
- "imageio==2.31.6"
- "omegaconf==2.3.0"
- "transformers==4.35.0"
- "moviepy==1.0.3"
- "av==10.0.0"
- "decord==0.6.0"
- "kornia==0.7.0"
- "open-clip-torch==2.12.0"
- "xformers==0.0.21"
predict: "predict.py:Predictor"
83 changes: 83 additions & 0 deletions VideoCrafter/configs/inference_i2v_512_v1.0.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
model:
target: lvdm.models.ddpm3d.LatentVisualDiffusion
params:
linear_start: 0.00085
linear_end: 0.012
num_timesteps_cond: 1
timesteps: 1000
first_stage_key: video
cond_stage_key: caption
cond_stage_trainable: false
conditioning_key: crossattn
image_size:
- 40
- 64
channels: 4
scale_by_std: false
scale_factor: 0.18215
use_ema: false
uncond_type: empty_seq
use_scale: true
scale_b: 0.7
finegrained: true
unet_config:
target: lvdm.modules.networks.openaimodel3d.UNetModel
params:
in_channels: 4
out_channels: 4
model_channels: 320
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
- 4
num_head_channels: 64
transformer_depth: 1
context_dim: 1024
use_linear: true
use_checkpoint: true
temporal_conv: true
temporal_attention: true
temporal_selfatt_only: true
use_relative_position: false
use_causal_attention: false
use_image_attention: true
temporal_length: 16
addition_attention: true
fps_cond: true
first_stage_config:
target: lvdm.models.autoencoder.AutoencoderKL
params:
embed_dim: 4
monitor: val/rec_loss
ddconfig:
double_z: true
z_channels: 4
resolution: 512
in_channels: 3
out_ch: 3
ch: 128
ch_mult:
- 1
- 2
- 4
- 4
num_res_blocks: 2
attn_resolutions: []
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: lvdm.modules.encoders.condition.FrozenOpenCLIPEmbedder
params:
freeze: true
layer: penultimate
cond_img_config:
target: lvdm.modules.encoders.condition.FrozenOpenCLIPImageEmbedderV2
params:
freeze: true
Loading

0 comments on commit 8a61e4e

Please sign in to comment.