Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,169 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# CamI2V: Camera-Controlled Image-to-Video Diffusion Model
|
2 |
+
|
3 |
+
|
4 |
+
<div align="center">
|
5 |
+
<a href="https://arxiv.org/abs/2410.15957">
|
6 |
+
<img src="https://img.shields.io/static/v1?label=arXiv&message=2410.15957&color=b21d1a" style="display: inline-block; vertical-align: middle;">
|
7 |
+
</a>
|
8 |
+
<a href="https://zgctroy.github.io/CamI2V">
|
9 |
+
<img src="https://img.shields.io/static/v1?label=Project&message=Page&color=green" style="display: inline-block; vertical-align: middle;">
|
10 |
+
</a>
|
11 |
+
<a href="https://huggingface.co/MuteApo/CamI2V/tree/main">
|
12 |
+
<img src="https://img.shields.io/static/v1?label=HuggingFace&message=Checkpoints&color=blue" style="display: inline-block; vertical-align: middle;">
|
13 |
+
</a>
|
14 |
+
</div>
|
15 |
+
|
16 |
+
|
17 |
+
## 🌟 News and Todo List
|
18 |
+
|
19 |
+
|
20 |
+
- 🔥 25/03/17: Upload test metadata used in our paper to make easier evaluation.
|
21 |
+
- 🔥 25/02/15: Release demo of [RealCam-I2V](https://zgctroy.github.io/RealCam-I2V/) for real-world applications, code will be available at [repo](https://github.com/ZGCTroy/RealCam-I2V).
|
22 |
+
- 🔥 25/01/12: Release checkpoint of [CamI2V (512x320, 100k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_100k.pt). We plan to release a more advanced model with longer training soon.
|
23 |
+
- 🔥 25/01/02: Release checkpoint of [CamI2V (512x320, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_50k.pt), which is suitable for research propose and comparison.
|
24 |
+
- 🔥 24/12/24: Integrate [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL) in gradio demo, you can now caption your own input image by this powerful VLM.
|
25 |
+
- 🔥 24/12/23: Release checkpoint of [CamI2V (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cami2v.pt).
|
26 |
+
- 🔥 24/12/16: Release reproduced non-official checkpoints of [MotionCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_motionctrl.pt) and [CameraCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cameractrl.pt) on [DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter).
|
27 |
+
- 🔥 24/12/09: Release training configs and scripts.
|
28 |
+
- 🔥 24/12/06: Release [dataset pre-process code](datasets) for RealEstate10K.
|
29 |
+
- 🔥 24/12/02: Release [evaluation code](evaluation) for RotErr, TransErr, CamMC and FVD.
|
30 |
+
- 🌱 24/11/16: Release model code of CamI2V for training and inference, including implementation for MotionCtrl and CameraCtrl.
|
31 |
+
|
32 |
+
## 🎥 Gallery
|
33 |
+
|
34 |
+
<table>
|
35 |
+
<tr>
|
36 |
+
<td align="center">
|
37 |
+
rightward rotation and zoom in<br>(CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)
|
38 |
+
</td>
|
39 |
+
<td align="center">
|
40 |
+
leftward rotation and zoom in<br>(CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)
|
41 |
+
</td>
|
42 |
+
</tr>
|
43 |
+
<tr>
|
44 |
+
<td align="center">
|
45 |
+
<img src="https://github.com/user-attachments/assets/74a764f4-0631-4fbe-94b9-af51057f99a5" width="75%">
|
46 |
+
</td>
|
47 |
+
<td align="center">
|
48 |
+
<img src="https://github.com/user-attachments/assets/99309759-8355-4ee1-95c4-897f01c46720" width="75%">
|
49 |
+
</td>
|
50 |
+
</tr>
|
51 |
+
<tr>
|
52 |
+
<td align="center">
|
53 |
+
zoom in and upward movement<br>(CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)
|
54 |
+
</td>
|
55 |
+
<td align="center">
|
56 |
+
downward movement and zoom-out<br>(CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)
|
57 |
+
</td>
|
58 |
+
</tr>
|
59 |
+
<tr>
|
60 |
+
<td align="center">
|
61 |
+
<img src="https://github.com/user-attachments/assets/aef4cc2e-fd7e-46db-82bc-a7e59aab5963" width="75%">
|
62 |
+
</td>
|
63 |
+
<td align="center">
|
64 |
+
<img src="https://github.com/user-attachments/assets/f204992a-d729-492c-a663-85f9b80680f5" width="75%">
|
65 |
+
</td>
|
66 |
+
</tr>
|
67 |
+
</table>
|
68 |
+
|
69 |
+
## 📈 Performance
|
70 |
+
|
71 |
+
Measured under 256x256 resolution, 50k training steps, 25 DDIM steps, text-image CFG 7.5, camera CFG 1.0 (no camera CFG).
|
72 |
+
|
73 |
+
| Method | RotErr↓ | TransErr↓ | CamMC↓ | FVD↓<br>(VideoGPT) | FVD↓<br>(StyleGAN) |
|
74 |
+
| :------------ | :--------: | :--------: | :--------: | :----------------: | :----------------: |
|
75 |
+
| DynamiCrafter | 3.3415 | 9.8024 | 11.625 | 106.02 | 92.196 |
|
76 |
+
| MotionCtrl | 0.8636 | 2.5068 | 2.9536 | 70.820 | 60.363 |
|
77 |
+
| CameraCtrl | 0.7064 | 1.9379 | 2.3070 | 66.713 | 57.644 |
|
78 |
+
| CamI2V | **0.4120** | **1.3409** | **1.5291** | **62.439** | **53.361** |
|
79 |
+
|
80 |
+
### Inference Speed and GPU Memory
|
81 |
+
|
82 |
+
| Method | # Parameters | GPU Memory | Generation Time<br>(RTX 3090) |
|
83 |
+
| :------------ | :----------: | :--------: | :---------------------------: |
|
84 |
+
| DynamiCrafter | 1.4 B | 11.14 GiB | 8.14 s |
|
85 |
+
| MotionCtrl | + 63.4 M | 11.18 GiB | 8.27 s |
|
86 |
+
| CameraCtrl | + 211 M | 11.56 GiB | 8.38 s |
|
87 |
+
| CamI2V | + 261 M | 11.67 GiB | 10.3 s |
|
88 |
+
|
89 |
+
## ⚙️ Environment
|
90 |
+
|
91 |
+
### Quick Start
|
92 |
+
|
93 |
+
```shell
|
94 |
+
conda create -n cami2v python=3.10
|
95 |
+
conda activate cami2v
|
96 |
+
|
97 |
+
conda install -y pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.1 -c pytorch -c nvidia
|
98 |
+
conda install -y xformers -c xformers
|
99 |
+
pip install -r requirements.txt
|
100 |
+
```
|
101 |
+
|
102 |
+
## 💫 Inference
|
103 |
+
|
104 |
+
### Download Model Checkpoints
|
105 |
+
|
106 |
+
| Model | Resolution | Training Steps |
|
107 |
+
| :--------- | :--------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
|
108 |
+
| CamI2V | 512x320 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_50k.pt), [100k](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_100k.pt) |
|
109 |
+
| CamI2V | 256x256 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cami2v.pt) |
|
110 |
+
| CameraCtrl | 256x256 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cameractrl.pt) |
|
111 |
+
| MotionCtrl | 256x256 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/256_motionctrl.pt) |
|
112 |
+
|
113 |
+
Currently we release 256x256 checkpoints with 50k training steps of DynamiCrafter-based CamI2V, CameraCtrl and MotionCtrl, which is suitable for research propose and comparison.
|
114 |
+
|
115 |
+
We also release 512x320 checkpoints of our CamI2V with longer training, make possible higher resolution and more advanced camera-controlled video generation.
|
116 |
+
|
117 |
+
Download above checkpoints and put under `ckpts` folder.
|
118 |
+
Please edit `ckpt_path` in `configs/models.json` if you have a different model path.
|
119 |
+
|
120 |
+
### Download Qwen2-VL Captioner (Optional)
|
121 |
+
|
122 |
+
Not required but recommend.
|
123 |
+
It is used to caption your custom image in gradio demo for video generaion.
|
124 |
+
We prefer the [AWQ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-AWQ) quantized version of Qwen2-VL due to speed and GPU memory.
|
125 |
+
|
126 |
+
Download the pre-trained model and put under `pretrained_models` folder:
|
127 |
+
|
128 |
+
```shell
|
129 |
+
─┬─ pretrained_models/
|
130 |
+
└─── Qwen2-VL-7B-Instruct-AWQ/
|
131 |
+
```
|
132 |
+
|
133 |
+
### Run Gradio Demo
|
134 |
+
|
135 |
+
```shell
|
136 |
+
python cami2v_gradio_app.py --use_qwenvl_captioner
|
137 |
+
```
|
138 |
+
|
139 |
+
Gradio may struggle to establish network connection, please re-try with `--use_host_ip`.
|
140 |
+
|
141 |
+
|
142 |
+
## 🤗 Related Repo
|
143 |
+
|
144 |
+
[RealCam-I2V: https://github.com/ZGCTroy/RealCam-I2V](https://github.com/ZGCTroy/RealCam-I2V)
|
145 |
+
|
146 |
+
[CameraCtrl: https://github.com/hehao13/CameraCtrl](https://github.com/hehao13/CameraCtrl)
|
147 |
+
|
148 |
+
[MotionCtrl: https://github.com/TencentARC/MotionCtrl](https://github.com/TencentARC/MotionCtrl)
|
149 |
+
|
150 |
+
[DynamiCrafter: https://github.com/Doubiiu/DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter)
|
151 |
+
|
152 |
+
|
153 |
+
## 🗒️ Citation
|
154 |
+
|
155 |
+
```
|
156 |
+
@article{zheng2024cami2v,
|
157 |
+
title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
|
158 |
+
author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
|
159 |
+
journal={arXiv preprint arXiv:2410.15957},
|
160 |
+
year={2024}
|
161 |
+
}
|
162 |
+
|
163 |
+
@article{li2025realcam,
|
164 |
+
title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control},
|
165 |
+
author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
|
166 |
+
journal={arXiv preprint arXiv:2502.10059},
|
167 |
+
year={2025},
|
168 |
+
}
|
169 |
+
```
|