MuteApo commited on
Commit
53587e3
·
verified ·
1 Parent(s): a136f01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +169 -3
README.md CHANGED
@@ -1,3 +1,169 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # CamI2V: Camera-Controlled Image-to-Video Diffusion Model
2
+
3
+
4
+ <div align="center">
5
+ <a href="https://arxiv.org/abs/2410.15957">
6
+ <img src="https://img.shields.io/static/v1?label=arXiv&message=2410.15957&color=b21d1a" style="display: inline-block; vertical-align: middle;">
7
+ </a>
8
+ <a href="https://zgctroy.github.io/CamI2V">
9
+ <img src="https://img.shields.io/static/v1?label=Project&message=Page&color=green" style="display: inline-block; vertical-align: middle;">
10
+ </a>
11
+ <a href="https://huggingface.co/MuteApo/CamI2V/tree/main">
12
+ <img src="https://img.shields.io/static/v1?label=HuggingFace&message=Checkpoints&color=blue" style="display: inline-block; vertical-align: middle;">
13
+ </a>
14
+ </div>
15
+
16
+
17
+ ## 🌟 News and Todo List
18
+
19
+
20
+ - 🔥 25/03/17: Upload test metadata used in our paper to make easier evaluation.
21
+ - 🔥 25/02/15: Release demo of [RealCam-I2V](https://zgctroy.github.io/RealCam-I2V/) for real-world applications, code will be available at [repo](https://github.com/ZGCTroy/RealCam-I2V).
22
+ - 🔥 25/01/12: Release checkpoint of [CamI2V (512x320, 100k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_100k.pt). We plan to release a more advanced model with longer training soon.
23
+ - 🔥 25/01/02: Release checkpoint of [CamI2V (512x320, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_50k.pt), which is suitable for research propose and comparison.
24
+ - 🔥 24/12/24: Integrate [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL) in gradio demo, you can now caption your own input image by this powerful VLM.
25
+ - 🔥 24/12/23: Release checkpoint of [CamI2V (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cami2v.pt).
26
+ - 🔥 24/12/16: Release reproduced non-official checkpoints of [MotionCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_motionctrl.pt) and [CameraCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cameractrl.pt) on [DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter).
27
+ - 🔥 24/12/09: Release training configs and scripts.
28
+ - 🔥 24/12/06: Release [dataset pre-process code](datasets) for RealEstate10K.
29
+ - 🔥 24/12/02: Release [evaluation code](evaluation) for RotErr, TransErr, CamMC and FVD.
30
+ - 🌱 24/11/16: Release model code of CamI2V for training and inference, including implementation for MotionCtrl and CameraCtrl.
31
+
32
+ ## 🎥 Gallery
33
+
34
+ <table>
35
+ <tr>
36
+ <td align="center">
37
+ rightward rotation and zoom in<br>(CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)
38
+ </td>
39
+ <td align="center">
40
+ leftward rotation and zoom in<br>(CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)
41
+ </td>
42
+ </tr>
43
+ <tr>
44
+ <td align="center">
45
+ <img src="https://github.com/user-attachments/assets/74a764f4-0631-4fbe-94b9-af51057f99a5" width="75%">
46
+ </td>
47
+ <td align="center">
48
+ <img src="https://github.com/user-attachments/assets/99309759-8355-4ee1-95c4-897f01c46720" width="75%">
49
+ </td>
50
+ </tr>
51
+ <tr>
52
+ <td align="center">
53
+ zoom in and upward movement<br>(CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)
54
+ </td>
55
+ <td align="center">
56
+ downward movement and zoom-out<br>(CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)
57
+ </td>
58
+ </tr>
59
+ <tr>
60
+ <td align="center">
61
+ <img src="https://github.com/user-attachments/assets/aef4cc2e-fd7e-46db-82bc-a7e59aab5963" width="75%">
62
+ </td>
63
+ <td align="center">
64
+ <img src="https://github.com/user-attachments/assets/f204992a-d729-492c-a663-85f9b80680f5" width="75%">
65
+ </td>
66
+ </tr>
67
+ </table>
68
+
69
+ ## 📈 Performance
70
+
71
+ Measured under 256x256 resolution, 50k training steps, 25 DDIM steps, text-image CFG 7.5, camera CFG 1.0 (no camera CFG).
72
+
73
+ | Method | RotErr↓ | TransErr↓ | CamMC↓ | FVD↓<br>(VideoGPT) | FVD↓<br>(StyleGAN) |
74
+ | :------------ | :--------: | :--------: | :--------: | :----------------: | :----------------: |
75
+ | DynamiCrafter | 3.3415 | 9.8024 | 11.625 | 106.02 | 92.196 |
76
+ | MotionCtrl | 0.8636 | 2.5068 | 2.9536 | 70.820 | 60.363 |
77
+ | CameraCtrl | 0.7064 | 1.9379 | 2.3070 | 66.713 | 57.644 |
78
+ | CamI2V | **0.4120** | **1.3409** | **1.5291** | **62.439** | **53.361** |
79
+
80
+ ### Inference Speed and GPU Memory
81
+
82
+ | Method | # Parameters | GPU Memory | Generation Time<br>(RTX 3090) |
83
+ | :------------ | :----------: | :--------: | :---------------------------: |
84
+ | DynamiCrafter | 1.4 B | 11.14 GiB | 8.14 s |
85
+ | MotionCtrl | + 63.4 M | 11.18 GiB | 8.27 s |
86
+ | CameraCtrl | + 211 M | 11.56 GiB | 8.38 s |
87
+ | CamI2V | + 261 M | 11.67 GiB | 10.3 s |
88
+
89
+ ## ⚙️ Environment
90
+
91
+ ### Quick Start
92
+
93
+ ```shell
94
+ conda create -n cami2v python=3.10
95
+ conda activate cami2v
96
+
97
+ conda install -y pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.1 -c pytorch -c nvidia
98
+ conda install -y xformers -c xformers
99
+ pip install -r requirements.txt
100
+ ```
101
+
102
+ ## 💫 Inference
103
+
104
+ ### Download Model Checkpoints
105
+
106
+ | Model | Resolution | Training Steps |
107
+ | :--------- | :--------: | :--------------------------------------------------------------------------------------------------------------------------------------------------: |
108
+ | CamI2V | 512x320 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_50k.pt), [100k](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_100k.pt) |
109
+ | CamI2V | 256x256 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cami2v.pt) |
110
+ | CameraCtrl | 256x256 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cameractrl.pt) |
111
+ | MotionCtrl | 256x256 | [50k](https://huggingface.co/MuteApo/CamI2V/blob/main/256_motionctrl.pt) |
112
+
113
+ Currently we release 256x256 checkpoints with 50k training steps of DynamiCrafter-based CamI2V, CameraCtrl and MotionCtrl, which is suitable for research propose and comparison.
114
+
115
+ We also release 512x320 checkpoints of our CamI2V with longer training, make possible higher resolution and more advanced camera-controlled video generation.
116
+
117
+ Download above checkpoints and put under `ckpts` folder.
118
+ Please edit `ckpt_path` in `configs/models.json` if you have a different model path.
119
+
120
+ ### Download Qwen2-VL Captioner (Optional)
121
+
122
+ Not required but recommend.
123
+ It is used to caption your custom image in gradio demo for video generaion.
124
+ We prefer the [AWQ](https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct-AWQ) quantized version of Qwen2-VL due to speed and GPU memory.
125
+
126
+ Download the pre-trained model and put under `pretrained_models` folder:
127
+
128
+ ```shell
129
+ ─┬─ pretrained_models/
130
+ └─── Qwen2-VL-7B-Instruct-AWQ/
131
+ ```
132
+
133
+ ### Run Gradio Demo
134
+
135
+ ```shell
136
+ python cami2v_gradio_app.py --use_qwenvl_captioner
137
+ ```
138
+
139
+ Gradio may struggle to establish network connection, please re-try with `--use_host_ip`.
140
+
141
+
142
+ ## 🤗 Related Repo
143
+
144
+ [RealCam-I2V: https://github.com/ZGCTroy/RealCam-I2V](https://github.com/ZGCTroy/RealCam-I2V)
145
+
146
+ [CameraCtrl: https://github.com/hehao13/CameraCtrl](https://github.com/hehao13/CameraCtrl)
147
+
148
+ [MotionCtrl: https://github.com/TencentARC/MotionCtrl](https://github.com/TencentARC/MotionCtrl)
149
+
150
+ [DynamiCrafter: https://github.com/Doubiiu/DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter)
151
+
152
+
153
+ ## 🗒️ Citation
154
+
155
+ ```
156
+ @article{zheng2024cami2v,
157
+ title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
158
+ author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
159
+ journal={arXiv preprint arXiv:2410.15957},
160
+ year={2024}
161
+ }
162
+
163
+ @article{li2025realcam,
164
+ title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control},
165
+ author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
166
+ journal={arXiv preprint arXiv:2502.10059},
167
+ year={2025},
168
+ }
169
+ ```