Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,6 @@ license: mit
|
|
3 |
---
|
4 |
# CamI2V: Camera-Controlled Image-to-Video Diffusion Model
|
5 |
|
6 |
-
|
7 |
<div align="center">
|
8 |
<a href="https://arxiv.org/abs/2410.15957">
|
9 |
<img src="https://img.shields.io/static/v1?label=arXiv&message=2410.15957&color=b21d1a" style="display: inline-block; vertical-align: middle;">
|
@@ -16,22 +15,6 @@ license: mit
|
|
16 |
</a>
|
17 |
</div>
|
18 |
|
19 |
-
|
20 |
-
## π News and Todo List
|
21 |
-
|
22 |
-
|
23 |
-
- π₯ 25/03/17: Upload test metadata used in our paper to make easier evaluation.
|
24 |
-
- π₯ 25/02/15: Release demo of [RealCam-I2V](https://zgctroy.github.io/RealCam-I2V/) for real-world applications, code will be available at [repo](https://github.com/ZGCTroy/RealCam-I2V).
|
25 |
-
- π₯ 25/01/12: Release checkpoint of [CamI2V (512x320, 100k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_100k.pt). We plan to release a more advanced model with longer training soon.
|
26 |
-
- π₯ 25/01/02: Release checkpoint of [CamI2V (512x320, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_50k.pt), which is suitable for research propose and comparison.
|
27 |
-
- π₯ 24/12/24: Integrate [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL) in gradio demo, you can now caption your own input image by this powerful VLM.
|
28 |
-
- π₯ 24/12/23: Release checkpoint of [CamI2V (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cami2v.pt).
|
29 |
-
- π₯ 24/12/16: Release reproduced non-official checkpoints of [MotionCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_motionctrl.pt) and [CameraCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cameractrl.pt) on [DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter).
|
30 |
-
- π₯ 24/12/09: Release training configs and scripts.
|
31 |
-
- π₯ 24/12/06: Release [dataset pre-process code](datasets) for RealEstate10K.
|
32 |
-
- π₯ 24/12/02: Release [evaluation code](evaluation) for RotErr, TransErr, CamMC and FVD.
|
33 |
-
- π± 24/11/16: Release model code of CamI2V for training and inference, including implementation for MotionCtrl and CameraCtrl.
|
34 |
-
|
35 |
## π₯ Gallery
|
36 |
|
37 |
<table>
|
@@ -69,6 +52,20 @@ license: mit
|
|
69 |
</tr>
|
70 |
</table>
|
71 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
72 |
## π Performance
|
73 |
|
74 |
Measured under 256x256 resolution, 50k training steps, 25 DDIM steps, text-image CFG 7.5, camera CFG 1.0 (no camera CFG).
|
@@ -141,7 +138,6 @@ python cami2v_gradio_app.py --use_qwenvl_captioner
|
|
141 |
|
142 |
Gradio may struggle to establish network connection, please re-try with `--use_host_ip`.
|
143 |
|
144 |
-
|
145 |
## π€ Related Repo
|
146 |
|
147 |
[RealCam-I2V: https://github.com/ZGCTroy/RealCam-I2V](https://github.com/ZGCTroy/RealCam-I2V)
|
@@ -152,7 +148,6 @@ Gradio may struggle to establish network connection, please re-try with `--use_h
|
|
152 |
|
153 |
[DynamiCrafter: https://github.com/Doubiiu/DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter)
|
154 |
|
155 |
-
|
156 |
## ποΈ Citation
|
157 |
|
158 |
```
|
|
|
3 |
---
|
4 |
# CamI2V: Camera-Controlled Image-to-Video Diffusion Model
|
5 |
|
|
|
6 |
<div align="center">
|
7 |
<a href="https://arxiv.org/abs/2410.15957">
|
8 |
<img src="https://img.shields.io/static/v1?label=arXiv&message=2410.15957&color=b21d1a" style="display: inline-block; vertical-align: middle;">
|
|
|
15 |
</a>
|
16 |
</div>
|
17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
18 |
## π₯ Gallery
|
19 |
|
20 |
<table>
|
|
|
52 |
</tr>
|
53 |
</table>
|
54 |
|
55 |
+
## π News and Todo List
|
56 |
+
|
57 |
+
- π₯ 25/03/17: Upload test metadata used in our paper to make easier evaluation.
|
58 |
+
- π₯ 25/02/15: Release demo of [RealCam-I2V](https://zgctroy.github.io/RealCam-I2V/) for real-world applications, code will be available at [repo](https://github.com/ZGCTroy/RealCam-I2V).
|
59 |
+
- π₯ 25/01/12: Release checkpoint of [CamI2V (512x320, 100k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_100k.pt). We plan to release a more advanced model with longer training soon.
|
60 |
+
- π₯ 25/01/02: Release checkpoint of [CamI2V (512x320, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/512_cami2v_50k.pt), which is suitable for research propose and comparison.
|
61 |
+
- π₯ 24/12/24: Integrate [Qwen2-VL](https://github.com/QwenLM/Qwen2-VL) in gradio demo, you can now caption your own input image by this powerful VLM.
|
62 |
+
- π₯ 24/12/23: Release checkpoint of [CamI2V (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cami2v.pt).
|
63 |
+
- π₯ 24/12/16: Release reproduced non-official checkpoints of [MotionCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_motionctrl.pt) and [CameraCtrl (256x256, 50k)](https://huggingface.co/MuteApo/CamI2V/blob/main/256_cameractrl.pt) on [DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter).
|
64 |
+
- π₯ 24/12/09: Release training configs and scripts.
|
65 |
+
- π₯ 24/12/06: Release [dataset pre-process code](datasets) for RealEstate10K.
|
66 |
+
- π₯ 24/12/02: Release [evaluation code](evaluation) for RotErr, TransErr, CamMC and FVD.
|
67 |
+
- π± 24/11/16: Release model code of CamI2V for training and inference, including implementation for MotionCtrl and CameraCtrl.
|
68 |
+
|
69 |
## π Performance
|
70 |
|
71 |
Measured under 256x256 resolution, 50k training steps, 25 DDIM steps, text-image CFG 7.5, camera CFG 1.0 (no camera CFG).
|
|
|
138 |
|
139 |
Gradio may struggle to establish network connection, please re-try with `--use_host_ip`.
|
140 |
|
|
|
141 |
## π€ Related Repo
|
142 |
|
143 |
[RealCam-I2V: https://github.com/ZGCTroy/RealCam-I2V](https://github.com/ZGCTroy/RealCam-I2V)
|
|
|
148 |
|
149 |
[DynamiCrafter: https://github.com/Doubiiu/DynamiCrafter](https://github.com/Doubiiu/DynamiCrafter)
|
150 |
|
|
|
151 |
## ποΈ Citation
|
152 |
|
153 |
```
|