CamI2V: Camera-Controlled Image-to-Video Diffusion Model

πŸŽ₯ Gallery

rightward rotation and zoom in
(CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)
leftward rotation and zoom in
(CFG=4, FS=6, step=50, ratio=0.6, scale=0.1)
zoom in and upward movement
(CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)
downward movement and zoom-out
(CFG=4, FS=6, step=50, ratio=0.8, scale=0.2)

🌟 News and Todo List

  • πŸ”₯ 25/03/17: Upload test metadata used in our paper to make easier evaluation.
  • πŸ”₯ 25/02/15: Release demo of RealCam-I2V for real-world applications, code will be available at repo.
  • πŸ”₯ 25/01/12: Release checkpoint of CamI2V (512x320, 100k). We plan to release a more advanced model with longer training soon.
  • πŸ”₯ 25/01/02: Release checkpoint of CamI2V (512x320, 50k), which is suitable for research propose and comparison.
  • πŸ”₯ 24/12/24: Integrate Qwen2-VL in gradio demo, you can now caption your own input image by this powerful VLM.
  • πŸ”₯ 24/12/23: Release checkpoint of CamI2V (256x256, 50k).
  • πŸ”₯ 24/12/16: Release reproduced non-official checkpoints of MotionCtrl (256x256, 50k) and CameraCtrl (256x256, 50k) on DynamiCrafter.
  • πŸ”₯ 24/12/09: Release training configs and scripts.
  • πŸ”₯ 24/12/06: Release dataset pre-process code for RealEstate10K.
  • πŸ”₯ 24/12/02: Release evaluation code for RotErr, TransErr, CamMC and FVD.
  • 🌱 24/11/16: Release model code of CamI2V for training and inference, including implementation for MotionCtrl and CameraCtrl.

πŸ“ˆ Performance

Measured under 256x256 resolution, 50k training steps, 25 DDIM steps, text-image CFG 7.5, camera CFG 1.0 (no camera CFG).

Method RotErr↓ TransErr↓ CamMC↓ FVD↓
(VideoGPT)
FVD↓
(StyleGAN)
DynamiCrafter 3.3415 9.8024 11.625 106.02 92.196
MotionCtrl 0.8636 2.5068 2.9536 70.820 60.363
CameraCtrl 0.7064 1.9379 2.3070 66.713 57.644
CamI2V 0.4120 1.3409 1.5291 62.439 53.361

Inference Speed and GPU Memory

Method # Parameters GPU Memory Generation Time
(RTX 3090)
DynamiCrafter 1.4 B 11.14 GiB 8.14 s
MotionCtrl + 63.4 M 11.18 GiB 8.27 s
CameraCtrl + 211 M 11.56 GiB 8.38 s
CamI2V + 261 M 11.67 GiB 10.3 s

βš™οΈ Environment

Quick Start

conda create -n cami2v python=3.10
conda activate cami2v

conda install -y pytorch==2.4.1 torchvision==0.19.1 pytorch-cuda=12.1 -c pytorch -c nvidia
conda install -y xformers -c xformers
pip install -r requirements.txt

πŸ’« Inference

Download Model Checkpoints

Model Resolution Training Steps
CamI2V 512x320 50k, 100k
CamI2V 256x256 50k
CameraCtrl 256x256 50k
MotionCtrl 256x256 50k

Currently we release 256x256 checkpoints with 50k training steps of DynamiCrafter-based CamI2V, CameraCtrl and MotionCtrl, which is suitable for research propose and comparison.

We also release 512x320 checkpoints of our CamI2V with longer training, make possible higher resolution and more advanced camera-controlled video generation.

Download above checkpoints and put under ckpts folder. Please edit ckpt_path in configs/models.json if you have a different model path.

Download Qwen2-VL Captioner (Optional)

Not required but recommend. It is used to caption your custom image in gradio demo for video generaion. We prefer the AWQ quantized version of Qwen2-VL due to speed and GPU memory.

Download the pre-trained model and put under pretrained_models folder:

─┬─ pretrained_models/
 └─── Qwen2-VL-7B-Instruct-AWQ/

Run Gradio Demo

python cami2v_gradio_app.py --use_qwenvl_captioner

Gradio may struggle to establish network connection, please re-try with --use_host_ip.

πŸ€— Related Repo

RealCam-I2V: https://github.com/ZGCTroy/RealCam-I2V

CameraCtrl: https://github.com/hehao13/CameraCtrl

MotionCtrl: https://github.com/TencentARC/MotionCtrl

DynamiCrafter: https://github.com/Doubiiu/DynamiCrafter

πŸ—’οΈ Citation

@article{zheng2024cami2v,
  title={CamI2V: Camera-Controlled Image-to-Video Diffusion Model},
  author={Zheng, Guangcong and Li, Teng and Jiang, Rui and Lu, Yehao and Wu, Tao and Li, Xi},
  journal={arXiv preprint arXiv:2410.15957},
  year={2024}
}

@article{li2025realcam,
    title={RealCam-I2V: Real-World Image-to-Video Generation with Interactive Complex Camera Control}, 
    author={Li, Teng and Zheng, Guangcong and Jiang, Rui and Zhan, Shuigen and Wu, Tao and Lu, Yehao and Lin, Yining and Li, Xi},
    journal={arXiv preprint arXiv:2502.10059},
    year={2025},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.