# Pusa VidGen [Code Repository](https://github.com/Yaofang-Liu/Pusa-VidGen) | [Model Hub](https://huggingface.co/RaphaelLiu/Pusa-V0.5) | [Training Toolkit](https://github.com/Yaofang-Liu/Mochi-Full-Finetuner) | [Dataset](https://huggingface.co/datasets/RaphaelLiu/PusaV0.5_Training) | [Paper](https://arxiv.org/abs/2410.03160) | [Follow on X](https://x.com/stephenajason) | [Xiaohongshu](https://www.xiaohongshu.com/explore/67f898dc000000001c008339?source=webshare&xhsshare=pc_web&xsec_token=ABAhG8mltqyMxL9kI0eRxwj7EwiW7MFYH2oPl4n8ww0OM=&xsec_source=pc_share) ## Overview Pusa introduces a paradigm shift in video diffusion modeling through frame-level noise control, departing from conventional approaches. This shift was first presented in our [FVDM](https://arxiv.org/abs/2410.03160) paper. Leveraging this architecture, Pusa seamlessly supports diverse video generation tasks (Text/Image/Video-to-Video) while maintaining exceptional motion fidelity and prompt adherence with our refined base model adaptations. Pusa-V0.5 represents an early preview based on [Mochi1-Preview](https://huggingface.co/genmo/mochi-1-preview). We are open-sourcing this work to foster community collaboration, enhance methodologies, and expand capabilities. ## ✨ Key Features - **Comprehensive Multi-task Support**: - Text-to-Video generation - Image-to-Video transformation - Frame interpolation - Video transitions - Seamless looping - Extended video generation - And more... - **Unprecedented Efficiency**: - Trained with only 0.1k H800 GPU hours - Total training cost: $0.1k - Hardware: 16 H800 GPUs - Configuration: Batch size 32, 500 training iterations, 1e-5 learning rate - *Note: Efficiency can be further improved with single-node training and advanced parallelism techniques. Collaborations welcome!* - **Complete Open-Source Release**: - Full codebase - Detailed architecture specifications - Comprehensive training methodology ## 🔍 Unique Architecture - **Novel Diffusion Paradigm**: Implements frame-level noise control with vectorized timesteps, originally introduced in the [FVDM paper](https://arxiv.org/abs/2410.03160), enabling unprecedented flexibility and scalability. - **Non-destructive Modification**: Our adaptations to the base model preserve its original Text-to-Video generation capabilities. After this adaptation, we only need a slight fine-tuning. - **Universal Applicability**: The methodology can be readily applied to other leading video diffusion models including Hunyuan Video, Wan2.1, and others. *Collaborations enthusiastically welcomed!* ## Installation and Usage ### Download Weights **Option 1**: Use the Hugging Face CLI: ```bash pip install huggingface_hub huggingface-cli download RaphaelLiu/Pusa-V0.5 --local-dir ``` **Option 2**: Download directly from [Hugging Face](https://huggingface.co/RaphaelLiu/Pusa-V0.5) to your local machine. ## Limitations Pusa currently has several known limitations: - The base Mochi model generates videos at relatively low resolution (480p) - We anticipate significant quality improvements when applying our methodology to more advanced models like Wan2.1 - We welcome community contributions to enhance model performance and extend its capabilities ## Related Work - [FVDM](https://arxiv.org/abs/2410.03160): Introduces the groundbreaking frame-level noise control with vectorized timestep approach that inspired Pusa. - [Mochi](https://huggingface.co/genmo/mochi-1-preview): Our foundation model, recognized as a leading open-source video generation system on the Artificial Analysis Leaderboard. ## Citation If you find our work useful in your research, please consider citing: ``` @misc{Liu2025pusa,   title={Pusa: Thousands Timesteps Video Diffusion Model},   author={Yaofang Liu and Rui Liu},   year={2025},   url={https://github.com/Yaofang-Liu/Pusa-VidGen}, } ``` ``` @article{liu2024redefining,   title={Redefining Temporal Modeling in Video Diffusion: The Vectorized Timestep Approach},   author={Liu, Yaofang and Ren, Yumeng and Cun, Xiaodong and Artola, Aitor and Liu, Yang and Zeng, Tieyong and Chan, Raymond H and Morel, Jean-michel},   journal={arXiv preprint arXiv:2410.03160},   year={2024} } ```