ContentV: Efficient Training of Video Generation Models with Limited Compute

This project presents ContentV, a novel framework that accelerates DiT-based video generation through three key innovations:

  • A minimalist model design that enables effective reuse of pre-trained image generation models for video synthesis
  • A comprehensive exploration of a multi-stage, efficient training strategy based on Flow Matching
  • A low-cost Reinforcement Learning with Human Feedback (RLHF) approach that further enhances generation quality without the need for additional human annotations.

Quickstart

Recommended PyTorch Version

  • GPU: torch >= 2.3.1 (CUDA >= 12.2)
  • NPU: torch and torch-npu >= 2.1.0 (CANN >= 8.0.RC2). Please refer to Ascend Extension for PyTorch for the installation of torch-npu.

Installation

git clone https://github.com/bytedance/ContentV.git
pip3 install -r ContentV/requirements.txt

T2V Generation

cd ContentV
## For GPU
python3 demo.py
## For NPU
USE_ASCEND_NPU=1 python3 demo.py

Todo List

  • Inference code and checkpoints
  • Training code of RLHF

License

This code repository and part of the model weights are licensed under the Apache 2.0 License. Please note that:

Acknowledgement

Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support