--- language: - en license: apache-2.0 pipeline_tag: text-to-video tags: - video-generation - thudm - image-to-video inference: false library_name: diffusers --- # CogVideoX1.5-5B

πŸ“„ δΈ­ζ–‡ι˜…θ―» | πŸ€— Huggingface Space | 🌐 Github | πŸ“œ arxiv

πŸ“ Visit QingYing and API Platform to experience larger-scale commercial video generation models.

## Model Introduction CogVideoX is an open-source video generation model similar to [QingYing](https://chatglm.cn/video?lang=en?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
Model Name CogVideoX1.5-5B (Latest) CogVideoX1.5-5B-I2V (Latest) CogVideoX-2B CogVideoX-5B CogVideoX-5B-I2V
Release Date November 8, 2024 November 8, 2024 August 6, 2024 August 27, 2024 September 19, 2024
Video Resolution 1360 * 768 Min(W, H) = 768
768 ≀ Max(W, H) ≀ 1360
Max(W, H) % 16 = 0
720 * 480
Number of Frames Should be 16N + 1 where N <= 10 (default 81) Should be 8N + 1 where N <= 6 (default 49)
Inference Precision BF16 (Recommended), FP16, FP32, FP8*, INT8, Not supported: INT4 FP16*(Recommended), BF16, FP32, FP8*, INT8, Not supported: INT4 BF16 (Recommended), FP16, FP32, FP8*, INT8, Not supported: INT4
Single GPU Memory Usage
SAT BF16: 76GB
diffusers BF16: from 10GB*
diffusers INT8(torchao): from 7GB*
SAT FP16: 18GB
diffusers FP16: 4GB minimum*
diffusers INT8 (torchao): 3.6GB minimum*
SAT BF16: 26GB
diffusers BF16 : 5GB minimum*
diffusers INT8 (torchao): 4.4GB minimum*
Multi-GPU Memory Usage BF16: 24GB* using diffusers
FP16: 10GB* using diffusers
BF16: 15GB* using diffusers
Inference Speed
(Step = 50, FP/BF16)
Single A100: ~1000 seconds (5-second video)
Single H100: ~550 seconds (5-second video)
Single A100: ~90 seconds
Single H100: ~45 seconds
Single A100: ~180 seconds
Single H100: ~90 seconds
Prompt Language English*
Prompt Token Limit 224 Tokens 226 Tokens
Video Length 5 seconds or 10 seconds 6 seconds
Frame Rate 16 frames / second 8 frames / second
Position Encoding 3d_rope_pos_embed 3d_sincos_pos_embed 3d_rope_pos_embed 3d_rope_pos_embed + learnable_pos_embed
Download Link (Diffusers) πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
Download Link (SAT) πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
SAT
**(rest of the content remains the same as the original)**