CogVideoX1.5-5B / README.md
nielsr's picture
nielsr HF Staff
Improve model card with pipeline tag, library name, and license clarification
37c5ed4 verified
|
raw
history blame
7.73 kB
metadata
language:
  - en
license: apache-2.0
pipeline_tag: text-to-video
tags:
  - video-generation
  - thudm
  - image-to-video
inference: false
library_name: diffusers

CogVideoX1.5-5B

πŸ“„ δΈ­ζ–‡ι˜…θ―» | πŸ€— Huggingface Space | 🌐 Github | πŸ“œ arxiv

πŸ“ Visit QingYing and API Platform to experience larger-scale commercial video generation models.

Model Introduction

CogVideoX is an open-source video generation model similar to QingYing. The table below displays the list of video generation models we currently offer, along with their foundational information.

Model Name CogVideoX1.5-5B (Latest) CogVideoX1.5-5B-I2V (Latest) CogVideoX-2B CogVideoX-5B CogVideoX-5B-I2V
Release Date November 8, 2024 November 8, 2024 August 6, 2024 August 27, 2024 September 19, 2024
Video Resolution 1360 * 768 Min(W, H) = 768
768 ≀ Max(W, H) ≀ 1360
Max(W, H) % 16 = 0
720 * 480
Number of Frames Should be 16N + 1 where N <= 10 (default 81) Should be 8N + 1 where N <= 6 (default 49)
Inference Precision BF16 (Recommended), FP16, FP32, FP8*, INT8, Not supported: INT4 FP16*(Recommended), BF16, FP32, FP8*, INT8, Not supported: INT4 BF16 (Recommended), FP16, FP32, FP8*, INT8, Not supported: INT4
Single GPU Memory Usage
SAT BF16: 76GB
diffusers BF16: from 10GB*
diffusers INT8(torchao): from 7GB*
SAT FP16: 18GB
diffusers FP16: 4GB minimum*
diffusers INT8 (torchao): 3.6GB minimum*
SAT BF16: 26GB
diffusers BF16 : 5GB minimum*
diffusers INT8 (torchao): 4.4GB minimum*
Multi-GPU Memory Usage BF16: 24GB* using diffusers
FP16: 10GB* using diffusers
BF16: 15GB* using diffusers
Inference Speed
(Step = 50, FP/BF16)
Single A100: ~1000 seconds (5-second video)
Single H100: ~550 seconds (5-second video)
Single A100: ~90 seconds
Single H100: ~45 seconds
Single A100: ~180 seconds
Single H100: ~90 seconds
Prompt Language English*
Prompt Token Limit 224 Tokens 226 Tokens
Video Length 5 seconds or 10 seconds 6 seconds
Frame Rate 16 frames / second 8 frames / second
Position Encoding 3d_rope_pos_embed 3d_sincos_pos_embed 3d_rope_pos_embed 3d_rope_pos_embed + learnable_pos_embed
Download Link (Diffusers) πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
Download Link (SAT) πŸ€— HuggingFace
πŸ€– ModelScope
🟣 WiseModel
SAT

(rest of the content remains the same as the original)