--- language: - en license: apache-2.0 pipeline_tag: text-to-video tags: - video-generation - thudm - image-to-video inference: false library_name: diffusers --- # CogVideoX1.5-5B

📄 中文阅读 | 🤗 Huggingface Space | 🌐 Github | 📜 arxiv

📍 Visit QingYing and API Platform to experience larger-scale commercial video generation models.

## Model Introduction CogVideoX is an open-source video generation model similar to [QingYing](https://chatglm.cn/video?lang=en?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.

Model Name	CogVideoX1.5-5B (Latest)	CogVideoX1.5-5B-I2V (Latest)	CogVideoX-2B	CogVideoX-5B	CogVideoX-5B-I2V
Release Date	November 8, 2024	November 8, 2024	August 6, 2024	August 27, 2024	September 19, 2024
Video Resolution	1360 * 768	Min(W, H) = 768 768 ≤ Max(W, H) ≤ 1360 Max(W, H) % 16 = 0	720 * 480
Number of Frames	Should be 16N + 1 where N <= 10 (default 81)		Should be 8N + 1 where N <= 6 (default 49)
Inference Precision	BF16 (Recommended), FP16, FP32, FP8*, INT8, Not supported: INT4		*FP16(Recommended)*, BF16, FP32, FP8, INT8, Not supported: INT4	BF16 (Recommended), FP16, FP32, FP8*, INT8, Not supported: INT4
Single GPU Memory Usage	SAT BF16: 76GB diffusers BF16: from 10GB* diffusers INT8(torchao): from 7GB*		SAT FP16: 18GB diffusers FP16: 4GB minimum* diffusers INT8 (torchao): 3.6GB minimum*	SAT BF16: 26GB diffusers BF16 : 5GB minimum* diffusers INT8 (torchao): 4.4GB minimum*
Multi-GPU Memory Usage	*BF16: 24GB using diffusers**		*FP16: 10GB using diffusers**	*BF16: 15GB using diffusers**
Inference Speed (Step = 50, FP/BF16)	Single A100: ~1000 seconds (5-second video) Single H100: ~550 seconds (5-second video)		Single A100: ~90 seconds Single H100: ~45 seconds	Single A100: ~180 seconds Single H100: ~90 seconds
Prompt Language	English*
Prompt Token Limit	224 Tokens		226 Tokens
Video Length	5 seconds or 10 seconds		6 seconds
Frame Rate	16 frames / second		8 frames / second
Position Encoding	3d_rope_pos_embed		3d_sincos_pos_embed	3d_rope_pos_embed	3d_rope_pos_embed + learnable_pos_embed
Download Link (Diffusers)	🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel	🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel	🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel	🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel	🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel
Download Link (SAT)	🤗 HuggingFace 🤖 ModelScope 🟣 WiseModel		SAT

**(rest of the content remains the same as the original)**