File size: 7,731 Bytes
e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 c1dc654 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 9ace5f5 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 e787df2 37c5ed4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
---
language:
- en
license: apache-2.0
pipeline_tag: text-to-video
tags:
- video-generation
- thudm
- image-to-video
inference: false
library_name: diffusers
---
# CogVideoX1.5-5B
<p style="text-align: center;">
<div align="center">
<img src=https://github.com/THUDM/CogVideo/raw/main/resources/logo.svg width="50%"/>
</div>
<p align="center">
<a href="https://huggingface.co/THUDM/CogVideoX1.5-5B/blob/main/README_zh.md">π δΈζι
θ―»</a> |
<a href="https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space">π€ Huggingface Space</a> |
<a href="https://github.com/THUDM/CogVideo">π Github </a> |
<a href="https://arxiv.org/pdf/2408.06072">π arxiv </a>
</p>
<p align="center">
π Visit <a href="https://chatglm.cn/video?lang=en?fr=osm_cogvideo">QingYing</a> and <a href="https://open.bigmodel.cn/?utm_campaign=open&_channel_track_key=OWTVNma9">API Platform</a> to experience larger-scale commercial video generation models.
</p>
## Model Introduction
CogVideoX is an open-source video generation model similar to [QingYing](https://chatglm.cn/video?lang=en?fr=osm_cogvideo). The table below displays the list of video generation models we currently offer, along with their foundational information.
<table style="border-collapse: collapse; width: 100%;">
<tr>
<th style="text-align: center;">Model Name</th>
<th style="text-align: center;">CogVideoX1.5-5B (Latest)</th>
<th style="text-align: center;">CogVideoX1.5-5B-I2V (Latest)</th>
<th style="text-align: center;">CogVideoX-2B</th>
<th style="text-align: center;">CogVideoX-5B</th>
<th style="text-align: center;">CogVideoX-5B-I2V</th>
</tr>
<tr>
<td style="text-align: center;">Release Date</td>
<th style="text-align: center;">November 8, 2024</th>
<th style="text-align: center;">November 8, 2024</th>
<th style="text-align: center;">August 6, 2024</th>
<th style="text-align: center;">August 27, 2024</th>
<th style="text-align: center;">September 19, 2024</th>
</tr>
<tr>
<td style="text-align: center;">Video Resolution</td>
<td colspan="1" style="text-align: center;">1360 * 768</td>
<td colspan="1" style="text-align: center;"> Min(W, H) = 768 <br> 768 β€ Max(W, H) β€ 1360 <br> Max(W, H) % 16 = 0 </td>
<td colspan="3" style="text-align: center;">720 * 480</td>
</tr>
<tr>
<td style="text-align: center;">Number of Frames</td>
<td colspan="2" style="text-align: center;">Should be <b>16N + 1</b> where N <= 10 (default 81)</td>
<td colspan="3" style="text-align: center;">Should be <b>8N + 1</b> where N <= 6 (default 49)</td>
</tr>
<tr>
<td style="text-align: center;">Inference Precision</td>
<td colspan="2" style="text-align: center;"><b>BF16 (Recommended)</b>, FP16, FP32, FP8*, INT8, Not supported: INT4</td>
<td style="text-align: center;"><b>FP16*(Recommended)</b>, BF16, FP32, FP8*, INT8, Not supported: INT4</td>
<td colspan="2" style="text-align: center;"><b>BF16 (Recommended)</b>, FP16, FP32, FP8*, INT8, Not supported: INT4</td>
</tr>
<tr>
<td style="text-align: center;">Single GPU Memory Usage<br></td>
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 76GB <br><b>diffusers BF16: from 10GB*</b><br><b>diffusers INT8(torchao): from 7GB*</b></td>
<td style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> FP16: 18GB <br><b>diffusers FP16: 4GB minimum* </b><br><b>diffusers INT8 (torchao): 3.6GB minimum*</b></td>
<td colspan="2" style="text-align: center;"><a href="https://github.com/THUDM/SwissArmyTransformer">SAT</a> BF16: 26GB <br><b>diffusers BF16 : 5GB minimum* </b><br><b>diffusers INT8 (torchao): 4.4GB minimum* </b></td>
</tr>
<tr>
<td style="text-align: center;">Multi-GPU Memory Usage</td>
<td colspan="2" style="text-align: center;"><b>BF16: 24GB* using diffusers</b><br></td>
<td style="text-align: center;"><b>FP16: 10GB* using diffusers</b><br></td>
<td colspan="2" style="text-align: center;"><b>BF16: 15GB* using diffusers</b><br></td>
</tr>
<tr>
<td style="text-align: center;">Inference Speed<br>(Step = 50, FP/BF16)</td>
<td colspan="2" style="text-align: center;">Single A100: ~1000 seconds (5-second video)<br>Single H100: ~550 seconds (5-second video)</td>
<td style="text-align: center;">Single A100: ~90 seconds<br>Single H100: ~45 seconds</td>
<td colspan="2" style="text-align: center;">Single A100: ~180 seconds<br>Single H100: ~90 seconds</td>
</tr>
<tr>
<td style="text-align: center;">Prompt Language</td>
<td colspan="5" style="text-align: center;">English*</td>
</tr>
<tr>
<td style="text-align: center;">Prompt Token Limit</td>
<td colspan="2" style="text-align: center;">224 Tokens</td>
<td colspan="3" style="text-align: center;">226 Tokens</td>
</tr>
<tr>
<td style="text-align: center;">Video Length</td>
<td colspan="2" style="text-align: center;">5 seconds or 10 seconds</td>
<td colspan="3" style="text-align: center;">6 seconds</td>
</tr>
<tr>
<td style="text-align: center;">Frame Rate</td>
<td colspan="2" style="text-align: center;">16 frames / second </td>
<td colspan="3" style="text-align: center;">8 frames / second </td>
</tr>
<tr>
<td style="text-align: center;">Position Encoding</td>
<td colspan="2" style="text-align: center;">3d_rope_pos_embed</td>
<td style="text-align: center;">3d_sincos_pos_embed</td>
<td style="text-align: center;">3d_rope_pos_embed</td>
<td style="text-align: center;">3d_rope_pos_embed + learnable_pos_embed</td>
</tr>
<tr>
<td style="text-align: center;">Download Link (Diffusers)</td>
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5B">π€ HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5B">π€ ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5B">π£ WiseModel</a></td>
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5B-I2V">π€ HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5B-I2V">π€ ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5B-I2V">π£ WiseModel</a></td>
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-2b">π€ HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-2b">π€ ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-2b">π£ WiseModel</a></td>
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b">π€ HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b">π€ ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b">π£ WiseModel</a></td>
<td style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX-5b-I2V">π€ HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX-5b-I2V">π€ ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX-5b-I2V">π£ WiseModel</a></td>
</tr>
<tr>
<td style="text-align: center;">Download Link (SAT)</td>
<td colspan="2" style="text-align: center;"><a href="https://huggingface.co/THUDM/CogVideoX1.5-5b-SAT">π€ HuggingFace</a><br><a href="https://modelscope.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">π€ ModelScope</a><br><a href="https://wisemodel.cn/models/ZhipuAI/CogVideoX1.5-5b-SAT">π£ WiseModel</a></td>
<td colspan="3" style="text-align: center;"><a href="./sat/README_zh.md">SAT</a></td>
</tr>
</table>
**(rest of the content remains the same as the original)** |