DeepSeek-V3-0324-GPTQ-4b-128g-experts
Model Overview
This model was obtained by quantizing the weights of deepseek-ai/DeepSeek-V3-0324 to INT4 data type. This optimization reduces the number of bits per parameter from 8 to 4, reducing the disk size and GPU memory requirements by approximately 50%.
Only non-shared experts within transformer blocks are compressed. Weights are quantized using a symmetric per-group scheme, with group size 128. The GPTQ algorithm is applied for quantization.
Model checkpoint is saved in compressed_tensors format.
Models | Experts Quantized | Attention blocks quantized | Size (GB) |
---|---|---|---|
deepseek-ai/DeepSeek-V3-0324 | ❌ | ❌ | 671 GB |
ISTA-DASLab/DeepSeek-V3-0324-GPTQ-4b-128g-experts | ✅ | ❌ | 346 GB |
Contributors
Denis Kuznedelev (Yandex), Eldar Kurtić (Red Hat AI & ISTA), Jiale Chen (ISTA), Michael Goin (Red Hat AI), Elias Frantar (ISTA), Dan Alistarh (Red Hat AI & ISTA).
- Downloads last month
- 38
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support