DeepSeek-V3-0324-GPTQ-4b-128g-experts

Model Overview

This model was obtained by quantizing the weights of deepseek-ai/DeepSeek-V3-0324 to INT4 data type. This optimization reduces the number of bits per parameter from 8 to 4, reducing the disk size and GPU memory requirements by approximately 50%.

Only non-shared experts within transformer blocks are compressed. Weights are quantized using a symmetric per-group scheme, with group size 128. The GPTQ algorithm is applied for quantization.

Model checkpoint is saved in compressed_tensors format.

Models Experts Quantized Attention blocks quantized Size (GB)
deepseek-ai/DeepSeek-V3-0324 671 GB
ISTA-DASLab/DeepSeek-V3-0324-GPTQ-4b-128g-experts 346 GB

Contributors

Denis Kuznedelev (Yandex), Eldar Kurtić (Red Hat AI & ISTA), Jiale Chen (ISTA), Michael Goin (Red Hat AI), Elias Frantar (ISTA), Dan Alistarh (Red Hat AI & ISTA).

Downloads last month
38
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support