Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g

Model Overview

This model was obtained by quantizing the weights of Mistral-Small-3.1-24B-Instruct-2503 to INT4 data type. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 75%.

Only the weights of the linear operators within language_model transformers blocks are quantized. Vision model and multimodal projection are kept in original precision. Weights are quantized using a symmetric per-group scheme, with group size 128. The GPTQ algorithm is applied for quantization.

Model checkpoint is saved in compressed_tensors format.

Usage

  • To use the model in transformers update the package to stable release of Mistral-3

    pip install git+https://github.com/huggingface/[email protected]

  • To use the model in vLLM update the package to version vllm>=0.8.0.

Downloads last month
1,047
Safetensors
Model size
4.73B params
Tensor type
I64
I32
BF16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for ISTA-DASLab/Mistral-Small-3.1-24B-Instruct-2503-GPTQ-4b-128g