Devstral-Small-2507-AWQ

Method

Quantised using casper-hansen/AutoAWQ and the following configs:

quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Inference

The quantised model's configs and weights are stored in hf and safetensors format, but the tokeniser remains in mistral format. Please load inference arguments accordingly, e.g.,:

vllm

vllm serve cpatonn/Devstral-Small-2507-AWQ --tokenizer_mode mistral --config_format hf --load_format safetensors --tool-call-parser mistral --enable-auto-tool-choice
Downloads last month
372
Safetensors
Model size
4.32B params
Tensor type
I32
BF16
F16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for cpatonn/Devstral-Small-2507-AWQ