Qwen3-32B-AWQ-GEMM-lc

Original Model: https://huggingface.co/Qwen/Qwen3-32B

Created with the latest AutoAWQ. The calibration was done on long context and 128 samples with the code below.

Quantization quality

Testing pre/post-quantization with lm_eval (https://github.com/EleutherAI/lm-evaluation-harness) using this command:

lm_eval --model local-completions --tasks gsm8k \
    --model_args model=Qwen/<model>,base_url=http://127.0.0.1:11435/v1/completions,max_length=32768\
    --num_fewshot 5

yields the following results:

Model Filter Metric Value
original flexible-extract exact_match ↑ 0.6232 ± 0.0133
original strict-match exact_match ↑ 0.7415 ± 0.0121
Qwen's AWQ flexible-extract exact_match ↑ failed
Qwen's AWQ strict-match exact_match ↑ failed
w4a16(sc) flexible-extract exact_match ↑ 0.6490 ± 0.0131
w4a16(sc) strict-match exact_match ↑ 0.6672 ± 0.0130
w4a16(lc) flexible-extract exact_match ↑ 0.7142 ± 0.0124
w4a16(lc) strict-match exact_match ↑ 0.7839 ± 0.0113

Quantization details

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from datasets import load_dataset

def load_cosmopedia():
    data = load_dataset('HuggingFaceTB/cosmopedia-100k', split="train")
    data = data.filter(lambda x: x["text_token_length"] >= 2048)
    return [text for text in data["text"]]

model_path = 'Qwen/Qwen3-32B'
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

quant_path = './Qwen/Qwen3-32B-AWQ-4bit'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

model.quantize(
    tokenizer,
    quant_config=quant_config,
    calib_data=load_cosmopedia(),
    n_parallel_calib_samples=1,
    max_calib_samples=128,
    max_calib_seq_len=40960
)

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
Downloads last month
21
Safetensors
Model size
5.73B params
Tensor type
I32
·
BF16
·
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kmouratidis/Qwen3-32B-AWQ-GEMM-lc

Base model

Qwen/Qwen3-32B
Quantized
(90)
this model