Qwen3-32B-AWQ-GEMM-lc
Original Model: https://huggingface.co/Qwen/Qwen3-32B
Created with the latest AutoAWQ. The calibration was done on long context and 128 samples with the code below.
Quantization quality
Testing pre/post-quantization with lm_eval (https://github.com/EleutherAI/lm-evaluation-harness) using this command:
lm_eval --model local-completions --tasks gsm8k \
--model_args model=Qwen/<model>,base_url=http://127.0.0.1:11435/v1/completions,max_length=32768\
--num_fewshot 5
yields the following results:
Model | Filter | Metric | Value | |
---|---|---|---|---|
original | flexible-extract | exact_match | ↑ | 0.6232 ± 0.0133 |
original | strict-match | exact_match | ↑ | 0.7415 ± 0.0121 |
Qwen's AWQ | flexible-extract | exact_match | ↑ | failed |
Qwen's AWQ | strict-match | exact_match | ↑ | failed |
w4a16(sc) | flexible-extract | exact_match | ↑ | 0.6490 ± 0.0131 |
w4a16(sc) | strict-match | exact_match | ↑ | 0.6672 ± 0.0130 |
w4a16(lc) | flexible-extract | exact_match | ↑ | 0.7142 ± 0.0124 |
w4a16(lc) | strict-match | exact_match | ↑ | 0.7839 ± 0.0113 |
Quantization details
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
from datasets import load_dataset
def load_cosmopedia():
data = load_dataset('HuggingFaceTB/cosmopedia-100k', split="train")
data = data.filter(lambda x: x["text_token_length"] >= 2048)
return [text for text in data["text"]]
model_path = 'Qwen/Qwen3-32B'
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
quant_path = './Qwen/Qwen3-32B-AWQ-4bit'
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
model.quantize(
tokenizer,
quant_config=quant_config,
calib_data=load_cosmopedia(),
n_parallel_calib_samples=1,
max_calib_samples=128,
max_calib_seq_len=40960
)
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)
- Downloads last month
- 21
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for kmouratidis/Qwen3-32B-AWQ-GEMM-lc
Base model
Qwen/Qwen3-32B