Quantization Dataset & How-to

by mwebr - opened Apr 5

Apr 5

Hello, thank you for your work. I am perceiving your quantized model way more robust than the one from Qwen directly in some specific tasks (e.g. RAG). Could you give an info on how you quantized the model? So which dataset you used and how you processed it (parameters, maybe script if possible)? Best :-)

Benasd

Owner Apr 5

Are you asking about Benasd/Qwen2.5-VL-72B-Instruct-AWQ(without trailing -fix) instead of Benasd/Qwen2.5-VL-72B-Instruct-AWQ-fix?

Benasd

Owner Apr 5

Benasd/Qwen2.5-VL-72B-Instruct-AWQ-fix uses their official AWQ weights, so the model is exactly the same. This is a temporary fix for their official preprocessor_config.json which is not needed any more, because they have fixed the official model.

mwebr

Apr 7

Thank you, got it. And Benasd/Qwen2.5-VL-72B-Instruct-AWQ? Because in their official AWQ weights I observed some character issues when generating e.g. German text. The Benasd/Qwen2.5-VL-72B-Instruct-AWQ model worked better on instruction following too. That is why I was curious :-)

Benasd

Owner Apr 12

Sorry for the late reply.
I used the simplest quantization script for Benasd/Qwen2.5-VL-72B-Instruct-AWQ, which relies on the default mit-han-lab/pile-val-backup as its calibration dataset (a text-only dataset).
Below is the script I used:

from AutoAWQ.awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

model_path = "Qwen/Qwen2.5-VL-72B-Instruct"
quant_path = "Qwen2.5-VL-72B-Instruct-AWQ"
quant_config = { "zero_point": True, "q_group_size": 64, "w_bit": 4, "version": "GEMM" }

# Load model
model = AutoAWQForCausalLM.from_pretrained(model_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

# Quantize
model.quantize(tokenizer, quant_config=quant_config)

# Save quantized model
model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

print(f'Model is quantized and saved at "{quant_path}"')

I think the official AWQ model uses the script recommended by this contributor, which is supposed to be better than using mit-han-lab/pile-val-backup, since it uses a vision-and-text dataset as the calibration set. However, in both my experiment and yours, it actually performs worse — and I don't have a theory as to why.
Here is the script that uses the image + text dataset (make sure to set Qwen2_5_VLProcessor.from_pretrained(model_path, padding_side='left')): https://github.com/casper-hansen/AutoAWQ/blob/main/docs/examples.md#custom-quantizer-qwen2-vl-example

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment