This model is created with the following code:

from datasets import load_dataset
from gptqmodel import GPTQModel, QuantizeConfig
from huggingface_hub import constants

model_id = "Qwen/Qwen3-32B"
# Save the quantized model in the HF cache directory
cache_dir = constants.HF_HUB_CACHE
quant_path = os.path.join(cache_dir, "models--quantized--" + model_id.replace("/", "--"))
os.makedirs(quant_path, exist_ok=True)

# Load calibration data (1024 samples from C4)
calibration_dataset = load_dataset(
    "allenai/c4",
    data_files="en/c4-train.00001-of-01024.json.gz",
    split="train"
  ).select(range(1024))["text"]

# Configure and run quantization
quant_config = QuantizeConfig(bits=4, group_size=128)
model = GPTQModel.load(model_id, quant_config)
model.quantize(calibration_dataset, batch_size=2)
model.save(quant_path)
Downloads last month
40
Safetensors
Model size
5.74B params
Tensor type
I32
BF16
FP16
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for coco101010/Qwen3-32B-GPTQ-4bit-default-calibration

Base model

Qwen/Qwen3-32B
Quantized
(90)
this model