Text Generation
Safetensors
qwen3
text-generation-inference
text2text-generation
conversational
8-bit precision
compressed-tensors

GPTQ INT8 W8A8 quantized Qwen/Qwen3-32B

GPTQ INT8 W8A8 quantized Qwen/Qwen3-32B calibrated with a sequence len of 4096 and 128 samples of TokenBender/code_instructions_122k_alpaca_style, glaiveai/glaive-code-assistant-v2, google/code_x_glue_ct_code_to_text for a total sample size of 1024.

Follow the Qwen/Qwen3-32B docs for running with vllm.

Downloads last month
67
Safetensors
Model size
32.8B params
Tensor type
BF16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for zankich/Qwen3-32B-INT8

Base model

Qwen/Qwen3-32B
Quantized
(98)
this model

Datasets used to train zankich/Qwen3-32B-INT8