These models are exl3 quantization models of Qwen2.5-Coder-32B which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release. I used exllamav3 version 0.0.2.

EXL3 Quantized Models

4.0bpw

6.0bpw

8.0bpw

For coding, I found >=6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>=Q6) is much better than 4.0bpw. If you are using these models only for short Auto Completion, 4.0bpw is usable.

Credits

Thanks to excellent work of exllamav3 dev teams.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3

Base model

Qwen/Qwen2.5-32B
Finetuned
(90)
this model