These models are exl3 quantization models of Qwen2.5-Coder-32B which is still SOTA no-reasoning coder model as of today. This model is still my go-to FIM(fill in the middle) autocompletion model after Qwen3, Gemma3 release. I used exllamav3 version 0.0.2.
EXL3 Quantized Models
For coding, I found >=6.0bpw or preferably 8.0bpw model with KV Cache Quantization (>=Q6) is much better than 4.0bpw. If you are using these models only for short Auto Completion, 4.0bpw is usable.
Credits
Thanks to excellent work of exllamav3 dev teams.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for LLMJapan/Qwen2.5-Coder-32B-Instruct_exl3
Base model
Qwen/Qwen2.5-32B
Finetuned
Qwen/Qwen2.5-Coder-32B
Finetuned
Qwen/Qwen2.5-Coder-32B-Instruct