Qwen/Qwen3-235B-A22B-GGUF · IQ4

Qwen

Would be great to have IQ4_XS part of the officially supported quants.

I have a Mac Studio M1 with 128GB RAM and cannot run Q4_K_M since it requires 135GB VRAM with only 8192 context. It is possible to tweak it to use up to 125GB VRAM.

Here's a screenshot of using the GGUF VRAM calculator:

However, using IQ4_XS (which should have very close performance to Q4_K_M), the VRAM requirements drop significantly:

As you can see, I can run it even with 40k token context!

I've written a tutorial: https://www.reddit.com/r/LocalLLaMA/comments/1kefods/serving_qwen3235ba22b_with_4bit_quantization_and/

Qwen
/

Qwen3-235B-A22B-GGUF

IQ4_XS for 128GB Macs