Qwen
/

Text Generation
GGUF
conversational

IQ4_XS for 128GB Macs

#1
by tarruda - opened

Would be great to have IQ4_XS part of the officially supported quants.

I have a Mac Studio M1 with 128GB RAM and cannot run Q4_K_M since it requires 135GB VRAM with only 8192 context. It is possible to tweak it to use up to 125GB VRAM.

Here's a screenshot of using the GGUF VRAM calculator:

image.png

However, using IQ4_XS (which should have very close performance to Q4_K_M), the VRAM requirements drop significantly:

image.png

As you can see, I can run it even with 40k token context!

I've written a tutorial: https://www.reddit.com/r/LocalLLaMA/comments/1kefods/serving_qwen3235ba22b_with_4bit_quantization_and/

Would you like to try this out? Even IQ1_M can produce accetable results for my case. https://huggingface.co/lovedheart/Qwen3-253B-A22B-IQ1_M

Sign up or log in to comment