IQ4_XS for 128GB Macs
#1
by
tarruda
- opened
Would be great to have IQ4_XS part of the officially supported quants.
I have a Mac Studio M1 with 128GB RAM and cannot run Q4_K_M since it requires 135GB VRAM with only 8192 context. It is possible to tweak it to use up to 125GB VRAM.
Here's a screenshot of using the GGUF VRAM calculator:
However, using IQ4_XS (which should have very close performance to Q4_K_M), the VRAM requirements drop significantly:
As you can see, I can run it even with 40k token context!
I've written a tutorial: https://www.reddit.com/r/LocalLLaMA/comments/1kefods/serving_qwen3235ba22b_with_4bit_quantization_and/
Would you like to try this out? Even IQ1_M can produce accetable results for my case. https://huggingface.co/lovedheart/Qwen3-253B-A22B-IQ1_M