Any chance of a 128k version so we can use it as a draft model for the larger 128k models?
Thanks!
I was literally just looking for a 128k quants from unsloth and was sat here scratching my head like, where is it?
Looking for it as well, but also wondering if a 128k version is really necessary...
I'm using Qwen3-32B-128K-Q8_0.gguf with context size of 131072.
--model-draft Qwen3-0.6B-Q8_0.gguf
--draft-max 8
--draft-min 0
--ctx-size-draft 32768
--draft-p-min 0.5
--gpu-layers-draft 65
--override-kv tokenizer.ggml.bos_token_id=int:151643
--device-draft CUDA0
I have not played with these params yet (they are not optimum), so they are far from optimum and using Q8 instead of Q4 is certainly not a good idea here.
Along with YaRN:
--rope-scaling yarn
--rope-scale 4
--yarn-orig-ctx 32768
Hey guys, as much as we'd love to release 128K quants, the small Qwen3 models don't support 128K context so only the large ones work :)
Hey guys, as much as we'd love to release 128K quants, the small Qwen3 models don't support 128K context so only the large ones work :)
I see, I see! Thank you for clarifying, makes sense now :) Appreciate you taking the time to respond to our inquiry!
Thanks Shimmy! Appreciate you taking the time to respond and for all your hard work.
@smcleod - Without it I have the following issue: "draft vocab special tokens must match target vocab to use speculation" don't you?
Note: I've tested several configs, including switching to the 4B model. My use-case is long context sizes with YaRN, and I've noticed that using a draft model inevitably lowers the quality of the Qwen-32B model output (brings hallucinations up) which I cannot afford. So I've dropped the idea of using draft, unless there is something I was not doing right...
My draft acceptance rate was between 0.4 and 0.56 across my different attempts, draft models and params used. I also didn't notice significant speed increase at large context size.
Ah, I wondered what the fix was for that, never the less it sounds like the impact of YaRN on quality might be too much of a trade off to be worth it.