Re-uploaded GGUFs with removed <think> tokens for better outputs

#4
by danielhanchen - opened

Hey guys we saw some people having issues with using the model in tools other than llama.cpp. We re-uploaded the GGUFs and we verified that removing the <think> is fine, since the model's probability of producing the think token seems to be nearly 100% anyways.

This should make lmstudio, Ollama and other inference engines other llama.cpp work! Please redownload weights or as @redeemer mentioned, simply delete the <think> token in the chat template ie change the below:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}

to:

{%- if add_generation_prompt %}
    {{- '<|im_start|>assistant\n' }}
{%- endif %}

See https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF?chat_template=default or https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507/raw/main/chat_template.jinja

danielhanchen pinned discussion

Thanks for this, I was thinking there must be a way to avoid the need to hard-code a leading think token!

Sign up or log in to comment