Re-uploaded GGUFs with removed <think> tokens for better outputs
Hey guys we saw some people having issues with using the model in tools other than llama.cpp. We re-uploaded the GGUFs and we verified that removing the <think>
is fine, since the model's probability of producing the think token seems to be nearly 100% anyways.
This should make lmstudio, Ollama and other inference engines other llama.cpp work! Please redownload weights or as
@redeemer
mentioned, simply delete the <think>
token in the chat template ie change the below:
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n<think>\n' }}
{%- endif %}
to:
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
See https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507-GGUF?chat_template=default or https://huggingface.co/unsloth/Qwen3-30B-A3B-Thinking-2507/raw/main/chat_template.jinja
Thanks for this, I was thinking there must be a way to avoid the need to hard-code a leading think token!