New Chat Template + Tool Calling Fixes as of 05 Aug, 2025
Although we previously addressed tool calling issues, the fix only worked in certain setups, such as llama.cpp. With other configurations, tool functionality remained inconsistent.
This new update has undergone extensive testing, by us and others, and should significantly improve tool calling reliability and most solve any strange behaviors.
IMPORTANT:
You must update llama.cpp as they have also fixed some issues!
This issue affected all uploads of the model, regardless of the uploader. We did not introduce this problem or break the model in our quantizations - in fact, weβve now fixed it. For correct chat template behavior and working tool calling, you must use our quants. Other quants (not uploaded by us) do not properly support tool calling.
I am using ollama/ollama:rocm docker image, how can i apply this fix or how can it support tool calling ??
api returns as " does not support tools"
Can you share the template here or somewhere?
I don't want to redownload the gguf but a fix would be nice. I could then load the fixed template.
Thanks!
Can confirm that it behaves way better than before (using UD-Q4_K_XL). ππ
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within XML tags:\n" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n\n\nFor each function call, return a json object with function name and arguments within XML tags:\n\n{"name": , "arguments": }\n<|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for message in messages[::-1] %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('') and message.content.endswith('')) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- if message.content is string %}
{%- set content = message.content %}
{%- else %}
{%- set content = '' %}
{%- endif %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is string %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '' in content %}
{%- set reasoning_content = content.split('')[0].rstrip('\n').split('')[-1].lstrip('\n') %}
{%- set content = content.split('')[-1].lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and reasoning_content) %}
{{- '<|im_start|>' + message.role + '\n\n' + reasoning_content.strip('\n') + '\n\n\n' + content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n\n' }}
{{- content }}
{{- '\n' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
Here is the template I have been using. I made it from a variety of sources, manually debugging, and with the help of AI models.
It seemed to work, but perhaps there are issues with it that are not obvious
One thing that I notice is that when using my template, cli tools like claude code and qwen code don't print to the terminal like:
<function=Read
β Read(qwen3_chat_template.jinja)
βΏ Read 135 lines (ctrl+r to expand)
They do with the template located in the unsloth hugginface non gguf model repo(which I assume is the same as the one in the new updated GGUF files)
I am using ollama/ollama:rocm docker image, how can i apply this fix or how can it support tool calling ??
api returns as " does not support tools"
Or ollama in general. I am experiencing silent toolcall failures, the AI just simply stops with no tool call.
This seems to somewhat work with qwen-code (with some oddity), but it fails with codex.
qwen-code output:
Let me search for relevant code patterns.
<tool_call>
<function=search_file_content
I don't think those <> parts are supposed to be visible? The result then causes qwen-code to make a 1M+ tokens request which obviously fails. I don't know if this is because qwen-code is stupid or if it's a tool parsing bug.
With codex:
command running...
$ find . -name '*.py' -o -name '*.js' -o -name '*.rs' -o -name '*.cpp' -o -name '*.h' -o -name '*.hpp'
codex
I'll help you find where the URL is defined in the implementation. Let me explore the codebase to locate this information.
<tool_call>
<function=shell
codex
<parameter=command>
["find", "/workspace", "-type", "f", "-name", ".py", "-o", "-name", ".js", "-o", "-name", ".rs", "-o", "-name", ".cpp", "-o", "-name", ".h", "-o", "-name", ".hpp"]
</tool_call>
And nothing is actually called (hmm, or maybe it's not realizing this failed because "." became "/workspace").
It works with Claude Code (but it might have the same weird outputs as qwen-code).
This seems to somewhat work with qwen-code (with some oddity), but it fails with codex.
qwen-code output:
Let me search for relevant code patterns. <tool_call> <function=search_file_content
I don't think those <> parts are supposed to be visible? The result then causes qwen-code to make a 1M+ tokens request which obviously fails. I don't know if this is because qwen-code is stupid or if it's a tool parsing bug.
With codex:
command running... $ find . -name '*.py' -o -name '*.js' -o -name '*.rs' -o -name '*.cpp' -o -name '*.h' -o -name '*.hpp' codex I'll help you find where the URL is defined in the implementation. Let me explore the codebase to locate this information. <tool_call> <function=shell codex <parameter=command> ["find", "/workspace", "-type", "f", "-name", ".py", "-o", "-name", ".js", "-o", "-name", ".rs", "-o", "-name", ".cpp", "-o", "-name", ".h", "-o", "-name", ".hpp"] </tool_call>
And nothing is actually called (hmm, or maybe it's not realizing this failed because "." became "/workspace").
It works with Claude Code (but it might have the same weird outputs as qwen-code).
I think the issue is that most tools expect the formatting to be json. Meanwhile this uses XML and has some extra tags that some tools do not expect. Some tools can handle it fine enough and just have some weird formatting. Others seem to break entirely.
If I am correct, a possible solution would be to have a proxy where you send the request to the model api like normal(llama.cpp server in my case) but then modify the return values to be in a format that the end user tooling expects.
I gave this a shot. Updated Llama.cpp and downloaded the fresh Q6 UD quant. I re-ran all my tests. It's still performing just as bad as before, unfortunately. Actually worse performance in RooCode and still doesn't do much of anything in Qwen Code.
I still don't understand the reasoning for why Qwen decided to make this one singular model the one that handles tools differently (because all the other models seem to work perfectly fine if I understand correctly). I mean if they were trying to push for a technological improvement, you'd at least expect it would work in their own product... Qwen Code, right? I don't get the logic here at all to make a change at not at least have it working in their own purpose-built coding solution. Just so perplexing of a decision.
To be extra sure... confirming that this is the correct template now?
{# Copyright 2025-present Unsloth. Apache 2.0 License. Unsloth Chat template fixes #}
{% macro render_item_list(item_list, tag_name='required') %}
{%- if item_list is defined and item_list is iterable and item_list | length > 0 %}
{%- if tag_name %}{{- '\n<' ~ tag_name ~ '>' -}}{% endif %}
{{- '[' }}
{%- for item in item_list -%}
{%- if loop.index > 1 %}{{- ", "}}{% endif -%}
{%- if item is string -%}
{{ "`" ~ item ~ "`" }}
{%- else -%}
{{ item }}
{%- endif -%}
{%- endfor -%}
{{- ']' }}
{%- if tag_name %}{{- '</' ~ tag_name ~ '>' -}}{% endif %}
{%- endif %}
{% endmacro %}
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = [] %}
{%- endif %}
{%- if system_message is defined %}
{{- "<|im_start|>system\n" + system_message }}
{%- else %}
{%- if tools is iterable and tools | length > 0 %}
{{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
{%- endif %}
{%- endif %}
{%- if tools is iterable and tools | length > 0 %}
{{- "\n\nYou have access to the following functions:\n\n" }}
{{- "<tools>" }}
{%- for tool in tools %}
{%- if tool.function is defined %}
{%- set tool = tool.function %}
{%- endif %}
{{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
{{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
{{- '\n<parameters>' }}
{%- for param_name, param_fields in tool.parameters.properties|items %}
...
...
...
(Truncated to save people from reading the whole thing)
If this is what it's supposed to be now, then, again, no noticeable improvement for me as of right now.
Regardless, much appreciated efforts from the Unsloth team. Sorry you guys had to go through all this craziness.
Hey, many thanks for all the hard work, just figured I'd drop my roo code setup so others can compare if they're having issues, since I was in the same boat as everyone before with tool calls failing a lot.
Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)
I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).
Re-entered all the recommended settings from the model card.
Configured Roo Code to use an Open AI compatible endpoint, pointed it at my OpenWebUI/api endpoint, picked Qwen3 coder.
Loaded up a fresh workspace in VS Code, and gave it a brief description of a weather API in rust (nothing crazy).
It did the whole thing, fixed a couple errors, wrote a readme, wrote integration tests, wrote a example script to call it, and had no tool-calling failures. Can't speak to anyone who has custom MCP servers or similar wired up.
I'll try it with the larger repo I was working with tomorrow and see if it's equally as stable, but it does look significantly better. I've been meaning to try qwen code, but I won't get a chance until next week.
@Sunderous I was quite skeptical of LM Studio giving me a different result because of how long I've been going at this, but I wanted to give you the benefit of the doubt. I went ahead and downloaded the latest Unsloth Q4 UD for Qwen3-Coder on it and it nailed both my tests first try!
It worked great in RooCode, no errors at all with tool calling.
(Edit: Initially wrote that it was also working great in Qwen Code, and realized I accidentally loaded the wrong model. After confirming I was loading the right model, I got the errors mentioned below.)
The only issues I'm getting:
1. Occasionally after a prompt has finished the model becomes unloaded and throws an error, and I just have to click the "retry" button in RooCode to get it going.
Update: Fixed it by switching from "LM Studio" to "OpenAI Compatible" in RooCode.
- Not working in Qwen Code for some reason. Maybe, despite the fact I asked LM Studio to download the Unsloth quant, it is still using a different template?
But beyond that it is at least working in RooCode.
So... I now have an important question: Why is LM Studio working and Llama.cpp not working with the same exact model in RooCode?
I didn't think the "engine" you used to run a model had any impact on the output quality (speed, of course). So this is quite a shock to me personally.
What is LM Studio doing that's not happening with Llama.cpp? Is LM Studio somehow translating the tool interactions to make them compatible maybe? Is it a different template? Really curious now.
Edit: I figured out how to see what the template is. It's definitely showing the same chat template in LM Studio, so it isn't that.
Just adding my experience here; downloaded latest Q8_K_XL quant (10:39 PM PST Aug 5, also built latest main-branch llama.cpp around this time) and ran with:
llama-server -m models/Qwen3-Coder-30B-A3B-Instruct-UD-Q8_K_XL.gguf
--jinja
--host 0.0.0.0
--port 8181
-ngl 99
-c 32768
-b 10240
-ub 2048
--n-cpu-moe 10
-fa
-t 24
Here's what I'm getting in Qwen Code (it doesn't work):
I have tried Qwen3-Coder-30B-A3B-Instruct-UD-Q3_K_XL, Qwen3-Coder-30B-A3B-Instruct-Q3_K_XL with roocode,crush,qwen,opencode none of them works.
Updated ollama docker container to latest.
Both above models shows same hash
Opencode shows like
{"name": "read", "arguments": {"filePath": "/workspace/CRUSH.md"}}
Crush shows like
RooCode shows like
[ERROR] You did not use a tool in your previous response! Please retry with a tool use.
Reminder: Instructions for Tool Use
Tool uses are formatted using XML-style tags. The tool name itself becomes the XML tag name. Each parameter is enclosed within its own set of tags. Here's the structure:
value1 value2 ...For example, to use the attempt_completion tool:
I have completed the task...Always use the actual tool name as the XML tag name for proper parsing and execution.
Next Steps
If you have completed the user's task, use the attempt_completion tool.
If you require additional information from the user, use the ask_followup_question tool.
Otherwise, if you have not completed the task and do not need additional information, then proceed with the next step of the task.
(This is an automated message, so do not respond to it conversationally.)
VSCode Open Tabs
Current Time
Current time in ISO 8601 UTC format: 2025-08-06T05:41:01.874Z
User time zone: Asia/Calcutta, UTC+5:30
Current Cost
$0.00
Current Mode
architect
ποΈ Architect
hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:UD-Q3_K_XL
You have not created a todo list yet. Create one with update_todo_list
if your task is complicated or involves multiple steps.
Qwen shows Like
Started to loop
Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)
I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).
The following worked for me (I was previously having issues with multiple agents, Qwen Code, Roo Code, Kilo Code, etc.):
- Switching LM Studio from latest stable to latest beta
- I used the model above (was already downloaded my LM Studio):
qwen3-coder-30b-a3b-instruct@q4_k_xl
- I made sure to change the model settings:
prompt
tab,prompt template
, and I put the content from this link. - This DID have issues with Qwen Code, which were solved after I asked the LLM what would be wrong with the Jinja template. It told me to remove
| safe
at two localtions, which I did. Since then, everything seems to work correctly.
Does openwebui have this feature to change chat template ?
Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)
I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).
The following worked for me (I was previously having issues with multiple agents, Qwen Code, Roo Code, Kilo Code, etc.):
- Switching LM Studio from latest stable to latest beta
- I used the model above (was already downloaded my LM Studio):
qwen3-coder-30b-a3b-instruct@q4_k_xl
- I made sure to change the model settings:
prompt
tab,prompt template
, and I put the content from this link.- This DID have issues with Qwen Code, which were solved after I asked the LLM what would be wrong with the Jinja template. It told me to remove
| safe
at two localtions, which I did. Since then, everything seems to work correctly.
@belgaied2 Can you share the working jinja chat template for Qwen Code?
Where is safe in current chat template? I don't see it?
Got the latest beta LM Studio + the latest beta CUDA llama.cpp (1.45)
I was using LM Studio via OpenWebUI, but for some reason it wouldn't pickup the new GGUF when I put it in the old LM Studio folder, so step 1 was re-import the model in a totally different folder (/models/imported-models/Qwen3-Coder-30B-A3B-Instruct/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf).
The following worked for me (I was previously having issues with multiple agents, Qwen Code, Roo Code, Kilo Code, etc.):
- Switching LM Studio from latest stable to latest beta
- I used the model above (was already downloaded my LM Studio):
qwen3-coder-30b-a3b-instruct@q4_k_xl
- I made sure to change the model settings:
prompt
tab,prompt template
, and I put the content from this link.- This DID have issues with Qwen Code, which were solved after I asked the LLM what would be wrong with the Jinja template. It told me to remove
| safe
at two localtions, which I did. Since then, everything seems to work correctly.
After updated the chat_template, it still not working properly in Roo Code.
This is from Roo code:
Roo is having trouble...
Roo appears to be stuck in a loop, attempting the same action (read_file) repeatedly. This might indicate a problem with its current strategy. Consider rephrasing the task, providing more specific instructions, or guiding it towards a different approach.
Qwen updated their chat_template in tokenizer_config.json
https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/discussions/14#689450d5592b0f16e20c183a
maybe this works better
@shimmyshimmer it says updated few hours ago? Have the ggufs been updated?
No they haven't, check the dates on the files itself. The last commit just removed the imatrix file.
I'm still encountering issues with tool calling. I'm using Cline and Roo Code with llama.cpp server using OpenAI compatible API.
Updated to latest llama.cpp, Cline/Roo, latest ggufs. I also tried the templates above and they didn't work for me. I haven't tried LMStudio yet though.
Chat template from current gguf q4_k_xl does not work well with RooCode, it fails to edit files.
Chat template from qwens tokenizer_config.json does not work too, RooCode can not read files with it.
This one from old qwen3 discussions does work for a few calls with RooCode and llamacpp:
qwen3-chat-template.jinja
{%- if tools %}
{{- '<|im_start|>system\n' }}
{%- if messages[0].role == 'system' %}
{{- messages[0].content + '\n\n' }}
{%- endif %}
{{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
{%- for tool in tools %}
{{- "\n" }}
{{- tool | tojson }}
{%- endfor %}
{{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }}
{%- else %}
{%- if messages[0].role == 'system' %}
{{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
{%- for forward_message in messages %}
{%- set index = (messages|length - 1) - loop.index0 %}
{%- set message = messages[index] %}
{%- set current_content = message.content if message.content is defined and message.content is not none else '' %}
{%- set tool_start = '<tool_response>' %}
{%- set tool_start_length = tool_start|length %}
{%- set start_of_message = current_content[:tool_start_length] %}
{%- set tool_end = '</tool_response>' %}
{%- set tool_end_length = tool_end|length %}
{%- set start_pos = (current_content|length) - tool_end_length %}
{%- if start_pos < 0 %}
{%- set start_pos = 0 %}
{%- endif %}
{%- set end_of_message = current_content[start_pos:] %}
{%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %}
{%- set ns.multi_step_tool = false %}
{%- set ns.last_query_index = index %}
{%- endif %}
{%- endfor %}
{%- for message in messages %}
{%- set m_content = message.content if message.content is defined and message.content is not none else '' %}
{%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
{{- '<|im_start|>' + message.role + '\n' + m_content + '<|im_end|>' + '\n' }}
{%- elif message.role == "assistant" %}
{%- set reasoning_content = '' %}
{%- if message.reasoning_content is defined and message.reasoning_content is not none %}
{%- set reasoning_content = message.reasoning_content %}
{%- else %}
{%- if '</think>' in m_content %}
{%- set m_content = (m_content.split('</think>')|last).lstrip('\n') %}
{%- set reasoning_content = (m_content.split('</think>')|first).rstrip('\n') %}
{%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\n') %}
{%- endif %}
{%- endif %}
{%- if loop.index0 > ns.last_query_index %}
{%- if loop.last or (not loop.last and (not reasoning_content.strip() == "")) %}
{{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + m_content.lstrip('\n') }}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + m_content }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + m_content }}
{%- endif %}
{%- if message.tool_calls %}
{%- for tool_call in message.tool_calls %}
{%- if (loop.first and m_content) or (not loop.first) %}
{{- '\n' }}
{%- endif %}
{%- if tool_call.function %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '<tool_call>\n{"name": "' }}
{{- tool_call.name }}
{{- '", "arguments": ' }}
{%- if tool_call.arguments is string %}
{{- tool_call.arguments }}
{%- else %}
{{- tool_call.arguments | tojson }}
{%- endif %}
{{- '}\n</tool_call>' }}
{%- endfor %}
{%- endif %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "tool" %}
{%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
{{- '<|im_start|>user' }}
{%- endif %}
{{- '\n<tool_response>\n' }}
{{- message.content if message.content is defined and message.content is not none else '' }}
{{- '\n</tool_response>' }}
{%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- if enable_thinking is defined and enable_thinking is false %}
{{- '<think>\n\n</think>\n\n' }}
{%- endif %}
{%- endif %}
Chat template from current gguf q4_k_xl does not work well with RooCode, it fails to edit files.
Chat template from qwens tokenizer_config.json does not work too, RooCode can not read files with it.This one from old qwen3 discussions does work for a few calls with RooCode and llamacpp:
qwen3-chat-template.jinja
{%- if tools %} {{- '<|im_start|>system\n' }} {%- if messages[0].role == 'system' %} {{- messages[0].content + '\n\n' }} {%- endif %} {{- "# Tools\n\nYou may call one or more functions to assist with the user query.\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }} {%- for tool in tools %} {{- "\n" }} {{- tool | tojson }} {%- endfor %} {{- "\n</tools>\n\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\n<tool_call>\n{\"name\": <function-name>, \"arguments\": <args-json-object>}\n</tool_call><|im_end|>\n" }} {%- else %} {%- if messages[0].role == 'system' %} {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }} {%- endif %} {%- endif %} {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %} {%- for forward_message in messages %} {%- set index = (messages|length - 1) - loop.index0 %} {%- set message = messages[index] %} {%- set current_content = message.content if message.content is defined and message.content is not none else '' %} {%- set tool_start = '<tool_response>' %} {%- set tool_start_length = tool_start|length %} {%- set start_of_message = current_content[:tool_start_length] %} {%- set tool_end = '</tool_response>' %} {%- set tool_end_length = tool_end|length %} {%- set start_pos = (current_content|length) - tool_end_length %} {%- if start_pos < 0 %} {%- set start_pos = 0 %} {%- endif %} {%- set end_of_message = current_content[start_pos:] %} {%- if ns.multi_step_tool and message.role == "user" and not(start_of_message == tool_start and end_of_message == tool_end) %} {%- set ns.multi_step_tool = false %} {%- set ns.last_query_index = index %} {%- endif %} {%- endfor %} {%- for message in messages %} {%- set m_content = message.content if message.content is defined and message.content is not none else '' %} {%- if (message.role == "user") or (message.role == "system" and not loop.first) %} {{- '<|im_start|>' + message.role + '\n' + m_content + '<|im_end|>' + '\n' }} {%- elif message.role == "assistant" %} {%- set reasoning_content = '' %} {%- if message.reasoning_content is defined and message.reasoning_content is not none %} {%- set reasoning_content = message.reasoning_content %} {%- else %} {%- if '</think>' in m_content %} {%- set m_content = (m_content.split('</think>')|last).lstrip('\n') %} {%- set reasoning_content = (m_content.split('</think>')|first).rstrip('\n') %} {%- set reasoning_content = (reasoning_content.split('<think>')|last).lstrip('\n') %} {%- endif %} {%- endif %} {%- if loop.index0 > ns.last_query_index %} {%- if loop.last or (not loop.last and (not reasoning_content.strip() == "")) %} {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + m_content.lstrip('\n') }} {%- else %} {{- '<|im_start|>' + message.role + '\n' + m_content }} {%- endif %} {%- else %} {{- '<|im_start|>' + message.role + '\n' + m_content }} {%- endif %} {%- if message.tool_calls %} {%- for tool_call in message.tool_calls %} {%- if (loop.first and m_content) or (not loop.first) %} {{- '\n' }} {%- endif %} {%- if tool_call.function %} {%- set tool_call = tool_call.function %} {%- endif %} {{- '<tool_call>\n{"name": "' }} {{- tool_call.name }} {{- '", "arguments": ' }} {%- if tool_call.arguments is string %} {{- tool_call.arguments }} {%- else %} {{- tool_call.arguments | tojson }} {%- endif %} {{- '}\n</tool_call>' }} {%- endfor %} {%- endif %} {{- '<|im_end|>\n' }} {%- elif message.role == "tool" %} {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %} {{- '<|im_start|>user' }} {%- endif %} {{- '\n<tool_response>\n' }} {{- message.content if message.content is defined and message.content is not none else '' }} {{- '\n</tool_response>' }} {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %} {{- '<|im_end|>\n' }} {%- endif %} {%- endif %} {%- endfor %} {%- if add_generation_prompt %} {{- '<|im_start|>assistant\n' }} {%- if enable_thinking is defined and enable_thinking is false %} {{- '<think>\n\n</think>\n\n' }} {%- endif %} {%- endif %}
Wait for llama.cpp fix. And try this
https://github.com/ggml-org/llama.cpp/issues/15012#issuecomment-3165984279
I don't know for you guys but the latest official template as
@itsthenewmeta
pointed to is working like a charm for me using llama.cpp server + RooCode!
No more unintended behavior after 30k+ tokens!
And the only (really) rare times it encounters tool calls formatting error, but even then when it occurs, it corrects itself right away.
I've been using it for almost 3 hours, so it seems this entirely fixed it. So I thought it was worth sharing!
Here is my jinja file (but it is just reindented from the one in https://huggingface.co/Qwen/Qwen3-Coder-30B-A3B-Instruct/blob/main/tokenizer_config.json):
{% macro render_extra_keys(json_dict, handled_keys) %}
{%- if json_dict is mapping %}
{%- for json_key in json_dict if json_key not in handled_keys %}
{%- if json_dict[json_key] is mapping %}
{{- '\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | tojson | safe) ~ '</' ~ json_key ~ '>' }}
{%- else %}
{{-'\n<' ~ json_key ~ '>' ~ (json_dict[json_key] | string) ~ '</' ~ json_key ~ '>' }}
{%- endif %}
{%- endfor %}
{%- endif %}
{% endmacro %}
{%- if messages[0]["role"] == "system" %}
{%- set system_message = messages[0]["content"] %}
{%- set loop_messages = messages[1:] %}
{%- else %}
{%- set loop_messages = messages %}
{%- endif %}
{%- if not tools is defined %}
{%- set tools = [] %}
{%- endif %}
{%- if system_message is defined %}
{{- "<|im_start|>system\n" + system_message }}
{%- else %}
{%- if tools is iterable and tools | length > 0 %}
{{- "<|im_start|>system\nYou are Qwen, a helpful AI assistant that can interact with a computer to solve tasks." }}
{%- endif %}
{%- endif %}
{%- if tools is iterable and tools | length > 0 %}
{{- "\n\nYou have access to the following functions:\n\n" }}
{{- "<tools>" }}
{%- for tool in tools %}
{%- if tool.function is defined %}
{%- set tool = tool.function %}
{%- endif %}
{{- "\n<function>\n<name>" ~ tool.name ~ "</name>" }}
{%- if tool.description is defined %}
{{- '\n<description>' ~ (tool.description | trim) ~ '</description>' }}
{%- endif %}
{{- '\n<parameters>' }}
{%- if tool.parameters is defined and tool.parameters is mapping and tool.parameters.properties is defined and tool.parameters.properties is mapping %}
{%- for param_name, param_fields in tool.parameters.properties|items %}
{{- '\n<parameter>' }}
{{- '\n<name>' ~ param_name ~ '</name>' }}
{%- if param_fields.type is defined %}
{{- '\n<type>' ~ (param_fields.type | string) ~ '</type>' }}
{%- endif %}
{%- if param_fields.description is defined %}
{{- '\n<description>' ~ (param_fields.description | trim) ~ '</description>' }}
{%- endif %}
{%- set handled_keys = ['name', 'type', 'description'] %}
{{- render_extra_keys(param_fields, handled_keys) }}
{{- '\n</parameter>' }}
{%- endfor %}
{%- endif %}
{% set handled_keys = ['type', 'properties'] %}
{{- render_extra_keys(tool.parameters, handled_keys) }}
{{- '\n</parameters>' }}
{%- set handled_keys = ['type', 'name', 'description', 'parameters'] %}
{{- render_extra_keys(tool, handled_keys) }}
{{- '\n</function>' }}
{%- endfor %}
{{- "\n</tools>" }}
{{- '\n\nIf you choose to call a function ONLY reply in the following format with NO suffix:\n\n<tool_call>\n<function=example_function_name>\n<parameter=example_parameter_1>\nvalue_1\n</parameter>\n<parameter=example_parameter_2>\nThis is the value for the second parameter\nthat can span\nmultiple lines\n</parameter>\n</function>\n</tool_call>\n\n<IMPORTANT>\nReminder:\n- Function calls MUST follow the specified format: an inner <function=...></function> block must be nested within <tool_call></tool_call> XML tags\n- Required parameters MUST be specified\n- You may provide optional reasoning for your function call in natural language BEFORE the function call, but NOT after\n- If there is no function call available, answer the question like normal with your current knowledge and do not tell the user about function calls\n</IMPORTANT>' }}
{%- endif %}
{%- if system_message is defined %}
{{- '<|im_end|>\n' }}
{%- else %}
{%- if tools is iterable and tools | length > 0 %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- endif %}
{%- for message in loop_messages %}
{%- if message.role == "assistant" and message.tool_calls is defined and message.tool_calls is iterable and message.tool_calls | length > 0 %}
{{- '<|im_start|>' + message.role }}
{%- if message.content is defined and message.content is string and message.content | trim | length > 0 %}
{{- '\n' + message.content | trim + '\n' }}
{%- endif %}
{%- for tool_call in message.tool_calls %}
{%- if tool_call.function is defined %}
{%- set tool_call = tool_call.function %}
{%- endif %}
{{- '\n<tool_call>\n<function=' + tool_call.name + '>\n' }}
{%- if tool_call.arguments is defined %}
{%- for args_name, args_value in tool_call.arguments|items %}
{{- '<parameter=' + args_name + '>\n' }}
{%- set args_value = args_value | tojson | safe if args_value is mapping else args_value | string %}
{{- args_value }}
{{- '\n</parameter>\n' }}
{%- endfor %}
{%- endif %}
{{- '</function>\n</tool_call>' }}
{%- endfor %}
{{- '<|im_end|>\n' }}
{%- elif message.role == "user" or message.role == "system" or message.role == "assistant" %}
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>' + '\n' }}
{%- elif message.role == "tool" %}
{%- if loop.previtem and loop.previtem.role != "tool" %}
{{- '<|im_start|>user\n' }}
{%- endif %}
{{- '<tool_response>\n' }}
{{- message.content }}
{{- '\n</tool_response>\n' }}
{%- if not loop.last and loop.nextitem.role != "tool" %}
{{- '<|im_end|>\n' }}
{%- elif loop.last %}
{{- '<|im_end|>\n' }}
{%- endif %}
{%- else %}
{{- '<|im_start|>' + message.role + '\n' + message.content + '<|im_end|>\n' }}
{%- endif %}
{%- endfor %}
{%- if add_generation_prompt %}
{{- '<|im_start|>assistant\n' }}
{%- endif %}
And here are my exact llama-server args:
./llama-server \
-m /home/user/llama.cpp/GGUFS/Qwen3-Coder-30B-A3B-Instruct-UD-Q4_K_XL.gguf \
-c 64000 \
-ngl 65 \
--temp 0.7 \
--top-p 0.8 \
--top-k 20 \
--repeat-penalty 1.05 \
--jinja \
--host 0.0.0.0 \
--port 8678 \
-a Qwen3-Coder-30B-A3B_Q4_K_XL_64k \
--chat-template-file /home/user/llama.cpp/templates/qwen3-coder.jinja \
-fa
I gave the template and config you suggested, owao, a shot and had immediate failures. Asking "Analyze this project and give me a summary in summary.md" had a fatal looping error on reading files and RooCode shut it down.
Still having the best tool interaction with LM Studio.
weird, latest master llama.cpp?
Yup, just double checked. But yeah if it's working for you that's great! I'm just sticking to LM Studio until llama.cpp gets it all sorted. Seems like there's a number of things going on to improve the situation as mentioned above.
The speed that it put those errors out actually was incredible. Semi-automatic error gun.
@JamesMowery
Would you be comfortable sharing your project repo? I'd like to see if I can replicate. I could of course understand if you'd rather not. Just tell me ;)
Also, latest RooCod stable version?
@owao No need. You can replicate it in like 30 seconds. Even when starting a brand new project it fails.
Here's a quick video recording of what I have setup now: https://share.mowery.io/u/2DYO.mp4
(I already removed your suggested template as I got worse results with it, but I tested it with it yesterday and the results were slightly different (it actually read the files and then after it read them it went into an infinite loop and gave the same final error) but ultimately the same failures no matter what.
Update: Here's a screen recording of the same project run with LM Studio. Only difference is I only have the Q4 UD model on LM Studio vs Q8 UD on llama.cpp (I had the Q3 - Q5 models on llama.cpp but deleted those after seeing how horrible the model was performing, so I only kept the Q6 & Q8). But even the lower quant in LM Studio does way better. Threw a single error but recovered and finished: https://share.mowery.io/u/aqUL.mp4
That's confusing... I see your parameters seem exactly the same between both. That said I don't know what's the defaults for the other sampling params in LMStudio. I'm going to test your prompt on a fresh random repo to test.
So here it is: I tested it on https://github.com/protectai/vulnhuntr that's a modest size repo, but still far larger than your example. I tried in architect mode even if I don't think it's the best mode for such a task, so I also tried in ask mode right after. Ask mode unfortunately failed for 2 attempts (but I really don't get this often!), but still it recovered.
I don't know what's going on. I could understand if you have enough testing, but if ever you wanna try a same example repo we agree on, I'd be happy to try. Let me know
I've tried so many different jinja templates and nothing works for me either. Using latest llama server builds for Windows. I wonder if it's these quants.
@biship Proper tool calling is not yet implemented in llama.cpp, see https://github.com/ggml-org/llama.cpp/pull/15162
So weird we have such difference experience with the same params, but π @qingy2024 thx I didn't notice and follow those ones so it seems your are "all not alone" lol I am
I apologize for the false hope...
Past 30k misbehavior in agentic use (roo-code) is still a thing! For example it starts to write reasoning traces as comments in generated code, making context size explode from there...
My understanding (correct me if I'm wrong) is that Roo, Cline and Kilo don't use the native function calling, and don't include the tools in the chat message structure. So the Jinja template changes won't affect these. Roo uses its own custom format with in-context instructions and inlines the tool descriptions into the user message content, completely bypassing the both parsing and rendering of the tools and tool calls. The reason it is failing is because this model REALLY wants to use the XML-ish syntax it was trained on, and there aren't enough in-context-learning examples to elicit Roo/Cline/Kilo's custom tool calling syntax. Stronger models like Claude don't mind being told to use a different tool calling syntax and having the parsing in user-space. I think there's no amount of Jina template tweaking which fill fix this for Roo.
EDIT: To verify this, I downloaded the Ollama version of this model, which has no tool calling support in the template, and added the following to Roo's "Custom Instructions for all Modes" section:
# === **CRITICAL REMINDER** ===
# Tool Use Formatting
Tool uses are formatted using XML-style tags. The tool name itself becomes the XML tag name. Each parameter is enclosed within its own set of tags. Here's the structure:
<actual_tool_name>
<parameter1_name>value1</parameter1_name>
<parameter2_name>value2</parameter2_name>
...
</actual_tool_name>
For example, to use the new_task tool:
<new_task>
<mode>code</mode>
<message>Implement a new feature for the application.</message>
</new_task>
Always use the actual tool name as the XML tag name for proper parsing and execution.
Which is just a copy-paste of the tool calling instructions from the system prompt (with some urgency formatting). I haven't tested it extensively, but it immediately started calling tools properly, whereas previously it was failing. That Ollama template doesn't even include tool rendering, and still the model can do it "in user space". I'm guessing that despite these instructions appearing in the system prompt, there's a "lost-in-the-middle" problem which is solved like this because Roo will append the custom instructions section to the bottom of the prompt (if it's still not working for you, try repeating that snippet 3x or so).
I found this worked on and off, but randomly apply_diff and some other roo code tools wouldn't run. But I think that might be fixed with a newly released roo code. (https://docs.roocode.com/update-notes/v3.25.16). Specifically this commit: https://github.com/RooCodeInc/Roo-Code/pull/7108 . I've only played with it for about 30 minutes tonight but I didn't encounter any error. I'll feel more confident that its fixed after I spend more time playing with it tomorrow. Hope y'all have similar results!
I ended up adding a 230+ line custom rule https://gist.github.com/richardhundt/820d782ccd49a7c4d8c6d52ab74d377d to reinforce the tool calling syntax. I found that Q5_K_XL and Q6_K_XL worked noticeably better than the Q4_K_XL quant. Either way, this is more of an instruction following issue with Roo than support for this model's quirky native function calling syntax since that's not actually being used.