Kimi-K2 Open-Source Model Tool Call Output Format Anomaly: Non-Standard tool_call_id Triggers Parsing Failures Compared with Official Mode

#48
by liopen - opened

During the deployment of the Kimi-K2 model using sglang, the model's output for tool call results becomes unstable after multiple rounds of tool invocation. For example:
{
"id": "82874411f6fe4051ba2aa5a5fcf22075",
"object": "chat.completion",
"created": 1754875124,
"model": "Kimi-K2-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Now let me create the API testing framework. First, I'll create the project structure:<|tool_calls_section_begin|><|tool_call_begin|>call_59adf5614cfe4f4b8a71be54<|tool_call_argument_begin|>{"file_path": "C:\\Users\\api-testing-framework\\requirements.txt", "content": "requests>=2.28.0\npytest>=7.0.0\npytest-html>=4.0.0\npytest-xdist>=3.0.0\npyyaml>=6.0\njsonschema>=4.0.0\nfaker>=15.0.0\npython-dotenv>=1.0.0\njinja2>=3.1.0"}<|tool_call_end|><|tool_calls_section_end|>",
"reasoning_content": null,
"tool_calls": []
},
"logprobs": null,
"finish_reason": "tool_calls",
"matched_stop": null
}
],
"usage": {
"prompt_tokens": 24639,
"total_tokens": 24782,
"completion_tokens": 143,
"prompt_tokens_details": null
}
}
In the above output, call_59adf5614cfe4f4b8a71be54 appears as a tool_call_id instead of following the standard functions.{func_name}:{index} format, causing parsing failures. Compared to the official Kimi-K2-0711-preview model, the open-source model shows significant differences in effectiveness, with the official model demonstrating notably higher accuracy.

{%- for tool_call in message['tool_calls'] -%}
<|tool_call_begin|>functions.{{ tool_call['function']['name'] }}:{{ loop.index }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
{%- endfor -%}

If we modify the chat_template.jinja file according to the above changes, it can resolve the issue.

Moonshot AI org

@liopen I believe the model does return the correct tool_id, but SGLang failed to capture it. But yes, manually construct the tool_id in the chat template could be a workaround. Just remember that the tool_id index must be global across the entire conversation—loop through every message and every tool call within each message. We should consider applying this fix to prevent similar issues. @bigeagle

Moonshot AI org

yes, the index term in tool_call_id is a self-incr counter in scope of the conversation, as is written in the tech report.

CleanShot 2025-08-12 at 15.56.44.png

I'm not sure if SGLang has some special processing on tool call id.

Hi @bigmoyan and @bigeagle , I also encountered this issue in multi-turn tool-calling with long tool context (~25 tool descriptions) using SGLang.

The main issue is that K2 fails to return the tool name (or function name) it wants to call, as shown in the first comment from @liopen .

"content": "Now let me create the API testing framework. First, I'll create the project structure:<|tool_calls_section_begin|><|tool_call_begin|>call_59adf5614cfe4f4b8a71be54<|tool_call_argument_begin|>{"file_path": "C:\\Users\\api-testing-framework\\requirements.txt", "content": "requests>=2.28.0\npytest>=7.0.0\npytest-html>=4.0.0\npytest-xdist>=3.0.0\npyyaml>=6.0\njsonschema>=4.0.0\nfaker>=15.0.0\npython-dotenv>=1.0.0\njinja2>=3.1.0"}<|tool_call_end|><|tool_calls_section_end|>"

You can see between <|tool_call_begin|> and <|tool_call_argument_begin|>, K2 outputs the tool call id (call_59adf5614cfe4f4b8a71be54), whereas the tool name is expected in the following format: functions.{func_name}:{index}. From this response, the client cannot know which tool K2 wants to use. Besides, this tool call id is a fake one. It is not referencing a previous tool call id.

SGLang is actually using a regex to parse the output and it does not recognize the tool call pattern when the tool call id appears between <|tool_call_begin|> and <|tool_call_argument_begin|>. This results in the <|tool_calls_section_begin|> tokens in the final output text.

I'm trying the modified chat_template.jinja file to see if it can be mitigated.

Hi @bigmoyan and @bigeagle , I verified the modified chat_template.jinja could resolve the issue. I've sent a PR. Could you check if it can be merged? Thanks.

Moonshot AI org

@AdvancedMage big thanks for the PR, but set loop.index as counter is incorrect (it's message-level counter, while the correct one is conversation level).

we are reaching out SGLang team to see if we can solve this issue.

@bigeagle do you have any news on this?

Sign up or log in to comment