Kimi-K2 Open-Source Model Tool Call Output Format Anomaly: Non-Standard tool_call_id Triggers Parsing Failures Compared with Official Mode

#48

by liopen - opened 12 days ago

12 days ago

During the deployment of the Kimi-K2 model using sglang, the model's output for tool call results becomes unstable after multiple rounds of tool invocation. For example:
{
"id": "82874411f6fe4051ba2aa5a5fcf22075",
"object": "chat.completion",
"created": 1754875124,
"model": "Kimi-K2-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Now let me create the API testing framework. First, I'll create the project structure:<|tool_calls_section_begin|><|tool_call_begin|>call_59adf5614cfe4f4b8a71be54<|tool_call_argument_begin|>{"file_path": "C:\\Users\\api-testing-framework\\requirements.txt", "content": "requests>=2.28.0\npytest>=7.0.0\npytest-html>=4.0.0\npytest-xdist>=3.0.0\npyyaml>=6.0\njsonschema>=4.0.0\nfaker>=15.0.0\npython-dotenv>=1.0.0\njinja2>=3.1.0"}<|tool_call_end|><|tool_calls_section_end|>",
"reasoning_content": null,
"tool_calls": []
},
"logprobs": null,
"finish_reason": "tool_calls",
"matched_stop": null
}
],
"usage": {
"prompt_tokens": 24639,
"total_tokens": 24782,
"completion_tokens": 143,
"prompt_tokens_details": null
}
}
In the above output, call_59adf5614cfe4f4b8a71be54 appears as a tool_call_id instead of following the standard functions.{func_name}:{index} format, causing parsing failures. Compared to the official Kimi-K2-0711-preview model, the open-source model shows significant differences in effectiveness, with the official model demonstrating notably higher accuracy.

liopen

11 days ago

{%- for tool_call in message['tool_calls'] -%}
<|tool_call_begin|>functions.{{ tool_call['function']['name'] }}:{{ loop.index }}<|tool_call_argument_begin|>{% if tool_call['function']['arguments'] is string %}{{ tool_call['function']['arguments'] }}{% else %}{{ tool_call['function']['arguments'] | tojson }}{% endif %}<|tool_call_end|>
{%- endfor -%}

If we modify the chat_template.jinja file according to the above changes, it can resolve the issue.

bigmoyan

Moonshot AI org 11 days ago

@liopen I believe the model does return the correct tool_id, but SGLang failed to capture it. But yes, manually construct the tool_id in the chat template could be a workaround. Just remember that the tool_id index must be global across the entire conversation—loop through every message and every tool call within each message. We should consider applying this fix to prevent similar issues. @bigeagle

bigeagle

Moonshot AI org 11 days ago

yes, the index term in tool_call_id is a self-incr counter in scope of the conversation, as is written in the tech report.

I'm not sure if SGLang has some special processing on tool call id.

AdvancedMage

11 days ago

Hi @bigmoyan and @bigeagle , I also encountered this issue in multi-turn tool-calling with long tool context (~25 tool descriptions) using SGLang.

The main issue is that K2 fails to return the tool name (or function name) it wants to call, as shown in the first comment from @liopen .

"content": "Now let me create the API testing framework. First, I'll create the project structure:<|tool_calls_section_begin|><|tool_call_begin|>call_59adf5614cfe4f4b8a71be54<|tool_call_argument_begin|>{"file_path": "C:\\Users\\api-testing-framework\\requirements.txt", "content": "requests>=2.28.0\npytest>=7.0.0\npytest-html>=4.0.0\npytest-xdist>=3.0.0\npyyaml>=6.0\njsonschema>=4.0.0\nfaker>=15.0.0\npython-dotenv>=1.0.0\njinja2>=3.1.0"}<|tool_call_end|><|tool_calls_section_end|>"

You can see between <|tool_call_begin|> and <|tool_call_argument_begin|>, K2 outputs the tool call id (call_59adf5614cfe4f4b8a71be54), whereas the tool name is expected in the following format: functions.{func_name}:{index}. From this response, the client cannot know which tool K2 wants to use. Besides, this tool call id is a fake one. It is not referencing a previous tool call id.

I'm trying the modified chat_template.jinja file to see if it can be mitigated.

AdvancedMage

11 days ago

Hi @bigmoyan and @bigeagle , I verified the modified chat_template.jinja could resolve the issue. I've sent a PR. Could you check if it can be merged? Thanks.

bigeagle

Moonshot AI org 11 days ago

@AdvancedMage big thanks for the PR, but set loop.index as counter is incorrect (it's message-level counter, while the correct one is conversation level).

we are reaching out SGLang team to see if we can solve this issue.

CaryH

9 days ago

@bigeagle do you have any news on this?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment