Vocab missing tool-related strings in chat template, poor performance with tools
I notice that none of the tool-related strings in the chat template at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer_config.json#L34 (<|tool▁calls▁begin|>
, <|tool▁sep|>
, <|tool▁outputs▁begin|>
, <|tool▁output▁begin|>
, etc...) are actually in tokenizer vocab of this model's tokenizer at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer.json.
However, I see that they are in the tokenizer for the main R1-0528 model at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/raw/main/tokenizer.json.
I also notice when inferencing with llama.cpp that this distilled model doesn't seem to acknowledge the template-formatted ...<|tool▁outputs▁end|><|tool▁outputs▁end|>
properly to continue its response, and seems to try to go back to thinking, or will output another >
character, or other weird behaviors.
This leads me to the questions:
- Is this distilled model actually trained for tool use?
- If no/yes, is the tools section of the chat template correct for this distilled model?