Vocab missing tool-related strings in chat template, poor performance with tools

#13
by mattjcly - opened

I notice that none of the tool-related strings in the chat template at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer_config.json#L34 (<|tool▁calls▁begin|>, <|tool▁sep|>, <|tool▁outputs▁begin|>, <|tool▁output▁begin|>, etc...) are actually in tokenizer vocab of this model's tokenizer at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer.json.

However, I see that they are in the tokenizer for the main R1-0528 model at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/raw/main/tokenizer.json.

I also notice when inferencing with llama.cpp that this distilled model doesn't seem to acknowledge the template-formatted ...<|tool▁outputs▁end|><|tool▁outputs▁end|> properly to continue its response, and seems to try to go back to thinking, or will output another > character, or other weird behaviors.

This leads me to the questions:

  1. Is this distilled model actually trained for tool use?
  2. If no/yes, is the tools section of the chat template correct for this distilled model?

Sign up or log in to comment