Vocab missing tool-related strings in chat template, poor performance with tools
I notice that none of the tool-related strings in the chat template at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer_config.json#L34 (<|tool▁calls▁begin|>
, <|tool▁sep|>
, <|tool▁outputs▁begin|>
, <|tool▁output▁begin|>
, etc...) are actually in tokenizer vocab of this model's tokenizer at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer.json.
However, I see that they are in the tokenizer for the main R1-0528 model at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/raw/main/tokenizer.json.
I also notice when inferencing with llama.cpp that this distilled model doesn't seem to acknowledge the template-formatted ...<|tool▁outputs▁end|><|tool▁outputs▁end|>
properly to continue its response, and seems to try to go back to thinking, or will output another >
character, or other weird behaviors.
This leads me to the questions:
- Is this distilled model actually trained for tool use?
- If no/yes, is the tools section of the chat template correct for this distilled model?
@mattjcly If it helps, I just added native tool calling - see https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF/discussions/7
@danielhanchen Shouldn't the tokens be in the tokenizer though? It seems strange that they're omitted. In the other previous Deepseek distills (Llama 70b) they're there.
Actually nevermind, I checked the old tokenizers and it's not there either. For some reason my implementation of the model is having a hard time sampling those multibyte underscore characters reliably and I can't figure out why. The tool call output ends up looking like this:
<|tool▁calls▁begin|>
<|tool▁callbegin|>
function
<|toolsep|
weather_search
```json
{"location": "San Francisco"}
<|toolcallend|>
<|toolcallsend|><|end▁of▁sentence|>
```
To anyone who happens to come across this searching for an answer to the same problem, just make sure you compute your rope frequencies in f32 :)