Vocab missing tool-related strings in chat template, poor performance with tools

#13

by mattjcly - opened 26 days ago

26 days ago

I notice that none of the tool-related strings in the chat template at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer_config.json#L34 (<｜tool▁calls▁begin｜>, <｜tool▁sep｜>, <｜tool▁outputs▁begin｜>, <｜tool▁output▁begin｜>, etc...) are actually in tokenizer vocab of this model's tokenizer at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B/blob/main/tokenizer.json.

However, I see that they are in the tokenizer for the main R1-0528 model at https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/raw/main/tokenizer.json.

I also notice when inferencing with llama.cpp that this distilled model doesn't seem to acknowledge the template-formatted ...<｜tool▁outputs▁end｜><｜tool▁outputs▁end｜> properly to continue its response, and seems to try to go back to thinking, or will output another > character, or other weird behaviors.

This leads me to the questions:

Is this distilled model actually trained for tool use?
If no/yes, is the tools section of the chat template correct for this distilled model?

danielhanchen

20 days ago

@mattjcly If it helps, I just added native tool calling - see https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF/discussions/7

zackangelo

5 days ago

@danielhanchen Shouldn't the tokens be in the tokenizer though? It seems strange that they're omitted. In the other previous Deepseek distills (Llama 70b) they're there.

zackangelo

5 days ago

Actually nevermind, I checked the old tokenizers and it's not there either. For some reason my implementation of the model is having a hard time sampling those multibyte underscore characters reliably and I can't figure out why. The tool call output ends up looking like this:

<｜tool▁calls▁begin｜>
<｜tool▁callbegin｜>
function
<｜toolsep｜
weather_search
```json
{"location": "San Francisco"}

<｜toolcallend｜>
<｜toolcallsend｜><｜end▁of▁sentence｜>
```

zackangelo

3 days ago

To anyone who happens to come across this searching for an answer to the same problem, just make sure you compute your rope frequencies in f32 :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment