Update - Tool Calling + Chat Template bug fixes

#20
by danielhanchen - opened
Unsloth AI org

Just updated DeepSeek-R1-0528 GGUFs and BF16 safetensors (the big 671B model)

  1. Native tool calling is now supported. Uses https://github.com/sgl-project/sglang/pull/6765 and https://github.com/vllm-project/vllm/pull/18874 which shows DeepSeek-R1 getting 93.25% on the BFCL** Berkeley Function-Calling Leaderboard https://gorilla.cs.berkeley.edu/leaderboard.html.
    Use it via --jinja in llama.cpp. Native transformers and vLLM should work as well.
    Had to fix multiple issues in SGLang and vLLM's PRs (dangling newlines etc)
  2. Chat template bug fixes add_generation_prompt now works - previously <|Assistant|> was auto appended - now it's toggle-able. Fixes many issues, and should streamline chat sessions.
  3. UTF-8 encoding of tokenizer_config.json is now fixed - now works in Windows.
  4. Ollama is now fixed on using more memory - I removed num_ctx and num_predict -> it'll now default to Ollama's defaults. This allocated more KV cache VRAM, thus spiking VRAM usage. Please update your context length manually.
  5. [10th June 2025] Update - LM Studio now also works
  6. Ollama works by using the TQ1_0 quant!

ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0

Please re-download all weights to get the latest updates!

danielhanchen pinned discussion

What is 3. about? I think I can ignore all the other ones and not re-download.

Why UD-Q2_XL was deleted? Is UD-IQ2_M better?

Unsloth AI org

What is 3. about? I think I can ignore all the other ones and not re-download.

It's not that important

Why UD-Q2_XL was deleted? Is UD-IQ2_M better?

Oh crap you're right, it was never supposed to be deleted lol thanks for the warning

Unsloth AI org

I also noticed Q8_0 was gone!! I'll redo Q8_0 and Q2_K_XL

Unsloth AI org

@ciprianv Q2_K_XL and Q8_0 are back - unsure why it got removed sorry!

Thank you!

Why is DeepSeek-R1-0528-UD-IQ2_M-00001-of-00005.gguf much newer than the rest of its parts? Are all the files updated (as mentioned above, or just the first one?

Sign up or log in to comment