Update - Tool Calling + Chat Template bug fixes
#20
pinned
by
danielhanchen
- opened
Just updated DeepSeek-R1-0528 GGUFs and BF16 safetensors (the big 671B model)
- Native tool calling is now supported. Uses https://github.com/sgl-project/sglang/pull/6765 and https://github.com/vllm-project/vllm/pull/18874 which shows DeepSeek-R1 getting 93.25% on the BFCL** Berkeley Function-Calling Leaderboard https://gorilla.cs.berkeley.edu/leaderboard.html.
Use it via--jinja
in llama.cpp. Native transformers and vLLM should work as well.
Had to fix multiple issues in SGLang and vLLM's PRs (dangling newlines etc) - Chat template bug fixes
add_generation_prompt
now works - previously<|Assistant|>
was auto appended - now it's toggle-able. Fixes many issues, and should streamline chat sessions. - UTF-8 encoding of
tokenizer_config.json
is now fixed - now works in Windows. - Ollama is now fixed on using more memory - I removed
num_ctx
andnum_predict
-> it'll now default to Ollama's defaults. This allocated more KV cache VRAM, thus spiking VRAM usage. Please update your context length manually. - [10th June 2025] Update - LM Studio now also works
- Ollama works by using the TQ1_0 quant!
ollama run hf.co/unsloth/DeepSeek-R1-0528-GGUF:TQ1_0
Please re-download all weights to get the latest updates!
danielhanchen
pinned discussion
What is 3. about? I think I can ignore all the other ones and not re-download.
Why UD-Q2_XL was deleted? Is UD-IQ2_M better?
What is 3. about? I think I can ignore all the other ones and not re-download.
It's not that important
Why UD-Q2_XL was deleted? Is UD-IQ2_M better?
Oh crap you're right, it was never supposed to be deleted lol thanks for the warning
I also noticed Q8_0 was gone!! I'll redo Q8_0 and Q2_K_XL
Thank you!
Why is DeepSeek-R1-0528-UD-IQ2_M-00001-of-00005.gguf much newer than the rest of its parts? Are all the files updated (as mentioned above, or just the first one?