File size: 12,496 Bytes
eb6951f 5b44356 eb6951f 676949c d299dc3 b31dbeb d299dc3 676949c eb6951f |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
---
license: mit
library_name: transformers
base_model:
- deepseek-ai/DeepSeek-V3.1
tags:
- deepseek
- deepseek_v3
- unsloth
---
<div>
<p style="margin-bottom: 0; margin-top: 0;">
<strong>Learn how to run DeepSeek-V3.1 correctly - <a href="https://docs.unsloth.ai/basics/deepseek-v3.1">Read our Guide</a>.</strong>
</p>
<p style="margin-top: 0;margin-bottom: 0;">
<em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
</p>
<div style="display: flex; gap: 5px; align-items: center; ">
<a href="https://github.com/unslothai/unsloth/">
<img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
</a>
<a href="https://discord.gg/unsloth">
<img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
</a>
<a href="https://docs.unsloth.ai/basics/deepseek-v3.1-how-to-run-locally">
<img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
</a>
</div>
<h1 style="margin-top: 0rem;">🐋 DeepSeek-V3.1 Usage Guidelines</h1>
</div>
These quants include our Unsloth chat template fixes, specifically for llama.cpp supported backends.
- Set the temperature **~0.6** (recommended) and Top_P value of **0.95** (recommended)
- UD-Q2_K_XL (247GB) is recommended
- For complete detailed instructions, see our guide: [unsloth.ai/blog/deepseek-v3.1](https://docs.unsloth.ai/basics/deepseek-v3.1)
<br>
# DeepSeek-V3.1
<!-- markdownlint-disable first-line-h1 -->
<!-- markdownlint-disable html -->
<!-- markdownlint-disable no-duplicate-header -->
<div align="center">
<img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
</div>
<hr>
<div align="center" style="line-height: 1;">
<a href="https://www.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Homepage" src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/badge.svg?raw=true" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://chat.deepseek.com/" target="_blank" style="margin: 2px;">
<img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-DeepSeek%20V3-536af5?color=536af5&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://huggingface.co/deepseek-ai" target="_blank" style="margin: 2px;">
<img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-DeepSeek%20AI-ffc107?color=ffc107&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="https://discord.gg/Tc7c45Zzu5" target="_blank" style="margin: 2px;">
<img alt="Discord" src="https://img.shields.io/badge/Discord-DeepSeek%20AI-7289da?logo=discord&logoColor=white&color=7289da" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/qr.jpeg?raw=true" target="_blank" style="margin: 2px;">
<img alt="Wechat" src="https://img.shields.io/badge/WeChat-DeepSeek%20AI-brightgreen?logo=wechat&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
<a href="https://twitter.com/deepseek_ai" target="_blank" style="margin: 2px;">
<img alt="Twitter Follow" src="https://img.shields.io/badge/Twitter-deepseek_ai-white?logo=x&logoColor=white" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
<div align="center" style="line-height: 1;">
<a href="LICENSE" style="margin: 2px;">
<img alt="License" src="https://img.shields.io/badge/License-MIT-f5de53?&color=f5de53" style="display: inline-block; vertical-align: middle;"/>
</a>
</div>
## Introduction
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. Compared to the previous version, this upgrade brings improvements in multiple aspects:
- **Hybrid thinking mode**: One model supports both thinking mode and non-thinking mode by changing the chat template.
- **Smarter tool calling**: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.
- **Higher thinking efficiency**: DeepSeek-V3.1-Think achieves comparable answer quality to DeepSeek-R1-0528, while responding more quickly.
DeepSeek-V3.1 is post-trained on the top of DeepSeek-V3.1-Base, which is built upon the original V3 base checkpoint through a two-phase long context extension approach, following the methodology outlined in the original DeepSeek-V3 report. We have expanded our dataset by collecting additional long documents and substantially extending both training phases. The 32K extension phase has been increased 10-fold to 630B tokens, while the 128K extension phase has been extended by 3.3x to 209B tokens. Additionally, DeepSeek-V3.1 is trained using the UE8M0 FP8 scale data format to ensure compatibility with microscaling data formats.
## Model Downloads
<div align="center">
| **Model** | **#Total Params** | **#Activated Params** | **Context Length** | **Download** |
| :------------: | :------------: | :------------: | :------------: | :------------: |
| DeepSeek-V3.1-Base | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1-Base) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1-Base) |
| DeepSeek-V3.1 | 671B | 37B | 128K | [HuggingFace](https://huggingface.co/deepseek-ai/DeepSeek-V3.1) \| [ModelScope](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3.1) |
</div>
## Chat Template
The details of our chat template is described in `tokenizer_config.json` and `assets/chat_template.jinja`. Here is a brief description.
### Non-Thinking
#### First-Turn
Prefix:
`<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>`
With the given prefix, DeepSeek V3.1 generates responses to queries in non-thinking mode. Unlike DeepSeek V3, it introduces an additional token `</think>`.
#### Multi-Turn
Context:
`<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>`
Prefix:
`<|User|>{query}<|Assistant|></think>`
By concatenating the context and the prefix, we obtain the correct prompt for the query.
### Thinking
#### First-Turn
Prefix:
`<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|><think>`
The prefix of thinking mode is similar to DeepSeek-R1.
#### Multi-Turn
Context:
`<|begin▁of▁sentence|>{system prompt}<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>...<|User|>{query}<|Assistant|></think>{response}<|end▁of▁sentence|>`
Prefix:
`<|User|>{query}<|Assistant|><think>`
The multi-turn template is the same with non-thinking multi-turn chat template. It means the thinking token in the last turn will be dropped but the `</think>` is retained in every turn of context.
### ToolCall
Toolcall is supported in non-thinking mode. The format is:
`<|begin▁of▁sentence|>{system prompt}{tool_description}<|User|>{query}<|Assistant|></think>` where the tool_description is
```
## Tools
You have access to the following tools:
### {tool_name1}
Description: {description}
Parameters: {json.dumps(parameters)}
IMPORTANT: ALWAYS adhere to this exact format for tool use:
<|tool▁calls▁begin|><|tool▁call▁begin|>tool_call_name<|tool▁sep|>tool_call_arguments<|tool▁call▁end|>{{additional_tool_calls}}<|tool▁calls▁end|>
Where:
- `tool_call_name` must be an exact match to one of the available tools
- `tool_call_arguments` must be valid JSON that strictly follows the tool's Parameters Schema
- For multiple tool calls, chain them directly without separators or spaces
```
### Code-Agent
We support various code agent frameworks. Please refer to the above toolcall format to create your own code agents. An example is shown in `assets/code_agent_trajectory.html`.
### Search-Agent
We design a specific format for searching toolcall in thinking mode, to support search agent.
For complex questions that require accessing external or up-to-date information, DeepSeek-V3.1 can leverage a user-provided search tool through a multi-turn tool-calling process.
Please refer to the `assets/search_tool_trajectory.html` and `assets/search_python_tool_trajectory.html` for the detailed template.
## Evaluation
| Category | Benchmark (Metric) | DeepSeek V3.1-NonThinking | DeepSeek V3 0324 | DeepSeek V3.1-Thinking | DeepSeek R1 0528
|----------|----------------------------------|-----------------|---|---|---|
| General |
| | MMLU-Redux (EM) | 91.8 | 90.5 | 93.7 | 93.4
| | MMLU-Pro (EM) | 83.7 | 81.2 | 84.8 | 85.0
| | GPQA-Diamond (Pass@1) | 74.9 | 68.4 | 80.1 | 81.0
| | Humanity's Last Exam (Pass@1) | - | - | 15.9 | 17.7
|Search Agent|
| | BrowseComp | - | - | 30.0 | 8.9
| | BrowseComp_zh | - | - | 49.2 | 35.7
| | Humanity's Last Exam (Python + Search) |- | - | 29.8 | 24.8
| | SimpleQA | - | - | 93.4 | 92.3
| Code |
| | LiveCodeBench (2408-2505) (Pass@1) | 56.4 | 43.0 | 74.8 | 73.3
| | Codeforces-Div1 (Rating) | - | - | 2091 | 1930
| | Aider-Polyglot (Acc.) | 68.4 | 55.1 | 76.3 | 71.6
| Code Agent|
| | SWE Verified (Agent mode) | 66.0 | 45.4 | - | 44.6
| | SWE-bench Multilingual (Agent mode) | 54.5 | 29.3 | - | 30.5
| | Terminal-bench (Terminus 1 framework) | 31.3 | 13.3 | - | 5.7
| Math |
| | AIME 2024 (Pass@1) | 66.3 | 59.4 | 93.1 | 91.4
| | AIME 2025 (Pass@1) | 49.8 | 51.3 | 88.4 | 87.5
| | HMMT 2025 (Pass@1) | 33.5 | 29.2 | 84.2 | 79.4 |
Note:
- Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Seach agent results of R1-0528 are evaluated with a pre-defined workflow.
- SWE-bench is evaluated with our internal code agent framework.
- HLE is evaluated with the text-only subset.
### Usage Example
```python
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1")
messages = [
{"role": "system", "content": "You are a helpful assistant"},
{"role": "user", "content": "Who are you?"},
{"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
{"role": "user", "content": "1+1=?"}
]
tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
# '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|><think>'
tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_generation_prompt=True)
# '<|begin▁of▁sentence|>You are a helpful assistant<|User|>Who are you?<|Assistant|></think>I am DeepSeek<|end▁of▁sentence|><|User|>1+1=?<|Assistant|></think>'
```
## How to Run Locally
The model structure of DeepSeek-V3.1 is the same as DeepSeek-V3. Please visit [DeepSeek-V3](https://github.com/deepseek-ai/DeepSeek-V3) repo for more information about running this model locally.
## License
This repository and the model weights are licensed under the [MIT License](LICENSE).
## Citation
```
@misc{deepseekai2024deepseekv3technicalreport,
title={DeepSeek-V3 Technical Report},
author={DeepSeek-AI},
year={2024},
eprint={2412.19437},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2412.19437},
}
```
## Contact
If you have any questions, please raise an issue or contact us at [[email protected]]([email protected]). |