shimmyshimmer commited on
Commit
1625b27
·
verified ·
1 Parent(s): 5a94f6f

Update README.md

Browse files

Adding benchmark image

Files changed (1) hide show
  1. README.md +1 -39
README.md CHANGED
@@ -14,44 +14,6 @@ tags:
14
  > [!NOTE]
15
  > For DeepSeek-R1-0528-**Qwen3-8B** GGUFs, [see here](https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF).
16
 
17
- <div>
18
- <p style="margin-bottom: 0; margin-top: 0;">
19
- <strong>Learn how to run DeepSeek-R1-0528 correctly - <a href="https://docs.unsloth.ai/basics/deepseek-r1-0528">Read our Guide</a>.</strong>
20
- </p>
21
- <p style="margin-bottom: 0;">
22
- <em>See <a href="https://huggingface.co/collections/unsloth/deepseek-r1-all-versions-678e1c48f5d2fce87892ace5">our collection</a> for all versions of R1 including GGUF, 4-bit & 16-bit formats.</em>
23
- </p>
24
- <p style="margin-top: 0;margin-bottom: 0;">
25
- <em><a href="https://docs.unsloth.ai/basics/unsloth-dynamic-v2.0-gguf">Unsloth Dynamic 2.0</a> achieves superior accuracy & outperforms other leading quants.</em>
26
- </p>
27
- <div style="display: flex; gap: 5px; align-items: center; ">
28
- <a href="https://github.com/unslothai/unsloth/">
29
- <img src="https://github.com/unslothai/unsloth/raw/main/images/unsloth%20new%20logo.png" width="133">
30
- </a>
31
- <a href="https://discord.gg/unsloth">
32
- <img src="https://github.com/unslothai/unsloth/raw/main/images/Discord%20button.png" width="173">
33
- </a>
34
- <a href="https://docs.unsloth.ai/basics/deepseek-r1-0528-how-to-run-locally">
35
- <img src="https://raw.githubusercontent.com/unslothai/unsloth/refs/heads/main/images/documentation%20green%20button.png" width="143">
36
- </a>
37
- </div>
38
- <h1 style="margin-top: 0rem;">🐋 DeepSeek-R1-0528 Usage Guidelines</h1>
39
- </div>
40
-
41
- - Set the temperature between **0.5–0.7 (0.6 recommended)** to reduce repetition and incoherence.
42
- - Set Top_P value of **0.95 (recommended)**
43
- - R1-0528 uses the same chat template as the original R1 model:
44
- ```
45
- <|begin▁of▁sentence|><|User|>What is 1+1?<|Assistant|>It's 2.<|end▁of▁sentence|><|User|>Explain more!<|Assistant|>
46
- ```
47
- - For llama.cpp / GGUF inference, you should skip the BOS since it’ll auto add it:
48
- ```
49
- <|User|>What is 1+1?<|Assistant|>
50
- ```
51
- - For complete detailed instructions, see our guide: [unsloth.ai/blog/deepseek-r1-0528](https://docs.unsloth.ai/basics/deepseek-r1-0528-how-to-run-locally)
52
-
53
- # DeepSeek-R1-0528 Model Card
54
-
55
  <div align="center">
56
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
57
  </div>
@@ -97,7 +59,7 @@ tags:
97
  The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.
98
 
99
  <p align="center">
100
- <img width="80%" src="figures/benchmark.png">
101
  </p>
102
 
103
  Compared to the previous version, the upgraded model shows significant improvements in handling complex reasoning tasks. For instance, in the AIME 2025 test, the model’s accuracy has increased from 70% in the previous version to 87.5% in the current version. This advancement stems from enhanced thinking depth during the reasoning process: in the AIME test set, the previous model used an average of 12K tokens per question, whereas the new version averages 23K tokens per question.
 
14
  > [!NOTE]
15
  > For DeepSeek-R1-0528-**Qwen3-8B** GGUFs, [see here](https://huggingface.co/unsloth/DeepSeek-R1-0528-Qwen3-8B-GGUF).
16
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
17
  <div align="center">
18
  <img src="https://github.com/deepseek-ai/DeepSeek-V2/blob/main/figures/logo.svg?raw=true" width="60%" alt="DeepSeek-V3" />
19
  </div>
 
59
  The DeepSeek R1 model has undergone a minor version upgrade, with the current version being DeepSeek-R1-0528. In the latest update, DeepSeek R1 has significantly improved its depth of reasoning and inference capabilities by leveraging increased computational resources and introducing algorithmic optimization mechanisms during post-training. The model has demonstrated outstanding performance across various benchmark evaluations, including mathematics, programming, and general logic. Its overall performance is now approaching that of leading models, such as O3 and Gemini 2.5 Pro.
60
 
61
  <p align="center">
62
+ <img width="80%" src="https://huggingface.co/deepseek-ai/DeepSeek-R1-0528/resolve/main/figures/benchmark.png">
63
  </p>
64
 
65
  Compared to the previous version, the upgraded model shows significant improvements in handling complex reasoning tasks. For instance, in the AIME 2025 test, the model’s accuracy has increased from 70% in the previous version to 87.5% in the current version. This advancement stems from enhanced thinking depth during the reasoning process: in the AIME test set, the previous model used an average of 12K tokens per question, whereas the new version averages 23K tokens per question.