deepseek-ai
/

DeepSeek-R1-0528-Qwen3-8B

Text Generation

text-generation-inference

Model card Files Files and versions Community

msr2000 commited on 8 days ago

Commit

6e8885a

·

1 Parent(s): 7674590

Small fix

Files changed (2) hide show

README.md +1 -1
config.json +2 -1

README.md CHANGED Viewed

@@ -118,7 +118,7 @@ Compared to previous versions of DeepSeek-R1, the usage recommendations for Deep
 1. System prompt is supported now.
 2. It is not required to add "\<think\>\n" at the beginning of the output to force the model into thinking pattern.
-The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B.
 ### System Prompt
 In the official DeepSeek web/app, we use the same system prompt with a specific date.

 1. System prompt is supported now.
 2. It is not required to add "\<think\>\n" at the beginning of the output to force the model into thinking pattern.
+The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B, but it is essential to ensure that all configuration files are sourced from our repository rather than the original Qwen3 project.
 ### System Prompt
 In the official DeepSeek web/app, we use the same system prompt with a specific date.

config.json CHANGED Viewed

@@ -21,7 +21,8 @@
   "rope_scaling": {
     "rope_type": "yarn",
     "factor": 4.0,
-    "original_max_position_embeddings": 32768
   },
   "rope_theta": 1000000,
   "sliding_window": null,

   "rope_scaling": {
     "rope_type": "yarn",
     "factor": 4.0,
+    "original_max_position_embeddings": 32768,
+    "attn_factor": 0.8782488562869419
   },
   "rope_theta": 1000000,
   "sliding_window": null,