msr2000 commited on
Commit
6e8885a
·
1 Parent(s): 7674590
Files changed (2) hide show
  1. README.md +1 -1
  2. config.json +2 -1
README.md CHANGED
@@ -118,7 +118,7 @@ Compared to previous versions of DeepSeek-R1, the usage recommendations for Deep
118
  1. System prompt is supported now.
119
  2. It is not required to add "\<think\>\n" at the beginning of the output to force the model into thinking pattern.
120
 
121
- The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B.
122
 
123
  ### System Prompt
124
  In the official DeepSeek web/app, we use the same system prompt with a specific date.
 
118
  1. System prompt is supported now.
119
  2. It is not required to add "\<think\>\n" at the beginning of the output to force the model into thinking pattern.
120
 
121
+ The model architecture of DeepSeek-R1-0528-Qwen3-8B is identical to that of Qwen3-8B, but it shares the same tokenizer configuration as DeepSeek-R1-0528. This model can be run in the same manner as Qwen3-8B, but it is essential to ensure that all configuration files are sourced from our repository rather than the original Qwen3 project.
122
 
123
  ### System Prompt
124
  In the official DeepSeek web/app, we use the same system prompt with a specific date.
config.json CHANGED
@@ -21,7 +21,8 @@
21
  "rope_scaling": {
22
  "rope_type": "yarn",
23
  "factor": 4.0,
24
- "original_max_position_embeddings": 32768
 
25
  },
26
  "rope_theta": 1000000,
27
  "sliding_window": null,
 
21
  "rope_scaling": {
22
  "rope_type": "yarn",
23
  "factor": 4.0,
24
+ "original_max_position_embeddings": 32768,
25
+ "attn_factor": 0.8782488562869419
26
  },
27
  "rope_theta": 1000000,
28
  "sliding_window": null,