davidlvxin commited on
Commit
5f6580e
·
verified ·
1 Parent(s): 8110142

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -0
README.md CHANGED
@@ -13,6 +13,44 @@ library_name: transformers
13
 
14
  Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ## Inference Code
17
 
18
  Make Sure Using `transforemrs>=4.51.3`.
 
13
 
14
  Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
15
 
16
+ ## Model Usage Guidelines
17
+ ### I. Sampling Parameters
18
+
19
+ | Parameter | Recommended Value | Description |
20
+ | ------------ | ----------------- | -------------------------------------------- |
21
+ | temperature | **0.6** | Balances creativity and stability |
22
+ | top_p | **0.95** | Cumulative probability threshold for sampling|
23
+ | top_k | **20–40** | Filters out rare tokens while maintaining diversity |
24
+ | max_new_tokens | **30000** | Leaves enough tokens for thinking |
25
+
26
+ ### II. Enforced Thinking
27
+
28
+ - Add \<think\>\n to the **first line**: Ensures the model thinks before responding
29
+ - When using `chat_template.jinja`, the prompt is automatically injected to enforce this behavior
30
+
31
+
32
+ ### III. Dialogue History Trimming
33
+
34
+ - Retain only the **final user-visible reply**.
35
+ Hidden thinking content should **not** be saved to history to reduce interference—this is already implemented in `chat_template.jinja`
36
+
37
+
38
+ ### IV. Handling Long Contexts (YaRN)
39
+
40
+ - When input length exceeds **8,192 tokens**, consider enabling YaRN (Rope Scaling)
41
+
42
+ - In supported frameworks, add the following snippet to `config.json`:
43
+
44
+ ```json
45
+ "rope_scaling": {
46
+ "type": "yarn",
47
+ "factor": 4.0,
48
+ "original_max_position_embeddings": 32768
49
+ }
50
+ ```
51
+
52
+ - **Static YaRN** applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed.
53
+
54
  ## Inference Code
55
 
56
  Make Sure Using `transforemrs>=4.51.3`.