Update README.md
Browse files
README.md
CHANGED
@@ -13,6 +13,44 @@ library_name: transformers
|
|
13 |
|
14 |
Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
|
15 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
## Inference Code
|
17 |
|
18 |
Make Sure Using `transforemrs>=4.51.3`.
|
|
|
13 |
|
14 |
Based on our latest technological advancements, we have trained a `GLM-4-0414` series model. During pretraining, we incorporated more code-related and reasoning-related data. In the alignment phase, we optimized the model specifically for agent capabilities. As a result, the model's performance in agent tasks such as tool use, web search, and coding has been significantly improved.
|
15 |
|
16 |
+
## Model Usage Guidelines
|
17 |
+
### I. Sampling Parameters
|
18 |
+
|
19 |
+
| Parameter | Recommended Value | Description |
|
20 |
+
| ------------ | ----------------- | -------------------------------------------- |
|
21 |
+
| temperature | **0.6** | Balances creativity and stability |
|
22 |
+
| top_p | **0.95** | Cumulative probability threshold for sampling|
|
23 |
+
| top_k | **20–40** | Filters out rare tokens while maintaining diversity |
|
24 |
+
| max_new_tokens | **30000** | Leaves enough tokens for thinking |
|
25 |
+
|
26 |
+
### II. Enforced Thinking
|
27 |
+
|
28 |
+
- Add \<think\>\n to the **first line**: Ensures the model thinks before responding
|
29 |
+
- When using `chat_template.jinja`, the prompt is automatically injected to enforce this behavior
|
30 |
+
|
31 |
+
|
32 |
+
### III. Dialogue History Trimming
|
33 |
+
|
34 |
+
- Retain only the **final user-visible reply**.
|
35 |
+
Hidden thinking content should **not** be saved to history to reduce interference—this is already implemented in `chat_template.jinja`
|
36 |
+
|
37 |
+
|
38 |
+
### IV. Handling Long Contexts (YaRN)
|
39 |
+
|
40 |
+
- When input length exceeds **8,192 tokens**, consider enabling YaRN (Rope Scaling)
|
41 |
+
|
42 |
+
- In supported frameworks, add the following snippet to `config.json`:
|
43 |
+
|
44 |
+
```json
|
45 |
+
"rope_scaling": {
|
46 |
+
"type": "yarn",
|
47 |
+
"factor": 4.0,
|
48 |
+
"original_max_position_embeddings": 32768
|
49 |
+
}
|
50 |
+
```
|
51 |
+
|
52 |
+
- **Static YaRN** applies uniformly to all text. It may slightly degrade performance on short texts, so enable as needed.
|
53 |
+
|
54 |
## Inference Code
|
55 |
|
56 |
Make Sure Using `transforemrs>=4.51.3`.
|