Update README.md
Browse files
README.md
CHANGED
@@ -101,13 +101,11 @@ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.4` or to create
|
|
101 |
|
102 |
For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
|
103 |
|
104 |
-
|
105 |
-
|
106 |
## Switching Between Thinking and Non-Thinking Mode
|
107 |
|
108 |
> [!TIP]
|
109 |
-
> The `enable_thinking` switch is also available in APIs created by
|
110 |
-
> Please refer to our documentation for [
|
111 |
|
112 |
### `enable_thinking=True`
|
113 |
|
@@ -127,6 +125,7 @@ In this mode, the model will generate think content wrapped in a `<think>...</th
|
|
127 |
> [!NOTE]
|
128 |
> For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
|
129 |
|
|
|
130 |
### `enable_thinking=False`
|
131 |
|
132 |
We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
|
|
|
101 |
|
102 |
For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
|
103 |
|
|
|
|
|
104 |
## Switching Between Thinking and Non-Thinking Mode
|
105 |
|
106 |
> [!TIP]
|
107 |
+
> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
|
108 |
+
> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
|
109 |
|
110 |
### `enable_thinking=True`
|
111 |
|
|
|
125 |
> [!NOTE]
|
126 |
> For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
|
127 |
|
128 |
+
|
129 |
### `enable_thinking=False`
|
130 |
|
131 |
We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.
|