Qwen
/

Qwen3-8B

@@ -101,13 +101,11 @@ For deployment, you can use `sglang>=0.4.6.post1` or `vllm>=0.8.4` or  to create
 For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
 ## Switching Between Thinking and Non-Thinking Mode
 > [!TIP]
-> The `enable_thinking` switch is also available in APIs created by vLLM and SGLang.
-> Please refer to our documentation for [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) and [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) users.
 ### `enable_thinking=True`
@@ -127,6 +125,7 @@ In this mode, the model will generate think content wrapped in a `<think>...</th
 > [!NOTE]
 > For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
 ### `enable_thinking=False`
 We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.

 For local use, applications such as llama.cpp, Ollama, LMStudio, and MLX-LM have also supported Qwen3.
 ## Switching Between Thinking and Non-Thinking Mode
 > [!TIP]
+> The `enable_thinking` switch is also available in APIs created by SGLang and vLLM.
+> Please refer to our documentation for [SGLang](https://qwen.readthedocs.io/en/latest/deployment/sglang.html#thinking-non-thinking-modes) and [vLLM](https://qwen.readthedocs.io/en/latest/deployment/vllm.html#thinking-non-thinking-modes) users.
 ### `enable_thinking=True`
 > [!NOTE]
 > For thinking mode, use `Temperature=0.6`, `TopP=0.95`, `TopK=20`, and `MinP=0` (the default setting in `generation_config.json`). **DO NOT use greedy decoding**, as it can lead to performance degradation and endless repetitions. For more detailed guidance, please refer to the [Best Practices](#best-practices) section.
 ### `enable_thinking=False`
 We provide a hard switch to strictly disable the model's thinking behavior, aligning its functionality with the previous Qwen2.5-Instruct models. This mode is particularly useful in scenarios where disabling thinking is essential for enhancing efficiency.