About the non-thinking mode
#14
by
volcanos
- opened
Great work!
I've also been doing this recently, training a model to think and not think at the same time, but when I inserted a certain proportion of <think>\n\n<think> data, I found that the model performance dropped seriously.
Is qwen's approach to directly insert <think>\n\n<think> after the user question during inference? Is it good to use only sft training, or is it necessary to use the last step RL.
And I want to know, how to let the model follow the thinking budget? I can not find the method at the blog.
Shouldn't it be <think>\n\n</think>
?
Edit: The full assistant start string should be <|im_start|>assistant\n<think>\n\n</think>\n\n
as per Qwen3 Github Issue #1286