About the non-thinking mode

#14
by volcanos - opened

Great work!

I've also been doing this recently, training a model to think and not think at the same time, but when I inserted a certain proportion of <think>\n\n<think> data, I found that the model performance dropped seriously.
Is qwen's approach to directly insert <think>\n\n<think> after the user question during inference? Is it good to use only sft training, or is it necessary to use the last step RL.

And I want to know, how to let the model follow the thinking budget? I can not find the method at the blog.

Shouldn't it be <think>\n\n</think>?

Edit: The full assistant start string should be <|im_start|>assistant\n<think>\n\n</think>\n\n as per Qwen3 Github Issue #1286

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment