Is it possible to disable thinking?

#11

by SlavikF - opened 17 days ago

Discussion

SlavikF

17 days ago

I tried to use

Tried to start with /no_think

It always think...

ubergarm

Owner 16 days ago

Is that a qwen3moe thing only? Not sure it is in R1-0528? Not sure and away from my desk right now, maybe someone else can chime in. If you don't want think I also have https://huggingface.co/ubergarm/DeepSeek-V3-0324-GGUF you could try.

gtkunit

15 days ago

Something like adding to the template should work, but I don't think ik_llama supports jinja templates, so you may have to inject it in your completion request.

gghfez

11 days ago

I managed using the text completions endpoint by pre-filling the response with something like:

<think>
Okay, the user wants me to respond immediately. Here's my response.
</think>

Panchovix

5 days ago

@gghfez Sorry for the noob question but this is before the front end per se? Like if using SillyTavern, you can't do that directly on the UI right?

gghfez

5 days ago

Sorry for the noob question

It's not a noob question, I spend ages messing around with chat templates :)
TLDR for ST at the bottom

this is before the front end per se?
Nope, in the front-end. With completions mode, you have full control over the prompt template.

I tend to test these out with mikupad first (you can just download the .html file and open it locally) since every reasoning model is different (eg. GLMZ, Cogito, etc)
This is my DeepseekNoThink template:

<｜begin▁of▁sentence｜><｜User｜>{{input}}<｜Assistant｜><think>
Okay, the user wants me to respond immediately, no need to think anymore.
</think>

For example

<｜begin▁of▁sentence｜><｜User｜>Is zero a negative or positive number?<｜Assistant｜><think>
Okay, the user wants me to respond immediately, no need to think anymore.
</think>

Generates the response immediately:

TLDR for ST

For SillyTavern, I tend to put it in the "Assistant Message Sequences" so it doesn't have "thought for 0 seconds" in the chat window.

<｜Assistant｜><think>
Okay, the user wants me to respond immediately, no need to think anymore.
</think>

(That will just look like a non-reasoning model when you use it)

But you can also put it in the "Start reply with" section in Reasoning Formatting (on the right of the UI)

I tend to do this if I want to actually use reasoning, steer the direction it takes eg,

<think>
Okay, time to plan my reply, ensuring I don't use any em-dashes or asterisks

ubergarm

Owner 5 days ago

•

edited 5 days ago

I went back and re-read the DeepSeek model card and notes but I don't think it was trained with an official /no_think prompt to disable thinking. I believe the official approach would be to use DeepSeek-V3-0324 for no thinking and use DeepSeek-R1-0528 for thinking.

But as I understand it you have couple ways to attempt to abuse the model to short-circuit thinking:

1. Completions Endpoint Chat Template

Assuming you are working low-level and tokenizing the strings yourself and feeding directly to the model or via the llama-server completions endpoint (not chat/completions). This is a hack by sending a partial assistant response without closing it with the expected <｜end▁of▁sentence｜>

system_message=""
user_message_1="Write a complex python app to find all perfect numbers. Be brief and don't think too much."

# normaly you would do the following and send a completely tokenized response
<｜begin▁of▁sentence｜>{system_message}<｜User｜>{user_message_1}<｜Assistant｜>

# but now we abuse the format by sending a partially tokenized string without terminating it properly and hope the LLM continues as if thinking were done
assistant_message_1="<think>Okay, I'll just write the code immediately.</think> Certainly! Here is the code: "

<｜begin▁of▁sentence｜>{system_message}<｜User｜>{user_message_1}<｜Assistant｜>{assistant_message_1}

2. Chat Thread Injection

You could also try to inject text to suppress thinking inthe chat thread, however it will be tokenized in such a way that it doesn't match the expected format. It is easier as you can just type it into any GUI like open-webui or ST etc:

        chat_thread = [
            {"role": "system", "content": "You are a helpful AI."},
            {"role": "user", "content": "Write a complex python app to find all perfect numbers. Be brief and don't think too much."},
            {"role": "assistant", "content": "<think>Okay, I'll just write the code immedeately.</think> Certainly! Here is the code: "},
        ]

Anyway, you can probably try a few combinations like this to try to hack it to reduce thinking. There may be some system prompts that reduce or influence thinking too, not really sure how it was trained.

As a comparison, it is interesting that the new https://huggingface.co/MiniMaxAI/MiniMax-M1-40k/ has both a 40k "thinking budget" and 80k version as well. I assume they were trained with different length examples and that is how they "control" the thinking budget. I assume Qwen was trained with examples that had /no_think which didn't use thinking traces. So its not really a digital thinking on/off knob but more a way to influence the output given specific training examples.

gghfez

5 days ago

P.S. Make sure you only put that in one place (either the "Assistant Message Sequences" or the Start reply with), not both.

Panchovix

5 days ago

@ubergarm I want to test mostly as to be like something as "DeepSeek V3 0528". I know they prob have it but it's not released haha. Many thanks for the info as well, I also tried /no_think but no dice, as that seems to be a Qwen3 only thing.

@gghfez Many thanks for all the help! Gonna try when I get home after work.

gghfez

5 days ago

@Panchovix
Np. And yeah /no_think is a qwen thing they specifically trained it on. Cogito has a an equivalent prompt as well.

You can also get some of the newer models like command-a (and DeepSeek-V3 0425) to think a bit by enabling reasoning then adding <think> Okay, I need to respond as {{char}} to the "Start reply with". I guess these models are aware of these thinking tags from seeing QwQ and R1 output in their training data.

@ubergram
I'll try out that chat_thread thing later. I've been using a crude fastapi proxy for OpenWebUI (it doesn't support text completions) to wrap chat completions -> text completions.

Panchovix

4 days ago

Really late, but just wanted both of your methods @gghfez works, so many thanks!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment