How to disable thinking?
#1
by
kurnevsky
- opened
Original model card mentions enable_thinking
parameter passed to the tokenizer. Any idea what it does and how to emulate it with llama-cpp?
It seems it's encoded in the chat template, so passing custom chat template should work, but also llama-cpp fails to parse it with --jinja
arg, so it would need some simplification.
/no_think
Yes, I've found adding /no_think at the end of my prompt works perfectly
Here is the proper way to do this: https://github.com/ggml-org/llama.cpp/issues/13178#issuecomment-2839416968
kurnevsky
changed discussion status to
closed