How to control thinking length?
#24
by
lidh15
- opened
I do want to use your fantastic thinking mode, however, it is a little bit long.
how can we limit the thinking under a max token, for example, at most 256 tokens for thinking part?
I think you can only implement it by generating twice. First with limited output length and end token = , then add to truncated outputs and inference againt
I noticed the length of "think" output is not the same if you run it multiple times with the same input and same configuration, sometimes having big variations. Is there a way to at least tell the model to think more so that the thought process is longer?