How to control thinking length?

#24
by lidh15 - opened

I do want to use your fantastic thinking mode, however, it is a little bit long.
how can we limit the thinking under a max token, for example, at most 256 tokens for thinking part?

I think you can only implement it by generating twice. First with limited output length and end token = , then add to truncated outputs and inference againt

I noticed the length of "think" output is not the same if you run it multiple times with the same input and same configuration, sometimes having big variations. Is there a way to at least tell the model to think more so that the thought process is longer?

Sign up or log in to comment