Some of the commonly adjusted parameters include: max_new_tokens: the maximum number of tokens to generate. In other words, the size of the output sequence, not including the tokens in the prompt. As an alternative to using the output's length as a stopping criteria, you can choose to stop generation whenever the full generation exceeds some amount of time.