More advanced parameters exist for the
[generate] method, which gives you even further control over the [generate] method's behavior.
For the complete list of the available parameters, refer to the API documentation.
Speculative Decoding
Speculative decoding (also known as assisted decoding) is a modification of the decoding strategies above, that uses an
assistant model (ideally a much smaller one) with the same tokenizer, to generate a few candidate tokens.