More advanced parameters exist for the [generate] method, which gives you even further control over the [generate] method's behavior. For the complete list of the available parameters, refer to the API documentation. Speculative Decoding Speculative decoding (also known as assisted decoding) is a modification of the decoding strategies above, that uses an assistant model (ideally a much smaller one) with the same tokenizer, to generate a few candidate tokens.