Using google/gemma-3-4b-it as asssitant model for speculative decoding does not work
I get a warning
An assistant model is provided, using a dynamic cache instead of a cache of type='hybrid'.generation_config
default values have been modified to match model-specific defaults: {'cache_implementation': 'hybrid'}. If this is not desired, please set these values explicitly.
and then an error
for idx in range(len(past_key_values)):
TypeError: object of type 'HybridCache' has no len()
How to fix it?
Hi @sayambhu ,
You have accurately identified the warning and the error you are encountering. This is a clear indication that your code is trying to interact with the past_key_values in a way that doesn't match its actual type when a HybridCache is being used.
Change it to simply call model.generate(). The model.generate() method intelligently handles the past_key_values object, whether it's a HybridCache, DynamicCache, or StaticCache. It knows the internal structure of these cache objects and manages them correctly across decoding steps.
Please find the attached code screenshot for your reference.
If you required any further assistance please feel free to reach out to me.
Thanks.