Using google/gemma-3-4b-it as asssitant model for speculative decoding does not work

#51
by sayambhu - opened

I get a warning

An assistant model is provided, using a dynamic cache instead of a cache of type='hybrid'.
generation_config default values have been modified to match model-specific defaults: {'cache_implementation': 'hybrid'}. If this is not desired, please set these values explicitly.

and then an error

for idx in range(len(past_key_values)):
TypeError: object of type 'HybridCache' has no len()

How to fix it?

Hi @sayambhu ,

You have accurately identified the warning and the error you are encountering. This is a clear indication that your code is trying to interact with the past_key_values in a way that doesn't match its actual type when a HybridCache is being used.

Change it to simply call model.generate(). The model.generate() method intelligently handles the past_key_values object, whether it's a HybridCache, DynamicCache, or StaticCache. It knows the internal structure of these cache objects and manages them correctly across decoding steps.

Please find the attached code screenshot for your reference.

Screenshot 2025-06-18 at 12.19.53 PM.png

If you required any further assistance please feel free to reach out to me.

Thanks.

Sign up or log in to comment