google/gemma-3-27b-it · Using google/gemma-3-4b-it as asssitant model for speculative decoding does not work

Hi @sayambhu ,

You have accurately identified the warning and the error you are encountering. This is a clear indication that your code is trying to interact with the past_key_values in a way that doesn't match its actual type when a HybridCache is being used.

Change it to simply call model.generate(). The model.generate() method intelligently handles the past_key_values object, whether it's a HybridCache, DynamicCache, or StaticCache. It knows the internal structure of these cache objects and manages them correctly across decoding steps.

Please find the attached code screenshot for your reference.

If you required any further assistance please feel free to reach out to me.

Thanks.