Support for quantized cache

by dragstoll - opened May 23, 2024

May 23, 2024

Hi
Is it possible to use quantized cache with this model?
It tried to use it with KV Cache Quantization:
cache_implementation="quantized",
cache_config={"nbits": 4, "backend": "quanto"},

But getting an error: This model does not support the quantized cache. If you want your model to support quantized cache, please open an issue.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment