Support for quantized cache
#5
by
dragstoll
- opened
Hi
Is it possible to use quantized cache with this model?
It tried to use it with KV Cache Quantization:
cache_implementation="quantized",
cache_config={"nbits": 4, "backend": "quanto"},
But getting an error: This model does not support the quantized cache. If you want your model to support quantized cache, please open an issue.