Problem with KV-cache
#4
by
SolidSnacke
- opened
Is it normal that cache_4bit or cache_8bit does not work with this model? More precisely, if you use them, an error is thrown when loading the model. Used oobabooga.
Is it normal that cache_4bit or cache_8bit does not work with this model? More precisely, if you use them, an error is thrown when loading the model. Used oobabooga.
I can't answer if it's supposed to be like that, but I had to do the this to get it to run on koboldcpp (F16 off on kvcache) and llama.cpp (-nvko)