GGUF
Not-For-All-Audiences
Inference Endpoints
conversational

Problem with KV-cache

#4
by SolidSnacke - opened

Is it normal that cache_4bit or cache_8bit does not work with this model? More precisely, if you use them, an error is thrown when loading the model. Used oobabooga.

Is it normal that cache_4bit or cache_8bit does not work with this model? More precisely, if you use them, an error is thrown when loading the model. Used oobabooga.

I can't answer if it's supposed to be like that, but I had to do the this to get it to run on koboldcpp (F16 off on kvcache) and llama.cpp (-nvko)

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment