IQ3_M are not working properly
It seems that IQ3_M does not work correctly, Q3_M for example does. I downloaded IQ3_M through LM Studio and it passed the file verification - it still does not work correctly.
Perhaps it is the fault of my configuration - vulkan with two gpus... I do not know.
I'm using koboldcpp-1.86.2
What do you mean by "not working properly?" Do you have more details?
Sorry... When I try to generate a response when I use Vulkan and the model is divided into two cards, the model is unable to correctly generate even a single sentence, it hangs on one letter and repeats it endlessly. With Q3_K_M this problem no longer occurs, the model appears to work correctly and generates reasonable answers.
Edit: Ok, it works correctly on CPU and ROCM, and even Vulkan works correctly on the nvidia card itself... But on the AMD card it doesn't work on Vulkan, I've even found an error in the drivers, there are threads in lamacpp, so that's probably it.
I guess the topic can be closed, I don't think you'll be able to fix it by modifying the model.
Edit: Ok, it works correctly on CPU and ROCM, and even Vulkan works correctly on the nvidia card itself... But on the AMD card it doesn't work on Vulkan, I've even found an error in the drivers, there are threads in lamacpp, so that's probably it.
ah okay that makes sense.. unfortunate! feel free to open a bug on llama.cpp or comment on an existing one with your findings, may be valuable to others! Glad you found the reason and sorry I can't help D: