Please share experiences at various quants.

#1
by BingoBird - opened

Hi folks, how are your experiences at various quants?
Is the vision part more sensitive to quantization than a regular LLM?

It is which is why we only provide F16 and Q8_0 for the vision stack as mmproj files. However this model currently lacks the mmproj files required for the vision stack as llama.cpp support for it is not merged yet as discussed in https://huggingface.co/mradermacher/Kimi-VL-A3B-Thinking-2506-GGUF/discussions/1. The mmproj file for vision will be provided as soon vision support for this architecture is merged into mainline llama.cpp

Actually I just realized that it did get merged yet. Expect the MMPROJ files containing the vision stack to appear as soon @mradermacher upgrades to the latest version of our llama.cpp fork which usual happens within a day or so of me notifying him.

Sign up or log in to comment