Please share experiences at various quants.

by BingoBird - opened 4 days ago

4 days ago

Hi folks, how are your experiences at various quants?
Is the vision part more sensitive to quantization than a regular LLM?

nicoboss

4 days ago

It is which is why we only provide F16 and Q8_0 for the vision stack as mmproj files. However this model currently lacks the mmproj files required for the vision stack as llama.cpp support for it is not merged yet as discussed in https://huggingface.co/mradermacher/Kimi-VL-A3B-Thinking-2506-GGUF/discussions/1. The mmproj file for vision will be provided as soon vision support for this architecture is merged into mainline llama.cpp

nicoboss

4 days ago

Actually I just realized that it did get merged yet. Expect the MMPROJ files containing the vision stack to appear as soon @mradermacher upgrades to the latest version of our llama.cpp fork which usual happens within a day or so of me notifying him.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment