if load image or video the response are unreadable word and charters

#1
by Ali4ai - opened

Hi , 😊
I download below files to my pc
model : MiMo-VL-7B-RL-2508.Q4_K_M.gguf
projector : MiMo-VL-7B-RL-2508.mmproj-f16.gguf

then run last version of llama.cpp : 6139 with below command
.\llama\llama-server -m .\Models\Xiaomi\MiMo-VL-7B-RL-2508.Q4_K_M.gguf --mmproj .\Models\Xiaomi\MiMo-VL-7B-RL-2508.mmproj-f16.gguf

response are ok for text based query and thinking in good , but πŸ€·β€β™‚οΈ
if load image or video the response are unreadable word and charters like in screen shoot below

image.png

I got the exact same thing using the F16 version of the GGUF, only my output was nothing but the letter "G" repeating.

SOLVED: Go grab unsloth's mmproj-F32.gguf from here. https://huggingface.co/unsloth/MiMo-VL-7B-RL-GGUF/ and it will work just fine.

Sign up or log in to comment