Vision functionality unavailable.

#3
by Glushiator - opened

Just a question, why is vision functionality not available in the uncensored version, when the base Gemma 3 model does support it?

As far as I can see, it is available. Why do you think it is not?

I tried to load an image using ollama CLI by pasting full image's path into the prompt and it failed.
Tried exactly the same method with a regular Gemma3 27B QAT and it slurped the image without a hitch and spat the description cheerfully.
Maybe I have the wrong model...

hf.co/mradermacher/gemma-3-27b-it-uncensored-GGUF:Q8_0

Could that be such a basic mistake??

What's the difference between:

hf.co/mradermacher/gemma-3-27b-it-uncensored-GGUF:Q8_0
and
hf.co/mradermacher/Nidum-gemma-3-27B-it-Uncensored-GGUF:Q8_0

Forgive me for I am n00b.

I tried to process an image using

hf.co/mradermacher/Nidum-gemma-3-27B-it-Uncensored-GGUF:Q8_0

and got an error:

Error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input

when trying out gemma3:4b-it-qat

it spits out the description flawlessly

I don't know how ollama works, other than it causes endless trouble. Specifically, I don't know what those command line arguments do, exactly.

What you need to do, in general, is get a quant file such as the Q8_0, and an mmproj file. The mmproj file is in the static repository (linked from the model card). And then somehow make ollama use those with some kind of configuration. Then it should work.

You can get a full list of files here: https://hf.tst.eu/model#Nidum-Gemma-3-27B-it-Uncensored-i1-GGUF

The difference between the gemma... and Nidum-gemma... is that they come from different repositories. They might be identical, or different. You'd have to visit the original model to see, but most likely, the difference is that different persons uncensored the model, with probably different results.

What do you use to run - this or other - models? If not ollama, then what?
I am not emotionally attached to ollama, sure it's easy, but if I get more out of something else, I will swap ollama in a heartbeat.

What do you use to run - this or other - models? If not ollama, then what?
I am not emotionally attached to ollama, sure it's easy, but if I get more out of something else, I will swap ollama in a heartbeat.

I recommend you use llama.cpp directly like most advanced users. oolama is just a 3rd application that internally uses llama.cpp and a massive pain especially when it comes to vision models. I see no reason why anyone would want to use a 3rd party application just to indirectly use llama.cpp instead of using it directly. Doing so just increases complexity, adds a ton of bugs and causes an unnecessary delay to get the latest features and many features are even missing entirely such as RPC to combine RAM/GPUs of multiple PCs. With llama.cpp server they offer a nice and intuitive UI and a much better user experiance.

Sign up or log in to comment