Does the Llama-4-Scout-17B-16E-Instruct Model Support Image Inputs?

#4
by Casy - opened

I'm trying to use the Llama-4-Scout-17B-16E-Instruct model (specifically the GGUF version, e.g., Q4_K_M or IQ3_XXS quantization) with Ollama to analyze images. The documentation suggests it is a multimodal model capable of processing images, but I'm encountering errors like "this model is missing data required for image input" when sending images via the Ollama Python client. Can anyone confirm if this model supports image inputs, and if so, what specific configurations or quantization levels are required to enable this functionality? Are there any known issues or additional setup steps needed to make image processing work with this model in Ollama?

"llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input"
Unsloth AI org

@Casy now if does!

the quantised version support this too?

Im running the latests ollama docker image and today i pushed again this version: hf.co/unsloth/Llama-4-Scout-17B-16E-Instruct-GGUF:Q2_K_XL

But still got this error:

time=2025-05-22T18:03:50.860Z level=INFO source=server.go:809 msg="llm predict error: Failed to create new sequence: failed to process inputs: this model is missing data required for image input"

Sign up or log in to comment