Qwen2.5-VL-7B-Instruct-mmproj-BF16.gguf is this a must ?
is Qwen2.5-VL-7B-Instruct-mmproj-BF16.gguf
a must for run quen image edit through comfyui ?
or the main text encoder for qwen image is okay ?
You need both... the LLM part and the vision part (mmproj)
Not sure, but is required... I just ask to ChatGPT and give me this answer
Short summary
Where is mmproj used?
In the semantic pathway: it takes Qwen2.5-VL’s visual features and projects them into the language embedding space, conditioning the generator (MMDiT) for editing and other image-conditioned tasks.Why is it needed?
To ensure the model understands the content of the input image and keeps its identity/structure while making edits. The VAE simultaneously preserves the appearance details.
For more detail response:
https://chatgpt.com/share/68a8269a-4ee8-8004-9415-cdeab13deb70
Ok, seems my initial idea was wrong, qwen image edit does indeed use the mmproj file because the input image is passed as input to the vision llm (which is acting as the text encoder). I initially thought it worked like hidream, cogview4 and lumina image 2.0, which use text-only llms as the text encoder, a vision llm was used simply because of better spatial understanding of the text layers, but I was wrong.
And since people were only asking about the mmproj now, I guess that means only the edit version specifically requires the mmproj, and the base t2i model does not require it.