Thinking with Generated Images

We introduce Thinking with Generated Images, where we enable a single LMM (Large Multimodal Model) to spontaneously generate and reason with intermediate visual thoughts via a native long-multimodal thought process.

thinking-with-generated-images

This model supports vision generation with intermediate visual subgoals.

thinking-with-generated-images

Please refer to our github repo for more information!

Downloads last month: 24

Safetensors

Model size

7.08B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support