Thinking with Generated Images

thinking-with-generated-images

We introduce Thinking with Generated Images, where we enable a single LMM (Large Multimodal Model) to spontaneously generate and reason with intermediate visual thoughts via a native long-multimodal thought process.

thinking-with-generated-images

This model supports vision generation with intermediate visual subgoals.

thinking-with-generated-images

Please refer to our github repo for more information!

Downloads last month
17
Safetensors
Model size
7.08B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support