Thinking with Generated Images
We introduce Thinking with Generated Images, where we enable a single LMM (Large Multimodal Model) to spontaneously generate and reason with intermediate visual thoughts via a native long-multimodal thought process.
This model supports vision generation with intermediate visual subgoals.
Please refer to our github repo for more information!
- Downloads last month
- 17
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support