Abstract
Omni is a unified multimodal model trained on diverse data types that enables context unrolling for improved reasoning across heterogeneous modalities.
We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D geometry, and hidden representations. We find that such training enables Context Unrolling, where the model explicitly reasons across multiple modal representations before producing predictions. This process enables the model to aggregate complementary information across heterogeneous modalities, facilitating a more faithful approximation of the shared multimodal knowledge manifold and improving downstream reasoning fidelity. As a result, Omni achieves strong performance on both multimodal generation and understanding benchmarks, while demonstrating advanced multimodal reasoning capabilities, including in-context generation of text, image, video, and 3D geometry.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings (2026)
- Modeling Cross-vision Synergy for Unified Large Vision Model (2026)
- Enhancing Alignment for Unified Multimodal Models via Semantically-Grounded Supervision (2026)
- TorchUMM: A Unified Multimodal Model Codebase for Evaluation, Analysis, and Post-training (2026)
- Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding (2026)
- LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation (2026)
- OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2604.21921 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper