OmniMMI: A Comprehensive Multi-modal Interaction Benchmark in Streaming Video Contexts Paper • 2503.22952 • Published 19 days ago • 18
DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness Paper • 2503.22677 • Published 19 days ago • 6
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published 15 days ago • 78
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper • 2503.19901 • Published 22 days ago • 35
SketchVideo: Sketch-based Video Generation and Editing Paper • 2503.23284 • Published 18 days ago • 22
RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper • 2503.24388 • Published 16 days ago • 29
Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper • 2503.22655 • Published 19 days ago • 37
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper • 2503.24376 • Published 16 days ago • 37