ReCamMaster: Camera-Controlled Generative Rendering from A Single Video Paper • 2503.11647 • Published Mar 14 • 140
SoFar Collection Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation • 5 items • Updated Feb 24 • 3
SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation Paper • 2502.13143 • Published Feb 18 • 30
Learning Getting-Up Policies for Real-World Humanoid Robots Paper • 2502.12152 • Published Feb 17 • 42
Taming Teacher Forcing for Masked Autoregressive Video Generation Paper • 2501.12389 • Published Jan 21 • 10
PhysGen: Rigid-Body Physics-Grounded Image-to-Video Generation Paper • 2409.18964 • Published Sep 27, 2024 • 27
DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation Paper • 2406.16855 • Published Jun 24, 2024 • 57
DreamLLM Collection [ICLR 2024 Spotlight] DreamLLM: Synergistic Multimodal Comprehension and Creation (https://arxiv.org/abs/2309.11499) • 6 items • Updated Mar 22, 2024 • 3
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding Paper • 2401.09340 • Published Jan 17, 2024 • 22
VideoPoet: A Large Language Model for Zero-Shot Video Generation Paper • 2312.14125 • Published Dec 21, 2023 • 47
Merlin:Empowering Multimodal LLMs with Foresight Minds Paper • 2312.00589 • Published Nov 30, 2023 • 27
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning Paper • 2307.09474 • Published Jul 18, 2023 • 1
DreamLLM: Synergistic Multimodal Comprehension and Creation Paper • 2309.11499 • Published Sep 20, 2023 • 58