DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Paper • 2506.03123 • Published 3 days ago • 14
Video World Models with Long-term Spatial Memory Paper • 2506.05284 • Published about 23 hours ago • 30
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes Paper • 2506.00227 • Published 7 days ago • 9
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published 5 days ago • 14
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published 8 days ago • 24
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published 7 days ago • 31
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published 4 days ago • 23
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published 3 days ago • 33
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published 3 days ago • 55
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance Paper • 2505.21876 • Published 10 days ago • 9
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Paper • 2505.22642 • Published 9 days ago • 3
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles Paper • 2505.21060 • Published 10 days ago • 4
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment Paper • 2505.18600 • Published 13 days ago • 44
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published 10 days ago • 14
Cosmos-Reason1 Collection Multimodal world understanding through reasoning • 5 items • Updated about 1 hour ago • 26
UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning Paper • 2505.14231 • Published 17 days ago • 51
Emerging Properties in Unified Multimodal Pretraining Paper • 2505.14683 • Published 17 days ago • 129
Hybrid 3D-4D Gaussian Splatting for Fast Dynamic Scene Representation Paper • 2505.13215 • Published 18 days ago • 28
LightLab: Controlling Light Sources in Images with Diffusion Models Paper • 2505.09608 • Published 23 days ago • 31