DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Paper • 2506.03123 • Published 3 days ago • 14
Video World Models with Long-term Spatial Memory Paper • 2506.05284 • Published about 23 hours ago • 30
Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes Paper • 2506.00227 • Published 7 days ago • 9
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published 5 days ago • 14
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics Paper • 2506.00070 • Published 8 days ago • 24
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces Paper • 2506.00123 • Published 7 days ago • 31
Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published 4 days ago • 23
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models Paper • 2506.03135 • Published 3 days ago • 33
UniWorld: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation Paper • 2506.03147 • Published 3 days ago • 55
EPiC: Efficient Video Camera Control Learning with Precise Anchor-Video Guidance Paper • 2505.21876 • Published 10 days ago • 9
FastTD3: Simple, Fast, and Capable Reinforcement Learning for Humanoid Control Paper • 2505.22642 • Published 9 days ago • 3
Styl3R: Instant 3D Stylized Reconstruction for Arbitrary Scenes and Styles Paper • 2505.21060 • Published 10 days ago • 4
Chain-of-Zoom: Extreme Super-Resolution via Scale Autoregression and Preference Alignment Paper • 2505.18600 • Published 13 days ago • 44
Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO Paper • 2505.21457 • Published 10 days ago • 14
Cosmos-Reason1 Collection Multimodal world understanding through reasoning • 5 items • Updated about 1 hour ago • 26
Running on Zero 88 88 TIGER Audio Extractor ✂ Extraction & Reconstruction for Efficient Speech Separation