Ctrl-Crash: Controllable Diffusion for Realistic Car Crashes Paper • 2506.00227 • Published 6 days ago • 9
Robustness in Both Domains: CLIP Needs a Robust Text Encoder Paper • 2506.03355 • Published 2 days ago • 6
Voyager: Long-Range and World-Consistent Video Diffusion for Explorable 3D Scene Generation Paper • 2506.04225 • Published 1 day ago • 19
Advancing Multimodal Reasoning: From Optimized Cold Start to Staged Reinforcement Learning Paper • 2506.04207 • Published 1 day ago • 41
SVGenius: Benchmarking LLMs in SVG Understanding, Editing and Generation Paper • 2506.03139 • Published 2 days ago • 13
LayerFlow: A Unified Model for Layer-aware Video Generation Paper • 2506.04228 • Published 1 day ago • 13
RefEdit: A Benchmark and Method for Improving Instruction-based Image Editing Model on Referring Expressions Paper • 2506.03448 • Published 2 days ago • 4
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models Paper • 2505.24864 • Published 6 days ago • 112
Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published 6 days ago • 162
view article Article Holo1: New family of GUI automation VLMs powering GUI agent Surfer-H By Hcompany and 1 other • 3 days ago • 59
FlowMo: Variance-Based Flow Guidance for Coherent Motion in Video Generation Paper • 2506.01144 • Published 4 days ago • 14
ReasonFLux-Coder Collection Coding LLMs excel at both writing code and generating unit tests. • 9 items • Updated 11 days ago • 6
GUI-Actor: Coordinate-Free Visual Grounding for GUI Agents Paper • 2506.03143 • Published 2 days ago • 34
SynthRL: Scaling Visual Reasoning with Verifiable Data Synthesis Paper • 2506.02096 • Published 3 days ago • 49
VS-Bench: Evaluating VLMs for Strategic Reasoning and Decision-Making in Multi-Agent Environments Paper • 2506.02387 • Published 3 days ago • 55
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs Paper • 2506.01674 • Published 4 days ago • 23