MicroVQA: A Multimodal Reasoning Benchmark for Microscopy-Based Scientific Research Paper ā¢ 2503.13399 ā¢ Published 7 days ago ā¢ 20
BlobCtrl: A Unified and Flexible Framework for Element-level Image Generation and Editing Paper ā¢ 2503.13434 ā¢ Published 7 days ago ā¢ 24
Edit Transfer: Learning Image Editing via Vision In-Context Relations Paper ā¢ 2503.13327 ā¢ Published 7 days ago ā¢ 24
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization Paper ā¢ 2503.12937 ā¢ Published 7 days ago ā¢ 26
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey Paper ā¢ 2503.12605 ā¢ Published 8 days ago ā¢ 27
Personalize Anything for Free with Diffusion Transformer Paper ā¢ 2503.12590 ā¢ Published 8 days ago ā¢ 41
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? Paper ā¢ 2503.12349 ā¢ Published 8 days ago ā¢ 38
DreamRenderer: Taming Multi-Instance Attribute Control in Large-Scale Text-to-Image Models Paper ā¢ 2503.12885 ā¢ Published 7 days ago ā¢ 41
Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills Paper ā¢ 2503.12533 ā¢ Published 8 days ago ā¢ 60
DropletVideo: A Dataset and Approach to Explore Integral Spatio-Temporal Consistent Video Generation Paper ā¢ 2503.06053 ā¢ Published 16 days ago ā¢ 84
Frac-Connections: Fractional Extension of Hyper-Connections Paper ā¢ 2503.14125 ā¢ Published 6 days ago ā¢ 19
AudioX: Diffusion Transformer for Anything-to-Audio Generation Paper ā¢ 2503.10522 ā¢ Published 11 days ago ā¢ 18
Infinite Mobility: Scalable High-Fidelity Synthesis of Articulated Objects via Procedural Generation Paper ā¢ 2503.13424 ā¢ Published 7 days ago ā¢ 25
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper ā¢ 2503.12329 ā¢ Published 8 days ago ā¢ 23
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper ā¢ 2503.12797 ā¢ Published 7 days ago ā¢ 28
Creation-MMBench: Assessing Context-Aware Creative Intelligence in MLLM Paper ā¢ 2503.14478 ā¢ Published 6 days ago ā¢ 41
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper ā¢ 2503.14476 ā¢ Published 6 days ago ā¢ 98
RWKV-7 "Goose" with Expressive Dynamic State Evolution Paper ā¢ 2503.14456 ā¢ Published 6 days ago ā¢ 127