Surfer-H Meets Holo1: Cost-Efficient Web Agent Powered by Open Weights Paper • 2506.02865 • Published 3 days ago • 27
SeedVR2: One-Step Video Restoration via Diffusion Adversarial Post-Training Paper • 2506.05301 • Published 1 day ago • 39
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos Paper • 2506.05349 • Published 1 day ago • 21
StreamBP: Memory-Efficient Exact Backpropagation for Long Sequence Training of LLMs Paper • 2506.03077 • Published 3 days ago • 15
FlexPainter: Flexible and Multi-View Consistent Texture Generation Paper • 2506.02620 • Published 4 days ago • 9
SparseMM: Head Sparsity Emerges from Visual Concept Responses in MLLMs Paper • 2506.05344 • Published 1 day ago • 14
Micro-Act: Mitigate Knowledge Conflict in Question Answering via Actionable Self-Reasoning Paper • 2506.05278 • Published 1 day ago • 3
Unfolding Spatial Cognition: Evaluating Multimodal Models on Visual Simulations Paper • 2506.04633 • Published 2 days ago • 16
SkyReels-Audio: Omni Audio-Conditioned Talking Portraits in Video Diffusion Transformers Paper • 2506.00830 • Published 6 days ago • 3
MINT-CoT: Enabling Interleaved Visual Tokens in Mathematical Chain-of-Thought Reasoning Paper • 2506.05331 • Published 1 day ago • 12
ComfyUI-Copilot: An Intelligent Assistant for Automated Workflow Development Paper • 2506.05010 • Published 1 day ago • 38
VideoREPA: Learning Physics for Video Generation through Relational Alignment with Foundation Models Paper • 2505.23656 • Published 8 days ago • 23
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs Paper • 2506.05328 • Published 1 day ago • 19
Contextual Integrity in LLMs via Reasoning and Reinforcement Learning Paper • 2506.04245 • Published 8 days ago • 4
MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale Paper • 2506.04405 • Published 2 days ago • 4
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models Paper • 2506.05176 • Published 1 day ago • 24