Submitted by xhyandwyy 41 Mobile-Agent-v3: Foundamental Agents for GUI Automation · 15 authors 4.75k 3
Submitted by Kevin355 28 LiveMCP-101: Stress Testing and Diagnosing MCP-enabled Agents on Challenging Queries · 14 authors 3
Submitted by haoningwu 12 SceneGen: Single-Image 3D Scene Generation in One Feedforward Pass · 4 authors 24 2
Submitted by cai-qi 7 Visual Autoregressive Modeling for Instruction-Guided Image Editing · 8 authors 11 2
Submitted by taesiri 7 ATLAS: Decoupling Skeletal and Shape Parameters for Expressive Parametric Human Modeling · 10 authors 2
Submitted by universea 7 aiXiv: A Next-Generation Open Access Ecosystem for Scientific Discovery Generated by AI Scientists · 23 authors 2
Submitted by taesiri 4 "Does the cafe entrance look accessible? Where is the door?" Towards Geospatial AI Agents for Visual Inquiries · 10 authors 2
Submitted by thewhole 3 Snap-Snap: Taking Two Images to Reconstruct 3D Human Gaussians in Milliseconds · 9 authors 2
Submitted by taesiri 2 When and What: Diffusion-Grounded VideoLLM with Entity Aware Segmentation for Long Video Understanding · 3 authors 2
Submitted by YirongSun 2 LLaSO: A Foundational Framework for Reproducible Research in Large Language and Speech Model · 8 authors 12 2
Submitted by amazingj 2 Fin-PRM: A Domain-Specialized Process Reward Model for Financial Reasoning in Large Language Models · 7 authors 2