Beyond Accuracy: Dissecting Mathematical Reasoning for LLMs Under Reinforcement Learning Paper • 2506.04723 • Published Jun 5, 2025 • 1
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild Paper • 2510.14240 • Published Oct 16, 2025 • 11
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Paper • 2510.13913 • Published Oct 15, 2025 • 3
Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows Paper • 2506.03332 • Published Jun 3, 2025 • 2
COSMOS: Predictable and Cost-Effective Adaptation of LLMs Paper • 2505.01449 • Published Apr 30, 2025 • 3
Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models Paper • 2406.14852 • Published Jun 21, 2024