MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published 9 days ago • 15
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published 7 days ago • 54
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 17 days ago • 100
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published 12 days ago • 103
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published 30 days ago • 81
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published 19 days ago • 126
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 106
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published Apr 21 • 22