MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents Paper • 2508.13186 • Published 10 days ago • 16
FutureX: An Advanced Live Benchmark for LLM Agents in Future Prediction Paper • 2508.11987 • Published 8 days ago • 56
Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL Paper • 2508.13167 • Published 17 days ago • 102
Efficient Agents: Building Effective Agents While Reducing Cost Paper • 2508.02694 • Published about 1 month ago • 82
AFM-Models Collection The models and training dataset of the paper: Chain-of-Agents: End-to-End Agent Foundation Models via Multi-Agent Distillation and Agentic RL • 12 items • Updated 18 days ago • 13
Seed Diffusion: A Large-Scale Diffusion Language Model with High-Speed Inference Paper • 2508.02193 • Published 20 days ago • 127
Agent KB: Leveraging Cross-Domain Experience for Agentic Problem Solving Paper • 2507.06229 • Published Jul 8 • 73
PersonaFeedback: A Large-scale Human-annotated Benchmark For Personalization Paper • 2506.12915 • Published Jun 15 • 21
Multiverse: Your Language Models Secretly Decide How to Parallelize and Merge Generation Paper • 2506.09991 • Published Jun 11 • 56
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models Paper • 2502.16614 • Published Feb 23 • 27
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 106
OmniEdit: Building Image Editing Generalist Models Through Specialist Supervision Paper • 2411.07199 • Published Nov 11, 2024 • 50
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models Paper • 2411.04905 • Published Nov 7, 2024 • 127
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper • 2410.20424 • Published Oct 27, 2024 • 41