Assisting in Writing Wikipedia-like Articles From Scratch with Large Language Models Paper • 2402.14207 • Published Feb 22, 2024 • 10
VCRL: Variance-based Curriculum Reinforcement Learning for Large Language Models Paper • 2509.19803 • Published 22 days ago • 116
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published Mar 3 • 29
You Have Thirteen Hours in Which to Solve the Labyrinth: Enhancing AI Game Masters with Function Calling Paper • 2409.06949 • Published Sep 11, 2024 • 1
Generative Agents: Interactive Simulacra of Human Behavior Paper • 2304.03442 • Published Apr 7, 2023 • 13
AppWorld: A Controllable World of Apps and People for Benchmarking Interactive Coding Agents Paper • 2407.18901 • Published Jul 26, 2024 • 35
WebGames: Challenging General-Purpose Web-Browsing AI Agents Paper • 2502.18356 • Published Feb 25 • 14
SPIN-Bench: How Well Do LLMs Plan Strategically and Reason Socially? Paper • 2503.12349 • Published Mar 16 • 44
Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens Paper • 2508.01191 • Published Aug 2 • 236
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 101
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 50