arxiv:2511.16043

Agent0: Unleashing Self-Evolving Agents from Zero Data via Tool-Integrated Reasoning

Published on Nov 20

· Submitted by

Peng Xia on Nov 21

#1 Paper of the day

University of North Carolina at Chapel Hill

Upvote

Authors:

Peng Xia ,

Abstract

Agent0, a self-evolving framework utilizing multi-step co-evolution and tool integration, enhances LLM reasoning capabilities without human-curated data.

AI-generated summary

Large Language Model (LLM) Agents, often trained with Reinforcement Learning (RL), are constrained by a dependency on human-curated data, limiting scalability and tethering AI to human knowledge. Existing self-evolution frameworks offer an alternative but are typically restricted by the model's inherent capabilities and single-round interactions, hindering the development of complex curricula involving tool use or dynamic reasoning. We introduce Agent0, a fully autonomous framework that evolves high-performing agents without external data through multi-step co-evolution and seamless tool integration. Agent0 establishes a symbiotic competition between two agents initialized from the same base LLM: a curriculum agent that proposes increasingly challenging frontier tasks, and an executor agent that learns to solve them. We integrate external tools to enhance the executor's problem-solving capacity; this improvement, in turn, pressures the curriculum agent to construct more complex, tool-aware tasks. Through this iterative process, Agent0 establishes a self-reinforcing cycle that continuously produces high-quality curricula. Empirically, Agent0 substantially boosts reasoning capabilities, improving the Qwen3-8B-Base model by 18% on mathematical reasoning and 24% on general reasoning benchmarks. Code is available at https://github.com/aiming-lab/Agent0.

View arXiv page View PDF GitHub 41 Add to collection

Community

richardxp888

Paper author Paper submitter 1 day ago

Agent0 is a fully autonomous framework that evolves high-performing agents from scratch without relying on any human-curated data. It employs a symbiotic competition between a Curriculum Agent (proposing tasks) and an Executor Agent (solving tasks with tools), driving a self-reinforcing cycle of improvement.