Papers
arxiv:2509.24726

Socratic-Zero : Bootstrapping Reasoning via Data-Free Agent Co-evolution

Published on Sep 29
ยท Submitted by Wang on Sep 30
Authors:
,
,
,
,
,
,
,

Abstract

A framework called Socratic-Zero autonomously generates high-quality training data through the co-evolution of three agents, improving reasoning tasks in large language models.

AI-generated summary

Recent breakthroughs in large language models (LLMs) on reasoning tasks rely heavily on massive, high-quality datasets-typically human-annotated and thus difficult to scale. While data synthesis or distillation offers a promising alternative, existing methods struggle with inconsistent data quality and an inability to dynamically adapt to the evolving capabilities of the model, leading to suboptimal training signals. To address these limitations, we introduce Socratic-Zero, a fully autonomous framework that generates high-quality training data from minimal seed examples through the co-evolution of three agents: the Teacher, the Solver, and the Generator. The Solver continuously refines its reasoning by learning from preference feedback on both successful and failed trajectories; the Teacher adaptively crafts increasingly challenging questions based on the Solver's weaknesses; and the Generator distills the Teacher's question-design strategy to enable scalable, high-fidelity curriculum generation. This closed-loop system produces a self-improving curriculum-requiring no pre-existing tasks or labels. Remarkably, starting from only 100 seed questions, our Socratic-Solver-8B achieves an average gain of +20.2 percentage points over prior data synthesis methods across seven mathematical reasoning benchmarks (AMC23, AIME24-25, Olympiad, MATH-500, Minerva, and GSM8K), with consistent gains on both Qwen3 and GLM4 series models. Even more surprisingly, synthetic data from Socratic-Generator-32B enables student LLMs to achieve superior performance compared to other state-of-the-art (SOTA) commercial LLMs on these benchmarks, including Qwen3-235B-A22B, DeepSeek-V3.1-671B, GPT-5, Gemini-2.5-Pro, Grok-4, and Claude-4.1-Opus.

Community

Paper author Paper submitter
โ€ข
edited 15 days ago

Struggling to boost LLM reasoning? ๐Ÿคฏ The endless need for data is a huge bottleneck.
Current methods often train solvers OR generators in isolation, ignoring their crucial interaction.
We introduce Socratic-Zero: A new framework where agents co-evolve, bootstrapping SOTA reasoning from (almost) nothing.

๐Ÿ“š Paper: https://arxiv.org/pdf/2509.24726

Inspired by Socratic "midwifery," Socratic-Zero creates a self-improving "iron triangle" ecosystem to produce better solvers AND generators.
๐Ÿง‘โ€๐ŸŽ“ Solver (Student): Solves problems & learns from its mistakes.
๐Ÿ‘ฉโ€๐Ÿซ Teacher (Master): Crafts new problems targeting the Solver's specific weaknesses.
โœ๏ธ Generator (Apprentice): Learns the Teacher's expert strategy to create a scalable, high-quality curriculum.

The system is fully autonomous. Starting with just 100 seed questions, it creates a closed "teach-learn-practice" loop, driving a spiral of improvement.
๐Ÿš€ Solver Result: Our Socratic-Solver-8B achieves a +20% average gain across 7 math benchmarks, all without massive external datasets!
But it's not just about solving. Our generator learns to create world-class problems.
๐Ÿคฏ Generator Result: Our Socratic-Generator-32B produces data that enables a student model to outperform those trained on data from giants like GPT-5, Gemini-2.5-Pro, Claude-4.1-Opus, Grok-4, Qwen3-235B, & DeepSeek-V3.1!

Co-evolution is key. By teaching agents how to teach each other, we unlock a new path for scalable, data-efficient reasoning.
We welcome your feedback and critiques!
๐Ÿ“š Paper: https://arxiv.org/pdf/2509.24726
๐Ÿ’ป Code: https://github.com/Frostlinx/Socratic-Zero

image

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2509.24726 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2509.24726 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.24726 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.