RLAD: Training LLMs to Discover Abstractions for Solving Reasoning Problems
Abstract
Introducing reasoning abstractions in reinforcement learning improves structured exploration and generalization for complex problem-solving.
Reasoning requires going beyond pattern matching or memorization of solutions to identify and implement "algorithmic procedures" that can be used to deduce answers to hard problems. Doing so requires realizing the most relevant primitives, intermediate results, or shared procedures, and building upon them. While RL post-training on long chains of thought ultimately aims to uncover this kind of algorithmic behavior, most reasoning traces learned by large models fail to consistently capture or reuse procedures, instead drifting into verbose and degenerate exploration. To address more effective reasoning, we introduce reasoning abstractions: concise natural language descriptions of procedural and factual knowledge that guide the model toward learning successful reasoning. We train models to be capable of proposing multiple abstractions given a problem, followed by RL that incentivizes building a solution while using the information provided by these abstractions. This results in a two-player RL training paradigm, abbreviated as RLAD, that jointly trains an abstraction generator and a solution generator. This setup effectively enables structured exploration, decouples learning signals of abstraction proposal and solution generation, and improves generalization to harder problems. We also show that allocating more test-time compute to generating abstractions is more beneficial for performance than generating more solutions at large test budgets, illustrating the role of abstractions in guiding meaningful exploration.
Community
rl + abstraction (in nautral language)
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Metacognitive Reuse: Turning Recurring LLM Reasoning Into Concise Behaviors (2025)
- SABER: Switchable and Balanced Training for Efficient LLM Reasoning (2025)
- Learning to Refine: Self-Refinement of Parallel Reasoning in LLMs (2025)
- Think in Blocks: Adaptive Reasoning from Direct Response to Deep Reasoning (2025)
- From Harm to Help: Turning Reasoning In-Context Demos into Assets for Reasoning LMs (2025)
- Rethinking Thinking Tokens: LLMs as Improvement Operators (2025)
- The Majority is not always right: RL training for solution aggregation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper