ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
Abstract
ProtoReasoning enhances large reasoning models through prototypical representations, leading to improved cross-domain generalization in logical reasoning, planning, and other tasks.
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However, the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes -- fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures.Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning).ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.
Community
Recent advances in Large Reasoning Models (LRMs) trained with Long Chain-of-Thought (Long
CoT) reasoning have demonstrated remarkable cross-domain generalization capabilities. However,
the underlying mechanisms supporting such transfer remain poorly understood. We hypothesize
that cross-domain generalization arises from shared abstract reasoning prototypes — fundamental
reasoning patterns that capture the essence of problems across domains. These prototypes
minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in
shared reasoning structures. Based on this hypothesis, we propose ProtoReasoning, a framework
that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical
representations (Prolog for logical reasoning, PDDL for planning). ProtoReasoning features:
(1) an automated prototype construction pipeline that transforms problems into corresponding
prototype representations; (2) a comprehensive verification system providing reliable feedback
through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within
prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning
achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3%
improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on
mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space
also demonstrates enhanced generalization to structurally similar problems compared to training
solely on natural language representations, validating our hypothesis that reasoning prototypes
serve as the foundation for generalizable reasoning in large language models.
Interesting work, but I expected the boost on performance to be much more. Specially for AIME24, in optillm (https://github.com/codelion/optillm) when using with z3, we actually see significant improvement in AIME24. For instance with qwen2.5:14b-instruct-fp16 (with ollama) we saw the AIME24 scores go from 10.00 to 20.00.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond (2025)
- Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles (2025)
- General-Reasoner: Advancing LLM Reasoning Across All Domains (2025)
- Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study (2025)
- CoRT: Code-integrated Reasoning within Thinking (2025)
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains (2025)
- Infi-MMR: Curriculum-based Unlocking Multimodal Reasoning via Phased Reinforcement Learning in Multimodal Small Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper