General-Reasoner: Advancing LLM Reasoning Across All Domains
Abstract
Reinforcement learning (RL) has recently demonstrated strong potential in enhancing the reasoning capabilities of large language models (LLMs). Particularly, the "Zero" reinforcement learning introduced by Deepseek-R1-Zero, enables direct RL training of base LLMs without relying on an intermediate supervised fine-tuning stage. Despite these advancements, current works for LLM reasoning mainly focus on mathematical and coding domains, largely due to data abundance and the ease of answer verification. This limits the applicability and generalization of such models to broader domains, where questions often have diverse answer representations, and data is more scarce. In this paper, we propose General-Reasoner, a novel training paradigm designed to enhance LLM reasoning capabilities across diverse domains. Our key contributions include: (1) constructing a large-scale, high-quality dataset of questions with verifiable answers curated by web crawling, covering a wide range of disciplines; and (2) developing a generative model-based answer verifier, which replaces traditional rule-based verification with the capability of chain-of-thought and context-awareness. We train a series of models and evaluate them on a wide range of datasets covering wide domains like physics, chemistry, finance, electronics etc. Our comprehensive evaluation across these 12 benchmarks (e.g. MMLU-Pro, GPQA, SuperGPQA, TheoremQA, BBEH and MATH AMC) demonstrates that General-Reasoner outperforms existing baseline methods, achieving robust and generalizable reasoning performance while maintaining superior effectiveness in mathematical reasoning tasks.
Community
General-Reasoner introduces a new training paradigm that leverages diverse web-crawled verifiable reasoning data and a compact generative model-based verifier to enable large language models to achieve robust, generalizable reasoning across a wide range of domains beyond mathematics.
If you remove the "14b-zoo" variant from chart 1 (it serves no purpose), and add the "qwen-3 base" and "qwen-3 instruct" then the chart would be more clear and less likely to mislead the reader that your method improves performance dramatically, when in fact the perceived dramatic increase is due to qwen-3 being a better model than qwen-2.5, for GPQA the chart emphasises a 12.6 point increase because you are comparing "qwen 2.5 instruct" to your "qwen-3 general" model. Whereas the actual increase, from qwen-3 instruct is only 1.3 points
All the data is in the tables, so I'm not saying you're deliberately misleading anyone, but the choice of elements in first chart is both confusing and (accidentally) misleading.
Thanks for the reminder!
We are starting from Qwen3-base models instead of Qwen3-instruct. So 1.3 points over Qwen3-instruct is indeed good because we don't use any confidential data used by the Qwen3 team. We release all of our data and training checkpoints.
But you are right. We should put the qwen3-base and qwen3-instruct results in the chart 1 to show the fair comparison. We will update this chart soon.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Crossing the Reward Bridge: Expanding RL with Verifiable Rewards Across Diverse Domains (2025)
- Nemotron-CrossThink: Scaling Self-Learning beyond Math Reasoning (2025)
- DeepMath-103K: A Large-Scale, Challenging, Decontaminated, and Verifiable Mathematical Dataset for Advancing Reasoning (2025)
- X-Reasoner: Towards Generalizable Reasoning Across Modalities and Domains (2025)
- Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 (2025)
- Reasoning Beyond Limits: Advances and Open Problems for LLMs (2025)
- Phi-4-Mini-Reasoning: Exploring the Limits of Small Reasoning Language Models in Math (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 4
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper