UniReason-Qwen3-14B-RL

This model is associated with the research paper: "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"

📄 Paper: 2507.00432 💻 Code: https://github.com/ReasoningTransfer/Transferability-of-LLM-Reasoning

Abstract

Math reasoning has become the poster child of progress in large language models (LLMs), with new models rapidly surpassing human-level performance on benchmarks like MATH and AIME. But as math leaderboards improve week by week, it is worth asking: do these gains reflect broader problem-solving ability or just narrow overfitting?

Model Description

This model is a RL-GRPO-tuned version of qwen3-14b focused on math-reasoning capabilities. The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.

Key Research Questions Addressed:

Does math reasoning training improve general LLM capabilities?
How do different training methods (RL vs SFT) affect transferability?
What is the trade-off between specialized math performance and general capabilities?

Model Details

Base Model: qwen3-14b
Training Method: RL-GRPO
Primary Focus: math-reasoning
Training Data: Math-specific datasets
Architecture: Transformer-based language model
Parameters: 14B

Training Details

Training Method: RL-GRPO

Custom training methodology - see paper for details.

Datasets Used

Mathematical reasoning datasets
See paper for complete dataset list

Performance

Math Reasoning Benchmarks

MATH: See paper
AIME: See paper

General Capabilities

General QA: See paper
Code Generation: See paper
Instruction Following: See paper

For detailed performance metrics, please refer to the paper.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ReasoningTransferability/UniReason-Qwen3-14B-RL"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Math reasoning
math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
inputs = tokenizer(math_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Example: General reasoning
general_prompt = "Explain the concept of supply and demand in economics."
inputs = tokenizer(general_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Limitations and Biases

Specialization Trade-offs: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
Training Method Dependencies: Performance characteristics vary significantly between RL and SFT training approaches
Domain Transfer: The extent of capability transfer from math to other domains is limited
Computational Requirements: Model requires significant computational resources for inference

Research Findings

Key findings from the associated paper:

RL vs SFT: RL-tuned models show better transfer to general domains compared to SFT-tuned models
Capability Trade-offs: Most math-specialized models fail to transfer gains to other domains
Forgetting: SFT-tuned models often forget general capabilities during math-focused training

Ethical Considerations

This model is intended for research purposes
Users should be aware of potential biases in mathematical and general reasoning
The model should not be used for making critical decisions without human oversight
Consider the environmental impact of large model inference

Citation

If you use this model in your research, please cite both the model and the associated paper:

@article{math_reasoning_transfer_2025,
  title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning},
  author={Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Kim, Seungone and Du, Minxin and Poovendran, Radha and Neubig, Graham and Yue, Xiang},
  journal={arXiv preprint arXiv:2507.00432},
  year={2025},
  url={https://arxiv.org/abs/2507.00432}
}

Contact

For questions about this model or the associated research, please:

Open an issue in this repository
Contact the paper authors
Reference the original paper: https://arxiv.org/abs/2507.00432

Acknowledgments

This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the qwen3-14b architecture as its foundation.

Model uploaded on 2025-07-03

ReasoningTransferability
/

UniReason-Qwen3-14B-RL