UniReason-Qwen3-14B-think-SFT

This model is associated with the research paper: "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning"

📄 Paper: 2507.00432 📚 Code: https://github.com/ReasoningTransfer/Transferability-of-LLM-Reasoning

Model Description

This model is a DISTILL FROM QWEN3-32B-INSTRUCT (NON-THINKING MODE) THROUGH REJECT SAMPLING-tuned version of Qwen3-14B-Base focused on math-reasoning capabilities. The model was developed as part of research investigating the transferability of mathematical reasoning skills to general language tasks.

Key Research Questions Addressed:

Does math reasoning training improve general LLM capabilities?
How do different training methods (RL vs SFT) affect transferability?
What is the trade-off between specialized math performance and general capabilities?

Model Details

Base Model: Qwen3-14B-Base
Training Method: DISTILL FROM QWEN3-32B-INSTRUCT (NON-THINKING MODE) THROUGH REJECT SAMPLING
Primary Focus: math-reasoning
Training Data: Math-specific datasets
Architecture: Transformer-based language model
Parameters: 14B

Training Details

Training Method: DISTILL FROM QWEN3-32B-INSTRUCT (NON-THINKING MODE) THROUGH REJECT SAMPLING

Custom training methodology - see paper for details.

Datasets Used

Mathematical reasoning datasets
See paper for complete dataset list

Performance

Math Reasoning Benchmarks

MATH: See paper
AIME: See paper

General Capabilities

General QA: See paper
Code Generation: See paper
Instruction Following: See paper

For detailed performance metrics, please refer to the paper.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load model and tokenizer
model_name = "ReasoningTransferability/UniReason-Qwen3-14B-no-think-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)

# Example: Math reasoning
math_prompt = "Solve this step by step: What is the derivative of x^3 + 2x^2 - 5x + 1?"
inputs = tokenizer(math_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=32768, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

# Example: General reasoning
general_prompt = "Explain the concept of supply and demand in economics."
inputs = tokenizer(general_prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=32768, temperature=0.7)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Limitations and Biases

Specialization Trade-offs: As explored in the paper, models optimized for math reasoning may show reduced performance on general tasks
Training Method Dependencies: Performance characteristics vary significantly between RL and SFT training approaches
Domain Transfer: The extent of capability transfer from math to other domains is limited
Computational Requirements: Model requires significant computational resources for inference

Research Findings

Key findings from the associated paper:

RL vs SFT: RL-tuned models show better transfer to general domains compared to SFT-tuned models
Capability Trade-offs: Most math-specialized models fail to transfer gains to other domains
Forgetting: SFT-tuned models often forget general capabilities during math-focused training

Ethical Considerations

This model is intended for research purposes
Users should be aware of potential biases in mathematical and general reasoning
The model should not be used for making critical decisions without human oversight
Consider the environmental impact of large model inference

Citation

If you use this model in your research, please cite both the model and the associated paper:

@misc{huan2025doesmathreasoningimprove,
      title={Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning}, 
      author={Maggie Huan and Yuetai Li and Tuney Zheng and Xiaoyu Xu and Seungone Kim and Minxin Du and Radha Poovendran and Graham Neubig and Xiang Yue},
      year={2025},
      eprint={2507.00432},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.00432}, 
}

Contact

For questions about this model or the associated research, please:

Open an issue in this repository
Contact the paper authors
Reference the original paper: https://arxiv.org/abs/2507.00432

Acknowledgments

This work builds upon the research presented in "Does Math Reasoning Improve General LLM Capabilities? Understanding Transferability of LLM Reasoning" and uses the Qwen3-14B-Base architecture as its foundation.

Model uploaded on 2025-07-05

Downloads last month: 200

Safetensors

Model size

14.8B params

Tensor type

BF16

Model tree for ReasoningTransferability/UniReason-Qwen3-14B-no-think-SFT

Quantizations

2 models