DistilQwen-ThoughtX: Optimized Reasoning Models with OmniThought
DistilQwen-ThoughtX is a series of high-performance reasoning models trained on the OmniThought dataset. These models are optimized for chain-of-thought (CoT) reasoning with balanced verbosity and cognitive difficulty, achieving state-of-the-art results on mathematical, coding, and logical reasoning benchmarks.
Model Variants
Model Name | Parameters | Base Model | Hugging Face Link |
---|---|---|---|
DistilQwen-ThoughtX-7B |
7B | Qwen2.5-7B-Instruct | Link |
DistilQwen-ThoughtX-32B |
32B | Qwen2.5-32B-Instruct | Link |
Key Features
Optimal Reasoning Verbosity (RV):
CoT processes are filtered to avoid overthinking (excessive steps) or under-reasoning, improving efficiency and accuracy.Cognitive Difficulty (CD) Alignment:
CoTs are selected to match the model's capacity, ensuring smaller models learn simpler reasoning paths while larger models handle complex logic.Performance:
Outperforms existing open-source reasoning models (e.g., DeepSeek-R1-Distill, OpenThinker) on benchmarks like AIME2024, MATH500, and LiveCodeBench V2.
Usage
Inference Example
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "alibaba-pai/DistilQwen-ThoughtX-7B" # or 32B
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")
prompt = "Solve ∫x e^x dx. Show your reasoning step-by-step."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Data: OmniThought
The models are trained on the OmniThought dataset, which includes:
- 2 million CoT processes with RV and CD annotations.
- Coverage of mathematics, coding, and logical reasoning tasks.
- Validated by multiple teacher models (DeepSeek-R1, QwQ-32B).
Benchmarks
Model | AIME2024 | MATH500 | GPQA-D | LiveCodeBench V2 |
---|---|---|---|---|
DeepSeek-R1-Distill-7B | 57.3 | 89.6 | 47.3 | 48.4 |
DistilQwen-ThoughtX-7B | 56.7 | 90.2 | 50.0 | 56.8 |
DeepSeek-R1-Distill-32B | 74.7 | 90.0 | 62.4 | 72.3 |
DistilQwen-ThoughtX-32B | 80.0 | 92.6 | 64.0 | 73.4 |
Reference
For more detailed information about the model, we encourage you to refer to our paper:
- Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang arXiv:2505.10937
You can cite the paper using the following citation format:
@misc{cai2025reasoningomnithoughtlargecot,
title={Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations},
author={Wenrui Cai and Chengyu Wang and Junbing Yan and Jun Huang and Xiangzhong Fang},
year={2025},
eprint={2505.10937},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2505.10937}
}
- Downloads last month
- 541