DistilQwen-ThoughtX: Optimized Reasoning Models with OmniThought

DistilQwen-ThoughtX is a series of high-performance reasoning models trained on the OmniThought dataset. These models are optimized for chain-of-thought (CoT) reasoning with balanced verbosity and cognitive difficulty, achieving state-of-the-art results on mathematical, coding, and logical reasoning benchmarks.

Model Variants

Model Name	Parameters	Base Model	Hugging Face Link
`DistilQwen-ThoughtX-7B`	7B	Qwen2.5-7B-Instruct	Link
`DistilQwen-ThoughtX-32B`	32B	Qwen2.5-32B-Instruct	Link

Key Features

Optimal Reasoning Verbosity (RV):
CoT processes are filtered to avoid overthinking (excessive steps) or under-reasoning, improving efficiency and accuracy.
Cognitive Difficulty (CD) Alignment:
CoTs are selected to match the model's capacity, ensuring smaller models learn simpler reasoning paths while larger models handle complex logic.
Performance:
Outperforms existing open-source reasoning models (e.g., DeepSeek-R1-Distill, OpenThinker) on benchmarks like AIME2024, MATH500, and LiveCodeBench V2.

Usage

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "alibaba-pai/DistilQwen-ThoughtX-7B"  # or 32B
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto")

prompt = "Solve ∫x e^x dx. Show your reasoning step-by-step."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training Data: OmniThought

The models are trained on the OmniThought dataset, which includes:

2 million CoT processes with RV and CD annotations.
Coverage of mathematics, coding, and logical reasoning tasks.
Validated by multiple teacher models (DeepSeek-R1, QwQ-32B).

Benchmarks

Model	AIME2024	MATH500	GPQA-D	LiveCodeBench V2
DeepSeek-R1-Distill-7B	57.3	89.6	47.3	48.4
DistilQwen-ThoughtX-7B	56.7	90.2	50.0	56.8
DeepSeek-R1-Distill-32B	74.7	90.0	62.4	72.3
DistilQwen-ThoughtX-32B	80.0	92.6	64.0	73.4

Reference

For more detailed information about the model, we encourage you to refer to our paper:

Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations
Wenrui Cai, Chengyu Wang, Junbing Yan, Jun Huang, Xiangzhong Fang arXiv:2505.10937

You can cite the paper using the following citation format:

@misc{cai2025reasoningomnithoughtlargecot,
      title={Reasoning with OmniThought: A Large CoT Dataset with Verbosity and Cognitive Difficulty Annotations}, 
      author={Wenrui Cai and Chengyu Wang and Junbing Yan and Jun Huang and Xiangzhong Fang},
      year={2025},
      eprint={2505.10937},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2505.10937} 
}

alibaba-pai
/

DistilQwen-ThoughtX-7B