II-Thought-1.5B-Preview

Overview
II-Thought-1.5B-Preview is a Reinforcement Learning enhanced language model trained on a subset of II-Thought-RL-v0, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled 50K math subset (dataset link).
Training Methodology
- Framework: ii_thought / verl
- Algorithm: GRPO
- Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
- Reward Modeling
- Answer correctness reward
- Format correctness reward
- Final reward function
- Answer correctness reward
For a deeper look into the implementation details, refer to the our repository: Intelligent-Internet/ii-thought.
Evaluation Results
We used the EvalScope to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:
- 64 responses:
AMC23, AIME24, AIME25
- 4 responses:
Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English
- 1 responses:
IFEval
Sampling Configs:
- Max context length: 32,768
- Temperature: 0.6
- Top p: 0.95
- Top k: 40
- seed: 42
Additionally, for Live-Code-Bench, we leverage QWQ-Evaluation to reproduce results using a max context length of 32768, averaging over 8 runs.
Benchmark | DeepSeek-R1-Distill-Qwen-1.5B | Qwen2.5-Math-1.5B-Instruct | II-Thought-1.5B-Preview |
---|---|---|---|
AMC23 | 69.69 | 54.26 | 79.77 |
AIME24 | 29.43 | 10.73 | 34.17 |
AIME25 | 23.39 | 8.8 | 26.09 |
Olympiad Bench | 43.15 | 36.07 | 52.78 |
Math500 | 83.6 | 73.15 | 87.2 |
Math Gaokao 2023 English | 72.99 | 62.47 | 77.21 |
Minerva Math | 27.57 | 24.45 | 30.79 |
Vietnamese Entrance Math Exam | 40.32 | 26.69 | 46.24 |
LiveCodeBench | 16.66 | 2.6 | 19.84 |
IFEval | 44.24 | 27.22 | 44.84 |
Average | 45.10 | 32.64 | 49.90 |
How To Use
Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.
For instance, you can easily start a service using vLLM:
vllm serve Intelligent-Internet/II-Thought-1.5B-Preview
You can also easily start a service using SGLang:
python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview
Usage Guidelines
- Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
- For mathematical problems, explicitly request step-by-step reasoning and format the final answer within
\\boxed{}
(e.g., "Please reason step by step, and put your final answer within \boxed{}.").
Citation
@misc{2025iithought,
title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset},
author={Intelligent Internet},
year={2025}
}
- Downloads last month
- 566