II-Thought-1.5B-Preview

Overview

II-Thought-1.5B-Preview is a Reinforcement Learning enhanced language model trained on a subset of II-Thought-RL-v0, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled 50K math subset (dataset link).

Training Methodology

Framework: ii_thought / verl
Algorithm: GRPO
Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
Reward Modeling
- Answer correctness reward
- Format correctness reward
- Final reward function

For a deeper look into the implementation details, refer to the our repository: Intelligent-Internet/ii-thought.

Evaluation Results

We used the EvalScope to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:

64 responses: AMC23, AIME24, AIME25
4 responses: Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English
1 responses: IFEval

Sampling Configs:

Max context length: 32,768
Temperature: 0.6
Top p: 0.95
Top k: 40
seed: 42

Additionally, for Live-Code-Bench, we leverage QWQ-Evaluation to reproduce results using a max context length of 32768, averaging over 8 runs.

Benchmark	DeepSeek-R1-Distill-Qwen-1.5B	Qwen2.5-Math-1.5B-Instruct	II-Thought-1.5B-Preview
AMC23	69.69	54.26	79.77
AIME24	29.43	10.73	34.17
AIME25	23.39	8.8	26.09
Olympiad Bench	43.15	36.07	52.78
Math500	83.6	73.15	87.2
Math Gaokao 2023 English	72.99	62.47	77.21
Minerva Math	27.57	24.45	30.79
Vietnamese Entrance Math Exam	40.32	26.69	46.24
LiveCodeBench	16.66	2.6	19.84
IFEval	44.24	27.22	44.84
Average	45.10	32.64	49.90

How To Use

Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.

For instance, you can easily start a service using vLLM:

vllm serve Intelligent-Internet/II-Thought-1.5B-Preview

You can also easily start a service using SGLang:

python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview

Usage Guidelines

Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
For mathematical problems, explicitly request step-by-step reasoning and format the final answer within \\boxed{} (e.g., "Please reason step by step, and put your final answer within \boxed{}.").

Citation

@misc{2025iithought,
      title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, 
      author={Intelligent Internet},
      year={2025}
}

Intelligent-Internet
/

II-Thought-1.5B-Preview