YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

II-Thought-1.5B-Preview

Overview

II-Thought-1.5B-Preview is a Reinforcement Learning enhanced language model trained on a subset of II-Thought-RL-v0, the first large-scale, multi-task dataset designed for RL. While II-Thought-RL-v0 spans multiple domains (mathematics, coding, medicine, science, etc.), this preview release was trained on randomly sampled 50K math subset (dataset link).

Training Methodology

  • Framework: ii_thought / verl
  • Algorithm: GRPO
  • Base Model: deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B
  • Reward Modeling
    • Answer correctness reward
    • Format correctness reward
    • Final reward function

For a deeper look into the implementation details, refer to the our repository: Intelligent-Internet/ii-thought.

Evaluation Results

We used the EvalScope to evaluate models and report Pass@1 accuracy across all benchmarks. The number of responses generated per problem is as follows:

  • 64 responses: AMC23, AIME24, AIME25
  • 4 responses: Math500, Olympiad-Bench, Vietnamese-Entrance-Math-Exam, Minerva-Math, Math-Gaokao-2023-English
  • 1 responses: IFEval

Sampling Configs:

  • Max context length: 32,768
  • Temperature: 0.6
  • Top p: 0.95
  • Top k: 40
  • seed: 42

Additionally, for Live-Code-Bench, we leverage QWQ-Evaluation to reproduce results using a max context length of 32768, averaging over 8 runs.

Benchmark DeepSeek-R1-Distill-Qwen-1.5B Qwen2.5-Math-1.5B-Instruct II-Thought-1.5B-Preview
AMC23 69.69 54.26 79.77
AIME24 29.43 10.73 34.17
AIME25 23.39 8.8 26.09
Olympiad Bench 43.15 36.07 52.78
Math500 83.6 73.15 87.2
Math Gaokao 2023 English 72.99 62.47 77.21
Minerva Math 27.57 24.45 30.79
Vietnamese Entrance Math Exam 40.32 26.69 46.24
LiveCodeBench 16.66 2.6 19.84
IFEval 44.24 27.22 44.84
Average 45.10 32.64 49.90

How To Use

Our model can be utilized in the same manner as Qwen or Deepseek-R1-Distill models.

For instance, you can easily start a service using vLLM:

vllm serve Intelligent-Internet/II-Thought-1.5B-Preview

You can also easily start a service using SGLang:

python -m sglang.launch_server --model Intelligent-Internet/II-Thought-1.5B-Preview

Usage Guidelines

  • Recommended Sampling Parameters: temperature = 0.6, top_p = 0.95
  • For mathematical problems, explicitly request step-by-step reasoning and format the final answer within \\boxed{} (e.g., "Please reason step by step, and put your final answer within \boxed{}.").

Citation

@misc{2025iithought,
      title={II-Thought : A Large-Scale, High-Quality Reasoning Dataset}, 
      author={Intelligent Internet},
      year={2025}
}
Downloads last month
566
Safetensors
Model size
1.78B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support