Model ID

GPG: A Simple and Strong Reinforcement Learning Baseline for Model Reasoning

https://arxiv.org/abs/2504.02546

Model Details

The RL model (GPG-7B in paper) trained on the simple1r_qwen_level3to5 dataset based on GPG, using Qwen2.5-Math-7B as the baseline model.

Attention!

Due to changes in environment and devices, test results may fluctuate. Specifically, when tested on an NPU, the average accuracy of five datasets (AIME24, AMC23, MATH-500, Minerva and OlympiadBench) is 57.7. However, when tested on an H20 GPU, the average accuracy drops from 57.7 to 55.3. These fluctuations are entirely within an acceptable range.

Downloads last month
8
Safetensors
Model size
7.62B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for GD-ML/Qwen2.5-Math-7B-GPG

Base model

Qwen/Qwen2.5-7B
Finetuned
(197)
this model