Qwen2.5-3B-GRPO-Math-GSM8K

A 3-billion-parameter Qwen2.5 model fine-tuned with Group-Relative Policy Optimization (GRPO) on the GSM8K grade-school math dataset. The aim is to turn the compact 3B model into a lightweight but highly capable step-by-step math reasoner that runs on a single consumer GPU.

Try it

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K"

tok = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype="auto",
    device_map="auto"
)

problem = "If a book costs $7 and a pen costs $2, how much do 3 books and 4 pens cost in total?"

messages = [
    {"role": "system", "content": "You are a step-by-step math tutor."},
    {"role": "user", "content": problem}
]

prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
out_ids = model.generate(
    **tok(prompt, return_tensors="pt").to(model.device),
    max_new_tokens=1024,
    temperature=0.7
)
print(tok.decode(out_ids[0], skip_special_tokens=True))
Downloads last month
9
Safetensors
Model size
3.09B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K

Base model

Qwen/Qwen2.5-3B
Finetuned
(631)
this model

Dataset used to train BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K

Collection including BounharAbdelaziz/Qwen2.5-3B-GRPO-Math-GSM8K