dzur658
/

SmolGRPO-135M

Text Generation

Reasoning-Course

text-generation-inference

Model card Files Files and versions

⚠️ TEST MODEL - NOT FOR PRODUCTION USE ⚠️

This is my first model created with GRPO Reinforcement Learning! It was trained only for 1 epoch to produce 50 token answers. The model was created based entirely off the associated Hugging Face 🤗 course.

Downloads last month: 7

Safetensors

Model size

135M params

Tensor type

BF16

·

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dzur658/SmolGRPO-135M

Base model

HuggingFaceTB/SmolLM2-135M

Quantized

HuggingFaceTB/SmolLM2-135M-Instruct

Finetuned

(173)

this model

Dataset used to train dzur658/SmolGRPO-135M