⚠️ TEST MODEL - NOT FOR PRODUCTION USE ⚠️
This is my first model created with GRPO Reinforcement Learning! It was trained only for 1 epoch to produce 50 token answers. The model was created based entirely off the associated Hugging Face 🤗 course.
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for dzur658/SmolGRPO-135M
Base model
HuggingFaceTB/SmolLM2-135M
Quantized
HuggingFaceTB/SmolLM2-135M-Instruct