Barcenas 3b GRPO
Based on alpindale/Llama-3.2-3B-Instruct And trained with dataset openai/gsm8k
The objective of this model is to test the novel GRPO training used in DeepSeek R1. Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.
Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽
- Downloads last month
- 20
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
HF Inference deployability: The model has no library tag.