Barcenas 3b GRPO

Based on alpindale/Llama-3.2-3B-Instruct And trained with dataset openai/gsm8k

The objective of this model is to test the novel GRPO training used in DeepSeek R1. Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.

Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽

Downloads last month
20
Safetensors
Model size
3.21B params
Tensor type
FP16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Danielbrdz/Barcenas-3b-GRPO

Finetuned
(348)
this model
Quantizations
2 models

Dataset used to train Danielbrdz/Barcenas-3b-GRPO