Barcenas 3b GRPO

Based on alpindale/Llama-3.2-3B-Instruct And trained with dataset openai/gsm8k

The objective of this model is to test the novel GRPO training used in DeepSeek R1. Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.

Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽

Downloads last month: 4

Safetensors

Model size

3.21B params

Tensor type

F16

Inference Providers NEW

Text Generation

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Danielbrdz/Barcenas-3b-GRPO

Base model

meta-llama/Llama-3.2-3B-Instruct

Finetuned

(483)

this model

Quantizations

2 models

Danielbrdz
/

Barcenas-3b-GRPO

Model tree for Danielbrdz/Barcenas-3b-GRPO

Dataset used to train Danielbrdz/Barcenas-3b-GRPO