Danielbrdz
/

Barcenas-3b-GRPO

Text Generation

Model card Files Files and versions Community

Danielbrdz commited on Feb 8

Commit

643e761

·

verified ·

1 Parent(s): a1c5481

Update README.md

Files changed (1) hide show

README.md +10 -1

README.md CHANGED Viewed

@@ -7,4 +7,13 @@ language:
 base_model:
 - meta-llama/Llama-3.2-3B-Instruct
 pipeline_tag: text-generation
----

 base_model:
 - meta-llama/Llama-3.2-3B-Instruct
 pipeline_tag: text-generation
+---
+Barcenas 3b GRPO
+Based on alpindale/Llama-3.2-3B-Instruct
+And trained with dataset openai/gsm8k
+The objective of this model is to test the novel GRPO training used in DeepSeek R1.
+Using the reinforcement learning (RL) algorithm to improve the reasoning capabilities of the Llama-3.2-3B-Instruct.
+Made with ❤️ in Guadalupe, Nuevo Leon, Mexico 🇲🇽