lordChipotle
/

Llama3GRPOReasoning

Reinforcement Learning

Model card Files Files and versions Community

lordChipotle commited on Jun 4

Commit

6985228

·

verified ·

1 Parent(s): 0f7d24d

Update README.md

Files changed (1) hide show

README.md +8 -1

README.md CHANGED Viewed

@@ -1,7 +1,14 @@
 # Llama 3.1-8B Mathematical Reasoning (GRPO)
 ## Full-Post-Training Instruction
-Please visit our [notebook](https://colab.research.google.com/drive/1kRmxAC5dL_rOqZUea11X2IdE5-mKbhnw?usp=sharing) for a full walkthrough on this project
 ## Model Description

+---
+datasets:
+- openai/gsm8k
+base_model:
+- meta-llama/Llama-3.1-8B-Instruct
+pipeline_tag: reinforcement-learning
+---
 # Llama 3.1-8B Mathematical Reasoning (GRPO)
 ## Full-Post-Training Instruction
+Please visit our [notebook](https://colab.research.google.com/drive/1kRmxAC5dL_rOqZUea11X2IdE5-mKbhnw?usp=sharing) for a full walkthrough on this project.
 ## Model Description