ykarout
/

RPT-DeepSeek-R1-0528-Qwen3-8B

Text Generation

Generated from Trainer

text-generation-inference

Model card Files Files and versions

ykarout commited on 17 days ago

Commit

6c94f95

·

verified ·

1 Parent(s): a518b50

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -6,6 +6,7 @@ tags:
 - generated_from_trainer
 - trl
 - grpo
 licence: license
 license: apache-2.0
 language:
@@ -23,7 +24,7 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
 ### Framework versions
 - TRL: 0.19.0
@@ -31,6 +32,7 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
 - Pytorch: 2.6.0+cu124
 - Datasets: 3.6.0
 - Tokenizers: 0.21.2
 ## Citations

 - generated_from_trainer
 - trl
 - grpo
+- rpt
 licence: license
 license: apache-2.0
 language:
 This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
+The GRPO training followed the methodology introduced in RPT (Reinforcement Pre-Training Paper ): https://huggingface.co/papers/2506.08007
 ### Framework versions
 - TRL: 0.19.0
 - Pytorch: 2.6.0+cu124
 - Datasets: 3.6.0
 - Tokenizers: 0.21.2
+- vllm: 0.8.5.post1
 ## Citations