ykarout commited on
Commit
6c94f95
·
verified ·
1 Parent(s): a518b50

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -6,6 +6,7 @@ tags:
6
  - generated_from_trainer
7
  - trl
8
  - grpo
 
9
  licence: license
10
  license: apache-2.0
11
  language:
@@ -23,7 +24,7 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
23
 
24
 
25
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
26
-
27
  ### Framework versions
28
 
29
  - TRL: 0.19.0
@@ -31,6 +32,7 @@ This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing
31
  - Pytorch: 2.6.0+cu124
32
  - Datasets: 3.6.0
33
  - Tokenizers: 0.21.2
 
34
 
35
  ## Citations
36
 
 
6
  - generated_from_trainer
7
  - trl
8
  - grpo
9
+ - rpt
10
  licence: license
11
  license: apache-2.0
12
  language:
 
24
 
25
 
26
  This model was trained with GRPO, a method introduced in [DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models](https://huggingface.co/papers/2402.03300).
27
+ The GRPO training followed the methodology introduced in RPT (Reinforcement Pre-Training Paper ): https://huggingface.co/papers/2506.08007
28
  ### Framework versions
29
 
30
  - TRL: 0.19.0
 
32
  - Pytorch: 2.6.0+cu124
33
  - Datasets: 3.6.0
34
  - Tokenizers: 0.21.2
35
+ - vllm: 0.8.5.post1
36
 
37
  ## Citations
38