sameersegal commited on
Commit
45a3583
·
verified ·
1 Parent(s): 61cb9b1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -0
README.md CHANGED
@@ -12,8 +12,14 @@ base_model:
12
 
13
  Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl/) (RL Training) and [reverse-text](https://github.com/PrimeIntellect-ai/prime-environments/tree/main/environments/reverse_text) (RL Environment). See the improvement in results:
14
 
 
 
 
15
  ![](comparison.png)
16
 
 
 
 
17
  ## Example Prompt & Reward
18
 
19
  **Task:** `reverse-text`
 
12
 
13
  Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl/) (RL Training) and [reverse-text](https://github.com/PrimeIntellect-ai/prime-environments/tree/main/environments/reverse_text) (RL Environment). See the improvement in results:
14
 
15
+ ## Comparison with SFT (base) model
16
+
17
+ The reward (correctness score) distribution has improved for the RLFT model across all rollouts.
18
  ![](comparison.png)
19
 
20
+ At an instance level, if we compare the best scores across rollouts, we see a mean improvement of 3.73%. But a maximum of ~30% and reduction of ~3%
21
+ ![](instance-level.png)
22
+
23
  ## Example Prompt & Reward
24
 
25
  **Task:** `reverse-text`