sameersegal
/

Qwen3-0.6B-Reverse-Text-SFT-RLFT

Model card Files Files and versions

sameersegal commited on Sep 19

Commit

45a3583

·

verified ·

1 Parent(s): 61cb9b1

Update README.md

Files changed (1) hide show

README.md +6 -0

README.md CHANGED Viewed

@@ -12,8 +12,14 @@ base_model:
 Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl/) (RL Training) and [reverse-text](https://github.com/PrimeIntellect-ai/prime-environments/tree/main/environments/reverse_text) (RL Environment). See the improvement in results:
 ![](comparison.png)
 ## Example Prompt & Reward
 **Task:** `reverse-text`

 Simple model that was RL FT for 20 steps / epochs after SFT to reverse text using [prime-rl](https://github.com/PrimeIntellect-ai/prime-rl/) (RL Training) and [reverse-text](https://github.com/PrimeIntellect-ai/prime-environments/tree/main/environments/reverse_text) (RL Environment). See the improvement in results:
+## Comparison with SFT (base) model
+The reward (correctness score) distribution has improved for the RLFT model across all rollouts.
 ![](comparison.png)
+At an instance level, if we compare the best scores across rollouts, we see a mean improvement of 3.73%. But a maximum of ~30% and reduction of ~3%
+![](instance-level.png)
 ## Example Prompt & Reward
 **Task:** `reverse-text`