edoarc commited on
Commit
0fc43ea
·
verified ·
1 Parent(s): 253c57f

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -4
README.md CHANGED
@@ -16,15 +16,21 @@ tags:
16
  - code
17
  ---
18
 
19
- # Reinforcement-Learned Teacher Student 7B
20
 
21
  This repository contains a 7B parameter student model trained using the **Reinforcement-Learned Teachers (RLT)** pipeline introduced in our paper [Reinforcement Learning Teachers](https://arxiv.org/abs/2506.08388).
22
 
23
- ## Model Description
 
 
 
 
 
 
24
 
25
- The 7B RLT student is distilled from a 7B Reinforcement-Learned Teacher, which has been explicitly trained to produce high-quality reasoning traces optimized for student distillation.
26
 
27
- The model was trained with supervised fine-tuning using the same hyperparameters, the system prompt, and the reasoning tags from [Li et al. 2025](https://arxiv.org/pdf/2502.07374).
28
  Evaluation was conducted using the [SkyThought](https://github.com/NovaSky-AI/SkyThought) library at commit `4bb8f3e`. Please refer to our [repository](https://github.com/SakanaAI/RLT) and [paper](https://arxiv.org/abs/2506.08388) for details and results.
29
 
30
 
 
16
  - code
17
  ---
18
 
19
+ # RLT-7B
20
 
21
  This repository contains a 7B parameter student model trained using the **Reinforcement-Learned Teachers (RLT)** pipeline introduced in our paper [Reinforcement Learning Teachers](https://arxiv.org/abs/2506.08388).
22
 
23
+ ## Model Details
24
+
25
+ - **Developed by:** [Sakana AI](https://sakana.ai/)
26
+ - **Model type:** Autoregressive Language Model
27
+ - **License:** Apache License, Version 2.0
28
+ - **Paper:** https://arxiv.org/abs/2506.08388
29
+ - **Code:** https://github.com/SakanaAI/RLT
30
 
31
+ ## Model Description
32
 
33
+ This 7B RLT student was distilled from a 7B Reinforcement-Learned Teacher, which has been explicitly trained to produce high-quality reasoning traces optimized for student distillation. The model was trained with supervised fine-tuning using the same hyperparameters, the system prompt, and the reasoning tags from [Li et al. 2025](https://arxiv.org/pdf/2502.07374).
34
  Evaluation was conducted using the [SkyThought](https://github.com/NovaSky-AI/SkyThought) library at commit `4bb8f3e`. Please refer to our [repository](https://github.com/SakanaAI/RLT) and [paper](https://arxiv.org/abs/2506.08388) for details and results.
35
 
36