Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -16,15 +16,21 @@ tags:
 - code
 ---
-# Reinforcement-Learned Teacher Student 7B
 This repository contains a 7B parameter student model trained using the **Reinforcement-Learned Teachers (RLT)** pipeline introduced in our paper [Reinforcement Learning Teachers](https://arxiv.org/abs/2506.08388).
-## Model Description
-The 7B RLT student is distilled from a 7B Reinforcement-Learned Teacher, which has been explicitly trained to produce high-quality reasoning traces optimized for student distillation.
-The model was trained with supervised fine-tuning using the same hyperparameters, the system prompt, and the reasoning tags from [Li et al. 2025](https://arxiv.org/pdf/2502.07374).
 Evaluation was conducted using the [SkyThought](https://github.com/NovaSky-AI/SkyThought) library at commit `4bb8f3e`. Please refer to our [repository](https://github.com/SakanaAI/RLT) and [paper](https://arxiv.org/abs/2506.08388) for details and results.

 - code
 ---
+# RLT-7B
 This repository contains a 7B parameter student model trained using the **Reinforcement-Learned Teachers (RLT)** pipeline introduced in our paper [Reinforcement Learning Teachers](https://arxiv.org/abs/2506.08388).
+## Model Details
+- **Developed by:** [Sakana AI](https://sakana.ai/)
+- **Model type:** Autoregressive Language Model
+- **License:** Apache License, Version 2.0
+- **Paper:** https://arxiv.org/abs/2506.08388
+- **Code:** https://github.com/SakanaAI/RLT
+## Model Description
+This 7B RLT student was distilled from a 7B Reinforcement-Learned Teacher, which has been explicitly trained to produce high-quality reasoning traces optimized for student distillation. The model was trained with supervised fine-tuning using the same hyperparameters, the system prompt, and the reasoning tags from [Li et al. 2025](https://arxiv.org/pdf/2502.07374).
 Evaluation was conducted using the [SkyThought](https://github.com/NovaSky-AI/SkyThought) library at commit `4bb8f3e`. Please refer to our [repository](https://github.com/SakanaAI/RLT) and [paper](https://arxiv.org/abs/2506.08388) for details and results.