nielsr HF Staff commited on
Commit
2cea537
Β·
verified Β·
1 Parent(s): 88e7424

Add pipeline and library tags

Browse files

This PR makes sure that people looking for RL models can find your model at https://huggingface.co/models?pipeline_tag=reinforcement-learning&sort=trending.
It also adds the `transformers` library name, so people know which library can be used with your model.

Files changed (1) hide show
  1. README.md +25 -10
README.md CHANGED
@@ -1,6 +1,10 @@
1
  ---
2
  license: mit
 
 
3
  ---
 
 
4
  <div align="center">
5
 
6
  # Open Reasoner Zero
@@ -37,7 +41,7 @@ We introduce **Open-Reasoner-Zero**, the first open source implementation of lar
37
 
38
  To enable broader participation in this pivotal moment we witnessed and accelerate research towards artificial general intelligence (AGI),
39
  we release our source code, parameter settings, training data, and model weights.
40
- Please refer to our [paper](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/ORZ_paper.pdf) for more insights across various model sizes.
41
 
42
  **Let the Reasoner-Zero tide rise!**
43
 
@@ -56,7 +60,7 @@ Please refer to our [paper](https://github.com/Open-Reasoner-Zero/Open-Reasoner-
56
  <strong>[2025/03/31]</strong>
57
  We announce a major milestone for `Open-Reasoner-Zero`:
58
 
59
- - 🌊 [Updated Paper](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/ORZ_paper.pdf) with new results.
60
  - πŸ”­ [Easy-to-use Training Scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/playground):
61
  - [ORZ-1.5B training scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/playground/orz_1p5b_ppo.py) and [ORZ-0.5B training scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/playground/orz_0p5b_ppo.py) (main results in Figure 2).
62
  - [Minimal resource training scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/playground/orz_0p5b_ppo_1gpu.py): ORZ-0.5B can be run on a single A800/H800 gpu!
@@ -75,7 +79,7 @@ We announce a major milestone for `Open-Reasoner-Zero`:
75
  We release `Open-Reasoner-Zero`.
76
 
77
  As part of this release, we open-source:
78
- - 🌊 [Paper](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/ORZ_paper.pdf) on our comprehensive analysis and insights in Reasoner-Zero training
79
  - πŸ€— HF Model [`Open-Reasoner-Zero-7B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) and [`Open-Reasoner-Zero-32B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-32B)
80
  - 🎁 [`Our curated 57k training data`](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/data)
81
  - πŸ“„ [Training Scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/playground) to enjoy your own Reasoner-Zero journey!
@@ -94,7 +98,7 @@ We release all of curated high-quality training data in the [`data`](https://git
94
  * [extended 72k](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/data/orz_math_72k_collection_extended.json), mainly cleaned from OpenR1-Math-220k.
95
  * [hard 13k](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/data/orz_math_13k_collection_hard.json), mined from the first stage of ORZ-32B training.
96
 
97
- The details for how to collect data are described in our [paper](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/ORZ_paper.pdf).
98
 
99
  ### Installation & Training Scripts
100
  We release our [Dockerfile](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/docker/Dockerfile) in [docker](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/docker) folder to facilitate the reproducibility of our training.
@@ -186,6 +190,14 @@ DEBUG_MODE=True python -m playground.orz_14m_ppo_mini
186
  DEBUG_MODE=True python -m playground.orz_7b_ppo
187
  ```
188
 
 
 
 
 
 
 
 
 
189
  ## Acknowledgements πŸ’–
190
 
191
  - This work was supported by computing resources and valuable feedback provided by [StepFun](https://www.stepfun.com/) and Tsinghua University.
@@ -209,11 +221,14 @@ We have several wechat groups to help discussions and sharing, you can scan the
209
  ## Citation
210
 
211
  ```bibtex
212
- @misc{OpenReasonerZero2025,
213
- title={Open-Reasoner-Zero: An Open Source Approach to Scaling Reinforcement Learning on the Base Model},
214
- author={Jingcheng Hu and Yinmin Zhang and Qi Han and Daxin Jiang and Xiangyu Zhang, Heung-Yeung Shum},
215
- year={2025},
216
- howpublished={\url{https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero}},
 
 
 
217
  }
218
  ```
219
-
 
1
  ---
2
  license: mit
3
+ library_name: transformers
4
+ pipeline_tag: reinforcement-learning
5
  ---
6
+
7
+ ```markdown
8
  <div align="center">
9
 
10
  # Open Reasoner Zero
 
41
 
42
  To enable broader participation in this pivotal moment we witnessed and accelerate research towards artificial general intelligence (AGI),
43
  we release our source code, parameter settings, training data, and model weights.
44
+ Please refer to our [paper](https://arxiv.org/abs/2503.24290) for more insights across various model sizes.
45
 
46
  **Let the Reasoner-Zero tide rise!**
47
 
 
60
  <strong>[2025/03/31]</strong>
61
  We announce a major milestone for `Open-Reasoner-Zero`:
62
 
63
+ - 🌊 [Updated Paper](https://arxiv.org/abs/2503.24290) with new results.
64
  - πŸ”­ [Easy-to-use Training Scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/playground):
65
  - [ORZ-1.5B training scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/playground/orz_1p5b_ppo.py) and [ORZ-0.5B training scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/playground/orz_0p5b_ppo.py) (main results in Figure 2).
66
  - [Minimal resource training scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/playground/orz_0p5b_ppo_1gpu.py): ORZ-0.5B can be run on a single A800/H800 gpu!
 
79
  We release `Open-Reasoner-Zero`.
80
 
81
  As part of this release, we open-source:
82
+ - 🌊 [Paper(WIP)](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/ORZ_paper.pdf) on our comprehensive analysis and insights in Reasoner-Zero training
83
  - πŸ€— HF Model [`Open-Reasoner-Zero-7B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-7B) and [`Open-Reasoner-Zero-32B`](https://huggingface.co/Open-Reasoner-Zero/Open-Reasoner-Zero-32B)
84
  - 🎁 [`Our curated 57k training data`](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/data)
85
  - πŸ“„ [Training Scripts](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/playground) to enjoy your own Reasoner-Zero journey!
 
98
  * [extended 72k](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/data/orz_math_72k_collection_extended.json), mainly cleaned from OpenR1-Math-220k.
99
  * [hard 13k](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/data/orz_math_13k_collection_hard.json), mined from the first stage of ORZ-32B training.
100
 
101
+ The details for how to collect data are described in our [paper](https://arxiv.org/abs/2503.24290).
102
 
103
  ### Installation & Training Scripts
104
  We release our [Dockerfile](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/docker/Dockerfile) in [docker](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/tree/main/docker) folder to facilitate the reproducibility of our training.
 
190
  DEBUG_MODE=True python -m playground.orz_7b_ppo
191
  ```
192
 
193
+ ### How to Use the Model
194
+ #### Policy Model
195
+ Policy models can be used in the same way as any chat model in transformers and vllm, since we have put the chat template jinja in the tokenizer.
196
+
197
+ #### Critic Model
198
+ Critic models can be loaded the same way like in the [training code](https://github.com/Open-Reasoner-Zero/Open-Reasoner-Zero/blob/main/orz/ppo/actors.py#L738).
199
+
200
+
201
  ## Acknowledgements πŸ’–
202
 
203
  - This work was supported by computing resources and valuable feedback provided by [StepFun](https://www.stepfun.com/) and Tsinghua University.
 
221
  ## Citation
222
 
223
  ```bibtex
224
+ @misc{hu2025openreasonerzeroopensourceapproach,
225
+ title={Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model},
226
+ author={Jingcheng Hu and Yinmin Zhang and Qi Han and Daxin Jiang and Xiangyu Zhang and Heung-Yeung Shum},
227
+ year={2025},
228
+ eprint={2503.24290},
229
+ archivePrefix={arXiv},
230
+ primaryClass={cs.LG},
231
+ url={https://arxiv.org/abs/2503.24290},
232
  }
233
  ```
234
+ ```