danielhanchen commited on
Commit
a4594b0
·
verified ·
1 Parent(s): 6cee5e8

Reinforcement Learning example

Browse files



@dkundel-openai
:)

Files changed (1) hide show
  1. README.md +10 -2
README.md CHANGED
@@ -163,9 +163,17 @@ The gpt-oss models are excellent for:
163
 
164
  # Fine-tuning
165
 
166
- Both gpt-oss models can be fine-tuned for a variety of specialized use cases.
167
 
168
- This smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) can be fine-tuned on a single H100 node.
 
 
 
 
 
 
 
 
169
 
170
  # Citation
171
 
 
163
 
164
  # Fine-tuning
165
 
166
+ Both gpt-oss models can be fine-tuned for a variety of specialized use-cases by using [transformers](https://github.com/huggingface/transformers) and [Unsloth](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune).
167
 
168
+ This smaller model `gpt-oss-20b` can be fine-tuned on consumer hardware, whereas the larger [`gpt-oss-120b`](https://huggingface.co/openai/gpt-oss-120b) can be fine-tuned on a single H100 GPU.
169
+
170
+ You can learn more about fine-tuning gpt-oss from [Hugging Face](https://cookbook.openai.com/articles/gpt-oss/fine-tune-transfomers) or [Unsloth’s guide](https://docs.unsloth.ai/new/gpt-oss-how-to-run-and-fine-tune#fine-tuning-gpt-oss-with-unsloth).
171
+
172
+ ## Reinforcement Fine-tuning
173
+
174
+ You can also train `gpt-oss` with reinforcement learning (RL).
175
+
176
+ [OpenAI’s notebook](https://github.com/openai/gpt-oss/blob/main/examples/reinforcement-fine-tuning.ipynb) shows how you can train `gpt-oss-20b` with RL to autonomously solve the 2048 game.
177
 
178
  # Citation
179