Update README.md
Browse files
README.md
CHANGED
@@ -103,6 +103,87 @@ See the Falcon 180B model card for an example of this.
|
|
103 |
| **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
|
104 |
|
105 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
106 |
## License and use
|
107 |
|
108 |
OLMo 2 is licensed under the Apache 2.0 license.
|
|
|
103 |
| **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
|
104 |
|
105 |
|
106 |
+
|
107 |
+
|
108 |
+
## Learning curves
|
109 |
+
|
110 |
+
Below is the training curves for `allenai/OLMo-2-0325-32B-Instruct`. The model was trained using 5 8xH100 nodes.
|
111 |
+
|
112 |
+

|
113 |
+
|
114 |
+

|
115 |
+
|
116 |
+
Below are the core eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct` (note we took step `320` as the final checkpoint, corresponding to episode `573,440`):
|
117 |
+
|
118 |
+

|
119 |
+
|
120 |
+
Below are the other eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct`:
|
121 |
+
|
122 |
+

|
123 |
+
|
124 |
+
|
125 |
+
## Reproduction command
|
126 |
+
|
127 |
+
The command below is copied directly from the tracked training job:
|
128 |
+
|
129 |
+
```bash
|
130 |
+
# clone and check out commit
|
131 |
+
git clone https://github.com/allenai/open-instruct.git
|
132 |
+
# this should be the correct commit, the main thing is to have the vllm monkey patch for
|
133 |
+
# 32b olmo https://github.com/allenai/open-instruct/blob/894ffa236319bc6c26c346240a7e4ee04ba0bd31/open_instruct/vllm_utils2.py#L37-L59
|
134 |
+
git checkout a51dc98525eec01de6e8a24c071f42dce407d738
|
135 |
+
uv sync
|
136 |
+
uv sync --extra compile
|
137 |
+
|
138 |
+
# note that you may need 5 8xH100 nodes for the training.
|
139 |
+
# so please setup ray properly, e.g., https://github.com/allenai/open-instruct/blob/main/docs/tulu3.md#llama-31-tulu-3-70b-reproduction
|
140 |
+
python open_instruct/grpo_vllm_thread_ray_gtrl.py \
|
141 |
+
--exp_name 0310_olmo2_32b_grpo_12818 \
|
142 |
+
--beta 0.01 \
|
143 |
+
--local_mini_batch_size 32 \
|
144 |
+
--number_samples_per_prompt 16 \
|
145 |
+
--output_dir output \
|
146 |
+
--local_rollout_batch_size 4 \
|
147 |
+
--kl_estimator kl3 \
|
148 |
+
--learning_rate 5e-7 \
|
149 |
+
--dataset_mixer_list allenai/RLVR-GSM-MATH-IF-Mixed-Constraints 1.0 \
|
150 |
+
--dataset_mixer_list_splits train \
|
151 |
+
--dataset_mixer_eval_list allenai/RLVR-GSM-MATH-IF-Mixed-Constraints 16 \
|
152 |
+
--dataset_mixer_eval_list_splits train \
|
153 |
+
--max_token_length 2048 \
|
154 |
+
--max_prompt_token_length 2048 \
|
155 |
+
--response_length 2048 \
|
156 |
+
--model_name_or_path allenai/OLMo-2-0325-32B-DPO \
|
157 |
+
--non_stop_penalty \
|
158 |
+
--stop_token eos \
|
159 |
+
--temperature 1.0 \
|
160 |
+
--ground_truths_key ground_truth \
|
161 |
+
--chat_template_name tulu \
|
162 |
+
--sft_messages_key messages \
|
163 |
+
--eval_max_length 4096 \
|
164 |
+
--total_episodes 10000000 \
|
165 |
+
--penalty_reward_value 0.0 \
|
166 |
+
--deepspeed_stage 3 \
|
167 |
+
--no_gather_whole_model \
|
168 |
+
--per_device_train_batch_size 2 \
|
169 |
+
--local_rollout_forward_batch_size 2 \
|
170 |
+
--actor_num_gpus_per_node 8 8 8 4 \
|
171 |
+
--num_epochs 1 \
|
172 |
+
--vllm_tensor_parallel_size 1 \
|
173 |
+
--vllm_num_engines 12 \
|
174 |
+
--lr_scheduler_type constant \
|
175 |
+
--apply_verifiable_reward true \
|
176 |
+
--seed 1 \
|
177 |
+
--num_evals 30 \
|
178 |
+
--save_freq 20 \
|
179 |
+
--reward_model_multiplier 0.0 \
|
180 |
+
--no_try_launch_beaker_eval_jobs \
|
181 |
+
--try_launch_beaker_eval_jobs_on_weka \
|
182 |
+
--gradient_checkpointing \
|
183 |
+
--with_tracking
|
184 |
+
```
|
185 |
+
|
186 |
+
|
187 |
## License and use
|
188 |
|
189 |
OLMo 2 is licensed under the Apache 2.0 license.
|