Text Generation
Transformers
Safetensors
English
olmo2
conversational
vwxyzjn commited on
Commit
b266600
·
verified ·
1 Parent(s): c0b7960

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md CHANGED
@@ -103,6 +103,87 @@ See the Falcon 180B model card for an example of this.
103
  | **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
104
 
105
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
106
  ## License and use
107
 
108
  OLMo 2 is licensed under the Apache 2.0 license.
 
103
  | **OLMo-2-32B-0325-Instruct** | 68.8 | 42.8 | 70.6 | 78.0 | 87.6 | 85.6 | 49.7 | 77.3 | 85.9 | 37.5 | 73.2 |
104
 
105
 
106
+
107
+
108
+ ## Learning curves
109
+
110
+ Below is the training curves for `allenai/OLMo-2-0325-32B-Instruct`. The model was trained using 5 8xH100 nodes.
111
+
112
+ ![](olmo-32b-instruct-learning-curve.png)
113
+
114
+ ![](olmo-32b-instruct-learning-curve-time)
115
+
116
+ Below are the core eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct` (note we took step `320` as the final checkpoint, corresponding to episode `573,440`):
117
+
118
+ ![](olmo-32b-instruct-eval-curve.png)
119
+
120
+ Below are the other eval scores over steps for `allenai/OLMo-2-0325-32B-Instruct`:
121
+
122
+ ![](olmo-32b-instruct-full-eval-curve.png)
123
+
124
+
125
+ ## Reproduction command
126
+
127
+ The command below is copied directly from the tracked training job:
128
+
129
+ ```bash
130
+ # clone and check out commit
131
+ git clone https://github.com/allenai/open-instruct.git
132
+ # this should be the correct commit, the main thing is to have the vllm monkey patch for
133
+ # 32b olmo https://github.com/allenai/open-instruct/blob/894ffa236319bc6c26c346240a7e4ee04ba0bd31/open_instruct/vllm_utils2.py#L37-L59
134
+ git checkout a51dc98525eec01de6e8a24c071f42dce407d738
135
+ uv sync
136
+ uv sync --extra compile
137
+
138
+ # note that you may need 5 8xH100 nodes for the training.
139
+ # so please setup ray properly, e.g., https://github.com/allenai/open-instruct/blob/main/docs/tulu3.md#llama-31-tulu-3-70b-reproduction
140
+ python open_instruct/grpo_vllm_thread_ray_gtrl.py \
141
+ --exp_name 0310_olmo2_32b_grpo_12818 \
142
+ --beta 0.01 \
143
+ --local_mini_batch_size 32 \
144
+ --number_samples_per_prompt 16 \
145
+ --output_dir output \
146
+ --local_rollout_batch_size 4 \
147
+ --kl_estimator kl3 \
148
+ --learning_rate 5e-7 \
149
+ --dataset_mixer_list allenai/RLVR-GSM-MATH-IF-Mixed-Constraints 1.0 \
150
+ --dataset_mixer_list_splits train \
151
+ --dataset_mixer_eval_list allenai/RLVR-GSM-MATH-IF-Mixed-Constraints 16 \
152
+ --dataset_mixer_eval_list_splits train \
153
+ --max_token_length 2048 \
154
+ --max_prompt_token_length 2048 \
155
+ --response_length 2048 \
156
+ --model_name_or_path allenai/OLMo-2-0325-32B-DPO \
157
+ --non_stop_penalty \
158
+ --stop_token eos \
159
+ --temperature 1.0 \
160
+ --ground_truths_key ground_truth \
161
+ --chat_template_name tulu \
162
+ --sft_messages_key messages \
163
+ --eval_max_length 4096 \
164
+ --total_episodes 10000000 \
165
+ --penalty_reward_value 0.0 \
166
+ --deepspeed_stage 3 \
167
+ --no_gather_whole_model \
168
+ --per_device_train_batch_size 2 \
169
+ --local_rollout_forward_batch_size 2 \
170
+ --actor_num_gpus_per_node 8 8 8 4 \
171
+ --num_epochs 1 \
172
+ --vllm_tensor_parallel_size 1 \
173
+ --vllm_num_engines 12 \
174
+ --lr_scheduler_type constant \
175
+ --apply_verifiable_reward true \
176
+ --seed 1 \
177
+ --num_evals 30 \
178
+ --save_freq 20 \
179
+ --reward_model_multiplier 0.0 \
180
+ --no_try_launch_beaker_eval_jobs \
181
+ --try_launch_beaker_eval_jobs_on_weka \
182
+ --gradient_checkpointing \
183
+ --with_tracking
184
+ ```
185
+
186
+
187
  ## License and use
188
 
189
  OLMo 2 is licensed under the Apache 2.0 license.