apriasmoro commited on
Commit
5ed955f
·
verified ·
1 Parent(s): 6e0cfde

Training in progress, step 42, checkpoint

Browse files
.gitattributes CHANGED
@@ -34,3 +34,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
 
 
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
  tokenizer.json filter=lfs diff=lfs merge=lfs -text
37
+ last-checkpoint/tokenizer.json filter=lfs diff=lfs merge=lfs -text
last-checkpoint/README.md ADDED
@@ -0,0 +1,202 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: Qwen/Qwen3-1.7B-Base
3
+ library_name: peft
4
+ ---
5
+
6
+ # Model Card for Model ID
7
+
8
+ <!-- Provide a quick summary of what the model is/does. -->
9
+
10
+
11
+
12
+ ## Model Details
13
+
14
+ ### Model Description
15
+
16
+ <!-- Provide a longer summary of what this model is. -->
17
+
18
+
19
+
20
+ - **Developed by:** [More Information Needed]
21
+ - **Funded by [optional]:** [More Information Needed]
22
+ - **Shared by [optional]:** [More Information Needed]
23
+ - **Model type:** [More Information Needed]
24
+ - **Language(s) (NLP):** [More Information Needed]
25
+ - **License:** [More Information Needed]
26
+ - **Finetuned from model [optional]:** [More Information Needed]
27
+
28
+ ### Model Sources [optional]
29
+
30
+ <!-- Provide the basic links for the model. -->
31
+
32
+ - **Repository:** [More Information Needed]
33
+ - **Paper [optional]:** [More Information Needed]
34
+ - **Demo [optional]:** [More Information Needed]
35
+
36
+ ## Uses
37
+
38
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
+
40
+ ### Direct Use
41
+
42
+ <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
+
44
+ [More Information Needed]
45
+
46
+ ### Downstream Use [optional]
47
+
48
+ <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
+
50
+ [More Information Needed]
51
+
52
+ ### Out-of-Scope Use
53
+
54
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
+
56
+ [More Information Needed]
57
+
58
+ ## Bias, Risks, and Limitations
59
+
60
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
+
62
+ [More Information Needed]
63
+
64
+ ### Recommendations
65
+
66
+ <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
67
+
68
+ Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
+
70
+ ## How to Get Started with the Model
71
+
72
+ Use the code below to get started with the model.
73
+
74
+ [More Information Needed]
75
+
76
+ ## Training Details
77
+
78
+ ### Training Data
79
+
80
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
81
+
82
+ [More Information Needed]
83
+
84
+ ### Training Procedure
85
+
86
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
87
+
88
+ #### Preprocessing [optional]
89
+
90
+ [More Information Needed]
91
+
92
+
93
+ #### Training Hyperparameters
94
+
95
+ - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
+
97
+ #### Speeds, Sizes, Times [optional]
98
+
99
+ <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
100
+
101
+ [More Information Needed]
102
+
103
+ ## Evaluation
104
+
105
+ <!-- This section describes the evaluation protocols and provides the results. -->
106
+
107
+ ### Testing Data, Factors & Metrics
108
+
109
+ #### Testing Data
110
+
111
+ <!-- This should link to a Dataset Card if possible. -->
112
+
113
+ [More Information Needed]
114
+
115
+ #### Factors
116
+
117
+ <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
+
119
+ [More Information Needed]
120
+
121
+ #### Metrics
122
+
123
+ <!-- These are the evaluation metrics being used, ideally with a description of why. -->
124
+
125
+ [More Information Needed]
126
+
127
+ ### Results
128
+
129
+ [More Information Needed]
130
+
131
+ #### Summary
132
+
133
+
134
+
135
+ ## Model Examination [optional]
136
+
137
+ <!-- Relevant interpretability work for the model goes here -->
138
+
139
+ [More Information Needed]
140
+
141
+ ## Environmental Impact
142
+
143
+ <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
+
145
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
+
147
+ - **Hardware Type:** [More Information Needed]
148
+ - **Hours used:** [More Information Needed]
149
+ - **Cloud Provider:** [More Information Needed]
150
+ - **Compute Region:** [More Information Needed]
151
+ - **Carbon Emitted:** [More Information Needed]
152
+
153
+ ## Technical Specifications [optional]
154
+
155
+ ### Model Architecture and Objective
156
+
157
+ [More Information Needed]
158
+
159
+ ### Compute Infrastructure
160
+
161
+ [More Information Needed]
162
+
163
+ #### Hardware
164
+
165
+ [More Information Needed]
166
+
167
+ #### Software
168
+
169
+ [More Information Needed]
170
+
171
+ ## Citation [optional]
172
+
173
+ <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
+
175
+ **BibTeX:**
176
+
177
+ [More Information Needed]
178
+
179
+ **APA:**
180
+
181
+ [More Information Needed]
182
+
183
+ ## Glossary [optional]
184
+
185
+ <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
+
187
+ [More Information Needed]
188
+
189
+ ## More Information [optional]
190
+
191
+ [More Information Needed]
192
+
193
+ ## Model Card Authors [optional]
194
+
195
+ [More Information Needed]
196
+
197
+ ## Model Card Contact
198
+
199
+ [More Information Needed]
200
+ ### Framework versions
201
+
202
+ - PEFT 0.15.2
last-checkpoint/adapter_config.json ADDED
@@ -0,0 +1,39 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "alpha_pattern": {},
3
+ "auto_mapping": null,
4
+ "base_model_name_or_path": "Qwen/Qwen3-1.7B-Base",
5
+ "bias": "none",
6
+ "corda_config": null,
7
+ "eva_config": null,
8
+ "exclude_modules": null,
9
+ "fan_in_fan_out": null,
10
+ "inference_mode": true,
11
+ "init_lora_weights": true,
12
+ "layer_replication": null,
13
+ "layers_pattern": null,
14
+ "layers_to_transform": null,
15
+ "loftq_config": {},
16
+ "lora_alpha": 16,
17
+ "lora_bias": false,
18
+ "lora_dropout": 0.05,
19
+ "megatron_config": null,
20
+ "megatron_core": "megatron.core",
21
+ "modules_to_save": null,
22
+ "peft_type": "LORA",
23
+ "r": 8,
24
+ "rank_pattern": {},
25
+ "revision": null,
26
+ "target_modules": [
27
+ "v_proj",
28
+ "k_proj",
29
+ "o_proj",
30
+ "gate_proj",
31
+ "up_proj",
32
+ "down_proj",
33
+ "q_proj"
34
+ ],
35
+ "task_type": "CAUSAL_LM",
36
+ "trainable_token_indices": null,
37
+ "use_dora": false,
38
+ "use_rslora": false
39
+ }
last-checkpoint/adapter_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb6858159d2462c5747e8d48f6dc28617462702164449f75fdb3d6700f7b25f2
3
+ size 34916720
last-checkpoint/added_tokens.json ADDED
@@ -0,0 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</think>": 151668,
3
+ "</tool_call>": 151658,
4
+ "</tool_response>": 151666,
5
+ "<think>": 151667,
6
+ "<tool_call>": 151657,
7
+ "<tool_response>": 151665,
8
+ "<|box_end|>": 151649,
9
+ "<|box_start|>": 151648,
10
+ "<|endoftext|>": 151643,
11
+ "<|file_sep|>": 151664,
12
+ "<|fim_middle|>": 151660,
13
+ "<|fim_pad|>": 151662,
14
+ "<|fim_prefix|>": 151659,
15
+ "<|fim_suffix|>": 151661,
16
+ "<|im_end|>": 151645,
17
+ "<|im_start|>": 151644,
18
+ "<|image_pad|>": 151655,
19
+ "<|object_ref_end|>": 151647,
20
+ "<|object_ref_start|>": 151646,
21
+ "<|quad_end|>": 151651,
22
+ "<|quad_start|>": 151650,
23
+ "<|repo_name|>": 151663,
24
+ "<|video_pad|>": 151656,
25
+ "<|vision_end|>": 151653,
26
+ "<|vision_pad|>": 151654,
27
+ "<|vision_start|>": 151652
28
+ }
last-checkpoint/merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
last-checkpoint/optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b261ea90f043c82150924487b970769232577ddfd8507df7a493be77f9355d6a
3
+ size 18162996
last-checkpoint/rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e1e0c02ac949d09dbeb7208e8b2463d02d3220e0d83c58bce28d56f61ba483b3
3
+ size 14244
last-checkpoint/scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:93e851a38910d314fd8b7919dd43fbaf9d456c3ac40c28c5aeffc9d42d0bbb4a
3
+ size 1064
last-checkpoint/special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|endoftext|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
last-checkpoint/tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:67cc0080ffd7555f723f423c27cfef314e1ad9d335c8b79f465c5faba1ed478b
3
+ size 11422821
last-checkpoint/tokenizer_config.json ADDED
@@ -0,0 +1,240 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ },
181
+ "151665": {
182
+ "content": "<tool_response>",
183
+ "lstrip": false,
184
+ "normalized": false,
185
+ "rstrip": false,
186
+ "single_word": false,
187
+ "special": false
188
+ },
189
+ "151666": {
190
+ "content": "</tool_response>",
191
+ "lstrip": false,
192
+ "normalized": false,
193
+ "rstrip": false,
194
+ "single_word": false,
195
+ "special": false
196
+ },
197
+ "151667": {
198
+ "content": "<think>",
199
+ "lstrip": false,
200
+ "normalized": false,
201
+ "rstrip": false,
202
+ "single_word": false,
203
+ "special": false
204
+ },
205
+ "151668": {
206
+ "content": "</think>",
207
+ "lstrip": false,
208
+ "normalized": false,
209
+ "rstrip": false,
210
+ "single_word": false,
211
+ "special": false
212
+ }
213
+ },
214
+ "additional_special_tokens": [
215
+ "<|im_start|>",
216
+ "<|im_end|>",
217
+ "<|object_ref_start|>",
218
+ "<|object_ref_end|>",
219
+ "<|box_start|>",
220
+ "<|box_end|>",
221
+ "<|quad_start|>",
222
+ "<|quad_end|>",
223
+ "<|vision_start|>",
224
+ "<|vision_end|>",
225
+ "<|vision_pad|>",
226
+ "<|image_pad|>",
227
+ "<|video_pad|>"
228
+ ],
229
+ "bos_token": null,
230
+ "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
231
+ "clean_up_tokenization_spaces": false,
232
+ "eos_token": "<|endoftext|>",
233
+ "errors": "replace",
234
+ "extra_special_tokens": {},
235
+ "model_max_length": 131072,
236
+ "pad_token": "<|endoftext|>",
237
+ "split_special_tokens": false,
238
+ "tokenizer_class": "Qwen2Tokenizer",
239
+ "unk_token": null
240
+ }
last-checkpoint/trainer_state.json ADDED
@@ -0,0 +1,1260 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "best_global_step": null,
3
+ "best_metric": null,
4
+ "best_model_checkpoint": null,
5
+ "epoch": 2.982456140350877,
6
+ "eval_steps": 22,
7
+ "global_step": 42,
8
+ "is_hyper_param_search": false,
9
+ "is_local_process_zero": true,
10
+ "is_world_process_zero": true,
11
+ "log_history": [
12
+ {
13
+ "clip_ratio/high_max": 0.0,
14
+ "clip_ratio/high_mean": 0.0,
15
+ "clip_ratio/low_mean": 0.0,
16
+ "clip_ratio/low_min": 0.0,
17
+ "clip_ratio/region_mean": 0.0,
18
+ "completions/clipped_ratio": 0.515625,
19
+ "completions/max_length": 256.0,
20
+ "completions/max_terminated_length": 255.0,
21
+ "completions/mean_length": 163.984375,
22
+ "completions/mean_terminated_length": 66.03225708007812,
23
+ "completions/min_length": 1.0,
24
+ "completions/min_terminated_length": 1.0,
25
+ "epoch": 0.07017543859649122,
26
+ "grad_norm": 0.13955183327198029,
27
+ "kl": 0.0,
28
+ "learning_rate": 0.0,
29
+ "loss": 0.24,
30
+ "num_tokens": 36143.0,
31
+ "reward": 42.860870361328125,
32
+ "reward_std": 12.139046669006348,
33
+ "rewards/conciseness_reward_func/mean": 2.915491819381714,
34
+ "rewards/conciseness_reward_func/std": 3.442464590072632,
35
+ "rewards/reward_func_conciseness/mean": 2.915491819381714,
36
+ "rewards/reward_func_conciseness/std": 3.442464590072632,
37
+ "rewards/reward_func_sensitivity/mean": 1.0,
38
+ "rewards/reward_func_sensitivity/std": 0.0,
39
+ "step": 1
40
+ },
41
+ {
42
+ "clip_ratio/high_max": 0.0,
43
+ "clip_ratio/high_mean": 0.0,
44
+ "clip_ratio/low_mean": 0.0,
45
+ "clip_ratio/low_min": 0.0,
46
+ "clip_ratio/region_mean": 0.0,
47
+ "completions/clipped_ratio": 0.28125,
48
+ "completions/max_length": 256.0,
49
+ "completions/max_terminated_length": 200.0,
50
+ "completions/mean_length": 112.59375,
51
+ "completions/mean_terminated_length": 56.4782600402832,
52
+ "completions/min_length": 1.0,
53
+ "completions/min_terminated_length": 1.0,
54
+ "epoch": 0.14035087719298245,
55
+ "grad_norm": 0.16610237956047058,
56
+ "kl": 0.0,
57
+ "learning_rate": 2.0000000000000003e-06,
58
+ "loss": 0.3533,
59
+ "num_tokens": 68937.0,
60
+ "reward": 44.994163513183594,
61
+ "reward_std": 21.462963104248047,
62
+ "rewards/conciseness_reward_func/mean": 3.075274705886841,
63
+ "rewards/conciseness_reward_func/std": 3.1286842823028564,
64
+ "rewards/reward_func_conciseness/mean": 3.075274705886841,
65
+ "rewards/reward_func_conciseness/std": 3.1286842823028564,
66
+ "rewards/reward_func_sensitivity/mean": 0.984375,
67
+ "rewards/reward_func_sensitivity/std": 0.125,
68
+ "step": 2
69
+ },
70
+ {
71
+ "clip_ratio/high_max": 0.0,
72
+ "clip_ratio/high_mean": 0.0,
73
+ "clip_ratio/low_mean": 0.0,
74
+ "clip_ratio/low_min": 0.0,
75
+ "clip_ratio/region_mean": 0.0,
76
+ "completions/clipped_ratio": 0.203125,
77
+ "completions/max_length": 256.0,
78
+ "completions/max_terminated_length": 230.0,
79
+ "completions/mean_length": 108.375,
80
+ "completions/mean_terminated_length": 70.74510192871094,
81
+ "completions/min_length": 1.0,
82
+ "completions/min_terminated_length": 1.0,
83
+ "epoch": 0.21052631578947367,
84
+ "grad_norm": 0.1859525740146637,
85
+ "kl": 0.0014283501222962514,
86
+ "learning_rate": 4.000000000000001e-06,
87
+ "loss": 0.376,
88
+ "num_tokens": 102181.0,
89
+ "reward": 36.23883056640625,
90
+ "reward_std": 15.684919357299805,
91
+ "rewards/conciseness_reward_func/mean": 2.4338905811309814,
92
+ "rewards/conciseness_reward_func/std": 2.2532119750976562,
93
+ "rewards/reward_func_conciseness/mean": 2.4338905811309814,
94
+ "rewards/reward_func_conciseness/std": 2.2532119750976562,
95
+ "rewards/reward_func_sensitivity/mean": 0.984375,
96
+ "rewards/reward_func_sensitivity/std": 0.125,
97
+ "step": 3
98
+ },
99
+ {
100
+ "clip_ratio/high_max": 0.0,
101
+ "clip_ratio/high_mean": 0.0,
102
+ "clip_ratio/low_mean": 0.0,
103
+ "clip_ratio/low_min": 0.0,
104
+ "clip_ratio/region_mean": 0.0,
105
+ "completions/clipped_ratio": 0.4375,
106
+ "completions/max_length": 256.0,
107
+ "completions/max_terminated_length": 253.0,
108
+ "completions/mean_length": 162.140625,
109
+ "completions/mean_terminated_length": 89.1388931274414,
110
+ "completions/min_length": 3.0,
111
+ "completions/min_terminated_length": 3.0,
112
+ "epoch": 0.2807017543859649,
113
+ "grad_norm": 0.16273996233940125,
114
+ "kl": 0.0016980907894321717,
115
+ "learning_rate": 6e-06,
116
+ "loss": 0.2831,
117
+ "num_tokens": 139390.0,
118
+ "reward": 25.656160354614258,
119
+ "reward_std": 15.340396881103516,
120
+ "rewards/conciseness_reward_func/mean": 1.655137062072754,
121
+ "rewards/conciseness_reward_func/std": 1.8140530586242676,
122
+ "rewards/reward_func_conciseness/mean": 1.655137062072754,
123
+ "rewards/reward_func_conciseness/std": 1.8140530586242676,
124
+ "rewards/reward_func_sensitivity/mean": 1.0,
125
+ "rewards/reward_func_sensitivity/std": 0.0,
126
+ "step": 4
127
+ },
128
+ {
129
+ "clip_ratio/high_max": 0.0,
130
+ "clip_ratio/high_mean": 0.0,
131
+ "clip_ratio/low_mean": 0.0,
132
+ "clip_ratio/low_min": 0.0,
133
+ "clip_ratio/region_mean": 0.0,
134
+ "completions/clipped_ratio": 0.28125,
135
+ "completions/max_length": 256.0,
136
+ "completions/max_terminated_length": 253.0,
137
+ "completions/mean_length": 116.8125,
138
+ "completions/mean_terminated_length": 62.34782791137695,
139
+ "completions/min_length": 1.0,
140
+ "completions/min_terminated_length": 1.0,
141
+ "epoch": 0.3508771929824561,
142
+ "grad_norm": 0.235906720161438,
143
+ "kl": 0.001262298581423238,
144
+ "learning_rate": 8.000000000000001e-06,
145
+ "loss": 0.2293,
146
+ "num_tokens": 173774.0,
147
+ "reward": 49.87214660644531,
148
+ "reward_std": 13.10794734954834,
149
+ "rewards/conciseness_reward_func/mean": 3.4291129112243652,
150
+ "rewards/conciseness_reward_func/std": 3.4430530071258545,
151
+ "rewards/reward_func_conciseness/mean": 3.4291129112243652,
152
+ "rewards/reward_func_conciseness/std": 3.4430530071258545,
153
+ "rewards/reward_func_sensitivity/mean": 1.0,
154
+ "rewards/reward_func_sensitivity/std": 0.0,
155
+ "step": 5
156
+ },
157
+ {
158
+ "clip_ratio/high_max": 0.0,
159
+ "clip_ratio/high_mean": 0.0,
160
+ "clip_ratio/low_mean": 0.0,
161
+ "clip_ratio/low_min": 0.0,
162
+ "clip_ratio/region_mean": 0.0,
163
+ "completions/clipped_ratio": 0.296875,
164
+ "completions/max_length": 256.0,
165
+ "completions/max_terminated_length": 217.0,
166
+ "completions/mean_length": 108.984375,
167
+ "completions/mean_terminated_length": 46.91111373901367,
168
+ "completions/min_length": 1.0,
169
+ "completions/min_terminated_length": 1.0,
170
+ "epoch": 0.42105263157894735,
171
+ "grad_norm": 0.1428779512643814,
172
+ "kl": 0.0010839882525033318,
173
+ "learning_rate": 1e-05,
174
+ "loss": 0.295,
175
+ "num_tokens": 207933.0,
176
+ "reward": 48.97629165649414,
177
+ "reward_std": 14.069400787353516,
178
+ "rewards/conciseness_reward_func/mean": 3.36348557472229,
179
+ "rewards/conciseness_reward_func/std": 2.850590705871582,
180
+ "rewards/reward_func_conciseness/mean": 3.36348557472229,
181
+ "rewards/reward_func_conciseness/std": 2.850590705871582,
182
+ "rewards/reward_func_sensitivity/mean": 1.0,
183
+ "rewards/reward_func_sensitivity/std": 0.0,
184
+ "step": 6
185
+ },
186
+ {
187
+ "clip_ratio/high_max": 0.0,
188
+ "clip_ratio/high_mean": 0.0,
189
+ "clip_ratio/low_mean": 0.0,
190
+ "clip_ratio/low_min": 0.0,
191
+ "clip_ratio/region_mean": 0.0,
192
+ "completions/clipped_ratio": 0.328125,
193
+ "completions/max_length": 256.0,
194
+ "completions/max_terminated_length": 247.0,
195
+ "completions/mean_length": 134.859375,
196
+ "completions/mean_terminated_length": 75.69767761230469,
197
+ "completions/min_length": 1.0,
198
+ "completions/min_terminated_length": 1.0,
199
+ "epoch": 0.49122807017543857,
200
+ "grad_norm": 0.16079330444335938,
201
+ "kl": 0.0015518797299591824,
202
+ "learning_rate": 1.2e-05,
203
+ "loss": 0.3841,
204
+ "num_tokens": 242164.0,
205
+ "reward": 40.38452911376953,
206
+ "reward_std": 25.31698989868164,
207
+ "rewards/conciseness_reward_func/mean": 2.734084129333496,
208
+ "rewards/conciseness_reward_func/std": 3.096156358718872,
209
+ "rewards/reward_func_conciseness/mean": 2.734084129333496,
210
+ "rewards/reward_func_conciseness/std": 3.096156358718872,
211
+ "rewards/reward_func_sensitivity/mean": 1.0,
212
+ "rewards/reward_func_sensitivity/std": 0.0,
213
+ "step": 7
214
+ },
215
+ {
216
+ "clip_ratio/high_max": 0.0,
217
+ "clip_ratio/high_mean": 0.0,
218
+ "clip_ratio/low_mean": 0.0,
219
+ "clip_ratio/low_min": 0.0,
220
+ "clip_ratio/region_mean": 0.0,
221
+ "completions/clipped_ratio": 0.359375,
222
+ "completions/max_length": 256.0,
223
+ "completions/max_terminated_length": 237.0,
224
+ "completions/mean_length": 130.484375,
225
+ "completions/mean_terminated_length": 60.07316970825195,
226
+ "completions/min_length": 1.0,
227
+ "completions/min_terminated_length": 1.0,
228
+ "epoch": 0.5614035087719298,
229
+ "grad_norm": 0.17502829432487488,
230
+ "kl": 0.001823052079998888,
231
+ "learning_rate": 1.4000000000000001e-05,
232
+ "loss": 0.4548,
233
+ "num_tokens": 276087.0,
234
+ "reward": 40.271270751953125,
235
+ "reward_std": 19.694808959960938,
236
+ "rewards/conciseness_reward_func/mean": 2.7327980995178223,
237
+ "rewards/conciseness_reward_func/std": 2.980412483215332,
238
+ "rewards/reward_func_conciseness/mean": 2.7327980995178223,
239
+ "rewards/reward_func_conciseness/std": 2.980412483215332,
240
+ "rewards/reward_func_sensitivity/mean": 0.96875,
241
+ "rewards/reward_func_sensitivity/std": 0.17536810040473938,
242
+ "step": 8
243
+ },
244
+ {
245
+ "clip_ratio/high_max": 0.0,
246
+ "clip_ratio/high_mean": 0.0,
247
+ "clip_ratio/low_mean": 0.0,
248
+ "clip_ratio/low_min": 0.0,
249
+ "clip_ratio/region_mean": 0.0,
250
+ "completions/clipped_ratio": 0.234375,
251
+ "completions/max_length": 256.0,
252
+ "completions/max_terminated_length": 256.0,
253
+ "completions/mean_length": 105.640625,
254
+ "completions/mean_terminated_length": 59.61224365234375,
255
+ "completions/min_length": 2.0,
256
+ "completions/min_terminated_length": 2.0,
257
+ "epoch": 0.631578947368421,
258
+ "grad_norm": 0.22610169649124146,
259
+ "kl": 0.001264312981220428,
260
+ "learning_rate": 1.6000000000000003e-05,
261
+ "loss": 0.3772,
262
+ "num_tokens": 307784.0,
263
+ "reward": 47.05876922607422,
264
+ "reward_std": 16.484901428222656,
265
+ "rewards/conciseness_reward_func/mean": 3.226520538330078,
266
+ "rewards/conciseness_reward_func/std": 2.9404289722442627,
267
+ "rewards/reward_func_conciseness/mean": 3.226520538330078,
268
+ "rewards/reward_func_conciseness/std": 2.9404289722442627,
269
+ "rewards/reward_func_sensitivity/mean": 0.984375,
270
+ "rewards/reward_func_sensitivity/std": 0.125,
271
+ "step": 9
272
+ },
273
+ {
274
+ "clip_ratio/high_max": 0.0,
275
+ "clip_ratio/high_mean": 0.0,
276
+ "clip_ratio/low_mean": 0.0,
277
+ "clip_ratio/low_min": 0.0,
278
+ "clip_ratio/region_mean": 0.0,
279
+ "completions/clipped_ratio": 0.3125,
280
+ "completions/max_length": 256.0,
281
+ "completions/max_terminated_length": 247.0,
282
+ "completions/mean_length": 135.09375,
283
+ "completions/mean_terminated_length": 80.13636779785156,
284
+ "completions/min_length": 1.0,
285
+ "completions/min_terminated_length": 1.0,
286
+ "epoch": 0.7017543859649122,
287
+ "grad_norm": 0.21073277294635773,
288
+ "kl": 0.0018543089681770653,
289
+ "learning_rate": 1.8e-05,
290
+ "loss": 0.4249,
291
+ "num_tokens": 342698.0,
292
+ "reward": 33.26258087158203,
293
+ "reward_std": 19.154817581176758,
294
+ "rewards/conciseness_reward_func/mean": 2.2158610820770264,
295
+ "rewards/conciseness_reward_func/std": 2.6055219173431396,
296
+ "rewards/reward_func_conciseness/mean": 2.2158610820770264,
297
+ "rewards/reward_func_conciseness/std": 2.6055219173431396,
298
+ "rewards/reward_func_sensitivity/mean": 0.984375,
299
+ "rewards/reward_func_sensitivity/std": 0.125,
300
+ "step": 10
301
+ },
302
+ {
303
+ "clip_ratio/high_max": 0.0,
304
+ "clip_ratio/high_mean": 0.0,
305
+ "clip_ratio/low_mean": 0.0,
306
+ "clip_ratio/low_min": 0.0,
307
+ "clip_ratio/region_mean": 0.0,
308
+ "completions/clipped_ratio": 0.25,
309
+ "completions/max_length": 256.0,
310
+ "completions/max_terminated_length": 246.0,
311
+ "completions/mean_length": 98.578125,
312
+ "completions/mean_terminated_length": 46.10416793823242,
313
+ "completions/min_length": 1.0,
314
+ "completions/min_terminated_length": 1.0,
315
+ "epoch": 0.7719298245614035,
316
+ "grad_norm": 0.24806265532970428,
317
+ "kl": 0.0015648197004338726,
318
+ "learning_rate": 2e-05,
319
+ "loss": 0.3982,
320
+ "num_tokens": 375123.0,
321
+ "reward": 47.2392463684082,
322
+ "reward_std": 24.591474533081055,
323
+ "rewards/conciseness_reward_func/mean": 3.236236095428467,
324
+ "rewards/conciseness_reward_func/std": 3.028409004211426,
325
+ "rewards/reward_func_conciseness/mean": 3.236236095428467,
326
+ "rewards/reward_func_conciseness/std": 3.028409004211426,
327
+ "rewards/reward_func_sensitivity/mean": 1.0,
328
+ "rewards/reward_func_sensitivity/std": 0.0,
329
+ "step": 11
330
+ },
331
+ {
332
+ "clip_ratio/high_max": 0.0,
333
+ "clip_ratio/high_mean": 0.0,
334
+ "clip_ratio/low_mean": 0.0,
335
+ "clip_ratio/low_min": 0.0,
336
+ "clip_ratio/region_mean": 0.0,
337
+ "completions/clipped_ratio": 0.328125,
338
+ "completions/max_length": 256.0,
339
+ "completions/max_terminated_length": 252.0,
340
+ "completions/mean_length": 119.890625,
341
+ "completions/mean_terminated_length": 53.41860580444336,
342
+ "completions/min_length": 1.0,
343
+ "completions/min_terminated_length": 1.0,
344
+ "epoch": 0.8421052631578947,
345
+ "grad_norm": 0.16092327237129211,
346
+ "kl": 0.0013837351434631273,
347
+ "learning_rate": 2.2000000000000003e-05,
348
+ "loss": 0.4597,
349
+ "num_tokens": 407736.0,
350
+ "reward": 45.030216217041016,
351
+ "reward_std": 27.396785736083984,
352
+ "rewards/conciseness_reward_func/mean": 3.0744104385375977,
353
+ "rewards/conciseness_reward_func/std": 3.167632818222046,
354
+ "rewards/reward_func_conciseness/mean": 3.0744104385375977,
355
+ "rewards/reward_func_conciseness/std": 3.167632818222046,
356
+ "rewards/reward_func_sensitivity/mean": 1.0,
357
+ "rewards/reward_func_sensitivity/std": 0.0,
358
+ "step": 12
359
+ },
360
+ {
361
+ "clip_ratio/high_max": 0.0,
362
+ "clip_ratio/high_mean": 0.0,
363
+ "clip_ratio/low_mean": 0.0,
364
+ "clip_ratio/low_min": 0.0,
365
+ "clip_ratio/region_mean": 0.0,
366
+ "completions/clipped_ratio": 0.1875,
367
+ "completions/max_length": 256.0,
368
+ "completions/max_terminated_length": 247.0,
369
+ "completions/mean_length": 86.15625,
370
+ "completions/mean_terminated_length": 46.96154022216797,
371
+ "completions/min_length": 2.0,
372
+ "completions/min_terminated_length": 2.0,
373
+ "epoch": 0.9122807017543859,
374
+ "grad_norm": 0.3321259319782257,
375
+ "kl": 0.0020260919045540504,
376
+ "learning_rate": 2.4e-05,
377
+ "loss": 0.4805,
378
+ "num_tokens": 437690.0,
379
+ "reward": 48.95707702636719,
380
+ "reward_std": 23.871448516845703,
381
+ "rewards/conciseness_reward_func/mean": 3.3620777130126953,
382
+ "rewards/conciseness_reward_func/std": 2.91607403755188,
383
+ "rewards/reward_func_conciseness/mean": 3.3620777130126953,
384
+ "rewards/reward_func_conciseness/std": 2.91607403755188,
385
+ "rewards/reward_func_sensitivity/mean": 1.0,
386
+ "rewards/reward_func_sensitivity/std": 0.0,
387
+ "step": 13
388
+ },
389
+ {
390
+ "clip_ratio/high_max": 0.0,
391
+ "clip_ratio/high_mean": 0.0,
392
+ "clip_ratio/low_mean": 0.0,
393
+ "clip_ratio/low_min": 0.0,
394
+ "clip_ratio/region_mean": 0.0,
395
+ "completions/clipped_ratio": 0.2777777777777778,
396
+ "completions/max_length": 256.0,
397
+ "completions/max_terminated_length": 212.0,
398
+ "completions/mean_length": 119.3888931274414,
399
+ "completions/mean_terminated_length": 66.84615325927734,
400
+ "completions/min_length": 16.0,
401
+ "completions/min_terminated_length": 16.0,
402
+ "epoch": 0.9824561403508771,
403
+ "grad_norm": 0.18372154235839844,
404
+ "kl": 0.0016084623784990981,
405
+ "learning_rate": 2.6000000000000002e-05,
406
+ "loss": 0.2136,
407
+ "num_tokens": 471753.0,
408
+ "reward": 36.678627014160156,
409
+ "reward_std": 13.176652908325195,
410
+ "rewards/conciseness_reward_func/mean": 2.4626033306121826,
411
+ "rewards/conciseness_reward_func/std": 2.459620952606201,
412
+ "rewards/reward_func_conciseness/mean": 2.4626033306121826,
413
+ "rewards/reward_func_conciseness/std": 2.459620952606201,
414
+ "rewards/reward_func_sensitivity/mean": 1.0,
415
+ "rewards/reward_func_sensitivity/std": 0.0,
416
+ "step": 14
417
+ },
418
+ {
419
+ "clip_ratio/high_max": 0.0,
420
+ "clip_ratio/high_mean": 0.0,
421
+ "clip_ratio/low_mean": 0.0,
422
+ "clip_ratio/low_min": 0.0,
423
+ "clip_ratio/region_mean": 0.0,
424
+ "completions/clipped_ratio": 0.296875,
425
+ "completions/max_length": 256.0,
426
+ "completions/max_terminated_length": 212.0,
427
+ "completions/mean_length": 105.671875,
428
+ "completions/mean_terminated_length": 42.20000076293945,
429
+ "completions/min_length": 1.0,
430
+ "completions/min_terminated_length": 1.0,
431
+ "epoch": 1.0701754385964912,
432
+ "grad_norm": 0.2454100102186203,
433
+ "kl": 0.0014291432889876887,
434
+ "learning_rate": 2.8000000000000003e-05,
435
+ "loss": 0.3836,
436
+ "num_tokens": 504352.0,
437
+ "reward": 47.79035186767578,
438
+ "reward_std": 20.939437866210938,
439
+ "rewards/conciseness_reward_func/mean": 3.2766082286834717,
440
+ "rewards/conciseness_reward_func/std": 3.099247455596924,
441
+ "rewards/reward_func_conciseness/mean": 3.2766082286834717,
442
+ "rewards/reward_func_conciseness/std": 3.099247455596924,
443
+ "rewards/reward_func_sensitivity/mean": 1.0,
444
+ "rewards/reward_func_sensitivity/std": 0.0,
445
+ "step": 15
446
+ },
447
+ {
448
+ "clip_ratio/high_max": 0.0,
449
+ "clip_ratio/high_mean": 0.0,
450
+ "clip_ratio/low_mean": 0.0,
451
+ "clip_ratio/low_min": 0.0,
452
+ "clip_ratio/region_mean": 0.0,
453
+ "completions/clipped_ratio": 0.125,
454
+ "completions/max_length": 256.0,
455
+ "completions/max_terminated_length": 168.0,
456
+ "completions/mean_length": 62.75,
457
+ "completions/mean_terminated_length": 35.142860412597656,
458
+ "completions/min_length": 1.0,
459
+ "completions/min_terminated_length": 1.0,
460
+ "epoch": 1.1403508771929824,
461
+ "grad_norm": 0.25173822045326233,
462
+ "kl": 0.0016040767804952338,
463
+ "learning_rate": 3e-05,
464
+ "loss": 0.5928,
465
+ "num_tokens": 532592.0,
466
+ "reward": 56.10843276977539,
467
+ "reward_std": 28.642688751220703,
468
+ "rewards/conciseness_reward_func/mean": 3.885960578918457,
469
+ "rewards/conciseness_reward_func/std": 2.9640209674835205,
470
+ "rewards/reward_func_conciseness/mean": 3.885960578918457,
471
+ "rewards/reward_func_conciseness/std": 2.9640209674835205,
472
+ "rewards/reward_func_sensitivity/mean": 1.0,
473
+ "rewards/reward_func_sensitivity/std": 0.0,
474
+ "step": 16
475
+ },
476
+ {
477
+ "clip_ratio/high_max": 0.0,
478
+ "clip_ratio/high_mean": 0.0,
479
+ "clip_ratio/low_mean": 0.0,
480
+ "clip_ratio/low_min": 0.0,
481
+ "clip_ratio/region_mean": 0.0,
482
+ "completions/clipped_ratio": 0.296875,
483
+ "completions/max_length": 256.0,
484
+ "completions/max_terminated_length": 253.0,
485
+ "completions/mean_length": 123.34375,
486
+ "completions/mean_terminated_length": 67.33333587646484,
487
+ "completions/min_length": 1.0,
488
+ "completions/min_terminated_length": 1.0,
489
+ "epoch": 1.2105263157894737,
490
+ "grad_norm": 0.14586757123470306,
491
+ "kl": 0.001644643591134809,
492
+ "learning_rate": 3.2000000000000005e-05,
493
+ "loss": 0.3506,
494
+ "num_tokens": 567406.0,
495
+ "reward": 40.49015808105469,
496
+ "reward_std": 20.766590118408203,
497
+ "rewards/conciseness_reward_func/mean": 2.7418220043182373,
498
+ "rewards/conciseness_reward_func/std": 2.6700687408447266,
499
+ "rewards/reward_func_conciseness/mean": 2.7418220043182373,
500
+ "rewards/reward_func_conciseness/std": 2.6700687408447266,
501
+ "rewards/reward_func_sensitivity/mean": 1.0,
502
+ "rewards/reward_func_sensitivity/std": 0.0,
503
+ "step": 17
504
+ },
505
+ {
506
+ "clip_ratio/high_max": 0.0,
507
+ "clip_ratio/high_mean": 0.0,
508
+ "clip_ratio/low_mean": 0.0,
509
+ "clip_ratio/low_min": 0.0,
510
+ "clip_ratio/region_mean": 0.0,
511
+ "completions/clipped_ratio": 0.28125,
512
+ "completions/max_length": 256.0,
513
+ "completions/max_terminated_length": 208.0,
514
+ "completions/mean_length": 113.765625,
515
+ "completions/mean_terminated_length": 58.10869598388672,
516
+ "completions/min_length": 2.0,
517
+ "completions/min_terminated_length": 2.0,
518
+ "epoch": 1.280701754385965,
519
+ "grad_norm": 0.1583815962076187,
520
+ "kl": 0.0014179286517901346,
521
+ "learning_rate": 3.4000000000000007e-05,
522
+ "loss": 0.1829,
523
+ "num_tokens": 601399.0,
524
+ "reward": 38.8372802734375,
525
+ "reward_std": 11.145255088806152,
526
+ "rewards/conciseness_reward_func/mean": 2.6207382678985596,
527
+ "rewards/conciseness_reward_func/std": 2.5241081714630127,
528
+ "rewards/reward_func_conciseness/mean": 2.6207382678985596,
529
+ "rewards/reward_func_conciseness/std": 2.5241081714630127,
530
+ "rewards/reward_func_sensitivity/mean": 1.0,
531
+ "rewards/reward_func_sensitivity/std": 0.0,
532
+ "step": 18
533
+ },
534
+ {
535
+ "clip_ratio/high_max": 0.0,
536
+ "clip_ratio/high_mean": 0.0,
537
+ "clip_ratio/low_mean": 0.0,
538
+ "clip_ratio/low_min": 0.0,
539
+ "clip_ratio/region_mean": 0.0,
540
+ "completions/clipped_ratio": 0.328125,
541
+ "completions/max_length": 256.0,
542
+ "completions/max_terminated_length": 255.0,
543
+ "completions/mean_length": 133.4375,
544
+ "completions/mean_terminated_length": 73.5813980102539,
545
+ "completions/min_length": 1.0,
546
+ "completions/min_terminated_length": 1.0,
547
+ "epoch": 1.3508771929824561,
548
+ "grad_norm": 0.1763005554676056,
549
+ "kl": 0.0019502783397911116,
550
+ "learning_rate": 3.6e-05,
551
+ "loss": 0.313,
552
+ "num_tokens": 634095.0,
553
+ "reward": 41.70539855957031,
554
+ "reward_std": 19.077178955078125,
555
+ "rewards/conciseness_reward_func/mean": 2.8343520164489746,
556
+ "rewards/conciseness_reward_func/std": 3.2151780128479004,
557
+ "rewards/reward_func_conciseness/mean": 2.8343520164489746,
558
+ "rewards/reward_func_conciseness/std": 3.2151780128479004,
559
+ "rewards/reward_func_sensitivity/mean": 0.984375,
560
+ "rewards/reward_func_sensitivity/std": 0.125,
561
+ "step": 19
562
+ },
563
+ {
564
+ "clip_ratio/high_max": 0.0,
565
+ "clip_ratio/high_mean": 0.0,
566
+ "clip_ratio/low_mean": 0.0,
567
+ "clip_ratio/low_min": 0.0,
568
+ "clip_ratio/region_mean": 0.0,
569
+ "completions/clipped_ratio": 0.234375,
570
+ "completions/max_length": 256.0,
571
+ "completions/max_terminated_length": 226.0,
572
+ "completions/mean_length": 111.046875,
573
+ "completions/mean_terminated_length": 66.67346954345703,
574
+ "completions/min_length": 1.0,
575
+ "completions/min_terminated_length": 1.0,
576
+ "epoch": 1.4210526315789473,
577
+ "grad_norm": 0.26902371644973755,
578
+ "kl": 0.0015094915579538792,
579
+ "learning_rate": 3.8e-05,
580
+ "loss": 0.3254,
581
+ "num_tokens": 667534.0,
582
+ "reward": 42.7399787902832,
583
+ "reward_std": 17.279216766357422,
584
+ "rewards/conciseness_reward_func/mean": 2.9066357612609863,
585
+ "rewards/conciseness_reward_func/std": 2.9050159454345703,
586
+ "rewards/reward_func_conciseness/mean": 2.9066357612609863,
587
+ "rewards/reward_func_conciseness/std": 2.9050159454345703,
588
+ "rewards/reward_func_sensitivity/mean": 1.0,
589
+ "rewards/reward_func_sensitivity/std": 0.0,
590
+ "step": 20
591
+ },
592
+ {
593
+ "clip_ratio/high_max": 0.0,
594
+ "clip_ratio/high_mean": 0.0,
595
+ "clip_ratio/low_mean": 0.0,
596
+ "clip_ratio/low_min": 0.0,
597
+ "clip_ratio/region_mean": 0.0,
598
+ "completions/clipped_ratio": 0.375,
599
+ "completions/max_length": 256.0,
600
+ "completions/max_terminated_length": 253.0,
601
+ "completions/mean_length": 131.859375,
602
+ "completions/mean_terminated_length": 57.375,
603
+ "completions/min_length": 1.0,
604
+ "completions/min_terminated_length": 1.0,
605
+ "epoch": 1.4912280701754386,
606
+ "grad_norm": 0.23793084919452667,
607
+ "kl": 0.0021542172180488706,
608
+ "learning_rate": 4e-05,
609
+ "loss": 0.303,
610
+ "num_tokens": 701941.0,
611
+ "reward": 47.04301834106445,
612
+ "reward_std": 19.078676223754883,
613
+ "rewards/conciseness_reward_func/mean": 3.2288718223571777,
614
+ "rewards/conciseness_reward_func/std": 3.394094467163086,
615
+ "rewards/reward_func_conciseness/mean": 3.2288718223571777,
616
+ "rewards/reward_func_conciseness/std": 3.394094467163086,
617
+ "rewards/reward_func_sensitivity/mean": 0.96875,
618
+ "rewards/reward_func_sensitivity/std": 0.17536810040473938,
619
+ "step": 21
620
+ },
621
+ {
622
+ "epoch": 1.5614035087719298,
623
+ "grad_norm": 0.24694041907787323,
624
+ "learning_rate": 4.2e-05,
625
+ "loss": 0.3075,
626
+ "step": 22
627
+ },
628
+ {
629
+ "epoch": 1.5614035087719298,
630
+ "eval_clip_ratio/high_max": 0.0,
631
+ "eval_clip_ratio/high_mean": 0.0,
632
+ "eval_clip_ratio/low_mean": 0.0,
633
+ "eval_clip_ratio/low_min": 0.0,
634
+ "eval_clip_ratio/region_mean": 0.0,
635
+ "eval_completions/clipped_ratio": 0.22916666666666666,
636
+ "eval_completions/max_length": 182.66666666666666,
637
+ "eval_completions/max_terminated_length": 125.83333333333333,
638
+ "eval_completions/mean_length": 98.0625,
639
+ "eval_completions/mean_terminated_length": 64.26984278361003,
640
+ "eval_completions/min_length": 22.666666666666668,
641
+ "eval_completions/min_terminated_length": 22.666666666666668,
642
+ "eval_kl": 0.0025872511711592474,
643
+ "eval_loss": 0.3126063644886017,
644
+ "eval_num_tokens": 731929.0,
645
+ "eval_reward": 41.88053798675537,
646
+ "eval_reward_std": 19.47314504782359,
647
+ "eval_rewards/conciseness_reward_func/mean": 2.8436764081319175,
648
+ "eval_rewards/conciseness_reward_func/std": 1.952876736720403,
649
+ "eval_rewards/reward_func_conciseness/mean": 2.8436764081319175,
650
+ "eval_rewards/reward_func_conciseness/std": 1.952876736720403,
651
+ "eval_rewards/reward_func_sensitivity/mean": 1.0,
652
+ "eval_rewards/reward_func_sensitivity/std": 0.0,
653
+ "eval_runtime": 75.02,
654
+ "eval_samples_per_second": 0.16,
655
+ "eval_steps_per_second": 0.027,
656
+ "step": 22
657
+ },
658
+ {
659
+ "clip_ratio/high_max": 0.0,
660
+ "clip_ratio/high_mean": 0.0,
661
+ "clip_ratio/low_mean": 0.0,
662
+ "clip_ratio/low_min": 0.0,
663
+ "clip_ratio/region_mean": 0.0,
664
+ "completions/clipped_ratio": 0.1328125,
665
+ "completions/max_length": 256.0,
666
+ "completions/max_terminated_length": 254.5,
667
+ "completions/mean_length": 81.1796875,
668
+ "completions/mean_terminated_length": 54.69163703918457,
669
+ "completions/min_length": 1.5,
670
+ "completions/min_terminated_length": 1.5,
671
+ "epoch": 1.631578947368421,
672
+ "grad_norm": 0.21585890650749207,
673
+ "kl": 0.0031253308843588457,
674
+ "learning_rate": 4.4000000000000006e-05,
675
+ "loss": 0.383,
676
+ "num_tokens": 763456.0,
677
+ "reward": 53.27559852600098,
678
+ "reward_std": 19.545034408569336,
679
+ "rewards/conciseness_reward_func/mean": 3.680190324783325,
680
+ "rewards/conciseness_reward_func/std": 3.0578778982162476,
681
+ "rewards/reward_func_conciseness/mean": 3.680190324783325,
682
+ "rewards/reward_func_conciseness/std": 3.0578778982162476,
683
+ "rewards/reward_func_sensitivity/mean": 0.9921875,
684
+ "rewards/reward_func_sensitivity/std": 0.0625,
685
+ "step": 23
686
+ },
687
+ {
688
+ "clip_ratio/high_max": 0.0,
689
+ "clip_ratio/high_mean": 0.0,
690
+ "clip_ratio/low_mean": 0.0,
691
+ "clip_ratio/low_min": 0.0,
692
+ "clip_ratio/region_mean": 0.0,
693
+ "completions/clipped_ratio": 0.328125,
694
+ "completions/max_length": 256.0,
695
+ "completions/max_terminated_length": 210.0,
696
+ "completions/mean_length": 125.71875,
697
+ "completions/mean_terminated_length": 62.093021392822266,
698
+ "completions/min_length": 1.0,
699
+ "completions/min_terminated_length": 1.0,
700
+ "epoch": 1.7017543859649122,
701
+ "grad_norm": 0.1490524560213089,
702
+ "kl": 0.0025897307932609692,
703
+ "learning_rate": 4.600000000000001e-05,
704
+ "loss": 0.3599,
705
+ "num_tokens": 796006.0,
706
+ "reward": 38.916587829589844,
707
+ "reward_std": 23.136043548583984,
708
+ "rewards/conciseness_reward_func/mean": 2.6300535202026367,
709
+ "rewards/conciseness_reward_func/std": 2.9958267211914062,
710
+ "rewards/reward_func_conciseness/mean": 2.6300535202026367,
711
+ "rewards/reward_func_conciseness/std": 2.9958267211914062,
712
+ "rewards/reward_func_sensitivity/mean": 0.984375,
713
+ "rewards/reward_func_sensitivity/std": 0.125,
714
+ "step": 24
715
+ },
716
+ {
717
+ "clip_ratio/high_max": 0.0,
718
+ "clip_ratio/high_mean": 0.0,
719
+ "clip_ratio/low_mean": 0.0,
720
+ "clip_ratio/low_min": 0.0,
721
+ "clip_ratio/region_mean": 0.0,
722
+ "completions/clipped_ratio": 0.359375,
723
+ "completions/max_length": 256.0,
724
+ "completions/max_terminated_length": 228.0,
725
+ "completions/mean_length": 134.296875,
726
+ "completions/mean_terminated_length": 66.0243911743164,
727
+ "completions/min_length": 1.0,
728
+ "completions/min_terminated_length": 1.0,
729
+ "epoch": 1.7719298245614035,
730
+ "grad_norm": 0.25410982966423035,
731
+ "kl": 0.002877463834010996,
732
+ "learning_rate": 4.8e-05,
733
+ "loss": 0.4365,
734
+ "num_tokens": 830413.0,
735
+ "reward": 45.20397186279297,
736
+ "reward_std": 27.19314956665039,
737
+ "rewards/conciseness_reward_func/mean": 3.087139129638672,
738
+ "rewards/conciseness_reward_func/std": 3.4524173736572266,
739
+ "rewards/reward_func_conciseness/mean": 3.087139129638672,
740
+ "rewards/reward_func_conciseness/std": 3.4524173736572266,
741
+ "rewards/reward_func_sensitivity/mean": 1.0,
742
+ "rewards/reward_func_sensitivity/std": 0.0,
743
+ "step": 25
744
+ },
745
+ {
746
+ "clip_ratio/high_max": 0.0,
747
+ "clip_ratio/high_mean": 0.0,
748
+ "clip_ratio/low_mean": 0.0,
749
+ "clip_ratio/low_min": 0.0,
750
+ "clip_ratio/region_mean": 0.0,
751
+ "completions/clipped_ratio": 0.25,
752
+ "completions/max_length": 256.0,
753
+ "completions/max_terminated_length": 238.0,
754
+ "completions/mean_length": 104.46875,
755
+ "completions/mean_terminated_length": 53.958335876464844,
756
+ "completions/min_length": 1.0,
757
+ "completions/min_terminated_length": 1.0,
758
+ "epoch": 1.8421052631578947,
759
+ "grad_norm": 0.21713471412658691,
760
+ "kl": 0.003132364639895968,
761
+ "learning_rate": 5e-05,
762
+ "loss": 0.5853,
763
+ "num_tokens": 863043.0,
764
+ "reward": 45.85869598388672,
765
+ "reward_std": 20.649715423583984,
766
+ "rewards/conciseness_reward_func/mean": 3.135101795196533,
767
+ "rewards/conciseness_reward_func/std": 2.850745916366577,
768
+ "rewards/reward_func_conciseness/mean": 3.135101795196533,
769
+ "rewards/reward_func_conciseness/std": 2.850745916366577,
770
+ "rewards/reward_func_sensitivity/mean": 1.0,
771
+ "rewards/reward_func_sensitivity/std": 0.0,
772
+ "step": 26
773
+ },
774
+ {
775
+ "clip_ratio/high_max": 0.0,
776
+ "clip_ratio/high_mean": 0.0,
777
+ "clip_ratio/low_mean": 0.0,
778
+ "clip_ratio/low_min": 0.0,
779
+ "clip_ratio/region_mean": 0.0,
780
+ "completions/clipped_ratio": 0.28125,
781
+ "completions/max_length": 256.0,
782
+ "completions/max_terminated_length": 248.0,
783
+ "completions/mean_length": 124.390625,
784
+ "completions/mean_terminated_length": 72.89130401611328,
785
+ "completions/min_length": 1.0,
786
+ "completions/min_terminated_length": 1.0,
787
+ "epoch": 1.912280701754386,
788
+ "grad_norm": 0.15928152203559875,
789
+ "kl": 0.0032604307343717664,
790
+ "learning_rate": 5.2000000000000004e-05,
791
+ "loss": 0.3924,
792
+ "num_tokens": 897484.0,
793
+ "reward": 50.61138153076172,
794
+ "reward_std": 23.177719116210938,
795
+ "rewards/conciseness_reward_func/mean": 3.483266830444336,
796
+ "rewards/conciseness_reward_func/std": 3.4873857498168945,
797
+ "rewards/reward_func_conciseness/mean": 3.483266830444336,
798
+ "rewards/reward_func_conciseness/std": 3.4873857498168945,
799
+ "rewards/reward_func_sensitivity/mean": 1.0,
800
+ "rewards/reward_func_sensitivity/std": 0.0,
801
+ "step": 27
802
+ },
803
+ {
804
+ "clip_ratio/high_max": 0.0,
805
+ "clip_ratio/high_mean": 0.0,
806
+ "clip_ratio/low_mean": 0.0,
807
+ "clip_ratio/low_min": 0.0,
808
+ "clip_ratio/region_mean": 0.0,
809
+ "completions/clipped_ratio": 0.3055555555555556,
810
+ "completions/max_length": 256.0,
811
+ "completions/max_terminated_length": 252.0,
812
+ "completions/mean_length": 120.30555725097656,
813
+ "completions/mean_terminated_length": 60.599998474121094,
814
+ "completions/min_length": 1.0,
815
+ "completions/min_terminated_length": 1.0,
816
+ "epoch": 1.9824561403508771,
817
+ "grad_norm": 0.24538102746009827,
818
+ "kl": 0.0042094711534446105,
819
+ "learning_rate": 5.4000000000000005e-05,
820
+ "loss": 0.2198,
821
+ "num_tokens": 932955.0,
822
+ "reward": 50.74748992919922,
823
+ "reward_std": 21.030189514160156,
824
+ "rewards/conciseness_reward_func/mean": 3.4932374954223633,
825
+ "rewards/conciseness_reward_func/std": 2.893507480621338,
826
+ "rewards/reward_func_conciseness/mean": 3.4932374954223633,
827
+ "rewards/reward_func_conciseness/std": 2.893507480621338,
828
+ "rewards/reward_func_sensitivity/mean": 1.0,
829
+ "rewards/reward_func_sensitivity/std": 0.0,
830
+ "step": 28
831
+ },
832
+ {
833
+ "clip_ratio/high_max": 0.0,
834
+ "clip_ratio/high_mean": 0.0,
835
+ "clip_ratio/low_mean": 0.0,
836
+ "clip_ratio/low_min": 0.0,
837
+ "clip_ratio/region_mean": 0.0,
838
+ "completions/clipped_ratio": 0.296875,
839
+ "completions/max_length": 256.0,
840
+ "completions/max_terminated_length": 249.0,
841
+ "completions/mean_length": 113.8125,
842
+ "completions/mean_terminated_length": 53.77777862548828,
843
+ "completions/min_length": 1.0,
844
+ "completions/min_terminated_length": 1.0,
845
+ "epoch": 2.0701754385964914,
846
+ "grad_norm": 0.37533530592918396,
847
+ "kl": 0.013647254294482991,
848
+ "learning_rate": 5.6000000000000006e-05,
849
+ "loss": 0.5434,
850
+ "num_tokens": 966971.0,
851
+ "reward": 53.38753128051758,
852
+ "reward_std": 25.584732055664062,
853
+ "rewards/conciseness_reward_func/mean": 3.6866371631622314,
854
+ "rewards/conciseness_reward_func/std": 3.548473596572876,
855
+ "rewards/reward_func_conciseness/mean": 3.6866371631622314,
856
+ "rewards/reward_func_conciseness/std": 3.548473596572876,
857
+ "rewards/reward_func_sensitivity/mean": 1.0,
858
+ "rewards/reward_func_sensitivity/std": 0.0,
859
+ "step": 29
860
+ },
861
+ {
862
+ "clip_ratio/high_max": 0.0,
863
+ "clip_ratio/high_mean": 0.0,
864
+ "clip_ratio/low_mean": 0.0,
865
+ "clip_ratio/low_min": 0.0,
866
+ "clip_ratio/region_mean": 0.0,
867
+ "completions/clipped_ratio": 0.109375,
868
+ "completions/max_length": 256.0,
869
+ "completions/max_terminated_length": 187.0,
870
+ "completions/mean_length": 65.328125,
871
+ "completions/mean_terminated_length": 41.91228103637695,
872
+ "completions/min_length": 1.0,
873
+ "completions/min_terminated_length": 1.0,
874
+ "epoch": 2.1403508771929824,
875
+ "grad_norm": 0.2770869731903076,
876
+ "kl": 0.015037874880363233,
877
+ "learning_rate": 5.8e-05,
878
+ "loss": 0.5078,
879
+ "num_tokens": 997920.0,
880
+ "reward": 62.491519927978516,
881
+ "reward_std": 13.444650650024414,
882
+ "rewards/conciseness_reward_func/mean": 4.353562831878662,
883
+ "rewards/conciseness_reward_func/std": 3.4237794876098633,
884
+ "rewards/reward_func_conciseness/mean": 4.353562831878662,
885
+ "rewards/reward_func_conciseness/std": 3.4237794876098633,
886
+ "rewards/reward_func_sensitivity/mean": 1.0,
887
+ "rewards/reward_func_sensitivity/std": 0.0,
888
+ "step": 30
889
+ },
890
+ {
891
+ "clip_ratio/high_max": 0.0,
892
+ "clip_ratio/high_mean": 0.0,
893
+ "clip_ratio/low_mean": 0.0,
894
+ "clip_ratio/low_min": 0.0,
895
+ "clip_ratio/region_mean": 0.0,
896
+ "completions/clipped_ratio": 0.125,
897
+ "completions/max_length": 256.0,
898
+ "completions/max_terminated_length": 250.0,
899
+ "completions/mean_length": 63.609375,
900
+ "completions/mean_terminated_length": 36.125,
901
+ "completions/min_length": 1.0,
902
+ "completions/min_terminated_length": 1.0,
903
+ "epoch": 2.2105263157894735,
904
+ "grad_norm": 0.3478524088859558,
905
+ "kl": 0.027639453124720603,
906
+ "learning_rate": 6e-05,
907
+ "loss": 0.4186,
908
+ "num_tokens": 1027679.0,
909
+ "reward": 66.20655059814453,
910
+ "reward_std": 24.815969467163086,
911
+ "rewards/conciseness_reward_func/mean": 4.6257123947143555,
912
+ "rewards/conciseness_reward_func/std": 3.330706834793091,
913
+ "rewards/reward_func_conciseness/mean": 4.6257123947143555,
914
+ "rewards/reward_func_conciseness/std": 3.330706834793091,
915
+ "rewards/reward_func_sensitivity/mean": 1.0,
916
+ "rewards/reward_func_sensitivity/std": 0.0,
917
+ "step": 31
918
+ },
919
+ {
920
+ "clip_ratio/high_max": 0.0,
921
+ "clip_ratio/high_mean": 0.0,
922
+ "clip_ratio/low_mean": 0.0,
923
+ "clip_ratio/low_min": 0.0,
924
+ "clip_ratio/region_mean": 0.0,
925
+ "completions/clipped_ratio": 0.234375,
926
+ "completions/max_length": 256.0,
927
+ "completions/max_terminated_length": 254.0,
928
+ "completions/mean_length": 95.390625,
929
+ "completions/mean_terminated_length": 46.2244873046875,
930
+ "completions/min_length": 1.0,
931
+ "completions/min_terminated_length": 1.0,
932
+ "epoch": 2.280701754385965,
933
+ "grad_norm": 0.2679261565208435,
934
+ "kl": 0.009736835781950504,
935
+ "learning_rate": 6.2e-05,
936
+ "loss": 0.4007,
937
+ "num_tokens": 1059420.0,
938
+ "reward": 53.97537612915039,
939
+ "reward_std": 19.86016082763672,
940
+ "rewards/conciseness_reward_func/mean": 3.729700803756714,
941
+ "rewards/conciseness_reward_func/std": 3.0745017528533936,
942
+ "rewards/reward_func_conciseness/mean": 3.729700803756714,
943
+ "rewards/reward_func_conciseness/std": 3.0745017528533936,
944
+ "rewards/reward_func_sensitivity/mean": 1.0,
945
+ "rewards/reward_func_sensitivity/std": 0.0,
946
+ "step": 32
947
+ },
948
+ {
949
+ "clip_ratio/high_max": 0.0,
950
+ "clip_ratio/high_mean": 0.0,
951
+ "clip_ratio/low_mean": 0.0,
952
+ "clip_ratio/low_min": 0.0,
953
+ "clip_ratio/region_mean": 0.0,
954
+ "completions/clipped_ratio": 0.09375,
955
+ "completions/max_length": 256.0,
956
+ "completions/max_terminated_length": 197.0,
957
+ "completions/mean_length": 50.28125,
958
+ "completions/mean_terminated_length": 29.0,
959
+ "completions/min_length": 1.0,
960
+ "completions/min_terminated_length": 1.0,
961
+ "epoch": 2.3508771929824563,
962
+ "grad_norm": 1.2629331350326538,
963
+ "kl": 0.11070789268705994,
964
+ "learning_rate": 6.400000000000001e-05,
965
+ "loss": 0.3228,
966
+ "num_tokens": 1087966.0,
967
+ "reward": 79.61641693115234,
968
+ "reward_std": 15.074862480163574,
969
+ "rewards/conciseness_reward_func/mean": 5.608071327209473,
970
+ "rewards/conciseness_reward_func/std": 3.699957847595215,
971
+ "rewards/reward_func_conciseness/mean": 5.608071327209473,
972
+ "rewards/reward_func_conciseness/std": 3.699957847595215,
973
+ "rewards/reward_func_sensitivity/mean": 1.0,
974
+ "rewards/reward_func_sensitivity/std": 0.0,
975
+ "step": 33
976
+ },
977
+ {
978
+ "clip_ratio/high_max": 0.0,
979
+ "clip_ratio/high_mean": 0.0,
980
+ "clip_ratio/low_mean": 0.0,
981
+ "clip_ratio/low_min": 0.0,
982
+ "clip_ratio/region_mean": 0.0,
983
+ "completions/clipped_ratio": 0.125,
984
+ "completions/max_length": 256.0,
985
+ "completions/max_terminated_length": 207.0,
986
+ "completions/mean_length": 51.25,
987
+ "completions/mean_terminated_length": 22.000001907348633,
988
+ "completions/min_length": 1.0,
989
+ "completions/min_terminated_length": 1.0,
990
+ "epoch": 2.4210526315789473,
991
+ "grad_norm": 1.2056056261062622,
992
+ "kl": 0.3512770783272572,
993
+ "learning_rate": 6.6e-05,
994
+ "loss": 0.6225,
995
+ "num_tokens": 1116666.0,
996
+ "reward": 79.62600708007812,
997
+ "reward_std": 23.02800941467285,
998
+ "rewards/conciseness_reward_func/mean": 5.608773708343506,
999
+ "rewards/conciseness_reward_func/std": 3.7261159420013428,
1000
+ "rewards/reward_func_conciseness/mean": 5.608773708343506,
1001
+ "rewards/reward_func_conciseness/std": 3.7261159420013428,
1002
+ "rewards/reward_func_sensitivity/mean": 1.0,
1003
+ "rewards/reward_func_sensitivity/std": 0.0,
1004
+ "step": 34
1005
+ },
1006
+ {
1007
+ "clip_ratio/high_max": 0.0,
1008
+ "clip_ratio/high_mean": 0.0,
1009
+ "clip_ratio/low_mean": 0.0,
1010
+ "clip_ratio/low_min": 0.0,
1011
+ "clip_ratio/region_mean": 0.0,
1012
+ "completions/clipped_ratio": 0.0625,
1013
+ "completions/max_length": 256.0,
1014
+ "completions/max_terminated_length": 229.0,
1015
+ "completions/mean_length": 62.9375,
1016
+ "completions/mean_terminated_length": 50.06666946411133,
1017
+ "completions/min_length": 1.0,
1018
+ "completions/min_terminated_length": 1.0,
1019
+ "epoch": 2.4912280701754383,
1020
+ "grad_norm": 0.9496796131134033,
1021
+ "kl": 0.1972154388204217,
1022
+ "learning_rate": 6.800000000000001e-05,
1023
+ "loss": 0.5741,
1024
+ "num_tokens": 1146962.0,
1025
+ "reward": 68.77381896972656,
1026
+ "reward_std": 24.19011878967285,
1027
+ "rewards/conciseness_reward_func/mean": 4.813781261444092,
1028
+ "rewards/conciseness_reward_func/std": 3.699648857116699,
1029
+ "rewards/reward_func_conciseness/mean": 4.813781261444092,
1030
+ "rewards/reward_func_conciseness/std": 3.699648857116699,
1031
+ "rewards/reward_func_sensitivity/mean": 1.0,
1032
+ "rewards/reward_func_sensitivity/std": 0.0,
1033
+ "step": 35
1034
+ },
1035
+ {
1036
+ "clip_ratio/high_max": 0.0,
1037
+ "clip_ratio/high_mean": 0.0,
1038
+ "clip_ratio/low_mean": 0.0,
1039
+ "clip_ratio/low_min": 0.0,
1040
+ "clip_ratio/region_mean": 0.0,
1041
+ "completions/clipped_ratio": 0.140625,
1042
+ "completions/max_length": 256.0,
1043
+ "completions/max_terminated_length": 214.0,
1044
+ "completions/mean_length": 52.40625,
1045
+ "completions/mean_terminated_length": 19.09090805053711,
1046
+ "completions/min_length": 1.0,
1047
+ "completions/min_terminated_length": 1.0,
1048
+ "epoch": 2.56140350877193,
1049
+ "grad_norm": 0.6840168237686157,
1050
+ "kl": 0.8201649184338748,
1051
+ "learning_rate": 7e-05,
1052
+ "loss": 0.2966,
1053
+ "num_tokens": 1175544.0,
1054
+ "reward": 97.68138122558594,
1055
+ "reward_std": 11.356389999389648,
1056
+ "rewards/conciseness_reward_func/mean": 6.931445121765137,
1057
+ "rewards/conciseness_reward_func/std": 3.4826667308807373,
1058
+ "rewards/reward_func_conciseness/mean": 6.931445121765137,
1059
+ "rewards/reward_func_conciseness/std": 3.4826667308807373,
1060
+ "rewards/reward_func_sensitivity/mean": 1.0,
1061
+ "rewards/reward_func_sensitivity/std": 0.0,
1062
+ "step": 36
1063
+ },
1064
+ {
1065
+ "clip_ratio/high_max": 0.0,
1066
+ "clip_ratio/high_mean": 0.0,
1067
+ "clip_ratio/low_mean": 0.0,
1068
+ "clip_ratio/low_min": 0.0,
1069
+ "clip_ratio/region_mean": 0.0,
1070
+ "completions/clipped_ratio": 0.078125,
1071
+ "completions/max_length": 256.0,
1072
+ "completions/max_terminated_length": 191.0,
1073
+ "completions/mean_length": 30.625,
1074
+ "completions/mean_terminated_length": 11.525424003601074,
1075
+ "completions/min_length": 1.0,
1076
+ "completions/min_terminated_length": 1.0,
1077
+ "epoch": 2.6315789473684212,
1078
+ "grad_norm": 0.7667601108551025,
1079
+ "kl": 0.38818224624264985,
1080
+ "learning_rate": 7.2e-05,
1081
+ "loss": 0.4061,
1082
+ "num_tokens": 1203020.0,
1083
+ "reward": 98.3485336303711,
1084
+ "reward_std": 15.768867492675781,
1085
+ "rewards/conciseness_reward_func/mean": 6.980318069458008,
1086
+ "rewards/conciseness_reward_func/std": 3.055245876312256,
1087
+ "rewards/reward_func_conciseness/mean": 6.980318069458008,
1088
+ "rewards/reward_func_conciseness/std": 3.055245876312256,
1089
+ "rewards/reward_func_sensitivity/mean": 1.0,
1090
+ "rewards/reward_func_sensitivity/std": 0.0,
1091
+ "step": 37
1092
+ },
1093
+ {
1094
+ "clip_ratio/high_max": 0.0,
1095
+ "clip_ratio/high_mean": 0.0,
1096
+ "clip_ratio/low_mean": 0.0,
1097
+ "clip_ratio/low_min": 0.0,
1098
+ "clip_ratio/region_mean": 0.0,
1099
+ "completions/clipped_ratio": 0.0625,
1100
+ "completions/max_length": 256.0,
1101
+ "completions/max_terminated_length": 194.0,
1102
+ "completions/mean_length": 27.609375,
1103
+ "completions/mean_terminated_length": 12.383334159851074,
1104
+ "completions/min_length": 1.0,
1105
+ "completions/min_terminated_length": 1.0,
1106
+ "epoch": 2.7017543859649122,
1107
+ "grad_norm": 0.598527193069458,
1108
+ "kl": 0.11694249129504897,
1109
+ "learning_rate": 7.4e-05,
1110
+ "loss": 0.211,
1111
+ "num_tokens": 1230875.0,
1112
+ "reward": 94.99549102783203,
1113
+ "reward_std": 7.114702224731445,
1114
+ "rewards/conciseness_reward_func/mean": 6.734685897827148,
1115
+ "rewards/conciseness_reward_func/std": 3.04093337059021,
1116
+ "rewards/reward_func_conciseness/mean": 6.734685897827148,
1117
+ "rewards/reward_func_conciseness/std": 3.04093337059021,
1118
+ "rewards/reward_func_sensitivity/mean": 1.0,
1119
+ "rewards/reward_func_sensitivity/std": 0.0,
1120
+ "step": 38
1121
+ },
1122
+ {
1123
+ "clip_ratio/high_max": 0.0,
1124
+ "clip_ratio/high_mean": 0.0,
1125
+ "clip_ratio/low_mean": 0.0,
1126
+ "clip_ratio/low_min": 0.0,
1127
+ "clip_ratio/region_mean": 0.0,
1128
+ "completions/clipped_ratio": 0.046875,
1129
+ "completions/max_length": 256.0,
1130
+ "completions/max_terminated_length": 243.0,
1131
+ "completions/mean_length": 33.859375,
1132
+ "completions/mean_terminated_length": 22.934425354003906,
1133
+ "completions/min_length": 1.0,
1134
+ "completions/min_terminated_length": 1.0,
1135
+ "epoch": 2.7719298245614032,
1136
+ "grad_norm": 3.7495362758636475,
1137
+ "kl": 0.6200877516530454,
1138
+ "learning_rate": 7.6e-05,
1139
+ "loss": 0.3951,
1140
+ "num_tokens": 1258154.0,
1141
+ "reward": 82.8931884765625,
1142
+ "reward_std": 20.4693603515625,
1143
+ "rewards/conciseness_reward_func/mean": 5.848114967346191,
1144
+ "rewards/conciseness_reward_func/std": 3.0996296405792236,
1145
+ "rewards/reward_func_conciseness/mean": 5.848114967346191,
1146
+ "rewards/reward_func_conciseness/std": 3.0996296405792236,
1147
+ "rewards/reward_func_sensitivity/mean": 1.0,
1148
+ "rewards/reward_func_sensitivity/std": 0.0,
1149
+ "step": 39
1150
+ },
1151
+ {
1152
+ "clip_ratio/high_max": 0.0,
1153
+ "clip_ratio/high_mean": 0.0,
1154
+ "clip_ratio/low_mean": 0.0,
1155
+ "clip_ratio/low_min": 0.0,
1156
+ "clip_ratio/region_mean": 0.0,
1157
+ "completions/clipped_ratio": 0.0625,
1158
+ "completions/max_length": 256.0,
1159
+ "completions/max_terminated_length": 105.0,
1160
+ "completions/mean_length": 26.59375,
1161
+ "completions/mean_terminated_length": 11.300000190734863,
1162
+ "completions/min_length": 1.0,
1163
+ "completions/min_terminated_length": 1.0,
1164
+ "epoch": 2.8421052631578947,
1165
+ "grad_norm": 1.9763565063476562,
1166
+ "kl": 1.0024023910518736,
1167
+ "learning_rate": 7.800000000000001e-05,
1168
+ "loss": 0.3104,
1169
+ "num_tokens": 1284792.0,
1170
+ "reward": 98.37100219726562,
1171
+ "reward_std": 7.518170356750488,
1172
+ "rewards/conciseness_reward_func/mean": 6.981963157653809,
1173
+ "rewards/conciseness_reward_func/std": 3.2990918159484863,
1174
+ "rewards/reward_func_conciseness/mean": 6.981963157653809,
1175
+ "rewards/reward_func_conciseness/std": 3.2990918159484863,
1176
+ "rewards/reward_func_sensitivity/mean": 1.0,
1177
+ "rewards/reward_func_sensitivity/std": 0.0,
1178
+ "step": 40
1179
+ },
1180
+ {
1181
+ "clip_ratio/high_max": 0.0,
1182
+ "clip_ratio/high_mean": 0.0,
1183
+ "clip_ratio/low_mean": 0.0,
1184
+ "clip_ratio/low_min": 0.0,
1185
+ "clip_ratio/region_mean": 0.0,
1186
+ "completions/clipped_ratio": 0.0625,
1187
+ "completions/max_length": 256.0,
1188
+ "completions/max_terminated_length": 209.0,
1189
+ "completions/mean_length": 47.609375,
1190
+ "completions/mean_terminated_length": 33.71666717529297,
1191
+ "completions/min_length": 1.0,
1192
+ "completions/min_terminated_length": 1.0,
1193
+ "epoch": 2.912280701754386,
1194
+ "grad_norm": 0.5956349968910217,
1195
+ "kl": 0.2420638755429536,
1196
+ "learning_rate": 8e-05,
1197
+ "loss": 0.568,
1198
+ "num_tokens": 1313747.0,
1199
+ "reward": 79.15403747558594,
1200
+ "reward_std": 22.53786277770996,
1201
+ "rewards/conciseness_reward_func/mean": 5.5741987228393555,
1202
+ "rewards/conciseness_reward_func/std": 3.646087884902954,
1203
+ "rewards/reward_func_conciseness/mean": 5.5741987228393555,
1204
+ "rewards/reward_func_conciseness/std": 3.646087884902954,
1205
+ "rewards/reward_func_sensitivity/mean": 1.0,
1206
+ "rewards/reward_func_sensitivity/std": 0.0,
1207
+ "step": 41
1208
+ },
1209
+ {
1210
+ "clip_ratio/high_max": 0.0,
1211
+ "clip_ratio/high_mean": 0.0,
1212
+ "clip_ratio/low_mean": 0.0,
1213
+ "clip_ratio/low_min": 0.0,
1214
+ "clip_ratio/region_mean": 0.0,
1215
+ "completions/clipped_ratio": 0.02777777777777779,
1216
+ "completions/max_length": 256.0,
1217
+ "completions/max_terminated_length": 124.0,
1218
+ "completions/mean_length": 21.25,
1219
+ "completions/mean_terminated_length": 14.54285717010498,
1220
+ "completions/min_length": 1.0,
1221
+ "completions/min_terminated_length": 1.0,
1222
+ "epoch": 2.982456140350877,
1223
+ "grad_norm": 2.653303623199463,
1224
+ "kl": 1.1713137086480856,
1225
+ "learning_rate": 8.2e-05,
1226
+ "loss": 0.5747,
1227
+ "num_tokens": 1341557.0,
1228
+ "reward": 117.25819396972656,
1229
+ "reward_std": 18.805095672607422,
1230
+ "rewards/conciseness_reward_func/mean": 8.365571022033691,
1231
+ "rewards/conciseness_reward_func/std": 2.659184217453003,
1232
+ "rewards/reward_func_conciseness/mean": 8.365571022033691,
1233
+ "rewards/reward_func_conciseness/std": 2.659184217453003,
1234
+ "rewards/reward_func_sensitivity/mean": 1.0,
1235
+ "rewards/reward_func_sensitivity/std": 0.0,
1236
+ "step": 42
1237
+ }
1238
+ ],
1239
+ "logging_steps": 1,
1240
+ "max_steps": 253,
1241
+ "num_input_tokens_seen": 1341557,
1242
+ "num_train_epochs": 19,
1243
+ "save_steps": 42,
1244
+ "stateful_callbacks": {
1245
+ "TrainerControl": {
1246
+ "args": {
1247
+ "should_epoch_stop": false,
1248
+ "should_evaluate": false,
1249
+ "should_log": false,
1250
+ "should_save": true,
1251
+ "should_training_stop": false
1252
+ },
1253
+ "attributes": {}
1254
+ }
1255
+ },
1256
+ "total_flos": 0.0,
1257
+ "train_batch_size": 8,
1258
+ "trial_name": null,
1259
+ "trial_params": null
1260
+ }
last-checkpoint/training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6396714d15c15097a094becd8e78ff96da9dbeee0fd713f901b3ca04fcee1d1
3
+ size 7864
last-checkpoint/vocab.json ADDED
The diff for this file is too large to render. See raw diff