jvanhoof commited on
Commit
3e76668
·
verified ·
1 Parent(s): fffe238

Upload folder using huggingface_hub

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,713 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:18240762
8
+ - loss:MSELoss
9
+ base_model: sentence-transformers/all-MiniLM-L6-v2
10
+ widget:
11
+ - source_sentence: Yeah, fire in the park, let's go!
12
+ sentences:
13
+ - 午前2時頃に音楽が止まり、それから熟睡。
14
+ - 彼はニンジンが好きではないので、食べなかった。
15
+ - 公園のライトアップ、ぜひ行こうね!
16
+ - source_sentence: Population is around 5.7 million people.
17
+ sentences:
18
+ - 人口は約570万人です。
19
+ - カンドンベの音楽はcuerdaと呼ばれるドラマーのグループによって演奏される。
20
+ - 'シノプシス: 2116年—日本政府はシビルシステムの無人ドローンロボットを問題のある国に輸出し始め、システムは世界中に広がっています。'
21
+ - source_sentence: With EMUI 5.0, the Huawei Mate 9 becomes more intelligent and efficient
22
+ over time by understanding consumers’ behaviour patterns and ensures the highest
23
+ priority applications are given preference subject to system resources.
24
+ sentences:
25
+ - 私も今はクルマを持っていません。
26
+ - ガジュマルの樹を見に行きたいです。
27
+ - EMUI5.0では、『HUAWEI Mate 9』が消費者の行動パターンを理解し、時間をかけて知能と効率を上げ、優先順位の最も高いアプリをシステム消費源の対象に優先される事を保証します。
28
+ - source_sentence: What are the differences between the environments and geographical
29
+ positions of the East and the West?
30
+ sentences:
31
+ - 環境と地理的位置に関して、東洋と西洋の相違点は何であろうか。
32
+ - その ​ ほか ​ に , “心霊 ​ 手術 ​ 師 ” が ​ おり , この ​ 人 ​ たち ​ は“ 心霊 ​ 手術 ” なる ​ もの ​ を ​
33
+ 行ない ​ ます。
34
+ - Numpy を import できない。
35
+ - source_sentence: Jesus Christ did surrender his life for the “sheep. ”
36
+ sentences:
37
+ - フィリポは読んでいる事柄が分かりますかと尋ねた。
38
+ - イエス ​ ・ ​ キリスト ​ は ​ ご自分 ​ の ​ 命 ​ を「羊」の ​ ため ​ に ​ 捨て ​ まし ​ た。
39
+ - 彼はこの金を中央政府には渡そうとしない。
40
+ pipeline_tag: sentence-similarity
41
+ library_name: sentence-transformers
42
+ metrics:
43
+ - pearson_cosine
44
+ - spearman_cosine
45
+ model-index:
46
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
47
+ results:
48
+ - task:
49
+ type: semantic-similarity
50
+ name: Semantic Similarity
51
+ dataset:
52
+ name: stsb multi mt en
53
+ type: stsb_multi_mt-en
54
+ metrics:
55
+ - type: pearson_cosine
56
+ value: 0.7988037559289333
57
+ name: Pearson Cosine
58
+ - type: spearman_cosine
59
+ value: 0.8009711557760016
60
+ name: Spearman Cosine
61
+ - task:
62
+ type: semantic-similarity
63
+ name: Semantic Similarity
64
+ dataset:
65
+ name: JSTS
66
+ type: JSTS
67
+ metrics:
68
+ - type: pearson_cosine
69
+ value: 0.8622404113206219
70
+ name: Pearson Cosine
71
+ - type: spearman_cosine
72
+ value: 0.8142666349859583
73
+ name: Spearman Cosine
74
+ ---
75
+
76
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L6-v2
77
+
78
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
79
+
80
+ ## Model Details
81
+
82
+ ### Model Description
83
+ - **Model Type:** Sentence Transformer
84
+ - **Base model:** [sentence-transformers/all-MiniLM-L6-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2) <!-- at revision c9745ed1d9f207416be6d2e6f8de32d1f16199bf -->
85
+ - **Maximum Sequence Length:** 128 tokens
86
+ - **Output Dimensionality:** 384 dimensions
87
+ - **Similarity Function:** Cosine Similarity
88
+ <!-- - **Training Dataset:** Unknown -->
89
+ <!-- - **Language:** Unknown -->
90
+ <!-- - **License:** Unknown -->
91
+
92
+ ### Model Sources
93
+
94
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
95
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
96
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
97
+
98
+ ### Full Model Architecture
99
+
100
+ ```
101
+ SentenceTransformer(
102
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
103
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
104
+ (2): Normalize()
105
+ )
106
+ ```
107
+
108
+ ## Usage
109
+
110
+ ### Direct Usage (Sentence Transformers)
111
+
112
+ First install the Sentence Transformers library:
113
+
114
+ ```bash
115
+ pip install -U sentence-transformers
116
+ ```
117
+
118
+ Then you can load this model and run inference.
119
+ ```python
120
+ from sentence_transformers import SentenceTransformer
121
+
122
+ # Download from the 🤗 Hub
123
+ model = SentenceTransformer("sentence_transformers_model_id")
124
+ # Run inference
125
+ sentences = [
126
+ 'Jesus Christ did surrender his life for the “sheep. ”',
127
+ 'イエス \u200b ・ \u200b キリスト \u200b は \u200b ご自分 \u200b の \u200b 命 \u200b を「羊」の \u200b ため \u200b に \u200b 捨て \u200b まし \u200b た。',
128
+ '彼はこの金を中央政府には渡そうとしない。',
129
+ ]
130
+ embeddings = model.encode(sentences)
131
+ print(embeddings.shape)
132
+ # [3, 384]
133
+
134
+ # Get the similarity scores for the embeddings
135
+ similarities = model.similarity(embeddings, embeddings)
136
+ print(similarities.shape)
137
+ # [3, 3]
138
+ ```
139
+
140
+ <!--
141
+ ### Direct Usage (Transformers)
142
+
143
+ <details><summary>Click to see the direct usage in Transformers</summary>
144
+
145
+ </details>
146
+ -->
147
+
148
+ <!--
149
+ ### Downstream Usage (Sentence Transformers)
150
+
151
+ You can finetune this model on your own dataset.
152
+
153
+ <details><summary>Click to expand</summary>
154
+
155
+ </details>
156
+ -->
157
+
158
+ <!--
159
+ ### Out-of-Scope Use
160
+
161
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
162
+ -->
163
+
164
+ ## Evaluation
165
+
166
+ ### Metrics
167
+
168
+ #### Semantic Similarity
169
+
170
+ * Datasets: `stsb_multi_mt-en` and `JSTS`
171
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
172
+
173
+ | Metric | stsb_multi_mt-en | JSTS |
174
+ |:--------------------|:-----------------|:-----------|
175
+ | pearson_cosine | 0.7988 | 0.8622 |
176
+ | **spearman_cosine** | **0.801** | **0.8143** |
177
+
178
+ <!--
179
+ ## Bias, Risks and Limitations
180
+
181
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
182
+ -->
183
+
184
+ <!--
185
+ ### Recommendations
186
+
187
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
188
+ -->
189
+
190
+ ## Training Details
191
+
192
+ ### Training Dataset
193
+
194
+ #### Unnamed Dataset
195
+
196
+
197
+ * Size: 18,240,762 training samples
198
+ * Columns: <code>english</code>, <code>non_english</code>, and <code>label</code>
199
+ * Approximate statistics based on the first 1000 samples:
200
+ | | english | non_english | label |
201
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------|
202
+ | type | string | string | list |
203
+ | details | <ul><li>min: 4 tokens</li><li>mean: 15.99 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 21.59 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 384 elements</li></ul> |
204
+ * Samples:
205
+ | english | non_english | label |
206
+ |:-----------------------------------------------------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------|
207
+ | <code>Slow to Mars?</code> | <code>火星しばり?</code> | <code>[-0.1292940022648608, -0.1167307527589221, -0.008499974779641976, 0.04317784529767997, -0.06141806471633044, ...]</code> |
208
+ | <code>Sunset is nearly there.</code> | <code>サンクスはすぐそこだし。</code> | <code>[-0.1347740689698337, 0.053288680755846106, 0.014359346388162629, 0.0157641416547634, 0.0900218121125077, ...]</code> |
209
+ | <code>Why were these Christians put to death?</code> | <code>ハンガリー ​ の ​ 新聞「バシュ ​ ・ ​ ナーペ」は ​ 次 ​ の ​ よう ​ に ​ 説明 ​ し ​ て ​ い ​ ます。「</code> | <code>[0.09746742956653999, -0.006846877375759926, -0.03973075126221857, 0.024986338940603363, -0.021140928354124164, ...]</code> |
210
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
211
+
212
+ ### Evaluation Dataset
213
+
214
+ #### Unnamed Dataset
215
+
216
+
217
+ * Size: 184,251 evaluation samples
218
+ * Columns: <code>english</code>, <code>non_english</code>, and <code>label</code>
219
+ * Approximate statistics based on the first 1000 samples:
220
+ | | english | non_english | label |
221
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-------------------------------------|
222
+ | type | string | string | list |
223
+ | details | <ul><li>min: 4 tokens</li><li>mean: 16.16 tokens</li><li>max: 116 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 21.65 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>size: 384 elements</li></ul> |
224
+ * Samples:
225
+ | english | non_english | label |
226
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------|
227
+ | <code>Back from donating?</code> | <code>ドーナツ回?</code> | <code>[-0.14056862827741115, -0.09391276023432168, 0.011405737148041988, 0.012085375305688852, -0.056379213184557624, ...]</code> |
228
+ | <code>134)Textbooks were also in short supply.</code> | <code>3)荷物の引き渡しも短時間にテキパキとされていました。</code> | <code>[0.04401202896633807, 0.07403046630916377, 0.11568493170920714, 0.047522982370575784, 0.1009405093401555, ...]</code> |
229
+ | <code>The COG investigators started the trial by providing dosages of crizotinib to their patients that were lower than those used in adults with NSCLC.</code> | <code>COG試験責任医師らは、NSCLCの成人患者で使用されている投与量より少ない量のcrizotinibを小児患者に提供することで試験を開始した。</code> | <code>[0.21476626448171793, -0.04704800523318936, 0.061019190603563075, 0.027317017405848458, -0.03788587912458321, ...]</code> |
230
+ * Loss: [<code>MSELoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#mseloss)
231
+
232
+ ### Training Hyperparameters
233
+ #### Non-Default Hyperparameters
234
+
235
+ - `eval_strategy`: steps
236
+ - `per_device_train_batch_size`: 512
237
+ - `per_device_eval_batch_size`: 512
238
+ - `gradient_accumulation_steps`: 2
239
+ - `learning_rate`: 0.0003
240
+ - `num_train_epochs`: 8
241
+ - `warmup_ratio`: 0.15
242
+ - `bf16`: True
243
+ - `dataloader_num_workers`: 8
244
+
245
+ #### All Hyperparameters
246
+ <details><summary>Click to expand</summary>
247
+
248
+ - `overwrite_output_dir`: False
249
+ - `do_predict`: False
250
+ - `eval_strategy`: steps
251
+ - `prediction_loss_only`: True
252
+ - `per_device_train_batch_size`: 512
253
+ - `per_device_eval_batch_size`: 512
254
+ - `per_gpu_train_batch_size`: None
255
+ - `per_gpu_eval_batch_size`: None
256
+ - `gradient_accumulation_steps`: 2
257
+ - `eval_accumulation_steps`: None
258
+ - `torch_empty_cache_steps`: None
259
+ - `learning_rate`: 0.0003
260
+ - `weight_decay`: 0.0
261
+ - `adam_beta1`: 0.9
262
+ - `adam_beta2`: 0.999
263
+ - `adam_epsilon`: 1e-08
264
+ - `max_grad_norm`: 1.0
265
+ - `num_train_epochs`: 8
266
+ - `max_steps`: -1
267
+ - `lr_scheduler_type`: linear
268
+ - `lr_scheduler_kwargs`: {}
269
+ - `warmup_ratio`: 0.15
270
+ - `warmup_steps`: 0
271
+ - `log_level`: passive
272
+ - `log_level_replica`: warning
273
+ - `log_on_each_node`: True
274
+ - `logging_nan_inf_filter`: True
275
+ - `save_safetensors`: True
276
+ - `save_on_each_node`: False
277
+ - `save_only_model`: False
278
+ - `restore_callback_states_from_checkpoint`: False
279
+ - `no_cuda`: False
280
+ - `use_cpu`: False
281
+ - `use_mps_device`: False
282
+ - `seed`: 42
283
+ - `data_seed`: None
284
+ - `jit_mode_eval`: False
285
+ - `use_ipex`: False
286
+ - `bf16`: True
287
+ - `fp16`: False
288
+ - `fp16_opt_level`: O1
289
+ - `half_precision_backend`: auto
290
+ - `bf16_full_eval`: False
291
+ - `fp16_full_eval`: False
292
+ - `tf32`: None
293
+ - `local_rank`: 0
294
+ - `ddp_backend`: None
295
+ - `tpu_num_cores`: None
296
+ - `tpu_metrics_debug`: False
297
+ - `debug`: []
298
+ - `dataloader_drop_last`: False
299
+ - `dataloader_num_workers`: 8
300
+ - `dataloader_prefetch_factor`: None
301
+ - `past_index`: -1
302
+ - `disable_tqdm`: False
303
+ - `remove_unused_columns`: True
304
+ - `label_names`: None
305
+ - `load_best_model_at_end`: False
306
+ - `ignore_data_skip`: False
307
+ - `fsdp`: []
308
+ - `fsdp_min_num_params`: 0
309
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
310
+ - `tp_size`: 0
311
+ - `fsdp_transformer_layer_cls_to_wrap`: None
312
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
313
+ - `deepspeed`: None
314
+ - `label_smoothing_factor`: 0.0
315
+ - `optim`: adamw_torch
316
+ - `optim_args`: None
317
+ - `adafactor`: False
318
+ - `group_by_length`: False
319
+ - `length_column_name`: length
320
+ - `ddp_find_unused_parameters`: None
321
+ - `ddp_bucket_cap_mb`: None
322
+ - `ddp_broadcast_buffers`: False
323
+ - `dataloader_pin_memory`: True
324
+ - `dataloader_persistent_workers`: False
325
+ - `skip_memory_metrics`: True
326
+ - `use_legacy_prediction_loop`: False
327
+ - `push_to_hub`: False
328
+ - `resume_from_checkpoint`: None
329
+ - `hub_model_id`: None
330
+ - `hub_strategy`: every_save
331
+ - `hub_private_repo`: None
332
+ - `hub_always_push`: False
333
+ - `gradient_checkpointing`: False
334
+ - `gradient_checkpointing_kwargs`: None
335
+ - `include_inputs_for_metrics`: False
336
+ - `include_for_metrics`: []
337
+ - `eval_do_concat_batches`: True
338
+ - `fp16_backend`: auto
339
+ - `push_to_hub_model_id`: None
340
+ - `push_to_hub_organization`: None
341
+ - `mp_parameters`:
342
+ - `auto_find_batch_size`: False
343
+ - `full_determinism`: False
344
+ - `torchdynamo`: None
345
+ - `ray_scope`: last
346
+ - `ddp_timeout`: 1800
347
+ - `torch_compile`: False
348
+ - `torch_compile_backend`: None
349
+ - `torch_compile_mode`: None
350
+ - `include_tokens_per_second`: False
351
+ - `include_num_input_tokens_seen`: False
352
+ - `neftune_noise_alpha`: None
353
+ - `optim_target_modules`: None
354
+ - `batch_eval_metrics`: False
355
+ - `eval_on_start`: False
356
+ - `use_liger_kernel`: False
357
+ - `eval_use_gather_object`: False
358
+ - `average_tokens_across_devices`: False
359
+ - `prompts`: None
360
+ - `batch_sampler`: batch_sampler
361
+ - `multi_dataset_batch_sampler`: proportional
362
+
363
+ </details>
364
+
365
+ ### Training Logs
366
+ <details><summary>Click to expand</summary>
367
+
368
+ | Epoch | Step | Training Loss | Validation Loss | stsb_multi_mt-en_spearman_cosine | JSTS_spearman_cosine |
369
+ |:------:|:------:|:-------------:|:---------------:|:--------------------------------:|:--------------------:|
370
+ | 0.0281 | 500 | 0.0057 | - | - | - |
371
+ | 0.0561 | 1000 | 0.005 | - | - | - |
372
+ | 0.0842 | 1500 | 0.0047 | - | - | - |
373
+ | 0.1123 | 2000 | 0.0045 | 0.0022 | 0.2757 | 0.2805 |
374
+ | 0.1403 | 2500 | 0.0043 | - | - | - |
375
+ | 0.1684 | 3000 | 0.0042 | - | - | - |
376
+ | 0.1965 | 3500 | 0.004 | - | - | - |
377
+ | 0.2245 | 4000 | 0.0039 | 0.0019 | 0.4951 | 0.5122 |
378
+ | 0.2526 | 4500 | 0.0037 | - | - | - |
379
+ | 0.2807 | 5000 | 0.0036 | - | - | - |
380
+ | 0.3088 | 5500 | 0.0035 | - | - | - |
381
+ | 0.3368 | 6000 | 0.0034 | 0.0016 | 0.6060 | 0.6544 |
382
+ | 0.3649 | 6500 | 0.0033 | - | - | - |
383
+ | 0.3930 | 7000 | 0.0032 | - | - | - |
384
+ | 0.4210 | 7500 | 0.0032 | - | - | - |
385
+ | 0.4491 | 8000 | 0.0031 | 0.0015 | 0.6802 | 0.7234 |
386
+ | 0.4772 | 8500 | 0.003 | - | - | - |
387
+ | 0.5052 | 9000 | 0.003 | - | - | - |
388
+ | 0.5333 | 9500 | 0.003 | - | - | - |
389
+ | 0.5614 | 10000 | 0.0029 | 0.0014 | 0.7144 | 0.7537 |
390
+ | 0.5894 | 10500 | 0.0029 | - | - | - |
391
+ | 0.6175 | 11000 | 0.0029 | - | - | - |
392
+ | 0.6456 | 11500 | 0.0028 | - | - | - |
393
+ | 0.6736 | 12000 | 0.0028 | 0.0014 | 0.7260 | 0.7691 |
394
+ | 0.7017 | 12500 | 0.0028 | - | - | - |
395
+ | 0.7298 | 13000 | 0.0028 | - | - | - |
396
+ | 0.7579 | 13500 | 0.0027 | - | - | - |
397
+ | 0.7859 | 14000 | 0.0027 | 0.0013 | 0.7396 | 0.7751 |
398
+ | 0.8140 | 14500 | 0.0027 | - | - | - |
399
+ | 0.8421 | 15000 | 0.0027 | - | - | - |
400
+ | 0.8701 | 15500 | 0.0027 | - | - | - |
401
+ | 0.8982 | 16000 | 0.0027 | 0.0013 | 0.7499 | 0.7793 |
402
+ | 0.9263 | 16500 | 0.0027 | - | - | - |
403
+ | 0.9543 | 17000 | 0.0027 | - | - | - |
404
+ | 0.9824 | 17500 | 0.0026 | - | - | - |
405
+ | 1.0104 | 18000 | 0.0026 | 0.0013 | 0.7542 | 0.7847 |
406
+ | 1.0385 | 18500 | 0.0026 | - | - | - |
407
+ | 1.0666 | 19000 | 0.0026 | - | - | - |
408
+ | 1.0946 | 19500 | 0.0026 | - | - | - |
409
+ | 1.1227 | 20000 | 0.0026 | 0.0013 | 0.7685 | 0.7883 |
410
+ | 1.1508 | 20500 | 0.0026 | - | - | - |
411
+ | 1.1789 | 21000 | 0.0026 | - | - | - |
412
+ | 1.2069 | 21500 | 0.0026 | - | - | - |
413
+ | 1.2350 | 22000 | 0.0026 | 0.0012 | 0.7695 | 0.7916 |
414
+ | 1.2631 | 22500 | 0.0026 | - | - | - |
415
+ | 1.2911 | 23000 | 0.0026 | - | - | - |
416
+ | 1.3192 | 23500 | 0.0025 | - | - | - |
417
+ | 1.3473 | 24000 | 0.0025 | 0.0012 | 0.7698 | 0.7937 |
418
+ | 1.3753 | 24500 | 0.0025 | - | - | - |
419
+ | 1.4034 | 25000 | 0.0025 | - | - | - |
420
+ | 1.4315 | 25500 | 0.0025 | - | - | - |
421
+ | 1.4595 | 26000 | 0.0025 | 0.0012 | 0.7785 | 0.7951 |
422
+ | 1.4876 | 26500 | 0.0025 | - | - | - |
423
+ | 1.5157 | 27000 | 0.0025 | - | - | - |
424
+ | 1.5437 | 27500 | 0.0025 | - | - | - |
425
+ | 1.5718 | 28000 | 0.0025 | 0.0012 | 0.7798 | 0.7995 |
426
+ | 1.5999 | 28500 | 0.0025 | - | - | - |
427
+ | 1.6280 | 29000 | 0.0025 | - | - | - |
428
+ | 1.6560 | 29500 | 0.0025 | - | - | - |
429
+ | 1.6841 | 30000 | 0.0025 | 0.0012 | 0.7821 | 0.7985 |
430
+ | 1.7122 | 30500 | 0.0025 | - | - | - |
431
+ | 1.7402 | 31000 | 0.0025 | - | - | - |
432
+ | 1.7683 | 31500 | 0.0025 | - | - | - |
433
+ | 1.7964 | 32000 | 0.0025 | 0.0012 | 0.7860 | 0.7999 |
434
+ | 1.8244 | 32500 | 0.0025 | - | - | - |
435
+ | 1.8525 | 33000 | 0.0025 | - | - | - |
436
+ | 1.8806 | 33500 | 0.0025 | - | - | - |
437
+ | 1.9086 | 34000 | 0.0025 | 0.0012 | 0.7859 | 0.8009 |
438
+ | 1.9367 | 34500 | 0.0025 | - | - | - |
439
+ | 1.9648 | 35000 | 0.0025 | - | - | - |
440
+ | 1.9928 | 35500 | 0.0025 | - | - | - |
441
+ | 2.0209 | 36000 | 0.0025 | 0.0012 | 0.7840 | 0.8000 |
442
+ | 2.0490 | 36500 | 0.0025 | - | - | - |
443
+ | 2.0770 | 37000 | 0.0025 | - | - | - |
444
+ | 2.1051 | 37500 | 0.0025 | - | - | - |
445
+ | 2.1332 | 38000 | 0.0025 | 0.0012 | 0.7882 | 0.8029 |
446
+ | 2.1612 | 38500 | 0.0025 | - | - | - |
447
+ | 2.1893 | 39000 | 0.0025 | - | - | - |
448
+ | 2.2174 | 39500 | 0.0025 | - | - | - |
449
+ | 2.2454 | 40000 | 0.0025 | 0.0012 | 0.7867 | 0.8030 |
450
+ | 2.2735 | 40500 | 0.0025 | - | - | - |
451
+ | 2.3016 | 41000 | 0.0025 | - | - | - |
452
+ | 2.3296 | 41500 | 0.0025 | - | - | - |
453
+ | 2.3577 | 42000 | 0.0025 | 0.0012 | 0.7909 | 0.8044 |
454
+ | 2.3858 | 42500 | 0.0025 | - | - | - |
455
+ | 2.4138 | 43000 | 0.0025 | - | - | - |
456
+ | 2.4419 | 43500 | 0.0024 | - | - | - |
457
+ | 2.4700 | 44000 | 0.0024 | 0.0012 | 0.7925 | 0.8047 |
458
+ | 2.4980 | 44500 | 0.0024 | - | - | - |
459
+ | 2.5261 | 45000 | 0.0024 | - | - | - |
460
+ | 2.5542 | 45500 | 0.0024 | - | - | - |
461
+ | 2.5823 | 46000 | 0.0024 | 0.0012 | 0.7945 | 0.8081 |
462
+ | 2.6103 | 46500 | 0.0024 | - | - | - |
463
+ | 2.6384 | 47000 | 0.0024 | - | - | - |
464
+ | 2.6665 | 47500 | 0.0024 | - | - | - |
465
+ | 2.6945 | 48000 | 0.0024 | 0.0012 | 0.7918 | 0.8071 |
466
+ | 2.7226 | 48500 | 0.0024 | - | - | - |
467
+ | 2.7507 | 49000 | 0.0024 | - | - | - |
468
+ | 2.7787 | 49500 | 0.0024 | - | - | - |
469
+ | 2.8068 | 50000 | 0.0024 | 0.0012 | 0.7945 | 0.8063 |
470
+ | 2.8349 | 50500 | 0.0024 | - | - | - |
471
+ | 2.8629 | 51000 | 0.0024 | - | - | - |
472
+ | 2.8910 | 51500 | 0.0024 | - | - | - |
473
+ | 2.9191 | 52000 | 0.0024 | 0.0012 | 0.7930 | 0.8078 |
474
+ | 2.9471 | 52500 | 0.0024 | - | - | - |
475
+ | 2.9752 | 53000 | 0.0024 | - | - | - |
476
+ | 3.0033 | 53500 | 0.0024 | - | - | - |
477
+ | 3.0313 | 54000 | 0.0024 | 0.0012 | 0.7947 | 0.8071 |
478
+ | 3.0594 | 54500 | 0.0024 | - | - | - |
479
+ | 3.0875 | 55000 | 0.0024 | - | - | - |
480
+ | 3.1155 | 55500 | 0.0024 | - | - | - |
481
+ | 3.1436 | 56000 | 0.0024 | 0.0012 | 0.7955 | 0.8077 |
482
+ | 3.1717 | 56500 | 0.0024 | - | - | - |
483
+ | 3.1997 | 57000 | 0.0024 | - | - | - |
484
+ | 3.2278 | 57500 | 0.0024 | - | - | - |
485
+ | 3.2559 | 58000 | 0.0024 | 0.0012 | 0.7969 | 0.8083 |
486
+ | 3.2839 | 58500 | 0.0024 | - | - | - |
487
+ | 3.3120 | 59000 | 0.0024 | - | - | - |
488
+ | 3.3401 | 59500 | 0.0024 | - | - | - |
489
+ | 3.3681 | 60000 | 0.0024 | 0.0012 | 0.7916 | 0.8089 |
490
+ | 3.3962 | 60500 | 0.0024 | - | - | - |
491
+ | 3.4243 | 61000 | 0.0024 | - | - | - |
492
+ | 3.4524 | 61500 | 0.0024 | - | - | - |
493
+ | 3.4804 | 62000 | 0.0024 | 0.0012 | 0.7941 | 0.8092 |
494
+ | 3.5085 | 62500 | 0.0024 | - | - | - |
495
+ | 3.5366 | 63000 | 0.0024 | - | - | - |
496
+ | 3.5646 | 63500 | 0.0024 | - | - | - |
497
+ | 3.5927 | 64000 | 0.0024 | 0.0012 | 0.7966 | 0.8112 |
498
+ | 3.6208 | 64500 | 0.0024 | - | - | - |
499
+ | 3.6488 | 65000 | 0.0024 | - | - | - |
500
+ | 3.6769 | 65500 | 0.0024 | - | - | - |
501
+ | 3.7050 | 66000 | 0.0024 | 0.0012 | 0.7957 | 0.8088 |
502
+ | 3.7330 | 66500 | 0.0024 | - | - | - |
503
+ | 3.7611 | 67000 | 0.0024 | - | - | - |
504
+ | 3.7892 | 67500 | 0.0024 | - | - | - |
505
+ | 3.8172 | 68000 | 0.0024 | 0.0012 | 0.7965 | 0.8104 |
506
+ | 3.8453 | 68500 | 0.0024 | - | - | - |
507
+ | 3.8734 | 69000 | 0.0024 | - | - | - |
508
+ | 3.9015 | 69500 | 0.0024 | - | - | - |
509
+ | 3.9295 | 70000 | 0.0024 | 0.0012 | 0.7948 | 0.8101 |
510
+ | 3.9576 | 70500 | 0.0024 | - | - | - |
511
+ | 3.9857 | 71000 | 0.0024 | - | - | - |
512
+ | 4.0137 | 71500 | 0.0024 | - | - | - |
513
+ | 4.0418 | 72000 | 0.0024 | 0.0012 | 0.7985 | 0.8129 |
514
+ | 4.0698 | 72500 | 0.0024 | - | - | - |
515
+ | 4.0979 | 73000 | 0.0024 | - | - | - |
516
+ | 4.1260 | 73500 | 0.0024 | - | - | - |
517
+ | 4.1540 | 74000 | 0.0024 | 0.0012 | 0.7964 | 0.8114 |
518
+ | 4.1821 | 74500 | 0.0024 | - | - | - |
519
+ | 4.2102 | 75000 | 0.0024 | - | - | - |
520
+ | 4.2382 | 75500 | 0.0024 | - | - | - |
521
+ | 4.2663 | 76000 | 0.0024 | 0.0012 | 0.7964 | 0.8105 |
522
+ | 4.2944 | 76500 | 0.0024 | - | - | - |
523
+ | 4.3225 | 77000 | 0.0024 | - | - | - |
524
+ | 4.3505 | 77500 | 0.0024 | - | - | - |
525
+ | 4.3786 | 78000 | 0.0024 | 0.0012 | 0.7975 | 0.8110 |
526
+ | 4.4067 | 78500 | 0.0024 | - | - | - |
527
+ | 4.4347 | 79000 | 0.0024 | - | - | - |
528
+ | 4.4628 | 79500 | 0.0024 | - | - | - |
529
+ | 4.4909 | 80000 | 0.0024 | 0.0012 | 0.7959 | 0.8113 |
530
+ | 4.5189 | 80500 | 0.0024 | - | - | - |
531
+ | 4.5470 | 81000 | 0.0024 | - | - | - |
532
+ | 4.5751 | 81500 | 0.0024 | - | - | - |
533
+ | 4.6031 | 82000 | 0.0024 | 0.0012 | 0.7979 | 0.8119 |
534
+ | 4.6312 | 82500 | 0.0024 | - | - | - |
535
+ | 4.6593 | 83000 | 0.0024 | - | - | - |
536
+ | 4.6873 | 83500 | 0.0024 | - | - | - |
537
+ | 4.7154 | 84000 | 0.0024 | 0.0012 | 0.7980 | 0.8123 |
538
+ | 4.7435 | 84500 | 0.0024 | - | - | - |
539
+ | 4.7715 | 85000 | 0.0024 | - | - | - |
540
+ | 4.7996 | 85500 | 0.0024 | - | - | - |
541
+ | 4.8277 | 86000 | 0.0024 | 0.0012 | 0.7963 | 0.8118 |
542
+ | 4.8558 | 86500 | 0.0024 | - | - | - |
543
+ | 4.8838 | 87000 | 0.0024 | - | - | - |
544
+ | 4.9119 | 87500 | 0.0024 | - | - | - |
545
+ | 4.9400 | 88000 | 0.0024 | 0.0012 | 0.7986 | 0.8126 |
546
+ | 4.9680 | 88500 | 0.0024 | - | - | - |
547
+ | 4.9961 | 89000 | 0.0024 | - | - | - |
548
+ | 5.0241 | 89500 | 0.0024 | - | - | - |
549
+ | 5.0522 | 90000 | 0.0024 | 0.0012 | 0.7994 | 0.8121 |
550
+ | 5.0803 | 90500 | 0.0024 | - | - | - |
551
+ | 5.1083 | 91000 | 0.0024 | - | - | - |
552
+ | 5.1364 | 91500 | 0.0024 | - | - | - |
553
+ | 5.1645 | 92000 | 0.0024 | 0.0012 | 0.7973 | 0.8120 |
554
+ | 5.1926 | 92500 | 0.0024 | - | - | - |
555
+ | 5.2206 | 93000 | 0.0024 | - | - | - |
556
+ | 5.2487 | 93500 | 0.0024 | - | - | - |
557
+ | 5.2768 | 94000 | 0.0024 | 0.0012 | 0.7970 | 0.8123 |
558
+ | 5.3048 | 94500 | 0.0024 | - | - | - |
559
+ | 5.3329 | 95000 | 0.0024 | - | - | - |
560
+ | 5.3610 | 95500 | 0.0024 | - | - | - |
561
+ | 5.3890 | 96000 | 0.0024 | 0.0012 | 0.7997 | 0.8126 |
562
+ | 5.4171 | 96500 | 0.0024 | - | - | - |
563
+ | 5.4452 | 97000 | 0.0024 | - | - | - |
564
+ | 5.4732 | 97500 | 0.0024 | - | - | - |
565
+ | 5.5013 | 98000 | 0.0024 | 0.0012 | 0.7957 | 0.8114 |
566
+ | 5.5294 | 98500 | 0.0024 | - | - | - |
567
+ | 5.5574 | 99000 | 0.0024 | - | - | - |
568
+ | 5.5855 | 99500 | 0.0024 | - | - | - |
569
+ | 5.6136 | 100000 | 0.0024 | 0.0012 | 0.7980 | 0.8132 |
570
+ | 5.6416 | 100500 | 0.0024 | - | - | - |
571
+ | 5.6697 | 101000 | 0.0024 | - | - | - |
572
+ | 5.6978 | 101500 | 0.0024 | - | - | - |
573
+ | 5.7259 | 102000 | 0.0024 | 0.0012 | 0.7984 | 0.8138 |
574
+ | 5.7539 | 102500 | 0.0024 | - | - | - |
575
+ | 5.7820 | 103000 | 0.0024 | - | - | - |
576
+ | 5.8101 | 103500 | 0.0024 | - | - | - |
577
+ | 5.8381 | 104000 | 0.0024 | 0.0012 | 0.7998 | 0.8134 |
578
+ | 5.8662 | 104500 | 0.0024 | - | - | - |
579
+ | 5.8943 | 105000 | 0.0024 | - | - | - |
580
+ | 5.9223 | 105500 | 0.0024 | - | - | - |
581
+ | 5.9504 | 106000 | 0.0024 | 0.0012 | 0.8013 | 0.8124 |
582
+ | 5.9785 | 106500 | 0.0024 | - | - | - |
583
+ | 6.0065 | 107000 | 0.0024 | - | - | - |
584
+ | 6.0346 | 107500 | 0.0024 | - | - | - |
585
+ | 6.0626 | 108000 | 0.0024 | 0.0012 | 0.7987 | 0.8134 |
586
+ | 6.0907 | 108500 | 0.0024 | - | - | - |
587
+ | 6.1188 | 109000 | 0.0024 | - | - | - |
588
+ | 6.1469 | 109500 | 0.0024 | - | - | - |
589
+ | 6.1749 | 110000 | 0.0024 | 0.0012 | 0.7986 | 0.8127 |
590
+ | 6.2030 | 110500 | 0.0024 | - | - | - |
591
+ | 6.2311 | 111000 | 0.0024 | - | - | - |
592
+ | 6.2591 | 111500 | 0.0024 | - | - | - |
593
+ | 6.2872 | 112000 | 0.0024 | 0.0012 | 0.7980 | 0.8128 |
594
+ | 6.3153 | 112500 | 0.0024 | - | - | - |
595
+ | 6.3433 | 113000 | 0.0024 | - | - | - |
596
+ | 6.3714 | 113500 | 0.0024 | - | - | - |
597
+ | 6.3995 | 114000 | 0.0024 | 0.0012 | 0.7980 | 0.8137 |
598
+ | 6.4275 | 114500 | 0.0024 | - | - | - |
599
+ | 6.4556 | 115000 | 0.0024 | - | - | - |
600
+ | 6.4837 | 115500 | 0.0024 | - | - | - |
601
+ | 6.5117 | 116000 | 0.0024 | 0.0012 | 0.7988 | 0.8129 |
602
+ | 6.5398 | 116500 | 0.0024 | - | - | - |
603
+ | 6.5679 | 117000 | 0.0024 | - | - | - |
604
+ | 6.5960 | 117500 | 0.0024 | - | - | - |
605
+ | 6.6240 | 118000 | 0.0024 | 0.0012 | 0.8007 | 0.8138 |
606
+ | 6.6521 | 118500 | 0.0024 | - | - | - |
607
+ | 6.6802 | 119000 | 0.0024 | - | - | - |
608
+ | 6.7082 | 119500 | 0.0024 | - | - | - |
609
+ | 6.7363 | 120000 | 0.0024 | 0.0012 | 0.8019 | 0.8143 |
610
+ | 6.7644 | 120500 | 0.0024 | - | - | - |
611
+ | 6.7924 | 121000 | 0.0024 | - | - | - |
612
+ | 6.8205 | 121500 | 0.0024 | - | - | - |
613
+ | 6.8486 | 122000 | 0.0024 | 0.0012 | 0.7980 | 0.8137 |
614
+ | 6.8766 | 122500 | 0.0024 | - | - | - |
615
+ | 6.9047 | 123000 | 0.0024 | - | - | - |
616
+ | 6.9328 | 123500 | 0.0024 | - | - | - |
617
+ | 6.9608 | 124000 | 0.0024 | 0.0012 | 0.8028 | 0.8142 |
618
+ | 6.9889 | 124500 | 0.0024 | - | - | - |
619
+ | 7.0170 | 125000 | 0.0024 | - | - | - |
620
+ | 7.0450 | 125500 | 0.0024 | - | - | - |
621
+ | 7.0731 | 126000 | 0.0024 | 0.0012 | 0.8002 | 0.8132 |
622
+ | 7.1012 | 126500 | 0.0024 | - | - | - |
623
+ | 7.1292 | 127000 | 0.0024 | - | - | - |
624
+ | 7.1573 | 127500 | 0.0024 | - | - | - |
625
+ | 7.1854 | 128000 | 0.0024 | 0.0012 | 0.8008 | 0.8137 |
626
+ | 7.2134 | 128500 | 0.0024 | - | - | - |
627
+ | 7.2415 | 129000 | 0.0024 | - | - | - |
628
+ | 7.2696 | 129500 | 0.0024 | - | - | - |
629
+ | 7.2976 | 130000 | 0.0024 | 0.0012 | 0.8005 | 0.8138 |
630
+ | 7.3257 | 130500 | 0.0024 | - | - | - |
631
+ | 7.3538 | 131000 | 0.0024 | - | - | - |
632
+ | 7.3818 | 131500 | 0.0024 | - | - | - |
633
+ | 7.4099 | 132000 | 0.0024 | 0.0012 | 0.7995 | 0.8140 |
634
+ | 7.4380 | 132500 | 0.0024 | - | - | - |
635
+ | 7.4661 | 133000 | 0.0024 | - | - | - |
636
+ | 7.4941 | 133500 | 0.0024 | - | - | - |
637
+ | 7.5222 | 134000 | 0.0024 | 0.0012 | 0.7999 | 0.8142 |
638
+ | 7.5503 | 134500 | 0.0024 | - | - | - |
639
+ | 7.5783 | 135000 | 0.0024 | - | - | - |
640
+ | 7.6064 | 135500 | 0.0024 | - | - | - |
641
+ | 7.6345 | 136000 | 0.0024 | 0.0012 | 0.8011 | 0.8138 |
642
+ | 7.6625 | 136500 | 0.0024 | - | - | - |
643
+ | 7.6906 | 137000 | 0.0024 | - | - | - |
644
+ | 7.7187 | 137500 | 0.0024 | - | - | - |
645
+ | 7.7467 | 138000 | 0.0024 | 0.0012 | 0.8015 | 0.8142 |
646
+ | 7.7748 | 138500 | 0.0024 | - | - | - |
647
+ | 7.8029 | 139000 | 0.0024 | - | - | - |
648
+ | 7.8309 | 139500 | 0.0024 | - | - | - |
649
+ | 7.8590 | 140000 | 0.0024 | 0.0012 | 0.8007 | 0.8141 |
650
+ | 7.8871 | 140500 | 0.0024 | - | - | - |
651
+ | 7.9151 | 141000 | 0.0024 | - | - | - |
652
+ | 7.9432 | 141500 | 0.0024 | - | - | - |
653
+ | 7.9713 | 142000 | 0.0024 | 0.0012 | 0.8010 | 0.8143 |
654
+ | 7.9994 | 142500 | 0.0024 | - | - | - |
655
+
656
+ </details>
657
+
658
+ ### Framework Versions
659
+ - Python: 3.10.16
660
+ - Sentence Transformers: 3.3.1
661
+ - Transformers: 4.51.3
662
+ - PyTorch: 2.5.1+cu124
663
+ - Accelerate: 1.2.1
664
+ - Datasets: 3.2.0
665
+ - Tokenizers: 0.21.1
666
+
667
+ ## Citation
668
+
669
+ ### BibTeX
670
+
671
+ #### Sentence Transformers
672
+ ```bibtex
673
+ @inproceedings{reimers-2019-sentence-bert,
674
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
675
+ author = "Reimers, Nils and Gurevych, Iryna",
676
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
677
+ month = "11",
678
+ year = "2019",
679
+ publisher = "Association for Computational Linguistics",
680
+ url = "https://arxiv.org/abs/1908.10084",
681
+ }
682
+ ```
683
+
684
+ #### MSELoss
685
+ ```bibtex
686
+ @inproceedings{reimers-2020-multilingual-sentence-bert,
687
+ title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
688
+ author = "Reimers, Nils and Gurevych, Iryna",
689
+ booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
690
+ month = "11",
691
+ year = "2020",
692
+ publisher = "Association for Computational Linguistics",
693
+ url = "https://arxiv.org/abs/2004.09813",
694
+ }
695
+ ```
696
+
697
+ <!--
698
+ ## Glossary
699
+
700
+ *Clearly define terms in order to be accessible across audiences.*
701
+ -->
702
+
703
+ <!--
704
+ ## Model Card Authors
705
+
706
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
707
+ -->
708
+
709
+ <!--
710
+ ## Model Card Contact
711
+
712
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
713
+ -->
config.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "gradient_checkpointing": false,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 6,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "torch_dtype": "float32",
21
+ "transformers_version": "4.51.3",
22
+ "type_vocab_size": 2,
23
+ "use_cache": true,
24
+ "vocab_size": 80000
25
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.1",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.5.1+cu124"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:c950ef5954184e6208a0fca3b09119216f6a100ce0249257a25610289cfe6c1a
3
+ size 166862600
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6909870f516db78744bd7d0c274bc29a6f732c7db0242d585ba05fd0a59f5d96
3
+ size 332604218
rng_state.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:366f3f326c9e515657aecaeccf4114b3a9148add41cfc367eadbec9568777cec
3
+ size 14244
scheduler.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:76eaf6c34ec06064ae51acdd7a19cb0ec0dbea77dbd311d0519eda428ad21e4c
3
+ size 1064
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "4": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": false,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 1000000000000000019884624838656,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizerFast",
54
+ "unk_token": "<unk>"
55
+ }
trainer_state.json ADDED
@@ -0,0 +1,3023 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "epoch": 7.999578971005136,
3
+ "global_step": 142504,
4
+ "max_steps": 142504,
5
+ "logging_steps": 500,
6
+ "eval_steps": 2000,
7
+ "save_steps": 2000,
8
+ "train_batch_size": 512,
9
+ "num_train_epochs": 8,
10
+ "num_input_tokens_seen": 0,
11
+ "total_flos": 0.0,
12
+ "log_history": [
13
+ {
14
+ "loss": 0.0057,
15
+ "grad_norm": 0.0016098980559036136,
16
+ "learning_rate": 7.003181137724551e-06,
17
+ "epoch": 0.028068599657563083,
18
+ "step": 500
19
+ },
20
+ {
21
+ "loss": 0.005,
22
+ "grad_norm": 0.0015317167853936553,
23
+ "learning_rate": 1.4020396706586825e-05,
24
+ "epoch": 0.05613719931512617,
25
+ "step": 1000
26
+ },
27
+ {
28
+ "loss": 0.0047,
29
+ "grad_norm": 0.0015495802508667111,
30
+ "learning_rate": 2.10376122754491e-05,
31
+ "epoch": 0.08420579897268925,
32
+ "step": 1500
33
+ },
34
+ {
35
+ "loss": 0.0045,
36
+ "grad_norm": 0.0022075821179896593,
37
+ "learning_rate": 2.8054827844311374e-05,
38
+ "epoch": 0.11227439863025233,
39
+ "step": 2000
40
+ },
41
+ {
42
+ "eval_loss": 0.002180776558816433,
43
+ "eval_evaluator_0": 0.0022395828273147345,
44
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.22815565384644398,
45
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.275689274915755,
46
+ "eval_JSTS_pearson_cosine": 0.24988924562808074,
47
+ "eval_JSTS_spearman_cosine": 0.28046156141859135,
48
+ "eval_sequential_score": 0.1861301397205537,
49
+ "eval_runtime": 73.4592,
50
+ "eval_samples_per_second": 2508.207,
51
+ "eval_steps_per_second": 4.901,
52
+ "epoch": 0.11227439863025233,
53
+ "step": 2000
54
+ },
55
+ {
56
+ "loss": 0.0043,
57
+ "grad_norm": 0.001968657597899437,
58
+ "learning_rate": 3.507204341317365e-05,
59
+ "epoch": 0.14034299828781543,
60
+ "step": 2500
61
+ },
62
+ {
63
+ "loss": 0.0042,
64
+ "grad_norm": 0.0018094776896759868,
65
+ "learning_rate": 4.208925898203593e-05,
66
+ "epoch": 0.1684115979453785,
67
+ "step": 3000
68
+ },
69
+ {
70
+ "loss": 0.004,
71
+ "grad_norm": 0.0013906272361055017,
72
+ "learning_rate": 4.91064745508982e-05,
73
+ "epoch": 0.19648019760294158,
74
+ "step": 3500
75
+ },
76
+ {
77
+ "loss": 0.0039,
78
+ "grad_norm": 0.001428323332220316,
79
+ "learning_rate": 5.6123690119760476e-05,
80
+ "epoch": 0.22454879726050467,
81
+ "step": 4000
82
+ },
83
+ {
84
+ "eval_loss": 0.0018520376179367304,
85
+ "eval_evaluator_0": 0.0019462420605123043,
86
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.4499110986328231,
87
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.49510970494174,
88
+ "eval_JSTS_pearson_cosine": 0.5097291615839368,
89
+ "eval_JSTS_spearman_cosine": 0.5122442820292972,
90
+ "eval_sequential_score": 0.3364334096771831,
91
+ "eval_runtime": 83.3187,
92
+ "eval_samples_per_second": 2211.401,
93
+ "eval_steps_per_second": 4.321,
94
+ "epoch": 0.22454879726050467,
95
+ "step": 4000
96
+ },
97
+ {
98
+ "loss": 0.0037,
99
+ "grad_norm": 0.0012093083932995796,
100
+ "learning_rate": 6.314090568862275e-05,
101
+ "epoch": 0.2526173969180678,
102
+ "step": 4500
103
+ },
104
+ {
105
+ "loss": 0.0036,
106
+ "grad_norm": 0.0011754471343010664,
107
+ "learning_rate": 7.015812125748502e-05,
108
+ "epoch": 0.28068599657563087,
109
+ "step": 5000
110
+ },
111
+ {
112
+ "loss": 0.0035,
113
+ "grad_norm": 0.001201385515742004,
114
+ "learning_rate": 7.71753368263473e-05,
115
+ "epoch": 0.3087545962331939,
116
+ "step": 5500
117
+ },
118
+ {
119
+ "loss": 0.0034,
120
+ "grad_norm": 0.001142949447967112,
121
+ "learning_rate": 8.419255239520957e-05,
122
+ "epoch": 0.336823195890757,
123
+ "step": 6000
124
+ },
125
+ {
126
+ "eval_loss": 0.0016247672028839588,
127
+ "eval_evaluator_0": 0.0017547985771670938,
128
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.6065087781362954,
129
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.6059881699518793,
130
+ "eval_JSTS_pearson_cosine": 0.675813930035037,
131
+ "eval_JSTS_spearman_cosine": 0.6543515532700886,
132
+ "eval_sequential_score": 0.42069817393304504,
133
+ "eval_runtime": 80.7633,
134
+ "eval_samples_per_second": 2281.371,
135
+ "eval_steps_per_second": 4.457,
136
+ "epoch": 0.336823195890757,
137
+ "step": 6000
138
+ },
139
+ {
140
+ "loss": 0.0033,
141
+ "grad_norm": 0.0012169757392257452,
142
+ "learning_rate": 9.120976796407185e-05,
143
+ "epoch": 0.3648917955483201,
144
+ "step": 6500
145
+ },
146
+ {
147
+ "loss": 0.0032,
148
+ "grad_norm": 0.000984999118372798,
149
+ "learning_rate": 9.822698353293412e-05,
150
+ "epoch": 0.39296039520588316,
151
+ "step": 7000
152
+ },
153
+ {
154
+ "loss": 0.0032,
155
+ "grad_norm": 0.0008541549323126674,
156
+ "learning_rate": 0.0001052441991017964,
157
+ "epoch": 0.42102899486344625,
158
+ "step": 7500
159
+ },
160
+ {
161
+ "loss": 0.0031,
162
+ "grad_norm": 0.0012339747045189142,
163
+ "learning_rate": 0.00011226141467065867,
164
+ "epoch": 0.44909759452100934,
165
+ "step": 8000
166
+ },
167
+ {
168
+ "eval_loss": 0.0014834599569439888,
169
+ "eval_evaluator_0": 0.0016358010470867157,
170
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.6856139415411606,
171
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.6802098455846184,
172
+ "eval_JSTS_pearson_cosine": 0.7611066837119786,
173
+ "eval_JSTS_spearman_cosine": 0.7233990218403792,
174
+ "eval_sequential_score": 0.46841488949069476,
175
+ "eval_runtime": 82.9975,
176
+ "eval_samples_per_second": 2219.96,
177
+ "eval_steps_per_second": 4.337,
178
+ "epoch": 0.44909759452100934,
179
+ "step": 8000
180
+ },
181
+ {
182
+ "loss": 0.003,
183
+ "grad_norm": 0.0011975348461419344,
184
+ "learning_rate": 0.00011927863023952095,
185
+ "epoch": 0.4771661941785724,
186
+ "step": 8500
187
+ },
188
+ {
189
+ "loss": 0.003,
190
+ "grad_norm": 0.0008851690217852592,
191
+ "learning_rate": 0.00012629584580838322,
192
+ "epoch": 0.5052347938361356,
193
+ "step": 9000
194
+ },
195
+ {
196
+ "loss": 0.003,
197
+ "grad_norm": 0.0007472793222405016,
198
+ "learning_rate": 0.0001333130613772455,
199
+ "epoch": 0.5333033934936986,
200
+ "step": 9500
201
+ },
202
+ {
203
+ "loss": 0.0029,
204
+ "grad_norm": 0.0007375231361947954,
205
+ "learning_rate": 0.00014033027694610776,
206
+ "epoch": 0.5613719931512617,
207
+ "step": 10000
208
+ },
209
+ {
210
+ "eval_loss": 0.0014046069700270891,
211
+ "eval_evaluator_0": 0.0015687322011217475,
212
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.719395051149917,
213
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7143717274119371,
214
+ "eval_JSTS_pearson_cosine": 0.7959507802550232,
215
+ "eval_JSTS_spearman_cosine": 0.7536892647020201,
216
+ "eval_sequential_score": 0.48987657477169294,
217
+ "eval_runtime": 82.4721,
218
+ "eval_samples_per_second": 2234.101,
219
+ "eval_steps_per_second": 4.365,
220
+ "epoch": 0.5613719931512617,
221
+ "step": 10000
222
+ },
223
+ {
224
+ "loss": 0.0029,
225
+ "grad_norm": 0.0006650229915976524,
226
+ "learning_rate": 0.00014734749251497006,
227
+ "epoch": 0.5894405928088248,
228
+ "step": 10500
229
+ },
230
+ {
231
+ "loss": 0.0029,
232
+ "grad_norm": 0.0006891472148708999,
233
+ "learning_rate": 0.00015436470808383233,
234
+ "epoch": 0.6175091924663878,
235
+ "step": 11000
236
+ },
237
+ {
238
+ "loss": 0.0028,
239
+ "grad_norm": 0.0007127522258087993,
240
+ "learning_rate": 0.0001613819236526946,
241
+ "epoch": 0.6455777921239509,
242
+ "step": 11500
243
+ },
244
+ {
245
+ "loss": 0.0028,
246
+ "grad_norm": 0.0006974066491238773,
247
+ "learning_rate": 0.00016839913922155688,
248
+ "epoch": 0.673646391781514,
249
+ "step": 12000
250
+ },
251
+ {
252
+ "eval_loss": 0.0013544464018195868,
253
+ "eval_evaluator_0": 0.001526530017144978,
254
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.732440031968934,
255
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7259782924564794,
256
+ "eval_JSTS_pearson_cosine": 0.81524683780267,
257
+ "eval_JSTS_spearman_cosine": 0.7690769793502338,
258
+ "eval_sequential_score": 0.49886060060795273,
259
+ "eval_runtime": 81.4058,
260
+ "eval_samples_per_second": 2263.365,
261
+ "eval_steps_per_second": 4.422,
262
+ "epoch": 0.673646391781514,
263
+ "step": 12000
264
+ },
265
+ {
266
+ "loss": 0.0028,
267
+ "grad_norm": 0.0007144405390135944,
268
+ "learning_rate": 0.00017541635479041915,
269
+ "epoch": 0.7017149914390771,
270
+ "step": 12500
271
+ },
272
+ {
273
+ "loss": 0.0028,
274
+ "grad_norm": 0.0006135280709713697,
275
+ "learning_rate": 0.00018243357035928144,
276
+ "epoch": 0.7297835910966401,
277
+ "step": 13000
278
+ },
279
+ {
280
+ "loss": 0.0027,
281
+ "grad_norm": 0.0006482451572082937,
282
+ "learning_rate": 0.0001894507859281437,
283
+ "epoch": 0.7578521907542033,
284
+ "step": 13500
285
+ },
286
+ {
287
+ "loss": 0.0027,
288
+ "grad_norm": 0.0006131622940301895,
289
+ "learning_rate": 0.00019646800149700596,
290
+ "epoch": 0.7859207904117663,
291
+ "step": 14000
292
+ },
293
+ {
294
+ "eval_loss": 0.0013185555581003428,
295
+ "eval_evaluator_0": 0.0014964122092351317,
296
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7428962880442835,
297
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7396057152300574,
298
+ "eval_JSTS_pearson_cosine": 0.8212214202225949,
299
+ "eval_JSTS_spearman_cosine": 0.7750900705329776,
300
+ "eval_sequential_score": 0.50539739932409,
301
+ "eval_runtime": 78.3836,
302
+ "eval_samples_per_second": 2350.632,
303
+ "eval_steps_per_second": 4.593,
304
+ "epoch": 0.7859207904117663,
305
+ "step": 14000
306
+ },
307
+ {
308
+ "loss": 0.0027,
309
+ "grad_norm": 0.0006473588873632252,
310
+ "learning_rate": 0.00020348521706586823,
311
+ "epoch": 0.8139893900693295,
312
+ "step": 14500
313
+ },
314
+ {
315
+ "loss": 0.0027,
316
+ "grad_norm": 0.000711853732354939,
317
+ "learning_rate": 0.00021050243263473053,
318
+ "epoch": 0.8420579897268925,
319
+ "step": 15000
320
+ },
321
+ {
322
+ "loss": 0.0027,
323
+ "grad_norm": 0.0005788441631011665,
324
+ "learning_rate": 0.0002175196482035928,
325
+ "epoch": 0.8701265893844556,
326
+ "step": 15500
327
+ },
328
+ {
329
+ "loss": 0.0027,
330
+ "grad_norm": 0.0005827408167533576,
331
+ "learning_rate": 0.00022453686377245507,
332
+ "epoch": 0.8981951890420187,
333
+ "step": 16000
334
+ },
335
+ {
336
+ "eval_loss": 0.001293691573664546,
337
+ "eval_evaluator_0": 0.0014756449963897467,
338
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7525848141469997,
339
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7499182070037869,
340
+ "eval_JSTS_pearson_cosine": 0.8271822689658272,
341
+ "eval_JSTS_spearman_cosine": 0.7792775749489048,
342
+ "eval_sequential_score": 0.5102238089830271,
343
+ "eval_runtime": 77.0122,
344
+ "eval_samples_per_second": 2392.491,
345
+ "eval_steps_per_second": 4.675,
346
+ "epoch": 0.8981951890420187,
347
+ "step": 16000
348
+ },
349
+ {
350
+ "loss": 0.0027,
351
+ "grad_norm": 0.0005995088722556829,
352
+ "learning_rate": 0.00023155407934131734,
353
+ "epoch": 0.9262637886995818,
354
+ "step": 16500
355
+ },
356
+ {
357
+ "loss": 0.0027,
358
+ "grad_norm": 0.0006232665036804974,
359
+ "learning_rate": 0.00023857129491017964,
360
+ "epoch": 0.9543323883571448,
361
+ "step": 17000
362
+ },
363
+ {
364
+ "loss": 0.0026,
365
+ "grad_norm": 0.0004452952998690307,
366
+ "learning_rate": 0.0002455885104790419,
367
+ "epoch": 0.982400988014708,
368
+ "step": 17500
369
+ },
370
+ {
371
+ "loss": 0.0026,
372
+ "grad_norm": 0.00046520173782482743,
373
+ "learning_rate": 0.0002526057260479042,
374
+ "epoch": 1.0104415190726135,
375
+ "step": 18000
376
+ },
377
+ {
378
+ "eval_loss": 0.0012740237871184945,
379
+ "eval_evaluator_0": 0.0014586352044716477,
380
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7569989865371126,
381
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.75423179602674,
382
+ "eval_JSTS_pearson_cosine": 0.8323697140474166,
383
+ "eval_JSTS_spearman_cosine": 0.7847176995466478,
384
+ "eval_sequential_score": 0.5134693769259532,
385
+ "eval_runtime": 75.7689,
386
+ "eval_samples_per_second": 2431.75,
387
+ "eval_steps_per_second": 4.751,
388
+ "epoch": 1.0104415190726135,
389
+ "step": 18000
390
+ },
391
+ {
392
+ "loss": 0.0026,
393
+ "grad_norm": 0.000663595914375037,
394
+ "learning_rate": 0.0002596229416167664,
395
+ "epoch": 1.0385101187301766,
396
+ "step": 18500
397
+ },
398
+ {
399
+ "loss": 0.0026,
400
+ "grad_norm": 0.0004852344573009759,
401
+ "learning_rate": 0.0002666401571856287,
402
+ "epoch": 1.0665787183877395,
403
+ "step": 19000
404
+ },
405
+ {
406
+ "loss": 0.0026,
407
+ "grad_norm": 0.0004660378326661885,
408
+ "learning_rate": 0.000273657372754491,
409
+ "epoch": 1.0946473180453027,
410
+ "step": 19500
411
+ },
412
+ {
413
+ "loss": 0.0026,
414
+ "grad_norm": 0.0005068962927907705,
415
+ "learning_rate": 0.00028067458832335327,
416
+ "epoch": 1.1227159177028658,
417
+ "step": 20000
418
+ },
419
+ {
420
+ "eval_loss": 0.0012583525385707617,
421
+ "eval_evaluator_0": 0.0014450980816036463,
422
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7685871450736268,
423
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7685391031501245,
424
+ "eval_JSTS_pearson_cosine": 0.8366257302162161,
425
+ "eval_JSTS_spearman_cosine": 0.7883307684121021,
426
+ "eval_sequential_score": 0.5194383232146101,
427
+ "eval_runtime": 73.3462,
428
+ "eval_samples_per_second": 2512.072,
429
+ "eval_steps_per_second": 4.908,
430
+ "epoch": 1.1227159177028658,
431
+ "step": 20000
432
+ },
433
+ {
434
+ "loss": 0.0026,
435
+ "grad_norm": 0.0004811872495338321,
436
+ "learning_rate": 0.0002876918038922155,
437
+ "epoch": 1.150784517360429,
438
+ "step": 20500
439
+ },
440
+ {
441
+ "loss": 0.0026,
442
+ "grad_norm": 0.00048696709563955665,
443
+ "learning_rate": 0.0002947090194610778,
444
+ "epoch": 1.178853117017992,
445
+ "step": 21000
446
+ },
447
+ {
448
+ "loss": 0.0026,
449
+ "grad_norm": 0.0005413197795860469,
450
+ "learning_rate": 0.00029969536358232613,
451
+ "epoch": 1.206921716675555,
452
+ "step": 21500
453
+ },
454
+ {
455
+ "loss": 0.0026,
456
+ "grad_norm": 0.00045019047684036195,
457
+ "learning_rate": 0.00029845700416088764,
458
+ "epoch": 1.2349903163331182,
459
+ "step": 22000
460
+ },
461
+ {
462
+ "eval_loss": 0.0012464351020753384,
463
+ "eval_evaluator_0": 0.0014347253600135446,
464
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7692492893947476,
465
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7694627869363532,
466
+ "eval_JSTS_pearson_cosine": 0.8390020614430106,
467
+ "eval_JSTS_spearman_cosine": 0.7916375179002599,
468
+ "eval_sequential_score": 0.5208450100655422,
469
+ "eval_runtime": 70.9341,
470
+ "eval_samples_per_second": 2597.496,
471
+ "eval_steps_per_second": 5.075,
472
+ "epoch": 1.2349903163331182,
473
+ "step": 22000
474
+ },
475
+ {
476
+ "loss": 0.0026,
477
+ "grad_norm": 0.0004873638099525124,
478
+ "learning_rate": 0.00029721864473944914,
479
+ "epoch": 1.263058915990681,
480
+ "step": 22500
481
+ },
482
+ {
483
+ "loss": 0.0026,
484
+ "grad_norm": 0.00047726804041303694,
485
+ "learning_rate": 0.0002959802853180107,
486
+ "epoch": 1.2911275156482442,
487
+ "step": 23000
488
+ },
489
+ {
490
+ "loss": 0.0025,
491
+ "grad_norm": 0.0004717214033007622,
492
+ "learning_rate": 0.0002947419258965722,
493
+ "epoch": 1.3191961153058074,
494
+ "step": 23500
495
+ },
496
+ {
497
+ "loss": 0.0025,
498
+ "grad_norm": 0.00046010816004127264,
499
+ "learning_rate": 0.0002935035664751337,
500
+ "epoch": 1.3472647149633705,
501
+ "step": 24000
502
+ },
503
+ {
504
+ "eval_loss": 0.0012361396802589297,
505
+ "eval_evaluator_0": 0.00142592191696167,
506
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7691914756523444,
507
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7697667204746023,
508
+ "eval_JSTS_pearson_cosine": 0.8417145522124531,
509
+ "eval_JSTS_spearman_cosine": 0.7937078701918163,
510
+ "eval_sequential_score": 0.5216335041944601,
511
+ "eval_runtime": 75.1661,
512
+ "eval_samples_per_second": 2451.253,
513
+ "eval_steps_per_second": 4.789,
514
+ "epoch": 1.3472647149633705,
515
+ "step": 24000
516
+ },
517
+ {
518
+ "loss": 0.0025,
519
+ "grad_norm": 0.0004902255604974926,
520
+ "learning_rate": 0.00029226520705369526,
521
+ "epoch": 1.3753333146209337,
522
+ "step": 24500
523
+ },
524
+ {
525
+ "loss": 0.0025,
526
+ "grad_norm": 0.0004407005035318434,
527
+ "learning_rate": 0.00029102684763225676,
528
+ "epoch": 1.4034019142784966,
529
+ "step": 25000
530
+ },
531
+ {
532
+ "loss": 0.0025,
533
+ "grad_norm": 0.00044140563113614917,
534
+ "learning_rate": 0.0002897884882108183,
535
+ "epoch": 1.4314705139360597,
536
+ "step": 25500
537
+ },
538
+ {
539
+ "loss": 0.0025,
540
+ "grad_norm": 0.00041528986184857786,
541
+ "learning_rate": 0.0002885501287893798,
542
+ "epoch": 1.4595391135936229,
543
+ "step": 26000
544
+ },
545
+ {
546
+ "eval_loss": 0.0012283611577004194,
547
+ "eval_evaluator_0": 0.0014194094110280275,
548
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7765957004921605,
549
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7784908782504051,
550
+ "eval_JSTS_pearson_cosine": 0.8435137350349455,
551
+ "eval_JSTS_spearman_cosine": 0.7951288914596331,
552
+ "eval_sequential_score": 0.5250130597070221,
553
+ "eval_runtime": 75.7745,
554
+ "eval_samples_per_second": 2431.569,
555
+ "eval_steps_per_second": 4.751,
556
+ "epoch": 1.4595391135936229,
557
+ "step": 26000
558
+ },
559
+ {
560
+ "loss": 0.0025,
561
+ "grad_norm": 0.0004047435650136322,
562
+ "learning_rate": 0.0002873117693679413,
563
+ "epoch": 1.4876077132511858,
564
+ "step": 26500
565
+ },
566
+ {
567
+ "loss": 0.0025,
568
+ "grad_norm": 0.0004014256992377341,
569
+ "learning_rate": 0.0002860734099465028,
570
+ "epoch": 1.515676312908749,
571
+ "step": 27000
572
+ },
573
+ {
574
+ "loss": 0.0025,
575
+ "grad_norm": 0.00044688646448776126,
576
+ "learning_rate": 0.0002848350505250644,
577
+ "epoch": 1.543744912566312,
578
+ "step": 27500
579
+ },
580
+ {
581
+ "loss": 0.0025,
582
+ "grad_norm": 0.00040986225940287113,
583
+ "learning_rate": 0.0002835966911036259,
584
+ "epoch": 1.571813512223875,
585
+ "step": 28000
586
+ },
587
+ {
588
+ "eval_loss": 0.0012222524965181947,
589
+ "eval_evaluator_0": 0.001413983409292996,
590
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7783094967374669,
591
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7798272724894207,
592
+ "eval_JSTS_pearson_cosine": 0.847750129011231,
593
+ "eval_JSTS_spearman_cosine": 0.7994761284870242,
594
+ "eval_sequential_score": 0.5269057947952459,
595
+ "eval_runtime": 71.22,
596
+ "eval_samples_per_second": 2587.07,
597
+ "eval_steps_per_second": 5.055,
598
+ "epoch": 1.571813512223875,
599
+ "step": 28000
600
+ },
601
+ {
602
+ "loss": 0.0025,
603
+ "grad_norm": 0.00041717709973454475,
604
+ "learning_rate": 0.00028235833168218744,
605
+ "epoch": 1.5998821118814384,
606
+ "step": 28500
607
+ },
608
+ {
609
+ "loss": 0.0025,
610
+ "grad_norm": 0.00043241435196250677,
611
+ "learning_rate": 0.00028111997226074894,
612
+ "epoch": 1.6279507115390013,
613
+ "step": 29000
614
+ },
615
+ {
616
+ "loss": 0.0025,
617
+ "grad_norm": 0.0004293594683986157,
618
+ "learning_rate": 0.00027988161283931044,
619
+ "epoch": 1.6560193111965644,
620
+ "step": 29500
621
+ },
622
+ {
623
+ "loss": 0.0025,
624
+ "grad_norm": 0.0004401277983561158,
625
+ "learning_rate": 0.000278643253417872,
626
+ "epoch": 1.6840879108541276,
627
+ "step": 30000
628
+ },
629
+ {
630
+ "eval_loss": 0.0012164415093138814,
631
+ "eval_evaluator_0": 0.0014087973395362496,
632
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7798749605001464,
633
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7820560181229511,
634
+ "eval_JSTS_pearson_cosine": 0.8471103806180909,
635
+ "eval_JSTS_spearman_cosine": 0.7985064708274818,
636
+ "eval_sequential_score": 0.5273237620966564,
637
+ "eval_runtime": 73.8852,
638
+ "eval_samples_per_second": 2493.746,
639
+ "eval_steps_per_second": 4.872,
640
+ "epoch": 1.6840879108541276,
641
+ "step": 30000
642
+ },
643
+ {
644
+ "loss": 0.0025,
645
+ "grad_norm": 0.00041988049633800983,
646
+ "learning_rate": 0.0002774048939964335,
647
+ "epoch": 1.7121565105116905,
648
+ "step": 30500
649
+ },
650
+ {
651
+ "loss": 0.0025,
652
+ "grad_norm": 0.0003672194143291563,
653
+ "learning_rate": 0.000276166534574995,
654
+ "epoch": 1.7402251101692536,
655
+ "step": 31000
656
+ },
657
+ {
658
+ "loss": 0.0025,
659
+ "grad_norm": 0.00039175362326204777,
660
+ "learning_rate": 0.00027492817515355656,
661
+ "epoch": 1.7682937098268168,
662
+ "step": 31500
663
+ },
664
+ {
665
+ "loss": 0.0025,
666
+ "grad_norm": 0.0004079992650076747,
667
+ "learning_rate": 0.00027368981573211806,
668
+ "epoch": 1.7963623094843797,
669
+ "step": 32000
670
+ },
671
+ {
672
+ "eval_loss": 0.0012125095818191767,
673
+ "eval_evaluator_0": 0.001405515824444592,
674
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7859915873862344,
675
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.785955305161636,
676
+ "eval_JSTS_pearson_cosine": 0.848154268566394,
677
+ "eval_JSTS_spearman_cosine": 0.7998758648147456,
678
+ "eval_sequential_score": 0.5290788952669421,
679
+ "eval_runtime": 72.8877,
680
+ "eval_samples_per_second": 2527.874,
681
+ "eval_steps_per_second": 4.939,
682
+ "epoch": 1.7963623094843797,
683
+ "step": 32000
684
+ },
685
+ {
686
+ "loss": 0.0025,
687
+ "grad_norm": 0.0003926429490093142,
688
+ "learning_rate": 0.0002724514563106796,
689
+ "epoch": 1.824430909141943,
690
+ "step": 32500
691
+ },
692
+ {
693
+ "loss": 0.0025,
694
+ "grad_norm": 0.0004074297030456364,
695
+ "learning_rate": 0.0002712130968892411,
696
+ "epoch": 1.852499508799506,
697
+ "step": 33000
698
+ },
699
+ {
700
+ "loss": 0.0025,
701
+ "grad_norm": 0.0003890927182510495,
702
+ "learning_rate": 0.0002699747374678026,
703
+ "epoch": 1.8805681084570691,
704
+ "step": 33500
705
+ },
706
+ {
707
+ "loss": 0.0025,
708
+ "grad_norm": 0.00040131420246325433,
709
+ "learning_rate": 0.0002687363780463641,
710
+ "epoch": 1.9086367081146323,
711
+ "step": 34000
712
+ },
713
+ {
714
+ "eval_loss": 0.0012091138632968068,
715
+ "eval_evaluator_0": 0.001402442343533039,
716
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7827178152250833,
717
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7859266574740394,
718
+ "eval_JSTS_pearson_cosine": 0.8484082672468725,
719
+ "eval_JSTS_spearman_cosine": 0.800852668744074,
720
+ "eval_sequential_score": 0.5293939228538821,
721
+ "eval_runtime": 75.221,
722
+ "eval_samples_per_second": 2449.463,
723
+ "eval_steps_per_second": 4.786,
724
+ "epoch": 1.9086367081146323,
725
+ "step": 34000
726
+ },
727
+ {
728
+ "loss": 0.0025,
729
+ "grad_norm": 0.00038372183917090297,
730
+ "learning_rate": 0.0002674980186249257,
731
+ "epoch": 1.9367053077721952,
732
+ "step": 34500
733
+ },
734
+ {
735
+ "loss": 0.0025,
736
+ "grad_norm": 0.00037665441050194204,
737
+ "learning_rate": 0.0002662596592034872,
738
+ "epoch": 1.9647739074297583,
739
+ "step": 35000
740
+ },
741
+ {
742
+ "loss": 0.0025,
743
+ "grad_norm": 0.0003966049407608807,
744
+ "learning_rate": 0.00026502129978204874,
745
+ "epoch": 1.9928425070873215,
746
+ "step": 35500
747
+ },
748
+ {
749
+ "loss": 0.0025,
750
+ "grad_norm": 0.0003766542940866202,
751
+ "learning_rate": 0.00026378294036061024,
752
+ "epoch": 2.020883038145227,
753
+ "step": 36000
754
+ },
755
+ {
756
+ "eval_loss": 0.0012054507387802005,
757
+ "eval_evaluator_0": 0.0013995040208101273,
758
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.784020333700642,
759
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7840106595360978,
760
+ "eval_JSTS_pearson_cosine": 0.8490527668314187,
761
+ "eval_JSTS_spearman_cosine": 0.800047720066079,
762
+ "eval_sequential_score": 0.5284859612076623,
763
+ "eval_runtime": 69.7452,
764
+ "eval_samples_per_second": 2641.773,
765
+ "eval_steps_per_second": 5.162,
766
+ "epoch": 2.020883038145227,
767
+ "step": 36000
768
+ },
769
+ {
770
+ "loss": 0.0025,
771
+ "grad_norm": 0.0004417026066221297,
772
+ "learning_rate": 0.00026254458093917174,
773
+ "epoch": 2.04895163780279,
774
+ "step": 36500
775
+ },
776
+ {
777
+ "loss": 0.0025,
778
+ "grad_norm": 0.00041861977661028504,
779
+ "learning_rate": 0.0002613062215177333,
780
+ "epoch": 2.0770202374603532,
781
+ "step": 37000
782
+ },
783
+ {
784
+ "loss": 0.0025,
785
+ "grad_norm": 0.0004018982872366905,
786
+ "learning_rate": 0.0002600678620962948,
787
+ "epoch": 2.105088837117916,
788
+ "step": 37500
789
+ },
790
+ {
791
+ "loss": 0.0025,
792
+ "grad_norm": 0.00039656515582464635,
793
+ "learning_rate": 0.0002588295026748563,
794
+ "epoch": 2.133157436775479,
795
+ "step": 38000
796
+ },
797
+ {
798
+ "eval_loss": 0.0012025837786495686,
799
+ "eval_evaluator_0": 0.0013965392718091607,
800
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7867175563214568,
801
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.788156882015082,
802
+ "eval_JSTS_pearson_cosine": 0.850585428356629,
803
+ "eval_JSTS_spearman_cosine": 0.8028791930427168,
804
+ "eval_sequential_score": 0.5308108714432027,
805
+ "eval_runtime": 75.0783,
806
+ "eval_samples_per_second": 2454.116,
807
+ "eval_steps_per_second": 4.795,
808
+ "epoch": 2.133157436775479,
809
+ "step": 38000
810
+ },
811
+ {
812
+ "loss": 0.0025,
813
+ "grad_norm": 0.0003948920639231801,
814
+ "learning_rate": 0.00025759114325341786,
815
+ "epoch": 2.1612260364330425,
816
+ "step": 38500
817
+ },
818
+ {
819
+ "loss": 0.0025,
820
+ "grad_norm": 0.0003850881475955248,
821
+ "learning_rate": 0.00025635278383197936,
822
+ "epoch": 2.1892946360906054,
823
+ "step": 39000
824
+ },
825
+ {
826
+ "loss": 0.0025,
827
+ "grad_norm": 0.0003732353507075459,
828
+ "learning_rate": 0.0002551144244105409,
829
+ "epoch": 2.2173632357481683,
830
+ "step": 39500
831
+ },
832
+ {
833
+ "loss": 0.0025,
834
+ "grad_norm": 0.0003862242156174034,
835
+ "learning_rate": 0.0002538760649891024,
836
+ "epoch": 2.2454318354057317,
837
+ "step": 40000
838
+ },
839
+ {
840
+ "eval_loss": 0.0011998420814052224,
841
+ "eval_evaluator_0": 0.0013942737132310867,
842
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7852403092359551,
843
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.786725095511986,
844
+ "eval_JSTS_pearson_cosine": 0.8514233172740598,
845
+ "eval_JSTS_spearman_cosine": 0.8029822066091388,
846
+ "eval_sequential_score": 0.5303671919447853,
847
+ "eval_runtime": 75.9955,
848
+ "eval_samples_per_second": 2424.497,
849
+ "eval_steps_per_second": 4.737,
850
+ "epoch": 2.2454318354057317,
851
+ "step": 40000
852
+ },
853
+ {
854
+ "loss": 0.0025,
855
+ "grad_norm": 0.000405432831030339,
856
+ "learning_rate": 0.0002526377055676639,
857
+ "epoch": 2.2735004350632946,
858
+ "step": 40500
859
+ },
860
+ {
861
+ "loss": 0.0025,
862
+ "grad_norm": 0.0003881326410919428,
863
+ "learning_rate": 0.0002513993461462254,
864
+ "epoch": 2.301569034720858,
865
+ "step": 41000
866
+ },
867
+ {
868
+ "loss": 0.0025,
869
+ "grad_norm": 0.0003565926162991673,
870
+ "learning_rate": 0.000250160986724787,
871
+ "epoch": 2.329637634378421,
872
+ "step": 41500
873
+ },
874
+ {
875
+ "loss": 0.0025,
876
+ "grad_norm": 0.0004187309823464602,
877
+ "learning_rate": 0.0002489226273033485,
878
+ "epoch": 2.357706234035984,
879
+ "step": 42000
880
+ },
881
+ {
882
+ "eval_loss": 0.0011972869979217649,
883
+ "eval_evaluator_0": 0.0013917966280132532,
884
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7895009562886788,
885
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7909140535786724,
886
+ "eval_JSTS_pearson_cosine": 0.8530308729488678,
887
+ "eval_JSTS_spearman_cosine": 0.8044463484338609,
888
+ "eval_sequential_score": 0.5322507328801822,
889
+ "eval_runtime": 72.3948,
890
+ "eval_samples_per_second": 2545.087,
891
+ "eval_steps_per_second": 4.973,
892
+ "epoch": 2.357706234035984,
893
+ "step": 42000
894
+ },
895
+ {
896
+ "loss": 0.0025,
897
+ "grad_norm": 0.0004115802585147321,
898
+ "learning_rate": 0.00024768426788191004,
899
+ "epoch": 2.385774833693547,
900
+ "step": 42500
901
+ },
902
+ {
903
+ "loss": 0.0025,
904
+ "grad_norm": 0.00038897068588994443,
905
+ "learning_rate": 0.00024644590846047154,
906
+ "epoch": 2.41384343335111,
907
+ "step": 43000
908
+ },
909
+ {
910
+ "loss": 0.0024,
911
+ "grad_norm": 0.0004322814638726413,
912
+ "learning_rate": 0.0002452075490390331,
913
+ "epoch": 2.4419120330086734,
914
+ "step": 43500
915
+ },
916
+ {
917
+ "loss": 0.0024,
918
+ "grad_norm": 0.0003841428260784596,
919
+ "learning_rate": 0.00024396918961759457,
920
+ "epoch": 2.4699806326662364,
921
+ "step": 44000
922
+ },
923
+ {
924
+ "eval_loss": 0.0011959928087890148,
925
+ "eval_evaluator_0": 0.0013907015090808272,
926
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7889312871704597,
927
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7924519964178153,
928
+ "eval_JSTS_pearson_cosine": 0.8532937048934948,
929
+ "eval_JSTS_spearman_cosine": 0.8046551411750555,
930
+ "eval_sequential_score": 0.5328326130339839,
931
+ "eval_runtime": 70.5626,
932
+ "eval_samples_per_second": 2611.17,
933
+ "eval_steps_per_second": 5.102,
934
+ "epoch": 2.4699806326662364,
935
+ "step": 44000
936
+ },
937
+ {
938
+ "loss": 0.0024,
939
+ "grad_norm": 0.0003754703502636403,
940
+ "learning_rate": 0.0002427308301961561,
941
+ "epoch": 2.4980492323237993,
942
+ "step": 44500
943
+ },
944
+ {
945
+ "loss": 0.0024,
946
+ "grad_norm": 0.000383255654014647,
947
+ "learning_rate": 0.00024149247077471763,
948
+ "epoch": 2.526117831981362,
949
+ "step": 45000
950
+ },
951
+ {
952
+ "loss": 0.0024,
953
+ "grad_norm": 0.00039129829383455217,
954
+ "learning_rate": 0.00024025411135327916,
955
+ "epoch": 2.5541864316389256,
956
+ "step": 45500
957
+ },
958
+ {
959
+ "loss": 0.0024,
960
+ "grad_norm": 0.00036208087112754583,
961
+ "learning_rate": 0.00023901575193184066,
962
+ "epoch": 2.5822550312964885,
963
+ "step": 46000
964
+ },
965
+ {
966
+ "eval_loss": 0.001193435164168477,
967
+ "eval_evaluator_0": 0.0013883529463782907,
968
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7905069652961602,
969
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7944868870697853,
970
+ "eval_JSTS_pearson_cosine": 0.8558051282512369,
971
+ "eval_JSTS_spearman_cosine": 0.8081201355340467,
972
+ "eval_sequential_score": 0.5346651251834035,
973
+ "eval_runtime": 75.5038,
974
+ "eval_samples_per_second": 2440.286,
975
+ "eval_steps_per_second": 4.768,
976
+ "epoch": 2.5822550312964885,
977
+ "step": 46000
978
+ },
979
+ {
980
+ "loss": 0.0024,
981
+ "grad_norm": 0.00036889282637275755,
982
+ "learning_rate": 0.0002377773925104022,
983
+ "epoch": 2.610323630954052,
984
+ "step": 46500
985
+ },
986
+ {
987
+ "loss": 0.0024,
988
+ "grad_norm": 0.0004052662698086351,
989
+ "learning_rate": 0.00023653903308896375,
990
+ "epoch": 2.6383922306116148,
991
+ "step": 47000
992
+ },
993
+ {
994
+ "loss": 0.0024,
995
+ "grad_norm": 0.00035753531847149134,
996
+ "learning_rate": 0.00023530067366752525,
997
+ "epoch": 2.6664608302691777,
998
+ "step": 47500
999
+ },
1000
+ {
1001
+ "loss": 0.0024,
1002
+ "grad_norm": 0.00040843107854016125,
1003
+ "learning_rate": 0.00023406231424608678,
1004
+ "epoch": 2.694529429926741,
1005
+ "step": 48000
1006
+ },
1007
+ {
1008
+ "eval_loss": 0.001191267161630094,
1009
+ "eval_evaluator_0": 0.0013865241780877113,
1010
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7895450351964287,
1011
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7918359978315925,
1012
+ "eval_JSTS_pearson_cosine": 0.8549595123958019,
1013
+ "eval_JSTS_spearman_cosine": 0.807145525822447,
1014
+ "eval_sequential_score": 0.5334560159440423,
1015
+ "eval_runtime": 75.509,
1016
+ "eval_samples_per_second": 2440.121,
1017
+ "eval_steps_per_second": 4.768,
1018
+ "epoch": 2.694529429926741,
1019
+ "step": 48000
1020
+ },
1021
+ {
1022
+ "loss": 0.0024,
1023
+ "grad_norm": 0.0003651395963970572,
1024
+ "learning_rate": 0.00023282395482464828,
1025
+ "epoch": 2.722598029584304,
1026
+ "step": 48500
1027
+ },
1028
+ {
1029
+ "loss": 0.0024,
1030
+ "grad_norm": 0.0004023007059004158,
1031
+ "learning_rate": 0.00023158559540320983,
1032
+ "epoch": 2.7506666292418673,
1033
+ "step": 49000
1034
+ },
1035
+ {
1036
+ "loss": 0.0024,
1037
+ "grad_norm": 0.00035017222398892045,
1038
+ "learning_rate": 0.00023034723598177134,
1039
+ "epoch": 2.7787352288994303,
1040
+ "step": 49500
1041
+ },
1042
+ {
1043
+ "loss": 0.0024,
1044
+ "grad_norm": 0.00035437452606856823,
1045
+ "learning_rate": 0.00022910887656033287,
1046
+ "epoch": 2.806803828556993,
1047
+ "step": 50000
1048
+ },
1049
+ {
1050
+ "eval_loss": 0.0011895851930603385,
1051
+ "eval_evaluator_0": 0.0013849218375980854,
1052
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7922481852931719,
1053
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7945407372089203,
1054
+ "eval_JSTS_pearson_cosine": 0.853618309736644,
1055
+ "eval_JSTS_spearman_cosine": 0.806310240190843,
1056
+ "eval_sequential_score": 0.5340786330791204,
1057
+ "eval_runtime": 69.7354,
1058
+ "eval_samples_per_second": 2642.143,
1059
+ "eval_steps_per_second": 5.162,
1060
+ "epoch": 2.806803828556993,
1061
+ "step": 50000
1062
+ },
1063
+ {
1064
+ "loss": 0.0024,
1065
+ "grad_norm": 0.00036025006556883454,
1066
+ "learning_rate": 0.00022787051713889437,
1067
+ "epoch": 2.8348724282145565,
1068
+ "step": 50500
1069
+ },
1070
+ {
1071
+ "loss": 0.0024,
1072
+ "grad_norm": 0.00039166337228380144,
1073
+ "learning_rate": 0.0002266321577174559,
1074
+ "epoch": 2.8629410278721195,
1075
+ "step": 51000
1076
+ },
1077
+ {
1078
+ "loss": 0.0024,
1079
+ "grad_norm": 0.00034316867822781205,
1080
+ "learning_rate": 0.00022539379829601743,
1081
+ "epoch": 2.8910096275296824,
1082
+ "step": 51500
1083
+ },
1084
+ {
1085
+ "loss": 0.0024,
1086
+ "grad_norm": 0.00034904375206679106,
1087
+ "learning_rate": 0.00022415543887457896,
1088
+ "epoch": 2.9190782271872457,
1089
+ "step": 52000
1090
+ },
1091
+ {
1092
+ "eval_loss": 0.0011883709812536836,
1093
+ "eval_evaluator_0": 0.0013841136824339628,
1094
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7910741852449901,
1095
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7930385547288906,
1096
+ "eval_JSTS_pearson_cosine": 0.8560901785956929,
1097
+ "eval_JSTS_spearman_cosine": 0.8078193158631177,
1098
+ "eval_sequential_score": 0.534080661424814,
1099
+ "eval_runtime": 73.0378,
1100
+ "eval_samples_per_second": 2522.681,
1101
+ "eval_steps_per_second": 4.929,
1102
+ "epoch": 2.9190782271872457,
1103
+ "step": 52000
1104
+ },
1105
+ {
1106
+ "loss": 0.0024,
1107
+ "grad_norm": 0.00038310152012854815,
1108
+ "learning_rate": 0.00022291707945314046,
1109
+ "epoch": 2.9471468268448087,
1110
+ "step": 52500
1111
+ },
1112
+ {
1113
+ "loss": 0.0024,
1114
+ "grad_norm": 0.00037686576251871884,
1115
+ "learning_rate": 0.000221678720031702,
1116
+ "epoch": 2.9752154265023716,
1117
+ "step": 53000
1118
+ },
1119
+ {
1120
+ "loss": 0.0024,
1121
+ "grad_norm": 0.0003636401961557567,
1122
+ "learning_rate": 0.0002204403606102635,
1123
+ "epoch": 3.0032559575602775,
1124
+ "step": 53500
1125
+ },
1126
+ {
1127
+ "loss": 0.0024,
1128
+ "grad_norm": 0.0003593047440517694,
1129
+ "learning_rate": 0.00021920200118882505,
1130
+ "epoch": 3.0313245572178404,
1131
+ "step": 54000
1132
+ },
1133
+ {
1134
+ "eval_loss": 0.001186997164040804,
1135
+ "eval_evaluator_0": 0.0013827680377289653,
1136
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7928793416853003,
1137
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7946997311878671,
1138
+ "eval_JSTS_pearson_cosine": 0.8561692207448726,
1139
+ "eval_JSTS_spearman_cosine": 0.8070635079461249,
1140
+ "eval_sequential_score": 0.5343820023905737,
1141
+ "eval_runtime": 74.9587,
1142
+ "eval_samples_per_second": 2458.035,
1143
+ "eval_steps_per_second": 4.803,
1144
+ "epoch": 3.0313245572178404,
1145
+ "step": 54000
1146
+ },
1147
+ {
1148
+ "loss": 0.0024,
1149
+ "grad_norm": 0.0003633006999734789,
1150
+ "learning_rate": 0.00021796364176738655,
1151
+ "epoch": 3.0593931568754034,
1152
+ "step": 54500
1153
+ },
1154
+ {
1155
+ "loss": 0.0024,
1156
+ "grad_norm": 0.00038046957342885435,
1157
+ "learning_rate": 0.00021672528234594808,
1158
+ "epoch": 3.0874617565329667,
1159
+ "step": 55000
1160
+ },
1161
+ {
1162
+ "loss": 0.0024,
1163
+ "grad_norm": 0.000375107309082523,
1164
+ "learning_rate": 0.00021548692292450958,
1165
+ "epoch": 3.1155303561905296,
1166
+ "step": 55500
1167
+ },
1168
+ {
1169
+ "loss": 0.0024,
1170
+ "grad_norm": 0.00037250894820317626,
1171
+ "learning_rate": 0.00021424856350307114,
1172
+ "epoch": 3.1435989558480926,
1173
+ "step": 56000
1174
+ },
1175
+ {
1176
+ "eval_loss": 0.001185316708870232,
1177
+ "eval_evaluator_0": 0.0013811604585498571,
1178
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7940863256555963,
1179
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7954504513319767,
1180
+ "eval_JSTS_pearson_cosine": 0.8560864211225326,
1181
+ "eval_JSTS_spearman_cosine": 0.8077153694140476,
1182
+ "eval_sequential_score": 0.5348489937348581,
1183
+ "eval_runtime": 69.9604,
1184
+ "eval_samples_per_second": 2633.647,
1185
+ "eval_steps_per_second": 5.146,
1186
+ "epoch": 3.1435989558480926,
1187
+ "step": 56000
1188
+ },
1189
+ {
1190
+ "loss": 0.0024,
1191
+ "grad_norm": 0.00036285867099650204,
1192
+ "learning_rate": 0.00021301020408163264,
1193
+ "epoch": 3.171667555505656,
1194
+ "step": 56500
1195
+ },
1196
+ {
1197
+ "loss": 0.0024,
1198
+ "grad_norm": 0.00036395539063960314,
1199
+ "learning_rate": 0.00021177184466019417,
1200
+ "epoch": 3.199736155163219,
1201
+ "step": 57000
1202
+ },
1203
+ {
1204
+ "loss": 0.0024,
1205
+ "grad_norm": 0.0003883049066644162,
1206
+ "learning_rate": 0.00021053348523875567,
1207
+ "epoch": 3.2278047548207818,
1208
+ "step": 57500
1209
+ },
1210
+ {
1211
+ "loss": 0.0024,
1212
+ "grad_norm": 0.00039045579615049064,
1213
+ "learning_rate": 0.0002092951258173172,
1214
+ "epoch": 3.255873354478345,
1215
+ "step": 58000
1216
+ },
1217
+ {
1218
+ "eval_loss": 0.0011840644292533398,
1219
+ "eval_evaluator_0": 0.0013800367014482617,
1220
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7945901145209169,
1221
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7969071310400232,
1222
+ "eval_JSTS_pearson_cosine": 0.8564336996988411,
1223
+ "eval_JSTS_spearman_cosine": 0.8082929419515673,
1224
+ "eval_sequential_score": 0.5355267032310129,
1225
+ "eval_runtime": 73.7991,
1226
+ "eval_samples_per_second": 2496.655,
1227
+ "eval_steps_per_second": 4.878,
1228
+ "epoch": 3.255873354478345,
1229
+ "step": 58000
1230
+ },
1231
+ {
1232
+ "loss": 0.0024,
1233
+ "grad_norm": 0.0003762798151001334,
1234
+ "learning_rate": 0.00020805676639587873,
1235
+ "epoch": 3.283941954135908,
1236
+ "step": 58500
1237
+ },
1238
+ {
1239
+ "loss": 0.0024,
1240
+ "grad_norm": 0.00034586797119118273,
1241
+ "learning_rate": 0.00020681840697444026,
1242
+ "epoch": 3.3120105537934714,
1243
+ "step": 59000
1244
+ },
1245
+ {
1246
+ "loss": 0.0024,
1247
+ "grad_norm": 0.0003964265051763505,
1248
+ "learning_rate": 0.00020558004755300176,
1249
+ "epoch": 3.3400791534510343,
1250
+ "step": 59500
1251
+ },
1252
+ {
1253
+ "loss": 0.0024,
1254
+ "grad_norm": 0.00035214261151850224,
1255
+ "learning_rate": 0.0002043416881315633,
1256
+ "epoch": 3.3681477531085973,
1257
+ "step": 60000
1258
+ },
1259
+ {
1260
+ "eval_loss": 0.001183054526336491,
1261
+ "eval_evaluator_0": 0.0013789632357656956,
1262
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.788845469490114,
1263
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7915900973042301,
1264
+ "eval_JSTS_pearson_cosine": 0.8573217830132864,
1265
+ "eval_JSTS_spearman_cosine": 0.8088596257787191,
1266
+ "eval_sequential_score": 0.5339428954395716,
1267
+ "eval_runtime": 74.7649,
1268
+ "eval_samples_per_second": 2464.404,
1269
+ "eval_steps_per_second": 4.815,
1270
+ "epoch": 3.3681477531085973,
1271
+ "step": 60000
1272
+ },
1273
+ {
1274
+ "loss": 0.0024,
1275
+ "grad_norm": 0.00036784267285838723,
1276
+ "learning_rate": 0.0002031033287101248,
1277
+ "epoch": 3.3962163527661606,
1278
+ "step": 60500
1279
+ },
1280
+ {
1281
+ "loss": 0.0024,
1282
+ "grad_norm": 0.000343677238561213,
1283
+ "learning_rate": 0.00020186496928868635,
1284
+ "epoch": 3.4242849524237235,
1285
+ "step": 61000
1286
+ },
1287
+ {
1288
+ "loss": 0.0024,
1289
+ "grad_norm": 0.00038110592868179083,
1290
+ "learning_rate": 0.00020062660986724785,
1291
+ "epoch": 3.452353552081287,
1292
+ "step": 61500
1293
+ },
1294
+ {
1295
+ "loss": 0.0024,
1296
+ "grad_norm": 0.0003722326655406505,
1297
+ "learning_rate": 0.00019938825044580938,
1298
+ "epoch": 3.48042215173885,
1299
+ "step": 62000
1300
+ },
1301
+ {
1302
+ "eval_loss": 0.0011825228575617075,
1303
+ "eval_evaluator_0": 0.00137852702755481,
1304
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7918929978349496,
1305
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7940915393599467,
1306
+ "eval_JSTS_pearson_cosine": 0.8575694411337575,
1307
+ "eval_JSTS_spearman_cosine": 0.8092433884345962,
1308
+ "eval_sequential_score": 0.5349044849406992,
1309
+ "eval_runtime": 73.3458,
1310
+ "eval_samples_per_second": 2512.087,
1311
+ "eval_steps_per_second": 4.908,
1312
+ "epoch": 3.48042215173885,
1313
+ "step": 62000
1314
+ },
1315
+ {
1316
+ "loss": 0.0024,
1317
+ "grad_norm": 0.00037806775071658194,
1318
+ "learning_rate": 0.00019814989102437088,
1319
+ "epoch": 3.5084907513964128,
1320
+ "step": 62500
1321
+ },
1322
+ {
1323
+ "loss": 0.0024,
1324
+ "grad_norm": 0.00033235494629479945,
1325
+ "learning_rate": 0.00019691153160293244,
1326
+ "epoch": 3.5365593510539757,
1327
+ "step": 63000
1328
+ },
1329
+ {
1330
+ "loss": 0.0024,
1331
+ "grad_norm": 0.0003531025140546262,
1332
+ "learning_rate": 0.00019567317218149394,
1333
+ "epoch": 3.564627950711539,
1334
+ "step": 63500
1335
+ },
1336
+ {
1337
+ "loss": 0.0024,
1338
+ "grad_norm": 0.00037601080839522183,
1339
+ "learning_rate": 0.00019443481276005547,
1340
+ "epoch": 3.592696550369102,
1341
+ "step": 64000
1342
+ },
1343
+ {
1344
+ "eval_loss": 0.0011816879268735647,
1345
+ "eval_evaluator_0": 0.0013777543790638447,
1346
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.793978583231683,
1347
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7965635420461006,
1348
+ "eval_JSTS_pearson_cosine": 0.8593126185258553,
1349
+ "eval_JSTS_spearman_cosine": 0.8111660595724337,
1350
+ "eval_sequential_score": 0.536369118665866,
1351
+ "eval_runtime": 73.2876,
1352
+ "eval_samples_per_second": 2514.083,
1353
+ "eval_steps_per_second": 4.912,
1354
+ "epoch": 3.592696550369102,
1355
+ "step": 64000
1356
+ },
1357
+ {
1358
+ "loss": 0.0024,
1359
+ "grad_norm": 0.00038335853605531156,
1360
+ "learning_rate": 0.00019319645333861697,
1361
+ "epoch": 3.6207651500266653,
1362
+ "step": 64500
1363
+ },
1364
+ {
1365
+ "loss": 0.0024,
1366
+ "grad_norm": 0.0003810620401054621,
1367
+ "learning_rate": 0.00019195809391717853,
1368
+ "epoch": 3.6488337496842282,
1369
+ "step": 65000
1370
+ },
1371
+ {
1372
+ "loss": 0.0024,
1373
+ "grad_norm": 0.00036980482400394976,
1374
+ "learning_rate": 0.00019071973449574003,
1375
+ "epoch": 3.676902349341791,
1376
+ "step": 65500
1377
+ },
1378
+ {
1379
+ "loss": 0.0024,
1380
+ "grad_norm": 0.00035592226777225733,
1381
+ "learning_rate": 0.00018948137507430156,
1382
+ "epoch": 3.7049709489993545,
1383
+ "step": 66000
1384
+ },
1385
+ {
1386
+ "eval_loss": 0.001180025632493198,
1387
+ "eval_evaluator_0": 0.0013761830050498247,
1388
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7942013155693763,
1389
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.795690335055399,
1390
+ "eval_JSTS_pearson_cosine": 0.8574208920400916,
1391
+ "eval_JSTS_spearman_cosine": 0.8088050540873077,
1392
+ "eval_sequential_score": 0.5352905240492521,
1393
+ "eval_runtime": 71.1029,
1394
+ "eval_samples_per_second": 2591.329,
1395
+ "eval_steps_per_second": 5.063,
1396
+ "epoch": 3.7049709489993545,
1397
+ "step": 66000
1398
+ },
1399
+ {
1400
+ "loss": 0.0024,
1401
+ "grad_norm": 0.0003833669179584831,
1402
+ "learning_rate": 0.00018824301565286306,
1403
+ "epoch": 3.7330395486569175,
1404
+ "step": 66500
1405
+ },
1406
+ {
1407
+ "loss": 0.0024,
1408
+ "grad_norm": 0.00034893417614512146,
1409
+ "learning_rate": 0.0001870046562314246,
1410
+ "epoch": 3.761108148314481,
1411
+ "step": 67000
1412
+ },
1413
+ {
1414
+ "loss": 0.0024,
1415
+ "grad_norm": 0.00034797278931364417,
1416
+ "learning_rate": 0.0001857662968099861,
1417
+ "epoch": 3.7891767479720437,
1418
+ "step": 67500
1419
+ },
1420
+ {
1421
+ "loss": 0.0024,
1422
+ "grad_norm": 0.00033796075149439275,
1423
+ "learning_rate": 0.00018452793738854765,
1424
+ "epoch": 3.8172453476296067,
1425
+ "step": 68000
1426
+ },
1427
+ {
1428
+ "eval_loss": 0.0011794335441663861,
1429
+ "eval_evaluator_0": 0.001375699182972312,
1430
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7927948755224404,
1431
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7964637470268982,
1432
+ "eval_JSTS_pearson_cosine": 0.8585216790750501,
1433
+ "eval_JSTS_spearman_cosine": 0.8104250125798427,
1434
+ "eval_sequential_score": 0.5360881529299043,
1435
+ "eval_runtime": 74.6048,
1436
+ "eval_samples_per_second": 2469.694,
1437
+ "eval_steps_per_second": 4.825,
1438
+ "epoch": 3.8172453476296067,
1439
+ "step": 68000
1440
+ },
1441
+ {
1442
+ "loss": 0.0024,
1443
+ "grad_norm": 0.00034984093508683145,
1444
+ "learning_rate": 0.00018328957796710915,
1445
+ "epoch": 3.8453139472871696,
1446
+ "step": 68500
1447
+ },
1448
+ {
1449
+ "loss": 0.0024,
1450
+ "grad_norm": 0.0003730449534486979,
1451
+ "learning_rate": 0.00018205121854567068,
1452
+ "epoch": 3.873382546944733,
1453
+ "step": 69000
1454
+ },
1455
+ {
1456
+ "loss": 0.0024,
1457
+ "grad_norm": 0.0003637449990492314,
1458
+ "learning_rate": 0.00018081285912423218,
1459
+ "epoch": 3.901451146602296,
1460
+ "step": 69500
1461
+ },
1462
+ {
1463
+ "loss": 0.0024,
1464
+ "grad_norm": 0.00034909258829429746,
1465
+ "learning_rate": 0.00017957449970279374,
1466
+ "epoch": 3.9295197462598592,
1467
+ "step": 70000
1468
+ },
1469
+ {
1470
+ "eval_loss": 0.0011782796354964375,
1471
+ "eval_evaluator_0": 0.0013747760094702244,
1472
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7922546321768296,
1473
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7948323017639329,
1474
+ "eval_JSTS_pearson_cosine": 0.8583284467056898,
1475
+ "eval_JSTS_spearman_cosine": 0.8101003363786926,
1476
+ "eval_sequential_score": 0.5354358047173653,
1477
+ "eval_runtime": 76.7277,
1478
+ "eval_samples_per_second": 2401.361,
1479
+ "eval_steps_per_second": 4.692,
1480
+ "epoch": 3.9295197462598592,
1481
+ "step": 70000
1482
+ },
1483
+ {
1484
+ "loss": 0.0024,
1485
+ "grad_norm": 0.0003403511946089566,
1486
+ "learning_rate": 0.00017833614028135524,
1487
+ "epoch": 3.957588345917422,
1488
+ "step": 70500
1489
+ },
1490
+ {
1491
+ "loss": 0.0024,
1492
+ "grad_norm": 0.0003837372059933841,
1493
+ "learning_rate": 0.00017709778085991677,
1494
+ "epoch": 3.985656945574985,
1495
+ "step": 71000
1496
+ },
1497
+ {
1498
+ "loss": 0.0024,
1499
+ "grad_norm": 0.0003609205596148968,
1500
+ "learning_rate": 0.00017585942143847827,
1501
+ "epoch": 4.013697476632891,
1502
+ "step": 71500
1503
+ },
1504
+ {
1505
+ "loss": 0.0024,
1506
+ "grad_norm": 0.00035595818189904094,
1507
+ "learning_rate": 0.00017462106201703983,
1508
+ "epoch": 4.041766076290454,
1509
+ "step": 72000
1510
+ },
1511
+ {
1512
+ "eval_loss": 0.0011778810294345021,
1513
+ "eval_evaluator_0": 0.00137452338822186,
1514
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7966269870347454,
1515
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7984801447329247,
1516
+ "eval_JSTS_pearson_cosine": 0.8607631663353562,
1517
+ "eval_JSTS_spearman_cosine": 0.8129269124974222,
1518
+ "eval_sequential_score": 0.5375938602061896,
1519
+ "eval_runtime": 74.4094,
1520
+ "eval_samples_per_second": 2476.18,
1521
+ "eval_steps_per_second": 4.838,
1522
+ "epoch": 4.041766076290454,
1523
+ "step": 72000
1524
+ },
1525
+ {
1526
+ "loss": 0.0024,
1527
+ "grad_norm": 0.0003813849180005491,
1528
+ "learning_rate": 0.00017338270259560133,
1529
+ "epoch": 4.069834675948017,
1530
+ "step": 72500
1531
+ },
1532
+ {
1533
+ "loss": 0.0024,
1534
+ "grad_norm": 0.0003693049948196858,
1535
+ "learning_rate": 0.00017214434317416286,
1536
+ "epoch": 4.09790327560558,
1537
+ "step": 73000
1538
+ },
1539
+ {
1540
+ "loss": 0.0024,
1541
+ "grad_norm": 0.00035990544711239636,
1542
+ "learning_rate": 0.00017090598375272436,
1543
+ "epoch": 4.125971875263144,
1544
+ "step": 73500
1545
+ },
1546
+ {
1547
+ "loss": 0.0024,
1548
+ "grad_norm": 0.0003601144708227366,
1549
+ "learning_rate": 0.0001696676243312859,
1550
+ "epoch": 4.1540404749207065,
1551
+ "step": 74000
1552
+ },
1553
+ {
1554
+ "eval_loss": 0.0011768318945541978,
1555
+ "eval_evaluator_0": 0.0013735340908169746,
1556
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7960225056174051,
1557
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.796425881500273,
1558
+ "eval_JSTS_pearson_cosine": 0.8595126900313504,
1559
+ "eval_JSTS_spearman_cosine": 0.8113933156160398,
1560
+ "eval_sequential_score": 0.5363975770690432,
1561
+ "eval_runtime": 72.397,
1562
+ "eval_samples_per_second": 2545.009,
1563
+ "eval_steps_per_second": 4.973,
1564
+ "epoch": 4.1540404749207065,
1565
+ "step": 74000
1566
+ },
1567
+ {
1568
+ "loss": 0.0024,
1569
+ "grad_norm": 0.00032057068892754614,
1570
+ "learning_rate": 0.00016842926490984742,
1571
+ "epoch": 4.182109074578269,
1572
+ "step": 74500
1573
+ },
1574
+ {
1575
+ "loss": 0.0024,
1576
+ "grad_norm": 0.00034604035317897797,
1577
+ "learning_rate": 0.00016719090548840895,
1578
+ "epoch": 4.210177674235832,
1579
+ "step": 75000
1580
+ },
1581
+ {
1582
+ "loss": 0.0024,
1583
+ "grad_norm": 0.0003545973158907145,
1584
+ "learning_rate": 0.00016595254606697045,
1585
+ "epoch": 4.238246273893395,
1586
+ "step": 75500
1587
+ },
1588
+ {
1589
+ "loss": 0.0024,
1590
+ "grad_norm": 0.00034211084130220115,
1591
+ "learning_rate": 0.00016471418664553198,
1592
+ "epoch": 4.266314873550958,
1593
+ "step": 76000
1594
+ },
1595
+ {
1596
+ "eval_loss": 0.0011760848574340343,
1597
+ "eval_evaluator_0": 0.0013727085897698998,
1598
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7948903122497631,
1599
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7964051915239347,
1600
+ "eval_JSTS_pearson_cosine": 0.8586153899615026,
1601
+ "eval_JSTS_spearman_cosine": 0.8105045038989925,
1602
+ "eval_sequential_score": 0.5360941346708991,
1603
+ "eval_runtime": 75.1808,
1604
+ "eval_samples_per_second": 2450.773,
1605
+ "eval_steps_per_second": 4.788,
1606
+ "epoch": 4.266314873550958,
1607
+ "step": 76000
1608
+ },
1609
+ {
1610
+ "loss": 0.0024,
1611
+ "grad_norm": 0.00035312894033268094,
1612
+ "learning_rate": 0.00016347582722409348,
1613
+ "epoch": 4.294383473208522,
1614
+ "step": 76500
1615
+ },
1616
+ {
1617
+ "loss": 0.0024,
1618
+ "grad_norm": 0.0003673464816529304,
1619
+ "learning_rate": 0.00016223746780265504,
1620
+ "epoch": 4.322452072866085,
1621
+ "step": 77000
1622
+ },
1623
+ {
1624
+ "loss": 0.0024,
1625
+ "grad_norm": 0.00035960401874035597,
1626
+ "learning_rate": 0.00016099910838121654,
1627
+ "epoch": 4.350520672523648,
1628
+ "step": 77500
1629
+ },
1630
+ {
1631
+ "loss": 0.0024,
1632
+ "grad_norm": 0.00036072355578653514,
1633
+ "learning_rate": 0.00015976074895977807,
1634
+ "epoch": 4.378589272181211,
1635
+ "step": 78000
1636
+ },
1637
+ {
1638
+ "eval_loss": 0.0011753218714147806,
1639
+ "eval_evaluator_0": 0.0013719868147745728,
1640
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.795324066980847,
1641
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7974887620221349,
1642
+ "eval_JSTS_pearson_cosine": 0.8592072333284577,
1643
+ "eval_JSTS_spearman_cosine": 0.8109905493632175,
1644
+ "eval_sequential_score": 0.5366170994000423,
1645
+ "eval_runtime": 75.244,
1646
+ "eval_samples_per_second": 2448.715,
1647
+ "eval_steps_per_second": 4.784,
1648
+ "epoch": 4.378589272181211,
1649
+ "step": 78000
1650
+ },
1651
+ {
1652
+ "loss": 0.0024,
1653
+ "grad_norm": 0.00036980511504225433,
1654
+ "learning_rate": 0.00015852238953833957,
1655
+ "epoch": 4.406657871838774,
1656
+ "step": 78500
1657
+ },
1658
+ {
1659
+ "loss": 0.0024,
1660
+ "grad_norm": 0.00036862987326458097,
1661
+ "learning_rate": 0.00015728403011690113,
1662
+ "epoch": 4.434726471496337,
1663
+ "step": 79000
1664
+ },
1665
+ {
1666
+ "loss": 0.0024,
1667
+ "grad_norm": 0.0003497784200590104,
1668
+ "learning_rate": 0.00015604567069546263,
1669
+ "epoch": 4.4627950711539,
1670
+ "step": 79500
1671
+ },
1672
+ {
1673
+ "loss": 0.0024,
1674
+ "grad_norm": 0.0003609252453316003,
1675
+ "learning_rate": 0.00015480731127402416,
1676
+ "epoch": 4.490863670811463,
1677
+ "step": 80000
1678
+ },
1679
+ {
1680
+ "eval_loss": 0.0011746675008907914,
1681
+ "eval_evaluator_0": 0.001371518475934863,
1682
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7943241357225923,
1683
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7958623517517603,
1684
+ "eval_JSTS_pearson_cosine": 0.8595840714983618,
1685
+ "eval_JSTS_spearman_cosine": 0.8113223465685817,
1686
+ "eval_sequential_score": 0.536185405598759,
1687
+ "eval_runtime": 71.8413,
1688
+ "eval_samples_per_second": 2564.696,
1689
+ "eval_steps_per_second": 5.011,
1690
+ "epoch": 4.490863670811463,
1691
+ "step": 80000
1692
+ },
1693
+ {
1694
+ "loss": 0.0024,
1695
+ "grad_norm": 0.0003620493516791612,
1696
+ "learning_rate": 0.00015356895185258566,
1697
+ "epoch": 4.518932270469026,
1698
+ "step": 80500
1699
+ },
1700
+ {
1701
+ "loss": 0.0024,
1702
+ "grad_norm": 0.0003764710854738951,
1703
+ "learning_rate": 0.0001523305924311472,
1704
+ "epoch": 4.547000870126589,
1705
+ "step": 81000
1706
+ },
1707
+ {
1708
+ "loss": 0.0024,
1709
+ "grad_norm": 0.0003772643976844847,
1710
+ "learning_rate": 0.00015109223300970875,
1711
+ "epoch": 4.575069469784152,
1712
+ "step": 81500
1713
+ },
1714
+ {
1715
+ "loss": 0.0024,
1716
+ "grad_norm": 0.00034990202402696013,
1717
+ "learning_rate": 0.00014985387358827025,
1718
+ "epoch": 4.603138069441716,
1719
+ "step": 82000
1720
+ },
1721
+ {
1722
+ "eval_loss": 0.001174184144474566,
1723
+ "eval_evaluator_0": 0.0013709234772250056,
1724
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7957272326156352,
1725
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7978851542982252,
1726
+ "eval_JSTS_pearson_cosine": 0.8601821520227879,
1727
+ "eval_JSTS_spearman_cosine": 0.8119161982838334,
1728
+ "eval_sequential_score": 0.5370574253530945,
1729
+ "eval_runtime": 73.1077,
1730
+ "eval_samples_per_second": 2520.268,
1731
+ "eval_steps_per_second": 4.924,
1732
+ "epoch": 4.603138069441716,
1733
+ "step": 82000
1734
+ },
1735
+ {
1736
+ "loss": 0.0024,
1737
+ "grad_norm": 0.000373341899830848,
1738
+ "learning_rate": 0.00014861551416683178,
1739
+ "epoch": 4.631206669099279,
1740
+ "step": 82500
1741
+ },
1742
+ {
1743
+ "loss": 0.0024,
1744
+ "grad_norm": 0.0003926403005607426,
1745
+ "learning_rate": 0.00014737715474539328,
1746
+ "epoch": 4.659275268756842,
1747
+ "step": 83000
1748
+ },
1749
+ {
1750
+ "loss": 0.0024,
1751
+ "grad_norm": 0.00034069083631038666,
1752
+ "learning_rate": 0.0001461387953239548,
1753
+ "epoch": 4.687343868414405,
1754
+ "step": 83500
1755
+ },
1756
+ {
1757
+ "loss": 0.0024,
1758
+ "grad_norm": 0.000384105573175475,
1759
+ "learning_rate": 0.00014490043590251634,
1760
+ "epoch": 4.715412468071968,
1761
+ "step": 84000
1762
+ },
1763
+ {
1764
+ "eval_loss": 0.0011737036984413862,
1765
+ "eval_evaluator_0": 0.0013708064798265696,
1766
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7956464956146001,
1767
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.798015078865507,
1768
+ "eval_JSTS_pearson_cosine": 0.8602937354382172,
1769
+ "eval_JSTS_spearman_cosine": 0.8122788292042248,
1770
+ "eval_sequential_score": 0.5372215715165195,
1771
+ "eval_runtime": 76.0545,
1772
+ "eval_samples_per_second": 2422.618,
1773
+ "eval_steps_per_second": 4.733,
1774
+ "epoch": 4.715412468071968,
1775
+ "step": 84000
1776
+ },
1777
+ {
1778
+ "loss": 0.0024,
1779
+ "grad_norm": 0.0003545938234310597,
1780
+ "learning_rate": 0.00014366207648107784,
1781
+ "epoch": 4.743481067729531,
1782
+ "step": 84500
1783
+ },
1784
+ {
1785
+ "loss": 0.0024,
1786
+ "grad_norm": 0.00034860073355957866,
1787
+ "learning_rate": 0.00014242371705963937,
1788
+ "epoch": 4.771549667387094,
1789
+ "step": 85000
1790
+ },
1791
+ {
1792
+ "loss": 0.0024,
1793
+ "grad_norm": 0.00032712245592847466,
1794
+ "learning_rate": 0.0001411853576382009,
1795
+ "epoch": 4.799618267044657,
1796
+ "step": 85500
1797
+ },
1798
+ {
1799
+ "loss": 0.0024,
1800
+ "grad_norm": 0.0003669277939479798,
1801
+ "learning_rate": 0.00013994699821676243,
1802
+ "epoch": 4.82768686670222,
1803
+ "step": 86000
1804
+ },
1805
+ {
1806
+ "eval_loss": 0.0011725523509085178,
1807
+ "eval_evaluator_0": 0.0013694484950974584,
1808
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7944287962917387,
1809
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7962719660375107,
1810
+ "eval_JSTS_pearson_cosine": 0.8603342393116733,
1811
+ "eval_JSTS_spearman_cosine": 0.8117996307099361,
1812
+ "eval_sequential_score": 0.5364803484141815,
1813
+ "eval_runtime": 72.4308,
1814
+ "eval_samples_per_second": 2543.82,
1815
+ "eval_steps_per_second": 4.97,
1816
+ "epoch": 4.82768686670222,
1817
+ "step": 86000
1818
+ },
1819
+ {
1820
+ "loss": 0.0024,
1821
+ "grad_norm": 0.0003452952078077942,
1822
+ "learning_rate": 0.00013870863879532393,
1823
+ "epoch": 4.855755466359783,
1824
+ "step": 86500
1825
+ },
1826
+ {
1827
+ "loss": 0.0024,
1828
+ "grad_norm": 0.000373579008737579,
1829
+ "learning_rate": 0.00013747027937388546,
1830
+ "epoch": 4.883824066017347,
1831
+ "step": 87000
1832
+ },
1833
+ {
1834
+ "loss": 0.0024,
1835
+ "grad_norm": 0.00036539442953653634,
1836
+ "learning_rate": 0.000136231919952447,
1837
+ "epoch": 4.91189266567491,
1838
+ "step": 87500
1839
+ },
1840
+ {
1841
+ "loss": 0.0024,
1842
+ "grad_norm": 0.00034126825630664825,
1843
+ "learning_rate": 0.00013499356053100852,
1844
+ "epoch": 4.939961265332473,
1845
+ "step": 88000
1846
+ },
1847
+ {
1848
+ "eval_loss": 0.0011726580560207367,
1849
+ "eval_evaluator_0": 0.0013694794615730643,
1850
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7947822441711181,
1851
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7985567920723605,
1852
+ "eval_JSTS_pearson_cosine": 0.8603459767041294,
1853
+ "eval_JSTS_spearman_cosine": 0.8125872790778134,
1854
+ "eval_sequential_score": 0.5375045168705823,
1855
+ "eval_runtime": 72.512,
1856
+ "eval_samples_per_second": 2540.973,
1857
+ "eval_steps_per_second": 4.965,
1858
+ "epoch": 4.939961265332473,
1859
+ "step": 88000
1860
+ },
1861
+ {
1862
+ "loss": 0.0024,
1863
+ "grad_norm": 0.0003895787231158465,
1864
+ "learning_rate": 0.00013375520110957002,
1865
+ "epoch": 4.968029864990036,
1866
+ "step": 88500
1867
+ },
1868
+ {
1869
+ "loss": 0.0024,
1870
+ "grad_norm": 0.00034253252670168877,
1871
+ "learning_rate": 0.00013251684168813155,
1872
+ "epoch": 4.9960984646475985,
1873
+ "step": 89000
1874
+ },
1875
+ {
1876
+ "loss": 0.0024,
1877
+ "grad_norm": 0.000371816277038306,
1878
+ "learning_rate": 0.00013127848226669308,
1879
+ "epoch": 5.0241389957055045,
1880
+ "step": 89500
1881
+ },
1882
+ {
1883
+ "loss": 0.0024,
1884
+ "grad_norm": 0.00035908666905015707,
1885
+ "learning_rate": 0.00013004012284525458,
1886
+ "epoch": 5.052207595363067,
1887
+ "step": 90000
1888
+ },
1889
+ {
1890
+ "eval_loss": 0.0011713592102751136,
1891
+ "eval_evaluator_0": 0.0013685176381841302,
1892
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7972527826834843,
1893
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7994197549835058,
1894
+ "eval_JSTS_pearson_cosine": 0.8598953052873622,
1895
+ "eval_JSTS_spearman_cosine": 0.8121179166649464,
1896
+ "eval_sequential_score": 0.5376353964288788,
1897
+ "eval_runtime": 73.9539,
1898
+ "eval_samples_per_second": 2491.431,
1899
+ "eval_steps_per_second": 4.868,
1900
+ "epoch": 5.052207595363067,
1901
+ "step": 90000
1902
+ },
1903
+ {
1904
+ "loss": 0.0024,
1905
+ "grad_norm": 0.0003359410329721868,
1906
+ "learning_rate": 0.0001288017634238161,
1907
+ "epoch": 5.08027619502063,
1908
+ "step": 90500
1909
+ },
1910
+ {
1911
+ "loss": 0.0024,
1912
+ "grad_norm": 0.0003694299957714975,
1913
+ "learning_rate": 0.00012756340400237764,
1914
+ "epoch": 5.108344794678193,
1915
+ "step": 91000
1916
+ },
1917
+ {
1918
+ "loss": 0.0024,
1919
+ "grad_norm": 0.00035323482006788254,
1920
+ "learning_rate": 0.00012632504458093917,
1921
+ "epoch": 5.136413394335756,
1922
+ "step": 91500
1923
+ },
1924
+ {
1925
+ "loss": 0.0024,
1926
+ "grad_norm": 0.00034838326973840594,
1927
+ "learning_rate": 0.00012508668515950067,
1928
+ "epoch": 5.16448199399332,
1929
+ "step": 92000
1930
+ },
1931
+ {
1932
+ "eval_loss": 0.001171564101241529,
1933
+ "eval_evaluator_0": 0.0013685659505426884,
1934
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7956367663644146,
1935
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7972514768821104,
1936
+ "eval_JSTS_pearson_cosine": 0.8603210424512124,
1937
+ "eval_JSTS_spearman_cosine": 0.812035751082205,
1938
+ "eval_sequential_score": 0.5368852646382861,
1939
+ "eval_runtime": 73.7319,
1940
+ "eval_samples_per_second": 2498.932,
1941
+ "eval_steps_per_second": 4.883,
1942
+ "epoch": 5.16448199399332,
1943
+ "step": 92000
1944
+ },
1945
+ {
1946
+ "loss": 0.0024,
1947
+ "grad_norm": 0.000366888998541981,
1948
+ "learning_rate": 0.0001238483257380622,
1949
+ "epoch": 5.192550593650883,
1950
+ "step": 92500
1951
+ },
1952
+ {
1953
+ "loss": 0.0024,
1954
+ "grad_norm": 0.0003418942214921117,
1955
+ "learning_rate": 0.00012260996631662373,
1956
+ "epoch": 5.220619193308446,
1957
+ "step": 93000
1958
+ },
1959
+ {
1960
+ "loss": 0.0024,
1961
+ "grad_norm": 0.0003668048302643001,
1962
+ "learning_rate": 0.00012137160689518524,
1963
+ "epoch": 5.248687792966009,
1964
+ "step": 93500
1965
+ },
1966
+ {
1967
+ "loss": 0.0024,
1968
+ "grad_norm": 0.0003521388862282038,
1969
+ "learning_rate": 0.00012013324747374676,
1970
+ "epoch": 5.276756392623572,
1971
+ "step": 94000
1972
+ },
1973
+ {
1974
+ "eval_loss": 0.0011708271922543645,
1975
+ "eval_evaluator_0": 0.0013678998220711946,
1976
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7961980730869951,
1977
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7969669668186664,
1978
+ "eval_JSTS_pearson_cosine": 0.8598902041888927,
1979
+ "eval_JSTS_spearman_cosine": 0.8122759372680154,
1980
+ "eval_sequential_score": 0.5368702679695844,
1981
+ "eval_runtime": 73.7553,
1982
+ "eval_samples_per_second": 2498.141,
1983
+ "eval_steps_per_second": 4.881,
1984
+ "epoch": 5.276756392623572,
1985
+ "step": 94000
1986
+ },
1987
+ {
1988
+ "loss": 0.0024,
1989
+ "grad_norm": 0.00038587721064686775,
1990
+ "learning_rate": 0.00011889488805230829,
1991
+ "epoch": 5.3048249922811355,
1992
+ "step": 94500
1993
+ },
1994
+ {
1995
+ "loss": 0.0024,
1996
+ "grad_norm": 0.0003625387034844607,
1997
+ "learning_rate": 0.0001176565286308698,
1998
+ "epoch": 5.332893591938698,
1999
+ "step": 95000
2000
+ },
2001
+ {
2002
+ "loss": 0.0024,
2003
+ "grad_norm": 0.000346001994330436,
2004
+ "learning_rate": 0.00011641816920943133,
2005
+ "epoch": 5.360962191596261,
2006
+ "step": 95500
2007
+ },
2008
+ {
2009
+ "loss": 0.0024,
2010
+ "grad_norm": 0.0003581918717827648,
2011
+ "learning_rate": 0.00011517980978799285,
2012
+ "epoch": 5.389030791253824,
2013
+ "step": 96000
2014
+ },
2015
+ {
2016
+ "eval_loss": 0.0011701997136697173,
2017
+ "eval_evaluator_0": 0.0013674315996468067,
2018
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7981789572820941,
2019
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.799695208936658,
2020
+ "eval_JSTS_pearson_cosine": 0.8598258650595708,
2021
+ "eval_JSTS_spearman_cosine": 0.8125775382281623,
2022
+ "eval_sequential_score": 0.5378800595881557,
2023
+ "eval_runtime": 72.2493,
2024
+ "eval_samples_per_second": 2550.21,
2025
+ "eval_steps_per_second": 4.983,
2026
+ "epoch": 5.389030791253824,
2027
+ "step": 96000
2028
+ },
2029
+ {
2030
+ "loss": 0.0024,
2031
+ "grad_norm": 0.00035948105505667627,
2032
+ "learning_rate": 0.00011394145036655436,
2033
+ "epoch": 5.417099390911387,
2034
+ "step": 96500
2035
+ },
2036
+ {
2037
+ "loss": 0.0024,
2038
+ "grad_norm": 0.0003364122530911118,
2039
+ "learning_rate": 0.0001127030909451159,
2040
+ "epoch": 5.445167990568951,
2041
+ "step": 97000
2042
+ },
2043
+ {
2044
+ "loss": 0.0024,
2045
+ "grad_norm": 0.0003634981403592974,
2046
+ "learning_rate": 0.00011146473152367741,
2047
+ "epoch": 5.473236590226514,
2048
+ "step": 97500
2049
+ },
2050
+ {
2051
+ "loss": 0.0024,
2052
+ "grad_norm": 0.0003649332211352885,
2053
+ "learning_rate": 0.00011022637210223895,
2054
+ "epoch": 5.501305189884077,
2055
+ "step": 98000
2056
+ },
2057
+ {
2058
+ "eval_loss": 0.0011697998270392418,
2059
+ "eval_evaluator_0": 0.0013669263571500778,
2060
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7931142327008134,
2061
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7957471539617756,
2062
+ "eval_JSTS_pearson_cosine": 0.8596991029296749,
2063
+ "eval_JSTS_spearman_cosine": 0.8113537089171092,
2064
+ "eval_sequential_score": 0.536155929745345,
2065
+ "eval_runtime": 73.393,
2066
+ "eval_samples_per_second": 2510.47,
2067
+ "eval_steps_per_second": 4.905,
2068
+ "epoch": 5.501305189884077,
2069
+ "step": 98000
2070
+ },
2071
+ {
2072
+ "loss": 0.0024,
2073
+ "grad_norm": 0.00038499030051752925,
2074
+ "learning_rate": 0.00010898801268080047,
2075
+ "epoch": 5.52937378954164,
2076
+ "step": 98500
2077
+ },
2078
+ {
2079
+ "loss": 0.0024,
2080
+ "grad_norm": 0.00034106100792996585,
2081
+ "learning_rate": 0.000107749653259362,
2082
+ "epoch": 5.557442389199203,
2083
+ "step": 99000
2084
+ },
2085
+ {
2086
+ "loss": 0.0024,
2087
+ "grad_norm": 0.00038382463390007615,
2088
+ "learning_rate": 0.00010651129383792351,
2089
+ "epoch": 5.585510988856766,
2090
+ "step": 99500
2091
+ },
2092
+ {
2093
+ "loss": 0.0024,
2094
+ "grad_norm": 0.00035632477374747396,
2095
+ "learning_rate": 0.00010527293441648504,
2096
+ "epoch": 5.613579588514329,
2097
+ "step": 100000
2098
+ },
2099
+ {
2100
+ "eval_loss": 0.0011689499951899052,
2101
+ "eval_evaluator_0": 0.0013659781543537974,
2102
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7955473991878141,
2103
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7979503915839332,
2104
+ "eval_JSTS_pearson_cosine": 0.861603427466545,
2105
+ "eval_JSTS_spearman_cosine": 0.8132431014099797,
2106
+ "eval_sequential_score": 0.5375198237160889,
2107
+ "eval_runtime": 73.285,
2108
+ "eval_samples_per_second": 2514.172,
2109
+ "eval_steps_per_second": 4.912,
2110
+ "epoch": 5.613579588514329,
2111
+ "step": 100000
2112
+ },
2113
+ {
2114
+ "loss": 0.0024,
2115
+ "grad_norm": 0.0003488284710329026,
2116
+ "learning_rate": 0.00010403457499504656,
2117
+ "epoch": 5.641648188171892,
2118
+ "step": 100500
2119
+ },
2120
+ {
2121
+ "loss": 0.0024,
2122
+ "grad_norm": 0.0003385685558896512,
2123
+ "learning_rate": 0.00010279621557360809,
2124
+ "epoch": 5.669716787829455,
2125
+ "step": 101000
2126
+ },
2127
+ {
2128
+ "loss": 0.0024,
2129
+ "grad_norm": 0.000358009448973462,
2130
+ "learning_rate": 0.0001015578561521696,
2131
+ "epoch": 5.697785387487018,
2132
+ "step": 101500
2133
+ },
2134
+ {
2135
+ "loss": 0.0024,
2136
+ "grad_norm": 0.00036567007191479206,
2137
+ "learning_rate": 0.00010031949673073112,
2138
+ "epoch": 5.725853987144581,
2139
+ "step": 102000
2140
+ },
2141
+ {
2142
+ "eval_loss": 0.0011688218219205737,
2143
+ "eval_evaluator_0": 0.001366003998555243,
2144
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.796658874864598,
2145
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7983816210608049,
2146
+ "eval_JSTS_pearson_cosine": 0.8614339209578082,
2147
+ "eval_JSTS_spearman_cosine": 0.8137949364794599,
2148
+ "eval_sequential_score": 0.53784752051294,
2149
+ "eval_runtime": 74.0336,
2150
+ "eval_samples_per_second": 2488.747,
2151
+ "eval_steps_per_second": 4.863,
2152
+ "epoch": 5.725853987144581,
2153
+ "step": 102000
2154
+ },
2155
+ {
2156
+ "loss": 0.0024,
2157
+ "grad_norm": 0.0003851432411465794,
2158
+ "learning_rate": 9.908113730929265e-05,
2159
+ "epoch": 5.753922586802144,
2160
+ "step": 102500
2161
+ },
2162
+ {
2163
+ "loss": 0.0024,
2164
+ "grad_norm": 0.0003519294841680676,
2165
+ "learning_rate": 9.784277788785416e-05,
2166
+ "epoch": 5.781991186459708,
2167
+ "step": 103000
2168
+ },
2169
+ {
2170
+ "loss": 0.0024,
2171
+ "grad_norm": 0.0003551561676431447,
2172
+ "learning_rate": 9.660441846641569e-05,
2173
+ "epoch": 5.810059786117271,
2174
+ "step": 103500
2175
+ },
2176
+ {
2177
+ "loss": 0.0024,
2178
+ "grad_norm": 0.00036788301076740026,
2179
+ "learning_rate": 9.536605904497721e-05,
2180
+ "epoch": 5.838128385774834,
2181
+ "step": 104000
2182
+ },
2183
+ {
2184
+ "eval_loss": 0.0011685107601806521,
2185
+ "eval_evaluator_0": 0.0013657337985932827,
2186
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7974990038670857,
2187
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7998327961795907,
2188
+ "eval_JSTS_pearson_cosine": 0.8615882286814495,
2189
+ "eval_JSTS_spearman_cosine": 0.8134121028192195,
2190
+ "eval_sequential_score": 0.5382035442658012,
2191
+ "eval_runtime": 75.725,
2192
+ "eval_samples_per_second": 2433.16,
2193
+ "eval_steps_per_second": 4.754,
2194
+ "epoch": 5.838128385774834,
2195
+ "step": 104000
2196
+ },
2197
+ {
2198
+ "loss": 0.0024,
2199
+ "grad_norm": 0.0003417586558498442,
2200
+ "learning_rate": 9.412769962353874e-05,
2201
+ "epoch": 5.8661969854323965,
2202
+ "step": 104500
2203
+ },
2204
+ {
2205
+ "loss": 0.0024,
2206
+ "grad_norm": 0.00036630959948524833,
2207
+ "learning_rate": 9.288934020210025e-05,
2208
+ "epoch": 5.8942655850899595,
2209
+ "step": 105000
2210
+ },
2211
+ {
2212
+ "loss": 0.0024,
2213
+ "grad_norm": 0.0003538629098329693,
2214
+ "learning_rate": 9.165098078066178e-05,
2215
+ "epoch": 5.922334184747523,
2216
+ "step": 105500
2217
+ },
2218
+ {
2219
+ "loss": 0.0024,
2220
+ "grad_norm": 0.0003364156873431057,
2221
+ "learning_rate": 9.04126213592233e-05,
2222
+ "epoch": 5.950402784405086,
2223
+ "step": 106000
2224
+ },
2225
+ {
2226
+ "eval_loss": 0.0011676481226459146,
2227
+ "eval_evaluator_0": 0.0013648421736434102,
2228
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7987319776822839,
2229
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8013060925089577,
2230
+ "eval_JSTS_pearson_cosine": 0.8606203102100304,
2231
+ "eval_JSTS_spearman_cosine": 0.8123940829683924,
2232
+ "eval_sequential_score": 0.5383550058836645,
2233
+ "eval_runtime": 71.0673,
2234
+ "eval_samples_per_second": 2592.628,
2235
+ "eval_steps_per_second": 5.066,
2236
+ "epoch": 5.950402784405086,
2237
+ "step": 106000
2238
+ },
2239
+ {
2240
+ "loss": 0.0024,
2241
+ "grad_norm": 0.00037549331318587065,
2242
+ "learning_rate": 8.917426193778481e-05,
2243
+ "epoch": 5.978471384062649,
2244
+ "step": 106500
2245
+ },
2246
+ {
2247
+ "loss": 0.0024,
2248
+ "grad_norm": 0.0003623282827902585,
2249
+ "learning_rate": 8.793590251634634e-05,
2250
+ "epoch": 6.006511915120555,
2251
+ "step": 107000
2252
+ },
2253
+ {
2254
+ "loss": 0.0024,
2255
+ "grad_norm": 0.0003772232448682189,
2256
+ "learning_rate": 8.669754309490786e-05,
2257
+ "epoch": 6.034580514778118,
2258
+ "step": 107500
2259
+ },
2260
+ {
2261
+ "loss": 0.0024,
2262
+ "grad_norm": 0.0003239116631448269,
2263
+ "learning_rate": 8.545918367346939e-05,
2264
+ "epoch": 6.062649114435681,
2265
+ "step": 108000
2266
+ },
2267
+ {
2268
+ "eval_loss": 0.0011675840942189097,
2269
+ "eval_evaluator_0": 0.0013649420579895377,
2270
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.797541186470268,
2271
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.798678131270505,
2272
+ "eval_JSTS_pearson_cosine": 0.8611829209171061,
2273
+ "eval_JSTS_spearman_cosine": 0.8134092826099162,
2274
+ "eval_sequential_score": 0.5378174519794703,
2275
+ "eval_runtime": 74.4646,
2276
+ "eval_samples_per_second": 2474.344,
2277
+ "eval_steps_per_second": 4.835,
2278
+ "epoch": 6.062649114435681,
2279
+ "step": 108000
2280
+ },
2281
+ {
2282
+ "loss": 0.0024,
2283
+ "grad_norm": 0.0003176493337377906,
2284
+ "learning_rate": 8.42208242520309e-05,
2285
+ "epoch": 6.090717714093244,
2286
+ "step": 108500
2287
+ },
2288
+ {
2289
+ "loss": 0.0024,
2290
+ "grad_norm": 0.0003452330129221082,
2291
+ "learning_rate": 8.298246483059243e-05,
2292
+ "epoch": 6.118786313750807,
2293
+ "step": 109000
2294
+ },
2295
+ {
2296
+ "loss": 0.0024,
2297
+ "grad_norm": 0.00034453245461918414,
2298
+ "learning_rate": 8.174410540915395e-05,
2299
+ "epoch": 6.14685491340837,
2300
+ "step": 109500
2301
+ },
2302
+ {
2303
+ "loss": 0.0024,
2304
+ "grad_norm": 0.0003421856090426445,
2305
+ "learning_rate": 8.050574598771546e-05,
2306
+ "epoch": 6.1749235130659335,
2307
+ "step": 110000
2308
+ },
2309
+ {
2310
+ "eval_loss": 0.001166862086392939,
2311
+ "eval_evaluator_0": 0.0013641597470268607,
2312
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7964846527892946,
2313
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7985939451864762,
2314
+ "eval_JSTS_pearson_cosine": 0.8611403607210577,
2315
+ "eval_JSTS_spearman_cosine": 0.8127204480758368,
2316
+ "eval_sequential_score": 0.5375595176697799,
2317
+ "eval_runtime": 76.0958,
2318
+ "eval_samples_per_second": 2421.305,
2319
+ "eval_steps_per_second": 4.731,
2320
+ "epoch": 6.1749235130659335,
2321
+ "step": 110000
2322
+ },
2323
+ {
2324
+ "loss": 0.0024,
2325
+ "grad_norm": 0.0003415006503928453,
2326
+ "learning_rate": 7.9267386566277e-05,
2327
+ "epoch": 6.202992112723496,
2328
+ "step": 110500
2329
+ },
2330
+ {
2331
+ "loss": 0.0024,
2332
+ "grad_norm": 0.0003598456096369773,
2333
+ "learning_rate": 7.802902714483851e-05,
2334
+ "epoch": 6.231060712381059,
2335
+ "step": 111000
2336
+ },
2337
+ {
2338
+ "loss": 0.0024,
2339
+ "grad_norm": 0.0003484071639832109,
2340
+ "learning_rate": 7.679066772340004e-05,
2341
+ "epoch": 6.259129312038622,
2342
+ "step": 111500
2343
+ },
2344
+ {
2345
+ "loss": 0.0024,
2346
+ "grad_norm": 0.000359120691427961,
2347
+ "learning_rate": 7.555230830196155e-05,
2348
+ "epoch": 6.287197911696185,
2349
+ "step": 112000
2350
+ },
2351
+ {
2352
+ "eval_loss": 0.0011666708160191774,
2353
+ "eval_evaluator_0": 0.0013638917589560151,
2354
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7961017872096734,
2355
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7979849882595907,
2356
+ "eval_JSTS_pearson_cosine": 0.8610830381621996,
2357
+ "eval_JSTS_spearman_cosine": 0.8127868051847031,
2358
+ "eval_sequential_score": 0.5373785617344167,
2359
+ "eval_runtime": 75.0848,
2360
+ "eval_samples_per_second": 2453.905,
2361
+ "eval_steps_per_second": 4.795,
2362
+ "epoch": 6.287197911696185,
2363
+ "step": 112000
2364
+ },
2365
+ {
2366
+ "loss": 0.0024,
2367
+ "grad_norm": 0.000345466221915558,
2368
+ "learning_rate": 7.431394888052308e-05,
2369
+ "epoch": 6.315266511353749,
2370
+ "step": 112500
2371
+ },
2372
+ {
2373
+ "loss": 0.0024,
2374
+ "grad_norm": 0.0003590380947571248,
2375
+ "learning_rate": 7.30755894590846e-05,
2376
+ "epoch": 6.343335111011312,
2377
+ "step": 113000
2378
+ },
2379
+ {
2380
+ "loss": 0.0024,
2381
+ "grad_norm": 0.00031623526592738926,
2382
+ "learning_rate": 7.183723003764611e-05,
2383
+ "epoch": 6.371403710668875,
2384
+ "step": 113500
2385
+ },
2386
+ {
2387
+ "loss": 0.0024,
2388
+ "grad_norm": 0.0003682223614305258,
2389
+ "learning_rate": 7.059887061620764e-05,
2390
+ "epoch": 6.399472310326438,
2391
+ "step": 114000
2392
+ },
2393
+ {
2394
+ "eval_loss": 0.001166442409157753,
2395
+ "eval_evaluator_0": 0.0013636148069053888,
2396
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7965842694288822,
2397
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7980305345551676,
2398
+ "eval_JSTS_pearson_cosine": 0.8614633327316192,
2399
+ "eval_JSTS_spearman_cosine": 0.8137131868068991,
2400
+ "eval_sequential_score": 0.5377024453896574,
2401
+ "eval_runtime": 74.1271,
2402
+ "eval_samples_per_second": 2485.609,
2403
+ "eval_steps_per_second": 4.857,
2404
+ "epoch": 6.399472310326438,
2405
+ "step": 114000
2406
+ },
2407
+ {
2408
+ "loss": 0.0024,
2409
+ "grad_norm": 0.0003290139138698578,
2410
+ "learning_rate": 6.936051119476916e-05,
2411
+ "epoch": 6.427540909984001,
2412
+ "step": 114500
2413
+ },
2414
+ {
2415
+ "loss": 0.0024,
2416
+ "grad_norm": 0.0003496033023111522,
2417
+ "learning_rate": 6.812215177333069e-05,
2418
+ "epoch": 6.4556095096415635,
2419
+ "step": 115000
2420
+ },
2421
+ {
2422
+ "loss": 0.0024,
2423
+ "grad_norm": 0.00037992914440110326,
2424
+ "learning_rate": 6.68837923518922e-05,
2425
+ "epoch": 6.483678109299127,
2426
+ "step": 115500
2427
+ },
2428
+ {
2429
+ "loss": 0.0024,
2430
+ "grad_norm": 0.00033238163450732827,
2431
+ "learning_rate": 6.564543293045373e-05,
2432
+ "epoch": 6.51174670895669,
2433
+ "step": 116000
2434
+ },
2435
+ {
2436
+ "eval_loss": 0.0011659334413707256,
2437
+ "eval_evaluator_0": 0.0013632605550810695,
2438
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7960308639337882,
2439
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7987892401333858,
2440
+ "eval_JSTS_pearson_cosine": 0.8609535858873032,
2441
+ "eval_JSTS_spearman_cosine": 0.8129194572102592,
2442
+ "eval_sequential_score": 0.5376906526329087,
2443
+ "eval_runtime": 75.8225,
2444
+ "eval_samples_per_second": 2430.031,
2445
+ "eval_steps_per_second": 4.748,
2446
+ "epoch": 6.51174670895669,
2447
+ "step": 116000
2448
+ },
2449
+ {
2450
+ "loss": 0.0024,
2451
+ "grad_norm": 0.00036796656786464155,
2452
+ "learning_rate": 6.440707350901525e-05,
2453
+ "epoch": 6.539815308614253,
2454
+ "step": 116500
2455
+ },
2456
+ {
2457
+ "loss": 0.0024,
2458
+ "grad_norm": 0.00036114981048740447,
2459
+ "learning_rate": 6.316871408757678e-05,
2460
+ "epoch": 6.567883908271816,
2461
+ "step": 117000
2462
+ },
2463
+ {
2464
+ "loss": 0.0024,
2465
+ "grad_norm": 0.0003341217234265059,
2466
+ "learning_rate": 6.19303546661383e-05,
2467
+ "epoch": 6.595952507929379,
2468
+ "step": 117500
2469
+ },
2470
+ {
2471
+ "loss": 0.0024,
2472
+ "grad_norm": 0.00033465935848653316,
2473
+ "learning_rate": 6.0691995244699816e-05,
2474
+ "epoch": 6.624021107586943,
2475
+ "step": 118000
2476
+ },
2477
+ {
2478
+ "eval_loss": 0.001165699097327888,
2479
+ "eval_evaluator_0": 0.0013630150351673365,
2480
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7981864061322079,
2481
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8006961439309341,
2482
+ "eval_JSTS_pearson_cosine": 0.8621061916067831,
2483
+ "eval_JSTS_spearman_cosine": 0.813845436526813,
2484
+ "eval_sequential_score": 0.5386348651643048,
2485
+ "eval_runtime": 74.5723,
2486
+ "eval_samples_per_second": 2470.771,
2487
+ "eval_steps_per_second": 4.828,
2488
+ "epoch": 6.624021107586943,
2489
+ "step": 118000
2490
+ },
2491
+ {
2492
+ "loss": 0.0024,
2493
+ "grad_norm": 0.0003290769236627966,
2494
+ "learning_rate": 5.945363582326134e-05,
2495
+ "epoch": 6.652089707244506,
2496
+ "step": 118500
2497
+ },
2498
+ {
2499
+ "loss": 0.0024,
2500
+ "grad_norm": 0.0003693581384140998,
2501
+ "learning_rate": 5.821527640182286e-05,
2502
+ "epoch": 6.680158306902069,
2503
+ "step": 119000
2504
+ },
2505
+ {
2506
+ "loss": 0.0024,
2507
+ "grad_norm": 0.000339480146067217,
2508
+ "learning_rate": 5.697691698038438e-05,
2509
+ "epoch": 6.708226906559632,
2510
+ "step": 119500
2511
+ },
2512
+ {
2513
+ "loss": 0.0024,
2514
+ "grad_norm": 0.0003198812191840261,
2515
+ "learning_rate": 5.57385575589459e-05,
2516
+ "epoch": 6.7362955062171945,
2517
+ "step": 120000
2518
+ },
2519
+ {
2520
+ "eval_loss": 0.0011652549728751183,
2521
+ "eval_evaluator_0": 0.0013625958235934377,
2522
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7999146186391144,
2523
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8018843128444009,
2524
+ "eval_JSTS_pearson_cosine": 0.8623027854132908,
2525
+ "eval_JSTS_spearman_cosine": 0.8142636672530339,
2526
+ "eval_sequential_score": 0.5391701919736761,
2527
+ "eval_runtime": 72.7378,
2528
+ "eval_samples_per_second": 2533.083,
2529
+ "eval_steps_per_second": 4.949,
2530
+ "epoch": 6.7362955062171945,
2531
+ "step": 120000
2532
+ },
2533
+ {
2534
+ "loss": 0.0024,
2535
+ "grad_norm": 0.00033788220025599003,
2536
+ "learning_rate": 5.450019813750742e-05,
2537
+ "epoch": 6.764364105874758,
2538
+ "step": 120500
2539
+ },
2540
+ {
2541
+ "loss": 0.0024,
2542
+ "grad_norm": 0.0003455994592513889,
2543
+ "learning_rate": 5.3261838716068944e-05,
2544
+ "epoch": 6.792432705532321,
2545
+ "step": 121000
2546
+ },
2547
+ {
2548
+ "loss": 0.0024,
2549
+ "grad_norm": 0.00035492057213559747,
2550
+ "learning_rate": 5.202347929463047e-05,
2551
+ "epoch": 6.820501305189884,
2552
+ "step": 121500
2553
+ },
2554
+ {
2555
+ "loss": 0.0024,
2556
+ "grad_norm": 0.0003294384223408997,
2557
+ "learning_rate": 5.078511987319199e-05,
2558
+ "epoch": 6.848569904847447,
2559
+ "step": 122000
2560
+ },
2561
+ {
2562
+ "eval_loss": 0.00116501294542104,
2563
+ "eval_evaluator_0": 0.001362400595098734,
2564
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7961266901173212,
2565
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7980347334366136,
2566
+ "eval_JSTS_pearson_cosine": 0.861779208629962,
2567
+ "eval_JSTS_spearman_cosine": 0.8137240607202665,
2568
+ "eval_sequential_score": 0.5377070649173262,
2569
+ "eval_runtime": 71.8999,
2570
+ "eval_samples_per_second": 2562.604,
2571
+ "eval_steps_per_second": 5.007,
2572
+ "epoch": 6.848569904847447,
2573
+ "step": 122000
2574
+ },
2575
+ {
2576
+ "loss": 0.0024,
2577
+ "grad_norm": 0.00034672432229854167,
2578
+ "learning_rate": 4.954676045175351e-05,
2579
+ "epoch": 6.87663850450501,
2580
+ "step": 122500
2581
+ },
2582
+ {
2583
+ "loss": 0.0024,
2584
+ "grad_norm": 0.00033899256959557533,
2585
+ "learning_rate": 4.830840103031503e-05,
2586
+ "epoch": 6.904707104162574,
2587
+ "step": 123000
2588
+ },
2589
+ {
2590
+ "loss": 0.0024,
2591
+ "grad_norm": 0.0003218873462174088,
2592
+ "learning_rate": 4.707004160887656e-05,
2593
+ "epoch": 6.932775703820137,
2594
+ "step": 123500
2595
+ },
2596
+ {
2597
+ "loss": 0.0024,
2598
+ "grad_norm": 0.0003494083066470921,
2599
+ "learning_rate": 4.583168218743808e-05,
2600
+ "epoch": 6.9608443034777,
2601
+ "step": 124000
2602
+ },
2603
+ {
2604
+ "eval_loss": 0.0011646621860563755,
2605
+ "eval_evaluator_0": 0.001362146227620542,
2606
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.8002158796274099,
2607
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8028449862236761,
2608
+ "eval_JSTS_pearson_cosine": 0.8621321684464223,
2609
+ "eval_JSTS_spearman_cosine": 0.8142098135048893,
2610
+ "eval_sequential_score": 0.5394723153187286,
2611
+ "eval_runtime": 70.6182,
2612
+ "eval_samples_per_second": 2609.113,
2613
+ "eval_steps_per_second": 5.098,
2614
+ "epoch": 6.9608443034777,
2615
+ "step": 124000
2616
+ },
2617
+ {
2618
+ "loss": 0.0024,
2619
+ "grad_norm": 0.0003382643044460565,
2620
+ "learning_rate": 4.45933227659996e-05,
2621
+ "epoch": 6.988912903135263,
2622
+ "step": 124500
2623
+ },
2624
+ {
2625
+ "loss": 0.0024,
2626
+ "grad_norm": 0.0003371954953763634,
2627
+ "learning_rate": 4.3354963344561124e-05,
2628
+ "epoch": 7.0169534341931685,
2629
+ "step": 125000
2630
+ },
2631
+ {
2632
+ "loss": 0.0024,
2633
+ "grad_norm": 0.0003601745702326298,
2634
+ "learning_rate": 4.2116603923122646e-05,
2635
+ "epoch": 7.0450220338507314,
2636
+ "step": 125500
2637
+ },
2638
+ {
2639
+ "loss": 0.0024,
2640
+ "grad_norm": 0.00032043090322986245,
2641
+ "learning_rate": 4.087824450168417e-05,
2642
+ "epoch": 7.073090633508294,
2643
+ "step": 126000
2644
+ },
2645
+ {
2646
+ "eval_loss": 0.0011643405305221677,
2647
+ "eval_evaluator_0": 0.0013617631047964096,
2648
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7977362496603029,
2649
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8001515194571482,
2650
+ "eval_JSTS_pearson_cosine": 0.861298094227186,
2651
+ "eval_JSTS_spearman_cosine": 0.813236448790595,
2652
+ "eval_sequential_score": 0.5382499104508466,
2653
+ "eval_runtime": 73.0053,
2654
+ "eval_samples_per_second": 2523.804,
2655
+ "eval_steps_per_second": 4.931,
2656
+ "epoch": 7.073090633508294,
2657
+ "step": 126000
2658
+ },
2659
+ {
2660
+ "loss": 0.0024,
2661
+ "grad_norm": 0.00031437003053724766,
2662
+ "learning_rate": 3.963988508024569e-05,
2663
+ "epoch": 7.101159233165857,
2664
+ "step": 126500
2665
+ },
2666
+ {
2667
+ "loss": 0.0024,
2668
+ "grad_norm": 0.0003345895966049284,
2669
+ "learning_rate": 3.8401525658807213e-05,
2670
+ "epoch": 7.12922783282342,
2671
+ "step": 127000
2672
+ },
2673
+ {
2674
+ "loss": 0.0024,
2675
+ "grad_norm": 0.0003360872797202319,
2676
+ "learning_rate": 3.716316623736873e-05,
2677
+ "epoch": 7.157296432480983,
2678
+ "step": 127500
2679
+ },
2680
+ {
2681
+ "loss": 0.0024,
2682
+ "grad_norm": 0.00033311580773442984,
2683
+ "learning_rate": 3.592480681593025e-05,
2684
+ "epoch": 7.185365032138547,
2685
+ "step": 128000
2686
+ },
2687
+ {
2688
+ "eval_loss": 0.0011641262099146843,
2689
+ "eval_evaluator_0": 0.0013615088537335396,
2690
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7990914678424127,
2691
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8007984953885947,
2692
+ "eval_JSTS_pearson_cosine": 0.8617710720510966,
2693
+ "eval_JSTS_spearman_cosine": 0.8136913826185042,
2694
+ "eval_sequential_score": 0.5386171289536108,
2695
+ "eval_runtime": 71.8195,
2696
+ "eval_samples_per_second": 2565.473,
2697
+ "eval_steps_per_second": 5.013,
2698
+ "epoch": 7.185365032138547,
2699
+ "step": 128000
2700
+ },
2701
+ {
2702
+ "loss": 0.0024,
2703
+ "grad_norm": 0.00034391821827739477,
2704
+ "learning_rate": 3.4686447394491774e-05,
2705
+ "epoch": 7.21343363179611,
2706
+ "step": 128500
2707
+ },
2708
+ {
2709
+ "loss": 0.0024,
2710
+ "grad_norm": 0.0003487596404738724,
2711
+ "learning_rate": 3.3448087973053296e-05,
2712
+ "epoch": 7.241502231453673,
2713
+ "step": 129000
2714
+ },
2715
+ {
2716
+ "loss": 0.0024,
2717
+ "grad_norm": 0.00034094389411620796,
2718
+ "learning_rate": 3.220972855161482e-05,
2719
+ "epoch": 7.269570831111236,
2720
+ "step": 129500
2721
+ },
2722
+ {
2723
+ "loss": 0.0024,
2724
+ "grad_norm": 0.0003401459543965757,
2725
+ "learning_rate": 3.097136913017634e-05,
2726
+ "epoch": 7.297639430768799,
2727
+ "step": 130000
2728
+ },
2729
+ {
2730
+ "eval_loss": 0.001163804205134511,
2731
+ "eval_evaluator_0": 0.0013612366747111082,
2732
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7982797286012763,
2733
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8005149758262927,
2734
+ "eval_JSTS_pearson_cosine": 0.8617189467206064,
2735
+ "eval_JSTS_spearman_cosine": 0.8137947848860295,
2736
+ "eval_sequential_score": 0.5385569991290111,
2737
+ "eval_runtime": 71.0982,
2738
+ "eval_samples_per_second": 2591.5,
2739
+ "eval_steps_per_second": 5.063,
2740
+ "epoch": 7.297639430768799,
2741
+ "step": 130000
2742
+ },
2743
+ {
2744
+ "loss": 0.0024,
2745
+ "grad_norm": 0.0003691680612973869,
2746
+ "learning_rate": 2.9733009708737864e-05,
2747
+ "epoch": 7.325708030426362,
2748
+ "step": 130500
2749
+ },
2750
+ {
2751
+ "loss": 0.0024,
2752
+ "grad_norm": 0.00033191803959198296,
2753
+ "learning_rate": 2.8494650287299383e-05,
2754
+ "epoch": 7.353776630083925,
2755
+ "step": 131000
2756
+ },
2757
+ {
2758
+ "loss": 0.0024,
2759
+ "grad_norm": 0.0003305823483970016,
2760
+ "learning_rate": 2.7256290865860905e-05,
2761
+ "epoch": 7.381845229741488,
2762
+ "step": 131500
2763
+ },
2764
+ {
2765
+ "loss": 0.0024,
2766
+ "grad_norm": 0.00032995129004120827,
2767
+ "learning_rate": 2.6017931444422428e-05,
2768
+ "epoch": 7.409913829399051,
2769
+ "step": 132000
2770
+ },
2771
+ {
2772
+ "eval_loss": 0.0011634527472779155,
2773
+ "eval_evaluator_0": 0.0013608216540887952,
2774
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7974230206646705,
2775
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.7995378161678208,
2776
+ "eval_JSTS_pearson_cosine": 0.8617639739167566,
2777
+ "eval_JSTS_spearman_cosine": 0.8140133727128173,
2778
+ "eval_sequential_score": 0.5383040035115756,
2779
+ "eval_runtime": 74.9306,
2780
+ "eval_samples_per_second": 2458.956,
2781
+ "eval_steps_per_second": 4.804,
2782
+ "epoch": 7.409913829399051,
2783
+ "step": 132000
2784
+ },
2785
+ {
2786
+ "loss": 0.0024,
2787
+ "grad_norm": 0.00034205676638521254,
2788
+ "learning_rate": 2.4779572022983947e-05,
2789
+ "epoch": 7.437982429056614,
2790
+ "step": 132500
2791
+ },
2792
+ {
2793
+ "loss": 0.0024,
2794
+ "grad_norm": 0.00034039837191812694,
2795
+ "learning_rate": 2.354121260154547e-05,
2796
+ "epoch": 7.466051028714178,
2797
+ "step": 133000
2798
+ },
2799
+ {
2800
+ "loss": 0.0024,
2801
+ "grad_norm": 0.0003149213152937591,
2802
+ "learning_rate": 2.230285318010699e-05,
2803
+ "epoch": 7.494119628371741,
2804
+ "step": 133500
2805
+ },
2806
+ {
2807
+ "loss": 0.0024,
2808
+ "grad_norm": 0.0003163942019455135,
2809
+ "learning_rate": 2.1064493758668514e-05,
2810
+ "epoch": 7.522188228029304,
2811
+ "step": 134000
2812
+ },
2813
+ {
2814
+ "eval_loss": 0.0011632349342107773,
2815
+ "eval_evaluator_0": 0.0013606171123683453,
2816
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7971261315068038,
2817
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.799940281130607,
2818
+ "eval_JSTS_pearson_cosine": 0.8619249668633946,
2819
+ "eval_JSTS_spearman_cosine": 0.8141984859650053,
2820
+ "eval_sequential_score": 0.5384997947359936,
2821
+ "eval_runtime": 70.7072,
2822
+ "eval_samples_per_second": 2605.83,
2823
+ "eval_steps_per_second": 5.091,
2824
+ "epoch": 7.522188228029304,
2825
+ "step": 134000
2826
+ },
2827
+ {
2828
+ "loss": 0.0024,
2829
+ "grad_norm": 0.00029837930924259126,
2830
+ "learning_rate": 1.9826134337230033e-05,
2831
+ "epoch": 7.550256827686867,
2832
+ "step": 134500
2833
+ },
2834
+ {
2835
+ "loss": 0.0024,
2836
+ "grad_norm": 0.0003313660272397101,
2837
+ "learning_rate": 1.858777491579156e-05,
2838
+ "epoch": 7.57832542734443,
2839
+ "step": 135000
2840
+ },
2841
+ {
2842
+ "loss": 0.0024,
2843
+ "grad_norm": 0.0003373105137143284,
2844
+ "learning_rate": 1.734941549435308e-05,
2845
+ "epoch": 7.6063940270019925,
2846
+ "step": 135500
2847
+ },
2848
+ {
2849
+ "loss": 0.0024,
2850
+ "grad_norm": 0.0003585618978831917,
2851
+ "learning_rate": 1.61110560729146e-05,
2852
+ "epoch": 7.634462626659556,
2853
+ "step": 136000
2854
+ },
2855
+ {
2856
+ "eval_loss": 0.0011632071109488606,
2857
+ "eval_evaluator_0": 0.0013605711283162236,
2858
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7991714861040964,
2859
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8010703574998989,
2860
+ "eval_JSTS_pearson_cosine": 0.8617085515076026,
2861
+ "eval_JSTS_spearman_cosine": 0.8138006037415473,
2862
+ "eval_sequential_score": 0.5387438441232542,
2863
+ "eval_runtime": 74.264,
2864
+ "eval_samples_per_second": 2481.026,
2865
+ "eval_steps_per_second": 4.848,
2866
+ "epoch": 7.634462626659556,
2867
+ "step": 136000
2868
+ },
2869
+ {
2870
+ "loss": 0.0024,
2871
+ "grad_norm": 0.00030662360950373113,
2872
+ "learning_rate": 1.4872696651476123e-05,
2873
+ "epoch": 7.662531226317119,
2874
+ "step": 136500
2875
+ },
2876
+ {
2877
+ "loss": 0.0024,
2878
+ "grad_norm": 0.0003228651185054332,
2879
+ "learning_rate": 1.3634337230037645e-05,
2880
+ "epoch": 7.690599825974682,
2881
+ "step": 137000
2882
+ },
2883
+ {
2884
+ "loss": 0.0024,
2885
+ "grad_norm": 0.0003200937353540212,
2886
+ "learning_rate": 1.2395977808599166e-05,
2887
+ "epoch": 7.718668425632245,
2888
+ "step": 137500
2889
+ },
2890
+ {
2891
+ "loss": 0.0024,
2892
+ "grad_norm": 0.0002969225461129099,
2893
+ "learning_rate": 1.1157618387160687e-05,
2894
+ "epoch": 7.746737025289808,
2895
+ "step": 138000
2896
+ },
2897
+ {
2898
+ "eval_loss": 0.0011628158390522003,
2899
+ "eval_evaluator_0": 0.0013602408580482006,
2900
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7994965640380185,
2901
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8014635757612357,
2902
+ "eval_JSTS_pearson_cosine": 0.8621887195083964,
2903
+ "eval_JSTS_spearman_cosine": 0.8142017976984064,
2904
+ "eval_sequential_score": 0.5390085381058968,
2905
+ "eval_runtime": 70.4499,
2906
+ "eval_samples_per_second": 2615.349,
2907
+ "eval_steps_per_second": 5.11,
2908
+ "epoch": 7.746737025289808,
2909
+ "step": 138000
2910
+ },
2911
+ {
2912
+ "loss": 0.0024,
2913
+ "grad_norm": 0.0003055589913856238,
2914
+ "learning_rate": 9.919258965722211e-06,
2915
+ "epoch": 7.774805624947371,
2916
+ "step": 138500
2917
+ },
2918
+ {
2919
+ "loss": 0.0024,
2920
+ "grad_norm": 0.00030320361838676035,
2921
+ "learning_rate": 8.680899544283732e-06,
2922
+ "epoch": 7.802874224604935,
2923
+ "step": 139000
2924
+ },
2925
+ {
2926
+ "loss": 0.0024,
2927
+ "grad_norm": 0.0003302599652670324,
2928
+ "learning_rate": 7.442540122845254e-06,
2929
+ "epoch": 7.830942824262498,
2930
+ "step": 139500
2931
+ },
2932
+ {
2933
+ "loss": 0.0024,
2934
+ "grad_norm": 0.00031279708491638303,
2935
+ "learning_rate": 6.204180701406776e-06,
2936
+ "epoch": 7.859011423920061,
2937
+ "step": 140000
2938
+ },
2939
+ {
2940
+ "eval_loss": 0.0011627456406131387,
2941
+ "eval_evaluator_0": 0.001360146445222199,
2942
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7987378429030463,
2943
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8007339865504044,
2944
+ "eval_JSTS_pearson_cosine": 0.8619350551916014,
2945
+ "eval_JSTS_spearman_cosine": 0.8140707704434627,
2946
+ "eval_sequential_score": 0.5387216344796965,
2947
+ "eval_runtime": 74.7841,
2948
+ "eval_samples_per_second": 2463.772,
2949
+ "eval_steps_per_second": 4.814,
2950
+ "epoch": 7.859011423920061,
2951
+ "step": 140000
2952
+ },
2953
+ {
2954
+ "loss": 0.0024,
2955
+ "grad_norm": 0.0003151698037981987,
2956
+ "learning_rate": 4.965821279968297e-06,
2957
+ "epoch": 7.8870800235776235,
2958
+ "step": 140500
2959
+ },
2960
+ {
2961
+ "loss": 0.0024,
2962
+ "grad_norm": 0.0003283790429122746,
2963
+ "learning_rate": 3.7274618585298198e-06,
2964
+ "epoch": 7.915148623235186,
2965
+ "step": 141000
2966
+ },
2967
+ {
2968
+ "loss": 0.0024,
2969
+ "grad_norm": 0.00029954445199109614,
2970
+ "learning_rate": 2.4891024370913414e-06,
2971
+ "epoch": 7.94321722289275,
2972
+ "step": 141500
2973
+ },
2974
+ {
2975
+ "loss": 0.0024,
2976
+ "grad_norm": 0.0003039216680917889,
2977
+ "learning_rate": 1.250743015652863e-06,
2978
+ "epoch": 7.971285822550313,
2979
+ "step": 142000
2980
+ },
2981
+ {
2982
+ "eval_loss": 0.0011625363258644938,
2983
+ "eval_evaluator_0": 0.0013599260710179806,
2984
+ "eval_stsb_multi_mt-en_pearson_cosine": 0.7988037559289333,
2985
+ "eval_stsb_multi_mt-en_spearman_cosine": 0.8009711557760016,
2986
+ "eval_JSTS_pearson_cosine": 0.8622404113206219,
2987
+ "eval_JSTS_spearman_cosine": 0.8142666349859583,
2988
+ "eval_sequential_score": 0.5388659056109927,
2989
+ "eval_runtime": 72.1595,
2990
+ "eval_samples_per_second": 2553.385,
2991
+ "eval_steps_per_second": 4.989,
2992
+ "epoch": 7.971285822550313,
2993
+ "step": 142000
2994
+ },
2995
+ {
2996
+ "loss": 0.0024,
2997
+ "grad_norm": 0.00028941588243469596,
2998
+ "learning_rate": 1.2383594214384782e-08,
2999
+ "epoch": 7.999354422207876,
3000
+ "step": 142500
3001
+ }
3002
+ ],
3003
+ "best_metric": null,
3004
+ "best_global_step": null,
3005
+ "best_model_checkpoint": null,
3006
+ "is_local_process_zero": true,
3007
+ "is_world_process_zero": true,
3008
+ "is_hyper_param_search": false,
3009
+ "trial_name": null,
3010
+ "trial_params": null,
3011
+ "stateful_callbacks": {
3012
+ "TrainerControl": {
3013
+ "args": {
3014
+ "should_training_stop": true,
3015
+ "should_epoch_stop": false,
3016
+ "should_save": true,
3017
+ "should_evaluate": false,
3018
+ "should_log": false
3019
+ },
3020
+ "attributes": {}
3021
+ }
3022
+ }
3023
+ }
training_args.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:023ece5731f0eed92e92135399cbe0b25b99b525ce8863c32c81f9a67e9ec300
3
+ size 5624