saikasyap commited on
Commit
8e0e716
·
verified ·
1 Parent(s): 6783d39

Initial commit

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,524 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - dense
7
+ - generated_from_trainer
8
+ - dataset_size:257886
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: intfloat/multilingual-e5-large
11
+ widget:
12
+ - source_sentence: Wherever and whenever they saw any creature, any dweller of the
13
+ Khandava, escaping from the fire, those two great heroes immediately shot it down.
14
+ sentences:
15
+ - वयं पठाम ।
16
+ - 'दि अमोङ्ग- अस् कुक्कुटस्य खण्डः पैरेडोलिया इत्यस्य उदाहरणम् अस्ति।
17
+
18
+ '
19
+ - यत्र यत्र च दृश्यन्ते प्राणिनः खाण्डवालयाः। पलायन्तः प्रवीरौ तौ तत्र तत्राभ्यधावताम्॥
20
+ - source_sentence: 'Residents were trapped in houses and elsewhere as the roads turned
21
+ into rivers.
22
+
23
+ '
24
+ sentences:
25
+ - वयमधुना षट्-लेबल्स् योजितवन्तः।
26
+ - 'पदवीषु नद्यायमानासु अन्यत्र गन्तुम् अकल्पाः वस्तव्याः गृहेष्वेव निबद्धाः आसन्।
27
+
28
+ '
29
+ - 'स्व॒स्ति न॒ इन्द्रो॑ वृ॒द्धश्र॑वाः स्व॒स्ति नः॑ पू॒षा वि॒श्ववे॑दाः । स्व॒स्ति
30
+ न॒स्तार्क्ष्यो॒ अरि॑ष्टनेमिः स्व॒स्ति नो॒ बृह॒स्पति॑र्दधातु '
31
+ - source_sentence: From this street the village is seen.
32
+ sentences:
33
+ - धर्मदण्डो न निर्दण्डो धर्मकार्यानुशासकः। यन्त्रितः कार्यकरणैः षड्भागकृतलक्षणः॥
34
+ - एतस्याः वीथ्याः ग्रामं दृश्यते ।
35
+ - 'भवता पत्रकर्त्रा नगरे सामुदायिकायाः हिंसायाः विषये मिथ्यावार्ताः प्रकाशिताः इत्यतः
36
+ जनाः भीताः सन्ति।
37
+
38
+ '
39
+ - source_sentence: 'Visitors have put poppies next to the names of their relatives
40
+ and friends.
41
+
42
+ '
43
+ sentences:
44
+ - 'परी॒तो षि॑ञ्चता सु॒तं सोमो॒ य उ॑त्त॒मं ह॒विः । द॒ध॒न्वाँ यो नर्यो॑ अ॒प्स्व१॒॑न्तरा
45
+ सु॒षाव॒ सोम॒मद्रि॑भिः '
46
+ - 'सन्दर्शकाः स्वीयानां सम्बन्धिनां, सुहृदां च नाम्नः पार्श्वे पोप्पीस् न्यक्षिपन्।
47
+
48
+ '
49
+ - 'बीबीगढ्-गृहं यत्र आङ्ग्लस्त्रियः, बालकाः च हताः, तथा च कूपः यस्मात् मृतानां शवाः
50
+ च प्राप्ताः।
51
+
52
+ '
53
+ - source_sentence: 'The majority of these nations are now republics or part of republics.
54
+
55
+ '
56
+ sentences:
57
+ - 'एतेषु अधिकांशाः देशाः अधुना गणराज्यानि उत गणराज्यानां भागाः वा सन्ति।
58
+
59
+ '
60
+ - तदिन्द्रजालप्रतिम बाणजालममित्रहा। विसृज्य दिक्षु सर्वासु महेन्द्र इव वज्रभृत्॥
61
+ - अत्र मूलसञ्चिका (source file) विद्यते। pdflatex इत्यादेशमुपयुज्य सङ्कलयामि।
62
+ pipeline_tag: sentence-similarity
63
+ library_name: sentence-transformers
64
+ metrics:
65
+ - src2trg_accuracy
66
+ - trg2src_accuracy
67
+ - mean_accuracy
68
+ model-index:
69
+ - name: SentenceTransformer based on intfloat/multilingual-e5-large
70
+ results:
71
+ - task:
72
+ type: translation
73
+ name: Translation
74
+ dataset:
75
+ name: eval en sa
76
+ type: eval-en-sa
77
+ metrics:
78
+ - type: src2trg_accuracy
79
+ value: 0.866
80
+ name: Src2Trg Accuracy
81
+ - type: trg2src_accuracy
82
+ value: 0.868
83
+ name: Trg2Src Accuracy
84
+ - type: mean_accuracy
85
+ value: 0.867
86
+ name: Mean Accuracy
87
+ ---
88
+
89
+ # SentenceTransformer based on intfloat/multilingual-e5-large
90
+
91
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
92
+
93
+ ## Model Details
94
+
95
+ ### Model Description
96
+ - **Model Type:** Sentence Transformer
97
+ - **Base model:** [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) <!-- at revision 0dc5580a448e4284468b8909bae50fa925907bc5 -->
98
+ - **Maximum Sequence Length:** 512 tokens
99
+ - **Output Dimensionality:** 1024 dimensions
100
+ - **Similarity Function:** Cosine Similarity
101
+ <!-- - **Training Dataset:** Unknown -->
102
+ <!-- - **Language:** Unknown -->
103
+ <!-- - **License:** Unknown -->
104
+
105
+ ### Model Sources
106
+
107
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
108
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
109
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
110
+
111
+ ### Full Model Architecture
112
+
113
+ ```
114
+ SentenceTransformer(
115
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
116
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
117
+ (2): Normalize()
118
+ )
119
+ ```
120
+
121
+ ## Usage
122
+
123
+ ### Direct Usage (Sentence Transformers)
124
+
125
+ First install the Sentence Transformers library:
126
+
127
+ ```bash
128
+ pip install -U sentence-transformers
129
+ ```
130
+
131
+ Then you can load this model and run inference.
132
+ ```python
133
+ from sentence_transformers import SentenceTransformer
134
+
135
+ # Download from the 🤗 Hub
136
+ model = SentenceTransformer("sentence_transformers_model_id")
137
+ # Run inference
138
+ sentences = [
139
+ 'The majority of these nations are now republics or part of republics.\n',
140
+ 'एतेषु अधिकांशाः देशाः अधुना गणराज्यानि उत गणराज्यानां भागाः वा सन्ति।\n',
141
+ 'अत्र मूलसञ्चिका (source file) विद्यते। pdflatex इत्यादेशमुपयुज्य सङ्कलयामि।',
142
+ ]
143
+ embeddings = model.encode(sentences)
144
+ print(embeddings.shape)
145
+ # [3, 1024]
146
+
147
+ # Get the similarity scores for the embeddings
148
+ similarities = model.similarity(embeddings, embeddings)
149
+ print(similarities)
150
+ # tensor([[1.0000, 0.8049, 0.1296],
151
+ # [0.8049, 1.0000, 0.1642],
152
+ # [0.1296, 0.1642, 1.0000]])
153
+ ```
154
+
155
+ <!--
156
+ ### Direct Usage (Transformers)
157
+
158
+ <details><summary>Click to see the direct usage in Transformers</summary>
159
+
160
+ </details>
161
+ -->
162
+
163
+ <!--
164
+ ### Downstream Usage (Sentence Transformers)
165
+
166
+ You can finetune this model on your own dataset.
167
+
168
+ <details><summary>Click to expand</summary>
169
+
170
+ </details>
171
+ -->
172
+
173
+ <!--
174
+ ### Out-of-Scope Use
175
+
176
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
177
+ -->
178
+
179
+ ## Evaluation
180
+
181
+ ### Metrics
182
+
183
+ #### Translation
184
+
185
+ * Dataset: `eval-en-sa`
186
+ * Evaluated with [<code>TranslationEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TranslationEvaluator)
187
+
188
+ | Metric | Value |
189
+ |:------------------|:----------|
190
+ | src2trg_accuracy | 0.866 |
191
+ | trg2src_accuracy | 0.868 |
192
+ | **mean_accuracy** | **0.867** |
193
+
194
+ <!--
195
+ ## Bias, Risks and Limitations
196
+
197
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
198
+ -->
199
+
200
+ <!--
201
+ ### Recommendations
202
+
203
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
204
+ -->
205
+
206
+ ## Training Details
207
+
208
+ ### Training Dataset
209
+
210
+ #### Unnamed Dataset
211
+
212
+ * Size: 257,886 training samples
213
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
214
+ * Approximate statistics based on the first 1000 samples:
215
+ | | sentence_0 | sentence_1 |
216
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
217
+ | type | string | string |
218
+ | details | <ul><li>min: 5 tokens</li><li>mean: 33.91 tokens</li><li>max: 403 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 37.33 tokens</li><li>max: 228 tokens</li></ul> |
219
+ * Samples:
220
+ | sentence_0 | sentence_1 |
221
+ |:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
222
+ | <code>"For the purpose of this tutorial, we shall list these instructions in slides."</code> | <code>अस्य पाठस्य आनुकूल्याय स्लैड् द्वारा आदेशान् वदामः ।</code> |
223
+ | <code>Gandharva prajapati, Vishwakarma and mana swaroop. Please protect Gandharva Brahmins and Kshatriyas. Riku and Sama have an apsara named Ashti. Please protect us. This sacrifice is an offering for them. Swaha for them. (43)</code> | <code>प्र॒जाप॑तिर्वि॒श्वक॑र्मा॒ मनो॑ गन्ध॒र्वस्तस्य॑ऽऋ॒क्सा॒मान्य॑प्स॒रस॒ऽएष्ट॑यो॒ नाम॑। स न॑ऽइ॒दं ब्रह्म॑ क्ष॒त्रं पा॑तु॒ तस्मै॒ स्वाहा॒ वाट् ताभ्यः॒ स्वाहा॑ ॥ (४३)</code> |
224
+ | <code>Many things are sold to treat acne, the most popular being benzoyl peroxide.<br></code> | <code>आक्ने-चिकित्सार्थं नाइकानि वस्तूनि विक्रीयन्ते, तेषु अतिजनप्रियं बेन्ज़ोय्ल् पराक्सैड्।<br></code> |
225
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
226
+ ```json
227
+ {
228
+ "scale": 20.0,
229
+ "similarity_fct": "cos_sim"
230
+ }
231
+ ```
232
+
233
+ ### Training Hyperparameters
234
+ #### Non-Default Hyperparameters
235
+
236
+ - `eval_strategy`: steps
237
+ - `per_device_train_batch_size`: 4
238
+ - `per_device_eval_batch_size`: 4
239
+ - `num_train_epochs`: 15
240
+ - `multi_dataset_batch_sampler`: round_robin
241
+
242
+ #### All Hyperparameters
243
+ <details><summary>Click to expand</summary>
244
+
245
+ - `overwrite_output_dir`: False
246
+ - `do_predict`: False
247
+ - `eval_strategy`: steps
248
+ - `prediction_loss_only`: True
249
+ - `per_device_train_batch_size`: 4
250
+ - `per_device_eval_batch_size`: 4
251
+ - `per_gpu_train_batch_size`: None
252
+ - `per_gpu_eval_batch_size`: None
253
+ - `gradient_accumulation_steps`: 1
254
+ - `eval_accumulation_steps`: None
255
+ - `torch_empty_cache_steps`: None
256
+ - `learning_rate`: 5e-05
257
+ - `weight_decay`: 0.0
258
+ - `adam_beta1`: 0.9
259
+ - `adam_beta2`: 0.999
260
+ - `adam_epsilon`: 1e-08
261
+ - `max_grad_norm`: 1
262
+ - `num_train_epochs`: 15
263
+ - `max_steps`: -1
264
+ - `lr_scheduler_type`: linear
265
+ - `lr_scheduler_kwargs`: {}
266
+ - `warmup_ratio`: 0.0
267
+ - `warmup_steps`: 0
268
+ - `log_level`: passive
269
+ - `log_level_replica`: warning
270
+ - `log_on_each_node`: True
271
+ - `logging_nan_inf_filter`: True
272
+ - `save_safetensors`: True
273
+ - `save_on_each_node`: False
274
+ - `save_only_model`: False
275
+ - `restore_callback_states_from_checkpoint`: False
276
+ - `no_cuda`: False
277
+ - `use_cpu`: False
278
+ - `use_mps_device`: False
279
+ - `seed`: 42
280
+ - `data_seed`: None
281
+ - `jit_mode_eval`: False
282
+ - `use_ipex`: False
283
+ - `bf16`: False
284
+ - `fp16`: False
285
+ - `fp16_opt_level`: O1
286
+ - `half_precision_backend`: auto
287
+ - `bf16_full_eval`: False
288
+ - `fp16_full_eval`: False
289
+ - `tf32`: None
290
+ - `local_rank`: 0
291
+ - `ddp_backend`: None
292
+ - `tpu_num_cores`: None
293
+ - `tpu_metrics_debug`: False
294
+ - `debug`: []
295
+ - `dataloader_drop_last`: False
296
+ - `dataloader_num_workers`: 0
297
+ - `dataloader_prefetch_factor`: None
298
+ - `past_index`: -1
299
+ - `disable_tqdm`: False
300
+ - `remove_unused_columns`: True
301
+ - `label_names`: None
302
+ - `load_best_model_at_end`: False
303
+ - `ignore_data_skip`: False
304
+ - `fsdp`: []
305
+ - `fsdp_min_num_params`: 0
306
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
307
+ - `fsdp_transformer_layer_cls_to_wrap`: None
308
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
309
+ - `deepspeed`: None
310
+ - `label_smoothing_factor`: 0.0
311
+ - `optim`: adamw_torch
312
+ - `optim_args`: None
313
+ - `adafactor`: False
314
+ - `group_by_length`: False
315
+ - `length_column_name`: length
316
+ - `ddp_find_unused_parameters`: None
317
+ - `ddp_bucket_cap_mb`: None
318
+ - `ddp_broadcast_buffers`: False
319
+ - `dataloader_pin_memory`: True
320
+ - `dataloader_persistent_workers`: False
321
+ - `skip_memory_metrics`: True
322
+ - `use_legacy_prediction_loop`: False
323
+ - `push_to_hub`: False
324
+ - `resume_from_checkpoint`: None
325
+ - `hub_model_id`: None
326
+ - `hub_strategy`: every_save
327
+ - `hub_private_repo`: None
328
+ - `hub_always_push`: False
329
+ - `hub_revision`: None
330
+ - `gradient_checkpointing`: False
331
+ - `gradient_checkpointing_kwargs`: None
332
+ - `include_inputs_for_metrics`: False
333
+ - `include_for_metrics`: []
334
+ - `eval_do_concat_batches`: True
335
+ - `fp16_backend`: auto
336
+ - `push_to_hub_model_id`: None
337
+ - `push_to_hub_organization`: None
338
+ - `mp_parameters`:
339
+ - `auto_find_batch_size`: False
340
+ - `full_determinism`: False
341
+ - `torchdynamo`: None
342
+ - `ray_scope`: last
343
+ - `ddp_timeout`: 1800
344
+ - `torch_compile`: False
345
+ - `torch_compile_backend`: None
346
+ - `torch_compile_mode`: None
347
+ - `include_tokens_per_second`: False
348
+ - `include_num_input_tokens_seen`: False
349
+ - `neftune_noise_alpha`: None
350
+ - `optim_target_modules`: None
351
+ - `batch_eval_metrics`: False
352
+ - `eval_on_start`: False
353
+ - `use_liger_kernel`: False
354
+ - `liger_kernel_config`: None
355
+ - `eval_use_gather_object`: False
356
+ - `average_tokens_across_devices`: False
357
+ - `prompts`: None
358
+ - `batch_sampler`: batch_sampler
359
+ - `multi_dataset_batch_sampler`: round_robin
360
+ - `router_mapping`: {}
361
+ - `learning_rate_mapping`: {}
362
+
363
+ </details>
364
+
365
+ ### Training Logs
366
+ | Epoch | Step | Training Loss | eval-en-sa_mean_accuracy |
367
+ |:------:|:-----:|:-------------:|:------------------------:|
368
+ | 0.0078 | 500 | 0.2715 | - |
369
+ | 0.0155 | 1000 | 0.0402 | - |
370
+ | 0.0233 | 1500 | 0.0323 | - |
371
+ | 0.0310 | 2000 | 0.0305 | - |
372
+ | 0.0388 | 2500 | 0.0169 | - |
373
+ | 0.0465 | 3000 | 0.0122 | - |
374
+ | 0.0543 | 3500 | 0.011 | - |
375
+ | 0.0620 | 4000 | 0.0134 | - |
376
+ | 0.0698 | 4500 | 0.0081 | - |
377
+ | 0.0776 | 5000 | 0.0177 | - |
378
+ | 0.0853 | 5500 | 0.0195 | - |
379
+ | 0.0931 | 6000 | 0.014 | - |
380
+ | 0.1008 | 6500 | 0.0226 | - |
381
+ | 0.1086 | 7000 | 0.0122 | - |
382
+ | 0.1163 | 7500 | 0.0156 | - |
383
+ | 0.1241 | 8000 | 0.0192 | - |
384
+ | 0.1318 | 8500 | 0.023 | - |
385
+ | 0.1396 | 9000 | 0.0153 | - |
386
+ | 0.1474 | 9500 | 0.0275 | - |
387
+ | 0.1551 | 10000 | 0.0272 | - |
388
+ | 0.1629 | 10500 | 0.0222 | - |
389
+ | 0.1706 | 11000 | 0.0134 | - |
390
+ | 0.1784 | 11500 | 0.0216 | - |
391
+ | 0.1861 | 12000 | 0.0152 | - |
392
+ | 0.1939 | 12500 | 0.0104 | - |
393
+ | 0.2016 | 13000 | 0.0178 | - |
394
+ | 0.2094 | 13500 | 0.0209 | - |
395
+ | 0.2171 | 14000 | 0.0211 | - |
396
+ | 0.2249 | 14500 | 0.0198 | - |
397
+ | 0.2327 | 15000 | 0.0212 | - |
398
+ | 0.2404 | 15500 | 0.0177 | - |
399
+ | 0.2482 | 16000 | 0.0221 | - |
400
+ | 0.2559 | 16500 | 0.0206 | - |
401
+ | 0.2637 | 17000 | 0.0181 | - |
402
+ | 0.2714 | 17500 | 0.0165 | - |
403
+ | 0.2792 | 18000 | 0.0145 | - |
404
+ | 0.2869 | 18500 | 0.0139 | - |
405
+ | 0.2947 | 19000 | 0.0198 | - |
406
+ | 0.3025 | 19500 | 0.0139 | - |
407
+ | 0.3102 | 20000 | 0.0177 | - |
408
+ | 0.3180 | 20500 | 0.0104 | - |
409
+ | 0.3257 | 21000 | 0.0149 | - |
410
+ | 0.3335 | 21500 | 0.0144 | - |
411
+ | 0.3412 | 22000 | 0.0168 | - |
412
+ | 0.3490 | 22500 | 0.0156 | - |
413
+ | 0.3567 | 23000 | 0.0132 | - |
414
+ | 0.3645 | 23500 | 0.0152 | - |
415
+ | 0.3723 | 24000 | 0.0147 | - |
416
+ | 0.3800 | 24500 | 0.0142 | - |
417
+ | 0.3878 | 25000 | 0.018 | - |
418
+ | 0.3955 | 25500 | 0.0246 | - |
419
+ | 0.4033 | 26000 | 0.0105 | - |
420
+ | 0.4110 | 26500 | 0.0097 | - |
421
+ | 0.4188 | 27000 | 0.0145 | - |
422
+ | 0.4265 | 27500 | 0.0136 | - |
423
+ | 0.4343 | 28000 | 0.0182 | - |
424
+ | 0.4421 | 28500 | 0.016 | - |
425
+ | 0.4498 | 29000 | 0.0088 | - |
426
+ | 0.4576 | 29500 | 0.0106 | - |
427
+ | 0.4653 | 30000 | 0.02 | - |
428
+ | 0.4731 | 30500 | 0.0153 | - |
429
+ | 0.4808 | 31000 | 0.0118 | - |
430
+ | 0.4886 | 31500 | 0.0141 | - |
431
+ | 0.4963 | 32000 | 0.0194 | - |
432
+ | 0.5041 | 32500 | 0.0149 | - |
433
+ | 0.5119 | 33000 | 0.0099 | - |
434
+ | 0.5196 | 33500 | 0.0212 | - |
435
+ | 0.5274 | 34000 | 0.0112 | - |
436
+ | 0.5351 | 34500 | 0.0175 | - |
437
+ | 0.5429 | 35000 | 0.0149 | - |
438
+ | 0.5506 | 35500 | 0.0142 | - |
439
+ | 0.5584 | 36000 | 0.0174 | - |
440
+ | 0.5661 | 36500 | 0.0146 | - |
441
+ | 0.5739 | 37000 | 0.0186 | - |
442
+ | 0.5816 | 37500 | 0.0167 | - |
443
+ | 0.5894 | 38000 | 0.0356 | - |
444
+ | 0.5972 | 38500 | 0.0195 | - |
445
+ | 0.6049 | 39000 | 0.0165 | - |
446
+ | 0.6127 | 39500 | 0.0202 | - |
447
+ | 0.6204 | 40000 | 0.0142 | - |
448
+ | 0.6282 | 40500 | 0.0104 | - |
449
+ | 0.6359 | 41000 | 0.0104 | - |
450
+ | 0.6437 | 41500 | 0.0155 | - |
451
+ | 0.6514 | 42000 | 0.0056 | - |
452
+ | 0.6592 | 42500 | 0.0102 | - |
453
+ | 0.6670 | 43000 | 0.0096 | - |
454
+ | 0.6747 | 43500 | 0.0219 | - |
455
+ | 0.6825 | 44000 | 0.0106 | - |
456
+ | 0.6902 | 44500 | 0.0129 | - |
457
+ | 0.6980 | 45000 | 0.0152 | - |
458
+ | 0.7057 | 45500 | 0.0158 | - |
459
+ | 0.7135 | 46000 | 0.0082 | - |
460
+ | 0.7212 | 46500 | 0.0159 | - |
461
+ | 0.7290 | 47000 | 0.0184 | - |
462
+ | 0.7368 | 47500 | 0.0101 | - |
463
+ | 0.7445 | 48000 | 0.0101 | - |
464
+ | 0.7523 | 48500 | 0.0115 | - |
465
+ | 0.7600 | 49000 | 0.0111 | - |
466
+ | 0.7678 | 49500 | 0.0116 | - |
467
+ | 0.7755 | 50000 | 0.0085 | 0.867 |
468
+
469
+
470
+ ### Framework Versions
471
+ - Python: 3.10.18
472
+ - Sentence Transformers: 5.0.0
473
+ - Transformers: 4.53.1
474
+ - PyTorch: 2.7.1+cu126
475
+ - Accelerate: 1.10.0
476
+ - Datasets: 3.6.0
477
+ - Tokenizers: 0.21.2
478
+
479
+ ## Citation
480
+
481
+ ### BibTeX
482
+
483
+ #### Sentence Transformers
484
+ ```bibtex
485
+ @inproceedings{reimers-2019-sentence-bert,
486
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
487
+ author = "Reimers, Nils and Gurevych, Iryna",
488
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
489
+ month = "11",
490
+ year = "2019",
491
+ publisher = "Association for Computational Linguistics",
492
+ url = "https://arxiv.org/abs/1908.10084",
493
+ }
494
+ ```
495
+
496
+ #### MultipleNegativesRankingLoss
497
+ ```bibtex
498
+ @misc{henderson2017efficient,
499
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
500
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
501
+ year={2017},
502
+ eprint={1705.00652},
503
+ archivePrefix={arXiv},
504
+ primaryClass={cs.CL}
505
+ }
506
+ ```
507
+
508
+ <!--
509
+ ## Glossary
510
+
511
+ *Clearly define terms in order to be accessible across audiences.*
512
+ -->
513
+
514
+ <!--
515
+ ## Model Card Authors
516
+
517
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
518
+ -->
519
+
520
+ <!--
521
+ ## Model Card Contact
522
+
523
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
524
+ -->
config.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "XLMRobertaModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "bos_token_id": 0,
7
+ "classifier_dropout": null,
8
+ "eos_token_id": 2,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 1024,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 4096,
14
+ "layer_norm_eps": 1e-05,
15
+ "max_position_embeddings": 514,
16
+ "model_type": "xlm-roberta",
17
+ "num_attention_heads": 16,
18
+ "num_hidden_layers": 24,
19
+ "output_past": true,
20
+ "pad_token_id": 1,
21
+ "position_embedding_type": "absolute",
22
+ "torch_dtype": "float32",
23
+ "transformers_version": "4.53.1",
24
+ "type_vocab_size": 1,
25
+ "use_cache": true,
26
+ "vocab_size": 250002
27
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "SentenceTransformer",
3
+ "__version__": {
4
+ "sentence_transformers": "5.0.0",
5
+ "transformers": "4.53.1",
6
+ "pytorch": "2.7.1+cu126"
7
+ },
8
+ "prompts": {
9
+ "query": "",
10
+ "document": ""
11
+ },
12
+ "default_prompt_name": null,
13
+ "similarity_fn_name": "cosine"
14
+ }
eval/translation_evaluation_eval-en-sa_results.csv ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ epoch,steps,src2trg,trg2src
2
+ 1.0,64472,0.001,0.001
3
+ 2.0,128944,0.0,0.0
4
+ 3.0,193416,0.002,0.001
5
+ 4.0,257888,0.0,0.002
6
+ 5.0,322360,0.002,0.003
7
+ 6.0,386832,0.0,0.001
8
+ 7.0,451304,0.001,0.001
9
+ 8.0,515776,0.0,0.0
10
+ 9.0,580248,0.001,0.0
11
+ 10.0,644720,0.001,0.0
12
+ 11.0,709192,0.002,0.002
13
+ 12.0,773664,0.001,0.0
14
+ 13.0,838136,0.0,0.0
15
+ 14.0,902608,0.001,0.004
16
+ 15.0,967080,0.001,0.001
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ae5042748107120afb38b17df9f52c3651ae1854fad9bf2a50ba4b1ce4ac82bb
3
+ size 2239607176
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": true,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:883b037111086fd4dfebbbc9b7cee11e1517b5e0c0514879478661440f137085
3
+ size 17082987
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": true,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "extra_special_tokens": {},
49
+ "mask_token": "<mask>",
50
+ "model_max_length": 512,
51
+ "pad_token": "<pad>",
52
+ "sep_token": "</s>",
53
+ "tokenizer_class": "XLMRobertaTokenizer",
54
+ "unk_token": "<unk>"
55
+ }