tomaarsen HF staff commited on
Commit
354ceec
·
verified ·
1 Parent(s): 0e81f4e

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,583 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:1000000
8
+ - loss:DenoisingAutoEncoderLoss
9
+ base_model: google-bert/bert-base-uncased
10
+ widget:
11
+ - source_sentence: He wound up homeless in the Mission District, playing for change
12
+ in the streets.
13
+ sentences:
14
+ - He wound up homeless, playing in streets
15
+ - It line-up of professional footballers,, firefighters and survivors.
16
+ - A (Dakota) belonging to the Dutch Air crash-landed near Beswick (Beswick Creek
17
+ now Barunga?
18
+ - source_sentence: The division remained near Arkhangelsk until the beginning of August,
19
+ when it was shipped across the White Sea to Murmansk.
20
+ sentences:
21
+ - The division remained near Arkhangelsk until the beginning of August, when it
22
+ was shipped across White Sea to Murmansk.
23
+ - The building is and.
24
+ - Maxim Triesman born October) is politician banker trade union leader.
25
+ - source_sentence: '"Leper," the last song on the album, was left as an instrumental
26
+ as Jourgensen had left the studio earlier than scheduled and did not care to write
27
+ any lyrics.'
28
+ sentences:
29
+ - There produced the viral host cells processes, more suitable environment for viral
30
+ replication transcription.
31
+ - As a the to
32
+ - Leper, the song on the album was left as an as Jourgensen had left the studio
33
+ scheduled and did care to any lyrics
34
+ - source_sentence: Prince and princess have given Gerda her their golden coach so
35
+ she can continue her search for Kay.
36
+ sentences:
37
+ - and princess given Gerda their golden coach so she can her search for Kay.
38
+ - handled the cinematography
39
+ - University Hoekstra was Professor of and Department of Multidisciplinary Water.
40
+ - source_sentence: While the early models stayed close to their original form, eight
41
+ subsequent generations varied substantially in size and styling.
42
+ sentences:
43
+ - While the stayed close their, eight generations varied substantially in size and
44
+ - Their influence, his's own tradition, his special organization all combined to
45
+ divert the young into a political career
46
+ - “ U ” cross of the river are a recent
47
+ datasets:
48
+ - princeton-nlp/datasets-for-simcse
49
+ pipeline_tag: sentence-similarity
50
+ library_name: sentence-transformers
51
+ metrics:
52
+ - pearson_cosine
53
+ - spearman_cosine
54
+ co2_eq_emissions:
55
+ emissions: 556.5173349579181
56
+ energy_consumed: 1.4317326253991955
57
+ source: codecarbon
58
+ training_type: fine-tuning
59
+ on_cloud: false
60
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
61
+ ram_total_size: 31.777088165283203
62
+ hours_used: 4.403
63
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
64
+ model-index:
65
+ - name: SentenceTransformer based on google-bert/bert-base-uncased
66
+ results:
67
+ - task:
68
+ type: semantic-similarity
69
+ name: Semantic Similarity
70
+ dataset:
71
+ name: sts dev
72
+ type: sts-dev
73
+ metrics:
74
+ - type: pearson_cosine
75
+ value: 0.6732163313155011
76
+ name: Pearson Cosine
77
+ - type: spearman_cosine
78
+ value: 0.6765812652563955
79
+ name: Spearman Cosine
80
+ - task:
81
+ type: semantic-similarity
82
+ name: Semantic Similarity
83
+ dataset:
84
+ name: sts test
85
+ type: sts-test
86
+ metrics:
87
+ - type: pearson_cosine
88
+ value: 0.6424591318281525
89
+ name: Pearson Cosine
90
+ - type: spearman_cosine
91
+ value: 0.6322331484751982
92
+ name: Spearman Cosine
93
+ ---
94
+
95
+ # SentenceTransformer based on google-bert/bert-base-uncased
96
+
97
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) on the [datasets-for-simcse](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
98
+
99
+ ## Model Details
100
+
101
+ ### Model Description
102
+ - **Model Type:** Sentence Transformer
103
+ - **Base model:** [google-bert/bert-base-uncased](https://huggingface.co/google-bert/bert-base-uncased) <!-- at revision 86b5e0934494bd15c9632b12f734a8a67f723594 -->
104
+ - **Maximum Sequence Length:** 75 tokens
105
+ - **Output Dimensionality:** 768 dimensions
106
+ - **Similarity Function:** Cosine Similarity
107
+ - **Training Dataset:**
108
+ - [datasets-for-simcse](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse)
109
+ <!-- - **Language:** Unknown -->
110
+ <!-- - **License:** Unknown -->
111
+
112
+ ### Model Sources
113
+
114
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
115
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
116
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
117
+
118
+ ### Full Model Architecture
119
+
120
+ ```
121
+ SentenceTransformer(
122
+ (0): Transformer({'max_seq_length': 75, 'do_lower_case': False}) with Transformer model: BertModel
123
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
124
+ )
125
+ ```
126
+
127
+ ## Usage
128
+
129
+ ### Direct Usage (Sentence Transformers)
130
+
131
+ First install the Sentence Transformers library:
132
+
133
+ ```bash
134
+ pip install -U sentence-transformers
135
+ ```
136
+
137
+ Then you can load this model and run inference.
138
+ ```python
139
+ from sentence_transformers import SentenceTransformer
140
+
141
+ # Download from the 🤗 Hub
142
+ model = SentenceTransformer("tomaarsen/bert-base-uncased-stsb-tsdae")
143
+ # Run inference
144
+ sentences = [
145
+ 'While the early models stayed close to their original form, eight subsequent generations varied substantially in size and styling.',
146
+ 'While the stayed close their, eight generations varied substantially in size and',
147
+ '“ U ” cross of the river are a recent',
148
+ ]
149
+ embeddings = model.encode(sentences)
150
+ print(embeddings.shape)
151
+ # [3, 768]
152
+
153
+ # Get the similarity scores for the embeddings
154
+ similarities = model.similarity(embeddings, embeddings)
155
+ print(similarities.shape)
156
+ # [3, 3]
157
+ ```
158
+
159
+ <!--
160
+ ### Direct Usage (Transformers)
161
+
162
+ <details><summary>Click to see the direct usage in Transformers</summary>
163
+
164
+ </details>
165
+ -->
166
+
167
+ <!--
168
+ ### Downstream Usage (Sentence Transformers)
169
+
170
+ You can finetune this model on your own dataset.
171
+
172
+ <details><summary>Click to expand</summary>
173
+
174
+ </details>
175
+ -->
176
+
177
+ <!--
178
+ ### Out-of-Scope Use
179
+
180
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
181
+ -->
182
+
183
+ ## Evaluation
184
+
185
+ ### Metrics
186
+
187
+ #### Semantic Similarity
188
+
189
+ * Datasets: `sts-dev` and `sts-test`
190
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
191
+
192
+ | Metric | sts-dev | sts-test |
193
+ |:--------------------|:-----------|:-----------|
194
+ | pearson_cosine | 0.6732 | 0.6425 |
195
+ | **spearman_cosine** | **0.6766** | **0.6322** |
196
+
197
+ <!--
198
+ ## Bias, Risks and Limitations
199
+
200
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
201
+ -->
202
+
203
+ <!--
204
+ ### Recommendations
205
+
206
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
207
+ -->
208
+
209
+ ## Training Details
210
+
211
+ ### Training Dataset
212
+
213
+ #### datasets-for-simcse
214
+
215
+ * Dataset: [datasets-for-simcse](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse) at [e145e8b](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/tree/e145e8bb659b2aa2669f32ef79cb4cdef6c58fef)
216
+ * Size: 1,000,000 training samples
217
+ * Columns: <code>text</code> and <code>noisy</code>
218
+ * Approximate statistics based on the first 1000 samples:
219
+ | | text | noisy |
220
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
221
+ | type | string | string |
222
+ | details | <ul><li>min: 3 tokens</li><li>mean: 27.96 tokens</li><li>max: 75 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 17.68 tokens</li><li>max: 75 tokens</li></ul> |
223
+ * Samples:
224
+ | text | noisy |
225
+ |:---------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------|
226
+ | <code>White was born in Iver, England.</code> | <code>White was born in Iver,</code> |
227
+ | <code>The common mangrove plants are "Rhizophora mucronata", "Sonneratia caseolaris", "Avicennia" spp., and "Aegiceras corniculatum".</code> | <code>plants are Rhizophora mucronata" "Sonneratia, spp.,".</code> |
228
+ | <code>H3K9ac and H3K14ac have been shown to be part of the active promoter state.</code> | <code>H3K9ac been part of active promoter state.</code> |
229
+ * Loss: [<code>DenoisingAutoEncoderLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
230
+
231
+ ### Evaluation Dataset
232
+
233
+ #### datasets-for-simcse
234
+
235
+ * Dataset: [datasets-for-simcse](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse) at [e145e8b](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse/tree/e145e8bb659b2aa2669f32ef79cb4cdef6c58fef)
236
+ * Size: 1,000,000 evaluation samples
237
+ * Columns: <code>text</code> and <code>noisy</code>
238
+ * Approximate statistics based on the first 1000 samples:
239
+ | | text | noisy |
240
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
241
+ | type | string | string |
242
+ | details | <ul><li>min: 3 tokens</li><li>mean: 28.12 tokens</li><li>max: 75 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 17.79 tokens</li><li>max: 66 tokens</li></ul> |
243
+ * Samples:
244
+ | text | noisy |
245
+ |:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
246
+ | <code>Philippe Hervé (born 16 April 1959) is a French water polo player.</code> | <code>Philippe Hervé born April 1959 is French</code> |
247
+ | <code>lies at the very edge of Scottish offshore waters, close to the maritime boundary with Norway.</code> | <code>the edge Scottish offshore waters close to maritime boundary with Norway</code> |
248
+ | <code>The place is an exceptional example of the forced migration of convicts (Vinegar Hill rebels) and the development associated with punishment and reform, particularly convict labour and the associated coal mines.</code> | <code>The is an example of forced migration of convicts (Vinegar rebels and the development punishment and reform, particularly convict and the associated coal.</code> |
249
+ * Loss: [<code>DenoisingAutoEncoderLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#denoisingautoencoderloss)
250
+
251
+ ### Training Hyperparameters
252
+ #### Non-Default Hyperparameters
253
+
254
+ - `eval_strategy`: steps
255
+ - `learning_rate`: 3e-05
256
+ - `num_train_epochs`: 1
257
+ - `warmup_ratio`: 0.1
258
+ - `fp16`: True
259
+
260
+ #### All Hyperparameters
261
+ <details><summary>Click to expand</summary>
262
+
263
+ - `overwrite_output_dir`: False
264
+ - `do_predict`: False
265
+ - `eval_strategy`: steps
266
+ - `prediction_loss_only`: True
267
+ - `per_device_train_batch_size`: 8
268
+ - `per_device_eval_batch_size`: 8
269
+ - `per_gpu_train_batch_size`: None
270
+ - `per_gpu_eval_batch_size`: None
271
+ - `gradient_accumulation_steps`: 1
272
+ - `eval_accumulation_steps`: None
273
+ - `torch_empty_cache_steps`: None
274
+ - `learning_rate`: 3e-05
275
+ - `weight_decay`: 0.0
276
+ - `adam_beta1`: 0.9
277
+ - `adam_beta2`: 0.999
278
+ - `adam_epsilon`: 1e-08
279
+ - `max_grad_norm`: 1.0
280
+ - `num_train_epochs`: 1
281
+ - `max_steps`: -1
282
+ - `lr_scheduler_type`: linear
283
+ - `lr_scheduler_kwargs`: {}
284
+ - `warmup_ratio`: 0.1
285
+ - `warmup_steps`: 0
286
+ - `log_level`: passive
287
+ - `log_level_replica`: warning
288
+ - `log_on_each_node`: True
289
+ - `logging_nan_inf_filter`: True
290
+ - `save_safetensors`: True
291
+ - `save_on_each_node`: False
292
+ - `save_only_model`: False
293
+ - `restore_callback_states_from_checkpoint`: False
294
+ - `no_cuda`: False
295
+ - `use_cpu`: False
296
+ - `use_mps_device`: False
297
+ - `seed`: 42
298
+ - `data_seed`: None
299
+ - `jit_mode_eval`: False
300
+ - `use_ipex`: False
301
+ - `bf16`: False
302
+ - `fp16`: True
303
+ - `fp16_opt_level`: O1
304
+ - `half_precision_backend`: auto
305
+ - `bf16_full_eval`: False
306
+ - `fp16_full_eval`: False
307
+ - `tf32`: None
308
+ - `local_rank`: 0
309
+ - `ddp_backend`: None
310
+ - `tpu_num_cores`: None
311
+ - `tpu_metrics_debug`: False
312
+ - `debug`: []
313
+ - `dataloader_drop_last`: False
314
+ - `dataloader_num_workers`: 0
315
+ - `dataloader_prefetch_factor`: None
316
+ - `past_index`: -1
317
+ - `disable_tqdm`: False
318
+ - `remove_unused_columns`: True
319
+ - `label_names`: None
320
+ - `load_best_model_at_end`: False
321
+ - `ignore_data_skip`: False
322
+ - `fsdp`: []
323
+ - `fsdp_min_num_params`: 0
324
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
325
+ - `fsdp_transformer_layer_cls_to_wrap`: None
326
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
327
+ - `deepspeed`: None
328
+ - `label_smoothing_factor`: 0.0
329
+ - `optim`: adamw_torch
330
+ - `optim_args`: None
331
+ - `adafactor`: False
332
+ - `group_by_length`: False
333
+ - `length_column_name`: length
334
+ - `ddp_find_unused_parameters`: None
335
+ - `ddp_bucket_cap_mb`: None
336
+ - `ddp_broadcast_buffers`: False
337
+ - `dataloader_pin_memory`: True
338
+ - `dataloader_persistent_workers`: False
339
+ - `skip_memory_metrics`: True
340
+ - `use_legacy_prediction_loop`: False
341
+ - `push_to_hub`: False
342
+ - `resume_from_checkpoint`: None
343
+ - `hub_model_id`: None
344
+ - `hub_strategy`: every_save
345
+ - `hub_private_repo`: None
346
+ - `hub_always_push`: False
347
+ - `gradient_checkpointing`: False
348
+ - `gradient_checkpointing_kwargs`: None
349
+ - `include_inputs_for_metrics`: False
350
+ - `include_for_metrics`: []
351
+ - `eval_do_concat_batches`: True
352
+ - `fp16_backend`: auto
353
+ - `push_to_hub_model_id`: None
354
+ - `push_to_hub_organization`: None
355
+ - `mp_parameters`:
356
+ - `auto_find_batch_size`: False
357
+ - `full_determinism`: False
358
+ - `torchdynamo`: None
359
+ - `ray_scope`: last
360
+ - `ddp_timeout`: 1800
361
+ - `torch_compile`: False
362
+ - `torch_compile_backend`: None
363
+ - `torch_compile_mode`: None
364
+ - `dispatch_batches`: None
365
+ - `split_batches`: None
366
+ - `include_tokens_per_second`: False
367
+ - `include_num_input_tokens_seen`: False
368
+ - `neftune_noise_alpha`: None
369
+ - `optim_target_modules`: None
370
+ - `batch_eval_metrics`: False
371
+ - `eval_on_start`: False
372
+ - `use_liger_kernel`: False
373
+ - `eval_use_gather_object`: False
374
+ - `average_tokens_across_devices`: False
375
+ - `prompts`: None
376
+ - `batch_sampler`: batch_sampler
377
+ - `multi_dataset_batch_sampler`: proportional
378
+
379
+ </details>
380
+
381
+ ### Training Logs
382
+ <details><summary>Click to expand</summary>
383
+
384
+ | Epoch | Step | Training Loss | Validation Loss | sts-dev_spearman_cosine | sts-test_spearman_cosine |
385
+ |:------:|:------:|:-------------:|:---------------:|:-----------------------:|:------------------------:|
386
+ | -1 | -1 | - | - | 0.3173 | - |
387
+ | 0.0081 | 1000 | 7.5472 | - | - | - |
388
+ | 0.0162 | 2000 | 6.0196 | - | - | - |
389
+ | 0.0242 | 3000 | 5.4872 | - | - | - |
390
+ | 0.0323 | 4000 | 5.1452 | - | - | - |
391
+ | 0.0404 | 5000 | 4.8099 | - | - | - |
392
+ | 0.0485 | 6000 | 4.5211 | - | - | - |
393
+ | 0.0566 | 7000 | 4.2967 | - | - | - |
394
+ | 0.0646 | 8000 | 4.1411 | - | - | - |
395
+ | 0.0727 | 9000 | 4.031 | - | - | - |
396
+ | 0.0808 | 10000 | 3.9636 | 3.8297 | 0.7237 | - |
397
+ | 0.0889 | 11000 | 3.9046 | - | - | - |
398
+ | 0.0970 | 12000 | 3.8138 | - | - | - |
399
+ | 0.1051 | 13000 | 3.7859 | - | - | - |
400
+ | 0.1131 | 14000 | 3.7237 | - | - | - |
401
+ | 0.1212 | 15000 | 3.6881 | - | - | - |
402
+ | 0.1293 | 16000 | 3.6133 | - | - | - |
403
+ | 0.1374 | 17000 | 3.5777 | - | - | - |
404
+ | 0.1455 | 18000 | 3.5285 | - | - | - |
405
+ | 0.1535 | 19000 | 3.4974 | - | - | - |
406
+ | 0.1616 | 20000 | 3.4421 | 3.3523 | 0.6978 | - |
407
+ | 0.1697 | 21000 | 3.416 | - | - | - |
408
+ | 0.1778 | 22000 | 3.4143 | - | - | - |
409
+ | 0.1859 | 23000 | 3.3661 | - | - | - |
410
+ | 0.1939 | 24000 | 3.3408 | - | - | - |
411
+ | 0.2020 | 25000 | 3.3079 | - | - | - |
412
+ | 0.2101 | 26000 | 3.2873 | - | - | - |
413
+ | 0.2182 | 27000 | 3.2639 | - | - | - |
414
+ | 0.2263 | 28000 | 3.2323 | - | - | - |
415
+ | 0.2343 | 29000 | 3.2416 | - | - | - |
416
+ | 0.2424 | 30000 | 3.2117 | 3.1015 | 0.6895 | - |
417
+ | 0.2505 | 31000 | 3.1868 | - | - | - |
418
+ | 0.2586 | 32000 | 3.1576 | - | - | - |
419
+ | 0.2667 | 33000 | 3.1619 | - | - | - |
420
+ | 0.2747 | 34000 | 3.1445 | - | - | - |
421
+ | 0.2828 | 35000 | 3.1387 | - | - | - |
422
+ | 0.2909 | 36000 | 3.1159 | - | - | - |
423
+ | 0.2990 | 37000 | 3.09 | - | - | - |
424
+ | 0.3071 | 38000 | 3.0771 | - | - | - |
425
+ | 0.3152 | 39000 | 3.065 | - | - | - |
426
+ | 0.3232 | 40000 | 3.0589 | 2.9535 | 0.6885 | - |
427
+ | 0.3313 | 41000 | 3.0539 | - | - | - |
428
+ | 0.3394 | 42000 | 3.0211 | - | - | - |
429
+ | 0.3475 | 43000 | 3.0158 | - | - | - |
430
+ | 0.3556 | 44000 | 3.0172 | - | - | - |
431
+ | 0.3636 | 45000 | 2.9912 | - | - | - |
432
+ | 0.3717 | 46000 | 2.9776 | - | - | - |
433
+ | 0.3798 | 47000 | 2.9539 | - | - | - |
434
+ | 0.3879 | 48000 | 2.9753 | - | - | - |
435
+ | 0.3960 | 49000 | 2.9467 | - | - | - |
436
+ | 0.4040 | 50000 | 2.9429 | 2.8288 | 0.6830 | - |
437
+ | 0.4121 | 51000 | 2.9243 | - | - | - |
438
+ | 0.4202 | 52000 | 2.9273 | - | - | - |
439
+ | 0.4283 | 53000 | 2.9118 | - | - | - |
440
+ | 0.4364 | 54000 | 2.9068 | - | - | - |
441
+ | 0.4444 | 55000 | 2.8961 | - | - | - |
442
+ | 0.4525 | 56000 | 2.8621 | - | - | - |
443
+ | 0.4606 | 57000 | 2.8825 | - | - | - |
444
+ | 0.4687 | 58000 | 2.8466 | - | - | - |
445
+ | 0.4768 | 59000 | 2.868 | - | - | - |
446
+ | 0.4848 | 60000 | 2.8372 | 2.7335 | 0.6871 | - |
447
+ | 0.4929 | 61000 | 2.8322 | - | - | - |
448
+ | 0.5010 | 62000 | 2.8239 | - | - | - |
449
+ | 0.5091 | 63000 | 2.8148 | - | - | - |
450
+ | 0.5172 | 64000 | 2.8137 | - | - | - |
451
+ | 0.5253 | 65000 | 2.8043 | - | - | - |
452
+ | 0.5333 | 66000 | 2.7973 | - | - | - |
453
+ | 0.5414 | 67000 | 2.7739 | - | - | - |
454
+ | 0.5495 | 68000 | 2.7694 | - | - | - |
455
+ | 0.5576 | 69000 | 2.755 | - | - | - |
456
+ | 0.5657 | 70000 | 2.7846 | 2.6422 | 0.6773 | - |
457
+ | 0.5737 | 71000 | 2.7246 | - | - | - |
458
+ | 0.5818 | 72000 | 2.7438 | - | - | - |
459
+ | 0.5899 | 73000 | 2.7314 | - | - | - |
460
+ | 0.5980 | 74000 | 2.7213 | - | - | - |
461
+ | 0.6061 | 75000 | 2.7402 | - | - | - |
462
+ | 0.6141 | 76000 | 2.6955 | - | - | - |
463
+ | 0.6222 | 77000 | 2.7131 | - | - | - |
464
+ | 0.6303 | 78000 | 2.6951 | - | - | - |
465
+ | 0.6384 | 79000 | 2.6812 | - | - | - |
466
+ | 0.6465 | 80000 | 2.6844 | 2.5743 | 0.6827 | - |
467
+ | 0.6545 | 81000 | 2.665 | - | - | - |
468
+ | 0.6626 | 82000 | 2.6528 | - | - | - |
469
+ | 0.6707 | 83000 | 2.6819 | - | - | - |
470
+ | 0.6788 | 84000 | 2.6529 | - | - | - |
471
+ | 0.6869 | 85000 | 2.6665 | - | - | - |
472
+ | 0.6949 | 86000 | 2.6554 | - | - | - |
473
+ | 0.7030 | 87000 | 2.6299 | - | - | - |
474
+ | 0.7111 | 88000 | 2.659 | - | - | - |
475
+ | 0.7192 | 89000 | 2.632 | - | - | - |
476
+ | 0.7273 | 90000 | 2.6209 | 2.5051 | 0.6782 | - |
477
+ | 0.7354 | 91000 | 2.6023 | - | - | - |
478
+ | 0.7434 | 92000 | 2.6226 | - | - | - |
479
+ | 0.7515 | 93000 | 2.6057 | - | - | - |
480
+ | 0.7596 | 94000 | 2.601 | - | - | - |
481
+ | 0.7677 | 95000 | 2.5888 | - | - | - |
482
+ | 0.7758 | 96000 | 2.5811 | - | - | - |
483
+ | 0.7838 | 97000 | 2.565 | - | - | - |
484
+ | 0.7919 | 98000 | 2.5727 | - | - | - |
485
+ | 0.8 | 99000 | 2.5863 | - | - | - |
486
+ | 0.8081 | 100000 | 2.5534 | 2.4526 | 0.6799 | - |
487
+ | 0.8162 | 101000 | 2.5423 | - | - | - |
488
+ | 0.8242 | 102000 | 2.5655 | - | - | - |
489
+ | 0.8323 | 103000 | 2.5394 | - | - | - |
490
+ | 0.8404 | 104000 | 2.5217 | - | - | - |
491
+ | 0.8485 | 105000 | 2.5534 | - | - | - |
492
+ | 0.8566 | 106000 | 2.5264 | - | - | - |
493
+ | 0.8646 | 107000 | 2.5481 | - | - | - |
494
+ | 0.8727 | 108000 | 2.5508 | - | - | - |
495
+ | 0.8808 | 109000 | 2.5302 | - | - | - |
496
+ | 0.8889 | 110000 | 2.5223 | 2.4048 | 0.6771 | - |
497
+ | 0.8970 | 111000 | 2.5274 | - | - | - |
498
+ | 0.9051 | 112000 | 2.515 | - | - | - |
499
+ | 0.9131 | 113000 | 2.5088 | - | - | - |
500
+ | 0.9212 | 114000 | 2.5035 | - | - | - |
501
+ | 0.9293 | 115000 | 2.495 | - | - | - |
502
+ | 0.9374 | 116000 | 2.5066 | - | - | - |
503
+ | 0.9455 | 117000 | 2.4858 | - | - | - |
504
+ | 0.9535 | 118000 | 2.4803 | - | - | - |
505
+ | 0.9616 | 119000 | 2.506 | - | - | - |
506
+ | 0.9697 | 120000 | 2.4906 | 2.3738 | 0.6766 | - |
507
+ | 0.9778 | 121000 | 2.5027 | - | - | - |
508
+ | 0.9859 | 122000 | 2.4858 | - | - | - |
509
+ | 0.9939 | 123000 | 2.4928 | - | - | - |
510
+ | -1 | -1 | - | - | - | 0.6322 |
511
+
512
+ </details>
513
+
514
+ ### Environmental Impact
515
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
516
+ - **Energy Consumed**: 1.432 kWh
517
+ - **Carbon Emitted**: 0.557 kg of CO2
518
+ - **Hours Used**: 4.403 hours
519
+
520
+ ### Training Hardware
521
+ - **On Cloud**: No
522
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
523
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
524
+ - **RAM Size**: 31.78 GB
525
+
526
+ ### Framework Versions
527
+ - Python: 3.11.6
528
+ - Sentence Transformers: 3.4.0.dev0
529
+ - Transformers: 4.48.0.dev0
530
+ - PyTorch: 2.5.0+cu121
531
+ - Accelerate: 0.35.0.dev0
532
+ - Datasets: 2.20.0
533
+ - Tokenizers: 0.21.0
534
+
535
+ ## Citation
536
+
537
+ ### BibTeX
538
+
539
+ #### Sentence Transformers
540
+ ```bibtex
541
+ @inproceedings{reimers-2019-sentence-bert,
542
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
543
+ author = "Reimers, Nils and Gurevych, Iryna",
544
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
545
+ month = "11",
546
+ year = "2019",
547
+ publisher = "Association for Computational Linguistics",
548
+ url = "https://arxiv.org/abs/1908.10084",
549
+ }
550
+ ```
551
+
552
+ #### DenoisingAutoEncoderLoss
553
+ ```bibtex
554
+ @inproceedings{wang-2021-TSDAE,
555
+ title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
556
+ author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
557
+ booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
558
+ month = nov,
559
+ year = "2021",
560
+ address = "Punta Cana, Dominican Republic",
561
+ publisher = "Association for Computational Linguistics",
562
+ pages = "671--688",
563
+ url = "https://arxiv.org/abs/2104.06979",
564
+ }
565
+ ```
566
+
567
+ <!--
568
+ ## Glossary
569
+
570
+ *Clearly define terms in order to be accessible across audiences.*
571
+ -->
572
+
573
+ <!--
574
+ ## Model Card Authors
575
+
576
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
577
+ -->
578
+
579
+ <!--
580
+ ## Model Card Contact
581
+
582
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
583
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "output\\training_stsb_tsdae-bert-base-uncased-8-2025-01-16_15-52-31\\final",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 768,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 3072,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.48.0.dev0",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.4.0.dev0",
4
+ "transformers": "4.48.0.dev0",
5
+ "pytorch": "2.5.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b9d060f2fbdf3859007a534d729e89bc3cb302649e9cd5eaa0cc6164e472d4c1
3
+ size 437951328
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 75,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": false,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 75,
50
+ "model_max_length": 75,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff