meandyou200175 commited on
Commit
f465c13
·
verified ·
1 Parent(s): f5ba522

Add new SentenceTransformer model

Browse files
Files changed (2) hide show
  1. README.md +81 -80
  2. model.safetensors +1 -1
README.md CHANGED
@@ -4,7 +4,7 @@ tags:
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
- - dataset_size:9316
8
  - loss:MultipleNegativesRankingLoss
9
  base_model: intfloat/multilingual-e5-large
10
  widget:
@@ -76,76 +76,76 @@ model-index:
76
  type: unknown
77
  metrics:
78
  - type: cosine_accuracy@1
79
- value: 0.8108108108108109
80
  name: Cosine Accuracy@1
81
  - type: cosine_accuracy@2
82
- value: 0.8957528957528957
83
  name: Cosine Accuracy@2
84
  - type: cosine_accuracy@5
85
- value: 0.9382239382239382
86
  name: Cosine Accuracy@5
87
  - type: cosine_accuracy@10
88
- value: 0.9642857142857143
89
  name: Cosine Accuracy@10
90
  - type: cosine_accuracy@100
91
- value: 0.9932432432432432
92
  name: Cosine Accuracy@100
93
  - type: cosine_precision@1
94
- value: 0.8108108108108109
95
  name: Cosine Precision@1
96
  - type: cosine_precision@2
97
- value: 0.44787644787644787
98
  name: Cosine Precision@2
99
  - type: cosine_precision@5
100
- value: 0.18764478764478765
101
  name: Cosine Precision@5
102
  - type: cosine_precision@10
103
- value: 0.09642857142857143
104
  name: Cosine Precision@10
105
  - type: cosine_precision@100
106
- value: 0.009932432432432433
107
  name: Cosine Precision@100
108
  - type: cosine_recall@1
109
- value: 0.8108108108108109
110
  name: Cosine Recall@1
111
  - type: cosine_recall@2
112
- value: 0.8957528957528957
113
  name: Cosine Recall@2
114
  - type: cosine_recall@5
115
- value: 0.9382239382239382
116
  name: Cosine Recall@5
117
  - type: cosine_recall@10
118
- value: 0.9642857142857143
119
  name: Cosine Recall@10
120
  - type: cosine_recall@100
121
- value: 0.9932432432432432
122
  name: Cosine Recall@100
123
  - type: cosine_ndcg@10
124
- value: 0.8923095558988695
125
  name: Cosine Ndcg@10
126
  - type: cosine_mrr@1
127
- value: 0.8108108108108109
128
  name: Cosine Mrr@1
129
  - type: cosine_mrr@2
130
- value: 0.8532818532818532
131
  name: Cosine Mrr@2
132
  - type: cosine_mrr@5
133
- value: 0.8649292149292154
134
  name: Cosine Mrr@5
135
  - type: cosine_mrr@10
136
- value: 0.8687695348409635
137
  name: Cosine Mrr@10
138
  - type: cosine_mrr@100
139
- value: 0.8700193430588538
140
  name: Cosine Mrr@100
141
  - type: cosine_map@100
142
- value: 0.8700193430588539
143
  name: Cosine Map@100
144
  ---
145
 
146
  # SentenceTransformer based on intfloat/multilingual-e5-large
147
 
148
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large) on the [word_embedding](https://huggingface.co/datasets/meandyou200175/word_embedding) dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
149
 
150
  ## Model Details
151
 
@@ -155,8 +155,7 @@ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [i
155
  - **Maximum Sequence Length:** 512 tokens
156
  - **Output Dimensionality:** 1024 dimensions
157
  - **Similarity Function:** Cosine Similarity
158
- - **Training Dataset:**
159
- - [word_embedding](https://huggingface.co/datasets/meandyou200175/word_embedding)
160
  <!-- - **Language:** Unknown -->
161
  <!-- - **License:** Unknown -->
162
 
@@ -242,28 +241,28 @@ You can finetune this model on your own dataset.
242
 
243
  | Metric | Value |
244
  |:---------------------|:-----------|
245
- | cosine_accuracy@1 | 0.8108 |
246
- | cosine_accuracy@2 | 0.8958 |
247
- | cosine_accuracy@5 | 0.9382 |
248
- | cosine_accuracy@10 | 0.9643 |
249
- | cosine_accuracy@100 | 0.9932 |
250
- | cosine_precision@1 | 0.8108 |
251
- | cosine_precision@2 | 0.4479 |
252
- | cosine_precision@5 | 0.1876 |
253
- | cosine_precision@10 | 0.0964 |
254
- | cosine_precision@100 | 0.0099 |
255
- | cosine_recall@1 | 0.8108 |
256
- | cosine_recall@2 | 0.8958 |
257
- | cosine_recall@5 | 0.9382 |
258
- | cosine_recall@10 | 0.9643 |
259
- | cosine_recall@100 | 0.9932 |
260
- | **cosine_ndcg@10** | **0.8923** |
261
- | cosine_mrr@1 | 0.8108 |
262
- | cosine_mrr@2 | 0.8533 |
263
- | cosine_mrr@5 | 0.8649 |
264
- | cosine_mrr@10 | 0.8688 |
265
- | cosine_mrr@100 | 0.87 |
266
- | cosine_map@100 | 0.87 |
267
 
268
  <!--
269
  ## Bias, Risks and Limitations
@@ -281,10 +280,9 @@ You can finetune this model on your own dataset.
281
 
282
  ### Training Dataset
283
 
284
- #### word_embedding
285
 
286
- * Dataset: [word_embedding](https://huggingface.co/datasets/meandyou200175/word_embedding) at [af76b11](https://huggingface.co/datasets/meandyou200175/word_embedding/tree/af76b11c1d93542ca76e864a60b1744d5e02b099)
287
- * Size: 9,316 training samples
288
  * Columns: <code>query</code> and <code>positive</code>
289
  * Approximate statistics based on the first 1000 samples:
290
  | | query | positive |
@@ -467,35 +465,38 @@ You can finetune this model on your own dataset.
467
  | Epoch | Step | Training Loss | Validation Loss | cosine_ndcg@10 |
468
  |:------:|:----:|:-------------:|:---------------:|:--------------:|
469
  | -1 | -1 | - | - | 0.7166 |
470
- | 0.1715 | 100 | 0.8892 | - | - |
471
- | 0.3431 | 200 | 0.1724 | - | - |
472
- | 0.5146 | 300 | 0.1783 | - | - |
473
- | 0.6861 | 400 | 0.1393 | - | - |
474
- | 0.8576 | 500 | 0.1262 | - | - |
475
- | 1.0292 | 600 | 0.1046 | - | - |
476
- | 1.2007 | 700 | 0.0639 | - | - |
477
- | 1.3722 | 800 | 0.0692 | - | - |
478
- | 1.5437 | 900 | 0.043 | - | - |
479
- | 1.7153 | 1000 | 0.0614 | 0.0819 | 0.8774 |
480
- | 1.8868 | 1100 | 0.0538 | - | - |
481
- | 2.0583 | 1200 | 0.0414 | - | - |
482
- | 2.2298 | 1300 | 0.0146 | - | - |
483
- | 2.4014 | 1400 | 0.0164 | - | - |
484
- | 2.5729 | 1500 | 0.0225 | - | - |
485
- | 2.7444 | 1600 | 0.0215 | - | - |
486
- | 2.9160 | 1700 | 0.0271 | - | - |
487
- | 3.0875 | 1800 | 0.0202 | - | - |
488
- | 3.2590 | 1900 | 0.0194 | - | - |
489
- | 3.4305 | 2000 | 0.0144 | 0.0682 | 0.8923 |
490
- | 3.6021 | 2100 | 0.0118 | - | - |
491
- | 3.7736 | 2200 | 0.0155 | - | - |
492
- | 3.9451 | 2300 | 0.0177 | - | - |
493
- | 4.1166 | 2400 | 0.0059 | - | - |
494
- | 4.2882 | 2500 | 0.0099 | - | - |
495
- | 4.4597 | 2600 | 0.0056 | - | - |
496
- | 4.6312 | 2700 | 0.0153 | - | - |
497
- | 4.8027 | 2800 | 0.0069 | - | - |
498
- | 4.9743 | 2900 | 0.01 | - | - |
 
 
 
499
 
500
 
501
  ### Framework Versions
 
4
  - sentence-similarity
5
  - feature-extraction
6
  - generated_from_trainer
7
+ - dataset_size:10356
8
  - loss:MultipleNegativesRankingLoss
9
  base_model: intfloat/multilingual-e5-large
10
  widget:
 
76
  type: unknown
77
  metrics:
78
  - type: cosine_accuracy@1
79
+ value: 0.9073359073359073
80
  name: Cosine Accuracy@1
81
  - type: cosine_accuracy@2
82
+ value: 0.9739382239382239
83
  name: Cosine Accuracy@2
84
  - type: cosine_accuracy@5
85
+ value: 0.9942084942084942
86
  name: Cosine Accuracy@5
87
  - type: cosine_accuracy@10
88
+ value: 0.999034749034749
89
  name: Cosine Accuracy@10
90
  - type: cosine_accuracy@100
91
+ value: 1.0
92
  name: Cosine Accuracy@100
93
  - type: cosine_precision@1
94
+ value: 0.9073359073359073
95
  name: Cosine Precision@1
96
  - type: cosine_precision@2
97
+ value: 0.48696911196911197
98
  name: Cosine Precision@2
99
  - type: cosine_precision@5
100
+ value: 0.19884169884169883
101
  name: Cosine Precision@5
102
  - type: cosine_precision@10
103
+ value: 0.0999034749034749
104
  name: Cosine Precision@10
105
  - type: cosine_precision@100
106
+ value: 0.010000000000000002
107
  name: Cosine Precision@100
108
  - type: cosine_recall@1
109
+ value: 0.9073359073359073
110
  name: Cosine Recall@1
111
  - type: cosine_recall@2
112
+ value: 0.9739382239382239
113
  name: Cosine Recall@2
114
  - type: cosine_recall@5
115
+ value: 0.9942084942084942
116
  name: Cosine Recall@5
117
  - type: cosine_recall@10
118
+ value: 0.999034749034749
119
  name: Cosine Recall@10
120
  - type: cosine_recall@100
121
+ value: 1.0
122
  name: Cosine Recall@100
123
  - type: cosine_ndcg@10
124
+ value: 0.9601842774877813
125
  name: Cosine Ndcg@10
126
  - type: cosine_mrr@1
127
+ value: 0.9073359073359073
128
  name: Cosine Mrr@1
129
  - type: cosine_mrr@2
130
+ value: 0.9406370656370656
131
  name: Cosine Mrr@2
132
  - type: cosine_mrr@5
133
+ value: 0.9462837837837839
134
  name: Cosine Mrr@5
135
  - type: cosine_mrr@10
136
+ value: 0.946988570202856
137
  name: Cosine Mrr@10
138
  - type: cosine_mrr@100
139
+ value: 0.9470763202906061
140
  name: Cosine Mrr@100
141
  - type: cosine_map@100
142
+ value: 0.9470763202906061
143
  name: Cosine Map@100
144
  ---
145
 
146
  # SentenceTransformer based on intfloat/multilingual-e5-large
147
 
148
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-large](https://huggingface.co/intfloat/multilingual-e5-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
149
 
150
  ## Model Details
151
 
 
155
  - **Maximum Sequence Length:** 512 tokens
156
  - **Output Dimensionality:** 1024 dimensions
157
  - **Similarity Function:** Cosine Similarity
158
+ <!-- - **Training Dataset:** Unknown -->
 
159
  <!-- - **Language:** Unknown -->
160
  <!-- - **License:** Unknown -->
161
 
 
241
 
242
  | Metric | Value |
243
  |:---------------------|:-----------|
244
+ | cosine_accuracy@1 | 0.9073 |
245
+ | cosine_accuracy@2 | 0.9739 |
246
+ | cosine_accuracy@5 | 0.9942 |
247
+ | cosine_accuracy@10 | 0.999 |
248
+ | cosine_accuracy@100 | 1.0 |
249
+ | cosine_precision@1 | 0.9073 |
250
+ | cosine_precision@2 | 0.487 |
251
+ | cosine_precision@5 | 0.1988 |
252
+ | cosine_precision@10 | 0.0999 |
253
+ | cosine_precision@100 | 0.01 |
254
+ | cosine_recall@1 | 0.9073 |
255
+ | cosine_recall@2 | 0.9739 |
256
+ | cosine_recall@5 | 0.9942 |
257
+ | cosine_recall@10 | 0.999 |
258
+ | cosine_recall@100 | 1.0 |
259
+ | **cosine_ndcg@10** | **0.9602** |
260
+ | cosine_mrr@1 | 0.9073 |
261
+ | cosine_mrr@2 | 0.9406 |
262
+ | cosine_mrr@5 | 0.9463 |
263
+ | cosine_mrr@10 | 0.947 |
264
+ | cosine_mrr@100 | 0.9471 |
265
+ | cosine_map@100 | 0.9471 |
266
 
267
  <!--
268
  ## Bias, Risks and Limitations
 
280
 
281
  ### Training Dataset
282
 
283
+ #### Unnamed Dataset
284
 
285
+ * Size: 10,356 training samples
 
286
  * Columns: <code>query</code> and <code>positive</code>
287
  * Approximate statistics based on the first 1000 samples:
288
  | | query | positive |
 
465
  | Epoch | Step | Training Loss | Validation Loss | cosine_ndcg@10 |
466
  |:------:|:----:|:-------------:|:---------------:|:--------------:|
467
  | -1 | -1 | - | - | 0.7166 |
468
+ | 0.1543 | 100 | 0.9191 | - | - |
469
+ | 0.3086 | 200 | 0.1876 | - | - |
470
+ | 0.4630 | 300 | 0.1547 | - | - |
471
+ | 0.6173 | 400 | 0.1556 | - | - |
472
+ | 0.7716 | 500 | 0.179 | - | - |
473
+ | 0.9259 | 600 | 0.1234 | - | - |
474
+ | 1.0802 | 700 | 0.087 | - | - |
475
+ | 1.2346 | 800 | 0.0576 | - | - |
476
+ | 1.3889 | 900 | 0.0564 | - | - |
477
+ | 1.5432 | 1000 | 0.0583 | 0.0271 | 0.9198 |
478
+ | 1.6975 | 1100 | 0.0764 | - | - |
479
+ | 1.8519 | 1200 | 0.0493 | - | - |
480
+ | 2.0062 | 1300 | 0.0481 | - | - |
481
+ | 2.1605 | 1400 | 0.0222 | - | - |
482
+ | 2.3148 | 1500 | 0.0234 | - | - |
483
+ | 2.4691 | 1600 | 0.0283 | - | - |
484
+ | 2.6235 | 1700 | 0.0236 | - | - |
485
+ | 2.7778 | 1800 | 0.026 | - | - |
486
+ | 2.9321 | 1900 | 0.0217 | - | - |
487
+ | 3.0864 | 2000 | 0.0193 | 0.0061 | 0.9534 |
488
+ | 3.2407 | 2100 | 0.0135 | - | - |
489
+ | 3.3951 | 2200 | 0.0162 | - | - |
490
+ | 3.5494 | 2300 | 0.0109 | - | - |
491
+ | 3.7037 | 2400 | 0.0107 | - | - |
492
+ | 3.8580 | 2500 | 0.0105 | - | - |
493
+ | 4.0123 | 2600 | 0.0095 | - | - |
494
+ | 4.1667 | 2700 | 0.0146 | - | - |
495
+ | 4.3210 | 2800 | 0.0102 | - | - |
496
+ | 4.4753 | 2900 | 0.0108 | - | - |
497
+ | 4.6296 | 3000 | 0.01 | 0.0061 | 0.9602 |
498
+ | 4.7840 | 3100 | 0.008 | - | - |
499
+ | 4.9383 | 3200 | 0.0117 | - | - |
500
 
501
 
502
  ### Framework Versions
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:24854dafae65ef24432ea85eb056628daff94e85bcb8d60730361305c68126c1
3
  size 2239607176
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8a156fd71ddd697aac4e8c39e7714f28d7852c7d782f6b21c821d0f433937f07
3
  size 2239607176