rasyosef commited on
Commit
0da044f
·
verified ·
1 Parent(s): f622aaa

Add new SparseEncoder model

Browse files
Files changed (4) hide show
  1. README.md +120 -160
  2. config.json +1 -1
  3. config_sentence_transformers.json +3 -3
  4. model.safetensors +1 -1
README.md CHANGED
@@ -5,42 +5,33 @@ tags:
5
  - sparse
6
  - splade
7
  - generated_from_trainer
8
- - dataset_size:1350000
9
  - loss:SpladeLoss
10
  - loss:SparseMarginMSELoss
11
  - loss:FlopsLoss
12
- base_model:
13
- - prajjwal1/bert-mini
14
  widget:
15
- - text: >-
16
- Coinsurance is a health care cost sharing between you and your insurance
17
- company. The cost sharing ranges from 80/20 to even 50/50. For example, if
18
- your coinsurance is 80/20, that means that your insurer covers 80% of annual
19
- medical expenses and you pay the remaining 20%. The cost sharing stops when
20
- medical expenses reach your out-of-pocket maximum, which usually is between
21
- $1,000 and $5,000.
22
- - text: >-
23
- The Definition of Success. 1 In 1806, the definition of Success in the
24
- Webster dictionary was to be fortunate, happy, kind and prosperous. In 2013
25
- the definition of success is the attainment of wealth, fame and power. 2
26
- The purpose of forming a company is not to obtain substantial wealth.
27
- - text: >-
28
- It wouldn't be completely accurate to say 10 syllables, because in English
29
- Sonnet writing, they are written in iambic pentameter, which is ten
30
- syllables, but it's not just any syllables, they have to be in
31
- rhythm.da-DA-da-DA-da-DA-da-DA-da-DA. And the rhymes are ABAB/CDCD/EFEF/GG
32
- for each stanza.hat makes a sonnet a sonnet is the rhyme scheme and the 10
33
- syllable lines. Check out this site and it may help you:
34
- http://www.elfwood.com/farp/thewriting/2... Sptfyr · 7 years ago. Thumbs up.
35
- - text: >-
36
- Dragon horn. A dragon horn is a sorcerous horn that is used to control
37
- dragons.
38
- - text: >-
39
- Social Sciences. Background research refers to accessing the collection of
40
- previously published and unpublished information about a site, region, or
41
- particular topic of interest and it is the first step of all good
42
- archaeological investigations, as well as that of all writers of any kind of
43
- research paper.
44
  pipeline_tag: feature-extraction
45
  library_name: sentence-transformers
46
  metrics:
@@ -74,89 +65,94 @@ model-index:
74
  type: unknown
75
  metrics:
76
  - type: dot_accuracy@1
77
- value: 0.4976
78
  name: Dot Accuracy@1
79
  - type: dot_accuracy@3
80
- value: 0.8154
81
  name: Dot Accuracy@3
82
  - type: dot_accuracy@5
83
- value: 0.9122
84
  name: Dot Accuracy@5
85
  - type: dot_accuracy@10
86
- value: 0.9684
87
  name: Dot Accuracy@10
88
  - type: dot_precision@1
89
- value: 0.4976
90
  name: Dot Precision@1
91
  - type: dot_precision@3
92
- value: 0.2791333333333333
93
  name: Dot Precision@3
94
  - type: dot_precision@5
95
- value: 0.18991999999999998
96
  name: Dot Precision@5
97
  - type: dot_precision@10
98
- value: 0.10178
99
  name: Dot Precision@10
100
  - type: dot_recall@1
101
- value: 0.4821
102
  name: Dot Recall@1
103
  - type: dot_recall@3
104
- value: 0.80205
105
  name: Dot Recall@3
106
  - type: dot_recall@5
107
- value: 0.9034833333333334
108
  name: Dot Recall@5
109
  - type: dot_recall@10
110
- value: 0.9639
111
  name: Dot Recall@10
112
  - type: dot_ndcg@10
113
- value: 0.739184491374207
114
  name: Dot Ndcg@10
115
  - type: dot_mrr@10
116
- value: 0.6690194444444474
117
  name: Dot Mrr@10
118
  - type: dot_map@100
119
- value: 0.6646610700105045
120
  name: Dot Map@100
121
  - type: query_active_dims
122
- value: 16.810400009155273
123
  name: Query Active Dims
124
  - type: query_sparsity_ratio
125
- value: 0.9994492366159113
126
  name: Query Sparsity Ratio
127
  - type: corpus_active_dims
128
- value: 100.62213478240855
129
  name: Corpus Active Dims
130
  - type: corpus_sparsity_ratio
131
- value: 0.996703291567315
132
  name: Corpus Sparsity Ratio
133
- license: mit
134
- datasets:
135
- - microsoft/ms_marco
136
- language:
137
- - en
138
  ---
139
 
140
- # SPLADE-BERT-Mini-Distil
141
 
142
- This is a SPLADE sparse retrieval model based on BERT-Mini (11M) that was trained by distilling a Cross-Encoder on the MSMARCO dataset. The cross-encoder used was [ms-marco-MiniLM-L6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L6-v2).
 
143
 
144
- This tiny SPLADE model is `6x` smaller than Naver's official `splade-v3-distilbert` while having `85%` of it's performance on the MSMARCO benchmark. This model is small enough to be used without a GPU on a dataset of a few thousand documents.
 
 
 
 
 
 
 
 
145
 
146
- - `Collection:` https://huggingface.co/collections/rasyosef/splade-tiny-msmarco-687c548c0691d95babf65b70
147
- - `Distillation Dataset:` https://huggingface.co/datasets/yosefw/msmarco-train-distil-v2
148
- - `Code:` https://github.com/rasyosef/splade-tiny-msmarco
149
 
150
- ## Performance
 
 
 
151
 
152
- The splade models were evaluated on 55 thousand queries and 8.84 million documents from the [MSMARCO](https://huggingface.co/datasets/microsoft/ms_marco) dataset.
153
 
154
- ||Size (# Params)|MRR@10 (MS MARCO dev)|
155
- |:---|:----|:-------------------|
156
- |`BM25`|-|18.0|-|-|
157
- |`rasyosef/splade-tiny`|4.4M|30.9|
158
- |`rasyosef/splade-mini`|11.2M|33.2|
159
- |`naver/splade-v3-distilbert`|67.0M|38.7|
160
 
161
  ## Usage
162
 
@@ -173,15 +169,15 @@ Then you can load this model and run inference.
173
  from sentence_transformers import SparseEncoder
174
 
175
  # Download from the 🤗 Hub
176
- model = SparseEncoder("rasyosef/splade-mini")
177
  # Run inference
178
  queries = [
179
- "research background definition",
180
  ]
181
  documents = [
182
- 'Social Sciences. Background research refers to accessing the collection of previously published and unpublished information about a site, region, or particular topic of interest and it is the first step of all good archaeological investigations, as well as that of all writers of any kind of research paper.',
183
- 'This Research Paper Background and Problem Definition and other 62,000+ term papers, college essay examples and free essays are available now on ReviewEssays.com. Autor: dharath1 July 22, 2014 • Research Paper • 442 Words (2 Pages) • 448 Views.',
184
- 'About the Month of February. February is the 2nd month of the year and has 28 or 29 days. The 29th day is every 4 years during leap year. Season (Northern Hemisphere): Winter. Holidays. Chinese New Year. National Freedom Day. Groundhog Day.',
185
  ]
186
  query_embeddings = model.encode_query(queries)
187
  document_embeddings = model.encode_document(documents)
@@ -191,7 +187,7 @@ print(query_embeddings.shape, document_embeddings.shape)
191
  # Get the similarity scores for the embeddings
192
  similarities = model.similarity(query_embeddings, document_embeddings)
193
  print(similarities)
194
- # tensor([[22.7011, 11.1635, 0.0000]])
195
  ```
196
 
197
  <!--
@@ -218,37 +214,6 @@ You can finetune this model on your own dataset.
218
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
219
  -->
220
 
221
- ## Model Details
222
-
223
- ### Model Description
224
- - **Model Type:** SPLADE Sparse Encoder
225
- - **Base model:** [prajjwal1/bert-mini](https://huggingface.co/prajjwal1/bert-mini)
226
- - **Maximum Sequence Length:** 512 tokens
227
- - **Output Dimensionality:** 30522 dimensions
228
- - **Similarity Function:** Dot Product
229
- <!-- - **Training Dataset:** Unknown -->
230
- <!-- - **Language:** Unknown -->
231
- <!-- - **License:** Unknown -->
232
-
233
- ### Model Sources
234
-
235
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
236
- - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
237
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
238
- - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
239
-
240
- ### Full Model Architecture
241
-
242
- ```
243
- SparseEncoder(
244
- (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
245
- (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
246
- )
247
- ```
248
-
249
- ## More
250
- <details><summary>Click to expand</summary>
251
-
252
  ## Evaluation
253
 
254
  ### Metrics
@@ -259,25 +224,25 @@ SparseEncoder(
259
 
260
  | Metric | Value |
261
  |:----------------------|:-----------|
262
- | dot_accuracy@1 | 0.4976 |
263
- | dot_accuracy@3 | 0.8154 |
264
- | dot_accuracy@5 | 0.9122 |
265
- | dot_accuracy@10 | 0.9684 |
266
- | dot_precision@1 | 0.4976 |
267
- | dot_precision@3 | 0.2791 |
268
- | dot_precision@5 | 0.1899 |
269
- | dot_precision@10 | 0.1018 |
270
- | dot_recall@1 | 0.4821 |
271
- | dot_recall@3 | 0.8021 |
272
- | dot_recall@5 | 0.9035 |
273
- | dot_recall@10 | 0.9639 |
274
- | **dot_ndcg@10** | **0.7392** |
275
- | dot_mrr@10 | 0.669 |
276
- | dot_map@100 | 0.6647 |
277
- | query_active_dims | 16.8104 |
278
- | query_sparsity_ratio | 0.9994 |
279
- | corpus_active_dims | 100.6221 |
280
- | corpus_sparsity_ratio | 0.9967 |
281
 
282
  <!--
283
  ## Bias, Risks and Limitations
@@ -297,25 +262,25 @@ SparseEncoder(
297
 
298
  #### Unnamed Dataset
299
 
300
- * Size: 1,350,000 training samples
301
- * Columns: <code>query</code>, <code>positive</code>, <code>negative</code>, and <code>label</code>
302
  * Approximate statistics based on the first 1000 samples:
303
- | | query | positive | negative | label |
304
- |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
305
- | type | string | string | string | list |
306
- | details | <ul><li>min: 4 tokens</li><li>mean: 8.95 tokens</li><li>max: 25 tokens</li></ul> | <ul><li>min: 15 tokens</li><li>mean: 79.36 tokens</li><li>max: 215 tokens</li></ul> | <ul><li>min: 21 tokens</li><li>mean: 78.39 tokens</li><li>max: 233 tokens</li></ul> | <ul><li>size: 1 elements</li></ul> |
307
  * Samples:
308
- | query | positive | negative | label |
309
- |:--------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------|
310
- | <code>what causes protruding stomach</code> | <code>Some of the less common causes of Protruding abdomen may include: 1 Constipation. 2 Chronic constipation. 3 Poor muscle tone. Poor muscle tone after 1 childbirth. Lactose intolerance. Food 1 allergies. Food intolerances. 2 Pregnancy. 3 Hernia. Malabsorption. Irritable bowel 1 syndrome. Colonic bacterial fermentation. 2 Gastroparesis. Diabetic gastroparesis.</code> | <code>Protruding abdomen: Introduction. Protruding abdomen: abdominal distension. See detailed information below for a list of 56 causes of Protruding abdomen, Symptom Checker, including diseases and drug side effect causes. » Review Causes of Protruding abdomen: Causes | Symptom Checker ». Home Diagnostic Testing and Protruding abdomen.</code> | <code>[3.2738194465637207]</code> |
311
- | <code>what is bialys</code> | <code>The bialy is not a sub-type of bagel, it’s a thing all to itself. Round with a depressed middle filled with cooked onions and sometimes poppy seeds, it is simply baked (bagels are boiled then baked). Purists prefer them straight up, preferably no more than five hours after being pulled from the oven. Extinction.Like the Lowland gorilla, the cassette tape and Madagascar forest coconuts, the bialy is rapidly becoming extinct. Sure, if you live in New York (where the Jewish tenements on the Lower East Side once overflowed with Eastern European foodstuffs that are now hard to locate), you have a few decent options.he bialy is not a sub-type of bagel, it’s a thing all to itself. Round with a depressed middle filled with cooked onions and sometimes poppy seeds, it is simply baked (bagels are boiled then baked). Purists prefer them straight up, preferably no more than five hours after being pulled from the oven. Extinction.</code> | <code>This homemade bialy recipe is even easier to make than a bagel because it doesn’t require boiling prior to baking.his homemade bialy recipe is even easier to make than a bagel because it doesn’t require boiling prior to baking.</code> | <code>[5.632390975952148]</code> |
312
- | <code>dhow definition</code> | <code>Definition of dhow. : an Arab lateen-rigged boat usually having a long overhang forward, a high poop, and a low waist.</code> | <code>Freebase(0.00 / 0 votes)Rate this definition: Dhow. Dhow is the generic name of a number of traditional sailing vessels with one or more masts with lateen sails used in the Red Sea and Indian Ocean region. Historians are divided as to whether the dhow was invented by Arabs or Indians.</code> | <code>[0.8292264938354492]</code> |
313
  * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
314
  ```json
315
  {
316
  "loss": "SparseMarginMSELoss",
317
- "document_regularizer_weight": 0.2,
318
- "query_regularizer_weight": 0.3
319
  }
320
  ```
321
 
@@ -323,16 +288,15 @@ SparseEncoder(
323
  #### Non-Default Hyperparameters
324
 
325
  - `eval_strategy`: epoch
326
- - `per_device_train_batch_size`: 32
327
- - `per_device_eval_batch_size`: 32
328
- - `learning_rate`: 6e-05
329
  - `num_train_epochs`: 4
330
  - `lr_scheduler_type`: cosine
331
- - `warmup_ratio`: 0.05
332
  - `fp16`: True
333
  - `load_best_model_at_end`: True
334
  - `optim`: adamw_torch_fused
335
- - `push_to_hub`: True
336
 
337
  #### All Hyperparameters
338
  <details><summary>Click to expand</summary>
@@ -341,14 +305,14 @@ SparseEncoder(
341
  - `do_predict`: False
342
  - `eval_strategy`: epoch
343
  - `prediction_loss_only`: True
344
- - `per_device_train_batch_size`: 32
345
- - `per_device_eval_batch_size`: 32
346
  - `per_gpu_train_batch_size`: None
347
  - `per_gpu_eval_batch_size`: None
348
  - `gradient_accumulation_steps`: 1
349
  - `eval_accumulation_steps`: None
350
  - `torch_empty_cache_steps`: None
351
- - `learning_rate`: 6e-05
352
  - `weight_decay`: 0.0
353
  - `adam_beta1`: 0.9
354
  - `adam_beta2`: 0.999
@@ -358,7 +322,7 @@ SparseEncoder(
358
  - `max_steps`: -1
359
  - `lr_scheduler_type`: cosine
360
  - `lr_scheduler_kwargs`: {}
361
- - `warmup_ratio`: 0.05
362
  - `warmup_steps`: 0
363
  - `log_level`: passive
364
  - `log_level_replica`: warning
@@ -415,7 +379,7 @@ SparseEncoder(
415
  - `dataloader_persistent_workers`: False
416
  - `skip_memory_metrics`: True
417
  - `use_legacy_prediction_loop`: False
418
- - `push_to_hub`: True
419
  - `resume_from_checkpoint`: None
420
  - `hub_model_id`: None
421
  - `hub_strategy`: every_save
@@ -458,21 +422,19 @@ SparseEncoder(
458
  </details>
459
 
460
  ### Training Logs
461
- | Epoch | Step | Training Loss | dot_ndcg@10 |
462
- |:-------:|:----------:|:-------------:|:-----------:|
463
- | 1.0 | 42188 | 8.6242 | 0.7262 |
464
- | 2.0 | 84376 | 7.0404 | 0.7362 |
465
- | 3.0 | 126564 | 5.3661 | 0.7388 |
466
- | **4.0** | **168752** | **4.4807** | **0.7392** |
467
 
468
- * The bold row denotes the saved checkpoint.
469
 
470
  ### Framework Versions
471
  - Python: 3.11.13
472
  - Sentence Transformers: 5.0.0
473
  - Transformers: 4.53.3
474
  - PyTorch: 2.6.0+cu124
475
- - Accelerate: 1.8.1
476
  - Datasets: 4.0.0
477
  - Tokenizers: 0.21.2
478
 
@@ -544,6 +506,4 @@ SparseEncoder(
544
  ## Model Card Contact
545
 
546
  *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
547
- -->
548
-
549
- </details>
 
5
  - sparse
6
  - splade
7
  - generated_from_trainer
8
+ - dataset_size:1000000
9
  - loss:SpladeLoss
10
  - loss:SparseMarginMSELoss
11
  - loss:FlopsLoss
12
+ base_model: yosefw/SPLADE-BERT-Mini-BS256
 
13
  widget:
14
+ - text: Caffeine is a central nervous system stimulant. It works by stimulating the
15
+ brain. Caffeine is found naturally in foods and beverages such as coffee, tea,
16
+ colas, energy and chocolate. Botanical sources of caffeine include kola nuts,
17
+ guarana, and yerba mate.
18
+ - text: Tim Hardaway, Jr. Compared To My 5ft 10in (177cm) Height. Tim Hardaway, Jr.'s
19
+ height is 6ft 6in or 198cm while I am 5ft 10in or 177cm. I am shorter compared
20
+ to him. To find out how much shorter I am, we would have to subtract my height
21
+ from Tim Hardaway, Jr.'s height. Therefore I am shorter to him for about 21cm.
22
+ - text: benefits of honey and lemon
23
+ - text: 'How To Cook Corn on the Cob in the Microwave What You Need. Ingredients 1
24
+ or more ears fresh, un-shucked sweet corn Equipment Microwave Cooling rack or
25
+ cutting board Instructions. Place 1 to 4 ears of corn in the microwave: Arrange
26
+ 1 to 4 ears of corn, un-shucked, in the microwave. If you prefer, you can set
27
+ them on a microwaveable plate or tray. If you need to cook more than 4 ears of
28
+ corn, cook them in batches. Microwave for 3 to 5 minutes: For just 1 or 2 ears
29
+ of corn, microwave for 3 minutes. For 3 or 4 ears, microwave for 4 minutes. If
30
+ you like softer corn or if your ears are particularly large, microwave for an
31
+ additional minute.'
32
+ - text: The law recognizes two basic kinds of warrantiesimplied warranties and express
33
+ warranties. Implied Warranties. Implied warranties are unspoken, unwritten promises,
34
+ created by state law, that go from you, as a seller or merchant, to your customers.
 
 
 
 
 
 
 
 
35
  pipeline_tag: feature-extraction
36
  library_name: sentence-transformers
37
  metrics:
 
65
  type: unknown
66
  metrics:
67
  - type: dot_accuracy@1
68
+ value: 0.5018
69
  name: Dot Accuracy@1
70
  - type: dot_accuracy@3
71
+ value: 0.8286
72
  name: Dot Accuracy@3
73
  - type: dot_accuracy@5
74
+ value: 0.9194
75
  name: Dot Accuracy@5
76
  - type: dot_accuracy@10
77
+ value: 0.9746
78
  name: Dot Accuracy@10
79
  - type: dot_precision@1
80
+ value: 0.5018
81
  name: Dot Precision@1
82
  - type: dot_precision@3
83
+ value: 0.2839333333333333
84
  name: Dot Precision@3
85
  - type: dot_precision@5
86
+ value: 0.19103999999999996
87
  name: Dot Precision@5
88
  - type: dot_precision@10
89
+ value: 0.10255999999999998
90
  name: Dot Precision@10
91
  - type: dot_recall@1
92
+ value: 0.4867666666666667
93
  name: Dot Recall@1
94
  - type: dot_recall@3
95
+ value: 0.81485
96
  name: Dot Recall@3
97
  - type: dot_recall@5
98
+ value: 0.9096166666666667
99
  name: Dot Recall@5
100
  - type: dot_recall@10
101
+ value: 0.9709333333333334
102
  name: Dot Recall@10
103
  - type: dot_ndcg@10
104
+ value: 0.7457042059559617
105
  name: Dot Ndcg@10
106
  - type: dot_mrr@10
107
+ value: 0.6749323809523842
108
  name: Dot Mrr@10
109
  - type: dot_map@100
110
+ value: 0.670785161566693
111
  name: Dot Map@100
112
  - type: query_active_dims
113
+ value: 22.584999084472656
114
  name: Query Active Dims
115
  - type: query_sparsity_ratio
116
+ value: 0.9992600419669592
117
  name: Query Sparsity Ratio
118
  - type: corpus_active_dims
119
+ value: 174.85202722777373
120
  name: Corpus Active Dims
121
  - type: corpus_sparsity_ratio
122
+ value: 0.9942712788405814
123
  name: Corpus Sparsity Ratio
 
 
 
 
 
124
  ---
125
 
126
+ # SPLADE Sparse Encoder
127
 
128
+ This is a [SPLADE Sparse Encoder](https://www.sbert.net/docs/sparse_encoder/usage/usage.html) model finetuned from [yosefw/SPLADE-BERT-Mini-BS256](https://huggingface.co/yosefw/SPLADE-BERT-Mini-BS256) using the [sentence-transformers](https://www.SBERT.net) library. It maps sentences & paragraphs to a 30522-dimensional sparse vector space and can be used for semantic search and sparse retrieval.
129
+ ## Model Details
130
 
131
+ ### Model Description
132
+ - **Model Type:** SPLADE Sparse Encoder
133
+ - **Base model:** [yosefw/SPLADE-BERT-Mini-BS256](https://huggingface.co/yosefw/SPLADE-BERT-Mini-BS256) <!-- at revision 986bc55b61d9f0559f86423fb5807b9f4a3b7094 -->
134
+ - **Maximum Sequence Length:** 512 tokens
135
+ - **Output Dimensionality:** 30522 dimensions
136
+ - **Similarity Function:** Dot Product
137
+ <!-- - **Training Dataset:** Unknown -->
138
+ <!-- - **Language:** Unknown -->
139
+ <!-- - **License:** Unknown -->
140
 
141
+ ### Model Sources
 
 
142
 
143
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
144
+ - **Documentation:** [Sparse Encoder Documentation](https://www.sbert.net/docs/sparse_encoder/usage/usage.html)
145
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
146
+ - **Hugging Face:** [Sparse Encoders on Hugging Face](https://huggingface.co/models?library=sentence-transformers&other=sparse-encoder)
147
 
148
+ ### Full Model Architecture
149
 
150
+ ```
151
+ SparseEncoder(
152
+ (0): MLMTransformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'BertForMaskedLM'})
153
+ (1): SpladePooling({'pooling_strategy': 'max', 'activation_function': 'relu', 'word_embedding_dimension': 30522})
154
+ )
155
+ ```
156
 
157
  ## Usage
158
 
 
169
  from sentence_transformers import SparseEncoder
170
 
171
  # Download from the 🤗 Hub
172
+ model = SparseEncoder("yosefw/SPLADE-BERT-Mini-BS256-distil")
173
  # Run inference
174
  queries = [
175
+ "common law implied warranty",
176
  ]
177
  documents = [
178
+ 'The law recognizes two basic kinds of warrantiesimplied warranties and express warranties. Implied Warranties. Implied warranties are unspoken, unwritten promises, created by state law, that go from you, as a seller or merchant, to your customers.',
179
+ 'An implied warranty is a contract law term for certain assurances that are presumed in the sale of products or real property.',
180
+ 'The implied warranty of fitness for a particular purpose is a promise that the law says you, as a seller, make when your customer relies on your advice that a product can be used for some specific purpose.',
181
  ]
182
  query_embeddings = model.encode_query(queries)
183
  document_embeddings = model.encode_document(documents)
 
187
  # Get the similarity scores for the embeddings
188
  similarities = model.similarity(query_embeddings, document_embeddings)
189
  print(similarities)
190
+ # tensor([[22.4364, 22.7160, 21.7330]])
191
  ```
192
 
193
  <!--
 
214
  *List how the model may foreseeably be misused and address what users ought not to do with the model.*
215
  -->
216
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
217
  ## Evaluation
218
 
219
  ### Metrics
 
224
 
225
  | Metric | Value |
226
  |:----------------------|:-----------|
227
+ | dot_accuracy@1 | 0.5018 |
228
+ | dot_accuracy@3 | 0.8286 |
229
+ | dot_accuracy@5 | 0.9194 |
230
+ | dot_accuracy@10 | 0.9746 |
231
+ | dot_precision@1 | 0.5018 |
232
+ | dot_precision@3 | 0.2839 |
233
+ | dot_precision@5 | 0.191 |
234
+ | dot_precision@10 | 0.1026 |
235
+ | dot_recall@1 | 0.4868 |
236
+ | dot_recall@3 | 0.8148 |
237
+ | dot_recall@5 | 0.9096 |
238
+ | dot_recall@10 | 0.9709 |
239
+ | **dot_ndcg@10** | **0.7457** |
240
+ | dot_mrr@10 | 0.6749 |
241
+ | dot_map@100 | 0.6708 |
242
+ | query_active_dims | 22.585 |
243
+ | query_sparsity_ratio | 0.9993 |
244
+ | corpus_active_dims | 174.852 |
245
+ | corpus_sparsity_ratio | 0.9943 |
246
 
247
  <!--
248
  ## Bias, Risks and Limitations
 
262
 
263
  #### Unnamed Dataset
264
 
265
+ * Size: 1,000,000 training samples
266
+ * Columns: <code>query</code>, <code>positive</code>, <code>negative_1</code>, <code>negative_2</code>, and <code>label</code>
267
  * Approximate statistics based on the first 1000 samples:
268
+ | | query | positive | negative_1 | negative_2 | label |
269
+ |:--------|:---------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|:-----------------------------------|
270
+ | type | string | string | string | string | list |
271
+ | details | <ul><li>min: 4 tokens</li><li>mean: 9.01 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 22 tokens</li><li>mean: 80.48 tokens</li><li>max: 247 tokens</li></ul> | <ul><li>min: 18 tokens</li><li>mean: 79.27 tokens</li><li>max: 213 tokens</li></ul> | <ul><li>min: 17 tokens</li><li>mean: 75.56 tokens</li><li>max: 190 tokens</li></ul> | <ul><li>size: 2 elements</li></ul> |
272
  * Samples:
273
+ | query | positive | negative_1 | negative_2 | label |
274
+ |:-----------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------|
275
+ | <code>friendly home health care</code> | <code>Medicare Evaluation of the Quality of Care. The quality of care given at Friendly Care Home Health Services is periodically evaluated by Medicare. The results of the most recent evaluation period are listed below to help you compare home care agencies in your area. More Info.</code> | <code>Every participant took the same survey so it is a useful way to compare Friendly Care Home Health Services to other home care agencies.</code> | <code>It covers a wide range of services and can often delay the need for long-term nursing home care. More specifically, home health care may include occupational and physical therapy, speech therapy, and even skilled nursing.</code> | <code>[1.2647171020507812, 9.144136428833008]</code> |
276
+ | <code>how much does the xbox elite controller weigh</code> | <code>How much does an Xbox 360 weigh? A: The weight of an Xbox 360 depends on the different model purchased, with an original Xbox 360 or Xbox 360 Elite weighing 7.7 pounds with a hard drive and a newer Xbox 360 Slim weighing 6.3 pounds. An Xbox 360 without a hard drive weighs 7 pounds.</code> | <code>How much does 6 xbox 360 games/cases weigh? How much does an xbox 360 elite weigh (in the box)? How much does an xbox 360 weigh? im going to fedex one? I am considering purchasing an Xbox 360, or a Playstation 3...</code> | <code>1 You can only upload videos smaller than 600 MB. 2 You can only upload a photo (png, jpg, jpeg) or video (3gp, 3gpp, mp4, mov, avi, mpg, mpeg, rm). 3 You can only upload a photo or video. Video should be smaller than <b>600 MB/5 minutes</b>.</code> | <code>[4.903870582580566, 18.162578582763672]</code> |
277
+ | <code>what county is norfolk, ct in</code> | <code>Norfolk, Connecticut. Norfolk (local /ˈnɔːrfɔːrk/) is a town in Litchfield County, Connecticut, United States. The population was 1,787 at the 2010 census.</code> | <code>Norfolk Historic District. The Norfolk Historic District was listed on the National Register of Historic Places in 1979. Portions of the content on this web page were adapted from a copy of the original nomination document. [†] Adaptation copyright © 2010, The Gombach Group. Description.</code> | <code>Terms begin the first day of the month. Grand Juries, 1st and 3rd Wednesday of each month. Civil cases set by agreement of counsel and consent of the court; scheduling orders are mandatory in most cases. Civil and Criminal trials begin at 9:30 a.m.</code> | <code>[12.4237699508667, 21.46290397644043]</code> |
278
  * Loss: [<code>SpladeLoss</code>](https://sbert.net/docs/package_reference/sparse_encoder/losses.html#spladeloss) with these parameters:
279
  ```json
280
  {
281
  "loss": "SparseMarginMSELoss",
282
+ "document_regularizer_weight": 0.12,
283
+ "query_regularizer_weight": 0.2
284
  }
285
  ```
286
 
 
288
  #### Non-Default Hyperparameters
289
 
290
  - `eval_strategy`: epoch
291
+ - `per_device_train_batch_size`: 64
292
+ - `per_device_eval_batch_size`: 64
293
+ - `learning_rate`: 4e-05
294
  - `num_train_epochs`: 4
295
  - `lr_scheduler_type`: cosine
296
+ - `warmup_ratio`: 0.025
297
  - `fp16`: True
298
  - `load_best_model_at_end`: True
299
  - `optim`: adamw_torch_fused
 
300
 
301
  #### All Hyperparameters
302
  <details><summary>Click to expand</summary>
 
305
  - `do_predict`: False
306
  - `eval_strategy`: epoch
307
  - `prediction_loss_only`: True
308
+ - `per_device_train_batch_size`: 64
309
+ - `per_device_eval_batch_size`: 64
310
  - `per_gpu_train_batch_size`: None
311
  - `per_gpu_eval_batch_size`: None
312
  - `gradient_accumulation_steps`: 1
313
  - `eval_accumulation_steps`: None
314
  - `torch_empty_cache_steps`: None
315
+ - `learning_rate`: 4e-05
316
  - `weight_decay`: 0.0
317
  - `adam_beta1`: 0.9
318
  - `adam_beta2`: 0.999
 
322
  - `max_steps`: -1
323
  - `lr_scheduler_type`: cosine
324
  - `lr_scheduler_kwargs`: {}
325
+ - `warmup_ratio`: 0.025
326
  - `warmup_steps`: 0
327
  - `log_level`: passive
328
  - `log_level_replica`: warning
 
379
  - `dataloader_persistent_workers`: False
380
  - `skip_memory_metrics`: True
381
  - `use_legacy_prediction_loop`: False
382
+ - `push_to_hub`: False
383
  - `resume_from_checkpoint`: None
384
  - `hub_model_id`: None
385
  - `hub_strategy`: every_save
 
422
  </details>
423
 
424
  ### Training Logs
425
+ | Epoch | Step | Training Loss | dot_ndcg@10 |
426
+ |:-----:|:-----:|:-------------:|:-----------:|
427
+ | 1.0 | 15625 | 9.3147 | 0.7353 |
428
+ | 2.0 | 31250 | 7.5267 | 0.7429 |
429
+ | 3.0 | 46875 | 6.3289 | 0.7457 |
 
430
 
 
431
 
432
  ### Framework Versions
433
  - Python: 3.11.13
434
  - Sentence Transformers: 5.0.0
435
  - Transformers: 4.53.3
436
  - PyTorch: 2.6.0+cu124
437
+ - Accelerate: 1.9.0
438
  - Datasets: 4.0.0
439
  - Tokenizers: 0.21.2
440
 
 
506
  ## Model Card Contact
507
 
508
  *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
509
+ -->
 
 
config.json CHANGED
@@ -17,7 +17,7 @@
17
  "pad_token_id": 0,
18
  "position_embedding_type": "absolute",
19
  "torch_dtype": "float32",
20
- "transformers_version": "4.54.0",
21
  "type_vocab_size": 2,
22
  "use_cache": true,
23
  "vocab_size": 30522
 
17
  "pad_token_id": 0,
18
  "position_embedding_type": "absolute",
19
  "torch_dtype": "float32",
20
+ "transformers_version": "4.55.2",
21
  "type_vocab_size": 2,
22
  "use_cache": true,
23
  "vocab_size": 30522
config_sentence_transformers.json CHANGED
@@ -1,9 +1,9 @@
1
  {
2
  "model_type": "SparseEncoder",
3
  "__version__": {
4
- "sentence_transformers": "5.0.0",
5
- "transformers": "4.54.0",
6
- "pytorch": "2.6.0+cu124"
7
  },
8
  "prompts": {
9
  "query": "",
 
1
  {
2
  "model_type": "SparseEncoder",
3
  "__version__": {
4
+ "sentence_transformers": "5.1.0",
5
+ "transformers": "4.55.2",
6
+ "pytorch": "2.8.0+cu126"
7
  },
8
  "prompts": {
9
  "query": "",
model.safetensors CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:4050564a649d96030d5ec42b38ae323f47c9454d87af057a42721bf892ed32a7
3
  size 44814856
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a7f2791a6ae8c6c07c51f95d67f52c7514cecf0fe59f0381969e3add0751f9d
3
  size 44814856