manmah commited on
Commit
00b2e7f
·
verified ·
1 Parent(s): d5e6600

Add new SentenceTransformer model

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 1024,
3
+ "pooling_mode_cls_token": true,
4
+ "pooling_mode_mean_tokens": false,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,660 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:156
8
+ - loss:MatryoshkaLoss
9
+ - loss:MultipleNegativesRankingLoss
10
+ base_model: Snowflake/snowflake-arctic-embed-l
11
+ widget:
12
+ - source_sentence: What was the typical context length accepted by most models last
13
+ year?
14
+ sentences:
15
+ - 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious
16
+ little progress on tackling that problem in 2024, and we’ve been talking about
17
+ it since September 2022.
18
+
19
+ I’m beginning to see the most popular idea of “agents” as dependent on AGI itself.
20
+ A model that’s robust against gulliblity is a very tall order indeed.
21
+
22
+ Evals really matter
23
+
24
+ Anthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):'
25
+ - 'Gemini 1.5 Pro also illustrated one of the key themes of 2024: increased context
26
+ lengths. Last year most models accepted 4,096 or 8,192 tokens, with the notable
27
+ exception of Claude 2.1 which accepted 200,000. Today every serious provider has
28
+ a 100,000+ token model, and Google’s Gemini series accepts up to 2 million.'
29
+ - 'Here’s the rest of the transcript. It’s bland and generic, but my phone can pitch
30
+ bland and generic Christmas movies to Netflix now!
31
+
32
+ LLM prices crashed, thanks to competition and increased efficiency
33
+
34
+ The past twelve months have seen a dramatic collapse in the cost of running a
35
+ prompt through the top tier hosted LLMs.
36
+
37
+ In December 2023 (here’s the Internet Archive for the OpenAI pricing page) OpenAI
38
+ were charging $30/million input tokens for GPT-4, $10/mTok for the then-new GPT-4
39
+ Turbo and $1/mTok for GPT-3.5 Turbo.'
40
+ - source_sentence: What challenges does the author face when trying to evaluate multiple
41
+ LLMs?
42
+ sentences:
43
+ - 'We don’t yet know how to build GPT-4
44
+
45
+ Frustratingly, despite the enormous leaps ahead we’ve had this year, we are yet
46
+ to see an alternative model that’s better than GPT-4.
47
+
48
+ OpenAI released GPT-4 in March, though it later turned out we had a sneak peak
49
+ of it in February when Microsoft used it as part of the new Bing.
50
+
51
+ This may well change in the next few weeks: Google’s Gemini Ultra has big claims,
52
+ but isn’t yet available for us to try out.
53
+
54
+ The team behind Mistral are working to beat GPT-4 as well, and their track record
55
+ is already extremely strong considering their first public model only came out
56
+ in September, and they’ve released two significant improvements since then.'
57
+ - 'I find I have to work with an LLM for a few weeks in order to get a good intuition
58
+ for it’s strengths and weaknesses. This greatly limits how many I can evaluate
59
+ myself!
60
+
61
+ The most frustrating thing for me is at the level of individual prompting.
62
+
63
+ Sometimes I’ll tweak a prompt and capitalize some of the words in it, to emphasize
64
+ that I really want it to OUTPUT VALID MARKDOWN or similar. Did capitalizing those
65
+ words make a difference? I still don’t have a good methodology for figuring that
66
+ out.
67
+
68
+ We’re left with what’s effectively Vibes Based Development. It’s vibes all the
69
+ way down.
70
+
71
+ I’d love to see us move beyond vibes in 2024!
72
+
73
+ LLMs are really smart, and also really, really dumb'
74
+ - 'Except... you can run generated code to see if it’s correct. And with patterns
75
+ like ChatGPT Code Interpreter the LLM can execute the code itself, process the
76
+ error message, then rewrite it and keep trying until it works!
77
+
78
+ So hallucination is a much lesser problem for code generation than for anything
79
+ else. If only we had the equivalent of Code Interpreter for fact-checking natural
80
+ language!
81
+
82
+ How should we feel about this as software engineers?
83
+
84
+ On the one hand, this feels like a threat: who needs a programmer if ChatGPT can
85
+ write code for you?'
86
+ - source_sentence: What are some ways mentioned to run local, private large language
87
+ models (LLMs) on personal devices?
88
+ sentences:
89
+ - 'A lot of people are excited about AI agents—an infuriatingly vague term that
90
+ seems to be converging on “AI systems that can go away and act on your behalf”.
91
+ We’ve been talking about them all year, but I’ve seen few if any examples of them
92
+ running in production, despite lots of exciting prototypes.
93
+
94
+ I think this is because of gullibility.
95
+
96
+ Can we solve this? Honestly, I’m beginning to suspect that you can’t fully solve
97
+ gullibility without achieving AGI. So it may be quite a while before those agent
98
+ dreams can really start to come true!
99
+
100
+ Code may be the best application
101
+
102
+ Over the course of the year, it’s become increasingly clear that writing code
103
+ is one of the things LLMs are most capable of.'
104
+ - 'I run a bunch of them on my laptop. I run Mistral 7B (a surprisingly great model)
105
+ on my iPhone. You can install several different apps to get your own, local, completely
106
+ private LLM. My own LLM project provides a CLI tool for running an array of different
107
+ models via plugins.
108
+
109
+ You can even run them entirely in your browser using WebAssembly and the latest
110
+ Chrome!
111
+
112
+ Hobbyists can build their own fine-tuned models
113
+
114
+ I said earlier that building an LLM was still out of reach of hobbyists. That
115
+ may be true for training from scratch, but fine-tuning one of those models is
116
+ another matter entirely.'
117
+ - 'Prompt injection is a natural consequence of this gulibility. I’ve seen precious
118
+ little progress on tackling that problem in 2024, and we’ve been talking about
119
+ it since September 2022.
120
+
121
+ I’m beginning to see the most popular idea of “agents” as dependent on AGI itself.
122
+ A model that’s robust against gulliblity is a very tall order indeed.
123
+
124
+ Evals really matter
125
+
126
+ Anthropic’s Amanda Askell (responsible for much of the work behind Claude’s Character):'
127
+ - source_sentence: How has the value of prompt-driven app generation changed from
128
+ 2023 to 2024?
129
+ sentences:
130
+ - 'On paper, a 64GB Mac should be a great machine for running models due to the
131
+ way the CPU and GPU can share the same memory. In practice, many models are released
132
+ as model weights and libraries that reward NVIDIA’s CUDA over other platforms.
133
+
134
+ The llama.cpp ecosystem helped a lot here, but the real breakthrough has been
135
+ Apple’s MLX library, “an array framework for Apple Silicon”. It’s fantastic.
136
+
137
+ Apple’s mlx-lm Python library supports running a wide range of MLX-compatible
138
+ models on my Mac, with excellent performance. mlx-community on Hugging Face offers
139
+ more than 1,000 models that have been converted to the necessary format.'
140
+ - 'The environmental impact got much, much worse
141
+
142
+ The much bigger problem here is the enormous competitive buildout of the infrastructure
143
+ that is imagined to be necessary for these models in the future.
144
+
145
+ Companies like Google, Meta, Microsoft and Amazon are all spending billions of
146
+ dollars rolling out new datacenters, with a very material impact on the electricity
147
+ grid and the environment. There’s even talk of spinning up new nuclear power stations,
148
+ but those can take decades.
149
+
150
+ Is this infrastructure necessary? DeepSeek v3’s $6m training cost and the continued
151
+ crash in LLM prices might hint that it’s not. But would you want to be the big
152
+ tech executive that argued NOT to build out this infrastructure only to be proven
153
+ wrong in a few years’ time?'
154
+ - 'These abilities are just a few weeks old at this point, and I don’t think their
155
+ impact has been fully felt yet. If you haven’t tried them out yet you really should.
156
+
157
+ Both Gemini and OpenAI offer API access to these features as well. OpenAI started
158
+ with a WebSocket API that was quite challenging to use, but in December they announced
159
+ a new WebRTC API which is much easier to get started with. Building a web app
160
+ that a user can talk to via voice is easy now!
161
+
162
+ Prompt driven app generation is a commodity already
163
+
164
+ This was possible with GPT-4 in 2023, but the value it provides became evident
165
+ in 2024.'
166
+ - source_sentence: What makes the prompt-driven custom interface feature powerful
167
+ and easy to build despite the challenges of browser sandboxing?
168
+ sentences:
169
+ - 'This prompt-driven custom interface feature is so powerful and easy to build
170
+ (once you’ve figured out the gnarly details of browser sandboxing) that I expect
171
+ it to show up as a feature in a wide range of products in 2025.
172
+
173
+ Universal access to the best models lasted for just a few short months
174
+
175
+ For a few short months this year all three of the best available models—GPT-4o,
176
+ Claude 3.5 Sonnet and Gemini 1.5 Pro—were freely available to most of the world.'
177
+ - 'The environmental impact got much, much worse
178
+
179
+ The much bigger problem here is the enormous competitive buildout of the infrastructure
180
+ that is imagined to be necessary for these models in the future.
181
+
182
+ Companies like Google, Meta, Microsoft and Amazon are all spending billions of
183
+ dollars rolling out new datacenters, with a very material impact on the electricity
184
+ grid and the environment. There’s even talk of spinning up new nuclear power stations,
185
+ but those can take decades.
186
+
187
+ Is this infrastructure necessary? DeepSeek v3’s $6m training cost and the continued
188
+ crash in LLM prices might hint that it’s not. But would you want to be the big
189
+ tech executive that argued NOT to build out this infrastructure only to be proven
190
+ wrong in a few years’ time?'
191
+ - 'We don’t yet know how to build GPT-4
192
+
193
+ Frustratingly, despite the enormous leaps ahead we’ve had this year, we are yet
194
+ to see an alternative model that’s better than GPT-4.
195
+
196
+ OpenAI released GPT-4 in March, though it later turned out we had a sneak peak
197
+ of it in February when Microsoft used it as part of the new Bing.
198
+
199
+ This may well change in the next few weeks: Google’s Gemini Ultra has big claims,
200
+ but isn’t yet available for us to try out.
201
+
202
+ The team behind Mistral are working to beat GPT-4 as well, and their track record
203
+ is already extremely strong considering their first public model only came out
204
+ in September, and they’ve released two significant improvements since then.'
205
+ pipeline_tag: sentence-similarity
206
+ library_name: sentence-transformers
207
+ metrics:
208
+ - cosine_accuracy@1
209
+ - cosine_accuracy@3
210
+ - cosine_accuracy@5
211
+ - cosine_accuracy@10
212
+ - cosine_precision@1
213
+ - cosine_precision@3
214
+ - cosine_precision@5
215
+ - cosine_precision@10
216
+ - cosine_recall@1
217
+ - cosine_recall@3
218
+ - cosine_recall@5
219
+ - cosine_recall@10
220
+ - cosine_ndcg@10
221
+ - cosine_mrr@10
222
+ - cosine_map@100
223
+ model-index:
224
+ - name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
225
+ results:
226
+ - task:
227
+ type: information-retrieval
228
+ name: Information Retrieval
229
+ dataset:
230
+ name: Unknown
231
+ type: unknown
232
+ metrics:
233
+ - type: cosine_accuracy@1
234
+ value: 0.875
235
+ name: Cosine Accuracy@1
236
+ - type: cosine_accuracy@3
237
+ value: 1.0
238
+ name: Cosine Accuracy@3
239
+ - type: cosine_accuracy@5
240
+ value: 1.0
241
+ name: Cosine Accuracy@5
242
+ - type: cosine_accuracy@10
243
+ value: 1.0
244
+ name: Cosine Accuracy@10
245
+ - type: cosine_precision@1
246
+ value: 0.875
247
+ name: Cosine Precision@1
248
+ - type: cosine_precision@3
249
+ value: 0.3333333333333333
250
+ name: Cosine Precision@3
251
+ - type: cosine_precision@5
252
+ value: 0.20000000000000004
253
+ name: Cosine Precision@5
254
+ - type: cosine_precision@10
255
+ value: 0.10000000000000002
256
+ name: Cosine Precision@10
257
+ - type: cosine_recall@1
258
+ value: 0.875
259
+ name: Cosine Recall@1
260
+ - type: cosine_recall@3
261
+ value: 1.0
262
+ name: Cosine Recall@3
263
+ - type: cosine_recall@5
264
+ value: 1.0
265
+ name: Cosine Recall@5
266
+ - type: cosine_recall@10
267
+ value: 1.0
268
+ name: Cosine Recall@10
269
+ - type: cosine_ndcg@10
270
+ value: 0.9538662191964322
271
+ name: Cosine Ndcg@10
272
+ - type: cosine_mrr@10
273
+ value: 0.9375
274
+ name: Cosine Mrr@10
275
+ - type: cosine_map@100
276
+ value: 0.9375
277
+ name: Cosine Map@100
278
+ ---
279
+
280
+ # SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
281
+
282
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
283
+
284
+ ## Model Details
285
+
286
+ ### Model Description
287
+ - **Model Type:** Sentence Transformer
288
+ - **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
289
+ - **Maximum Sequence Length:** 512 tokens
290
+ - **Output Dimensionality:** 1024 dimensions
291
+ - **Similarity Function:** Cosine Similarity
292
+ <!-- - **Training Dataset:** Unknown -->
293
+ <!-- - **Language:** Unknown -->
294
+ <!-- - **License:** Unknown -->
295
+
296
+ ### Model Sources
297
+
298
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
299
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
300
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
301
+
302
+ ### Full Model Architecture
303
+
304
+ ```
305
+ SentenceTransformer(
306
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
307
+ (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
308
+ (2): Normalize()
309
+ )
310
+ ```
311
+
312
+ ## Usage
313
+
314
+ ### Direct Usage (Sentence Transformers)
315
+
316
+ First install the Sentence Transformers library:
317
+
318
+ ```bash
319
+ pip install -U sentence-transformers
320
+ ```
321
+
322
+ Then you can load this model and run inference.
323
+ ```python
324
+ from sentence_transformers import SentenceTransformer
325
+
326
+ # Download from the 🤗 Hub
327
+ model = SentenceTransformer("manmah/legal-ft-2aefb51e-1a19-43c1-a5ff-7d28d65534da")
328
+ # Run inference
329
+ sentences = [
330
+ 'What makes the prompt-driven custom interface feature powerful and easy to build despite the challenges of browser sandboxing?',
331
+ 'This prompt-driven custom interface feature is so powerful and easy to build (once you’ve figured out the gnarly details of browser sandboxing) that I expect it to show up as a feature in a wide range of products in 2025.\nUniversal access to the best models lasted for just a few short months\nFor a few short months this year all three of the best available models—GPT-4o, Claude 3.5 Sonnet and Gemini 1.5 Pro—were freely available to most of the world.',
332
+ 'We don’t yet know how to build GPT-4\nFrustratingly, despite the enormous leaps ahead we’ve had this year, we are yet to see an alternative model that’s better than GPT-4.\nOpenAI released GPT-4 in March, though it later turned out we had a sneak peak of it in February when Microsoft used it as part of the new Bing.\nThis may well change in the next few weeks: Google’s Gemini Ultra has big claims, but isn’t yet available for us to try out.\nThe team behind Mistral are working to beat GPT-4 as well, and their track record is already extremely strong considering their first public model only came out in September, and they’ve released two significant improvements since then.',
333
+ ]
334
+ embeddings = model.encode(sentences)
335
+ print(embeddings.shape)
336
+ # [3, 1024]
337
+
338
+ # Get the similarity scores for the embeddings
339
+ similarities = model.similarity(embeddings, embeddings)
340
+ print(similarities.shape)
341
+ # [3, 3]
342
+ ```
343
+
344
+ <!--
345
+ ### Direct Usage (Transformers)
346
+
347
+ <details><summary>Click to see the direct usage in Transformers</summary>
348
+
349
+ </details>
350
+ -->
351
+
352
+ <!--
353
+ ### Downstream Usage (Sentence Transformers)
354
+
355
+ You can finetune this model on your own dataset.
356
+
357
+ <details><summary>Click to expand</summary>
358
+
359
+ </details>
360
+ -->
361
+
362
+ <!--
363
+ ### Out-of-Scope Use
364
+
365
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
366
+ -->
367
+
368
+ ## Evaluation
369
+
370
+ ### Metrics
371
+
372
+ #### Information Retrieval
373
+
374
+ * Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
375
+
376
+ | Metric | Value |
377
+ |:--------------------|:-----------|
378
+ | cosine_accuracy@1 | 0.875 |
379
+ | cosine_accuracy@3 | 1.0 |
380
+ | cosine_accuracy@5 | 1.0 |
381
+ | cosine_accuracy@10 | 1.0 |
382
+ | cosine_precision@1 | 0.875 |
383
+ | cosine_precision@3 | 0.3333 |
384
+ | cosine_precision@5 | 0.2 |
385
+ | cosine_precision@10 | 0.1 |
386
+ | cosine_recall@1 | 0.875 |
387
+ | cosine_recall@3 | 1.0 |
388
+ | cosine_recall@5 | 1.0 |
389
+ | cosine_recall@10 | 1.0 |
390
+ | **cosine_ndcg@10** | **0.9539** |
391
+ | cosine_mrr@10 | 0.9375 |
392
+ | cosine_map@100 | 0.9375 |
393
+
394
+ <!--
395
+ ## Bias, Risks and Limitations
396
+
397
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
398
+ -->
399
+
400
+ <!--
401
+ ### Recommendations
402
+
403
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
404
+ -->
405
+
406
+ ## Training Details
407
+
408
+ ### Training Dataset
409
+
410
+ #### Unnamed Dataset
411
+
412
+ * Size: 156 training samples
413
+ * Columns: <code>sentence_0</code> and <code>sentence_1</code>
414
+ * Approximate statistics based on the first 156 samples:
415
+ | | sentence_0 | sentence_1 |
416
+ |:--------|:-----------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|
417
+ | type | string | string |
418
+ | details | <ul><li>min: 12 tokens</li><li>mean: 20.82 tokens</li><li>max: 32 tokens</li></ul> | <ul><li>min: 43 tokens</li><li>mean: 135.28 tokens</li><li>max: 214 tokens</li></ul> |
419
+ * Samples:
420
+ | sentence_0 | sentence_1 |
421
+ |:--------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
422
+ | <code>What new feature does ChatGPT voice mode offer as of December?</code> | <code>The most recent twist, again from December (December was a lot) is live video. ChatGPT voice mode now provides the option to share your camera feed with the model and talk about what you can see in real time. Google Gemini have a preview of the same feature, which they managed to ship the day before ChatGPT did.</code> |
423
+ | <code>Which company released a similar live video feature just before ChatGPT?</code> | <code>The most recent twist, again from December (December was a lot) is live video. ChatGPT voice mode now provides the option to share your camera feed with the model and talk about what you can see in real time. Google Gemini have a preview of the same feature, which they managed to ship the day before ChatGPT did.</code> |
424
+ | <code>When did OpenAI make GPT-4o free for all users?</code> | <code>OpenAI made GPT-4o free for all users in May, and Claude 3.5 Sonnet was freely available from its launch in June. This was a momentus change, because for the previous year free users had mostly been restricted to GPT-3.5 level models, meaning new users got a very inaccurate mental model of what a capable LLM could actually do.<br>That era appears to have ended, likely permanently, with OpenAI’s launch of ChatGPT Pro. This $200/month subscription service is the only way to access their most capable model, o1 Pro.<br>Since the trick behind the o1 series (and the future models it will undoubtedly inspire) is to expend more compute time to get better results, I don’t think those days of free access to the best available models are likely to return.</code> |
425
+ * Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
426
+ ```json
427
+ {
428
+ "loss": "MultipleNegativesRankingLoss",
429
+ "matryoshka_dims": [
430
+ 768,
431
+ 512,
432
+ 256,
433
+ 128,
434
+ 64
435
+ ],
436
+ "matryoshka_weights": [
437
+ 1,
438
+ 1,
439
+ 1,
440
+ 1,
441
+ 1
442
+ ],
443
+ "n_dims_per_step": -1
444
+ }
445
+ ```
446
+
447
+ ### Training Hyperparameters
448
+ #### Non-Default Hyperparameters
449
+
450
+ - `eval_strategy`: steps
451
+ - `per_device_train_batch_size`: 10
452
+ - `per_device_eval_batch_size`: 10
453
+ - `num_train_epochs`: 10
454
+ - `multi_dataset_batch_sampler`: round_robin
455
+
456
+ #### All Hyperparameters
457
+ <details><summary>Click to expand</summary>
458
+
459
+ - `overwrite_output_dir`: False
460
+ - `do_predict`: False
461
+ - `eval_strategy`: steps
462
+ - `prediction_loss_only`: True
463
+ - `per_device_train_batch_size`: 10
464
+ - `per_device_eval_batch_size`: 10
465
+ - `per_gpu_train_batch_size`: None
466
+ - `per_gpu_eval_batch_size`: None
467
+ - `gradient_accumulation_steps`: 1
468
+ - `eval_accumulation_steps`: None
469
+ - `torch_empty_cache_steps`: None
470
+ - `learning_rate`: 5e-05
471
+ - `weight_decay`: 0.0
472
+ - `adam_beta1`: 0.9
473
+ - `adam_beta2`: 0.999
474
+ - `adam_epsilon`: 1e-08
475
+ - `max_grad_norm`: 1
476
+ - `num_train_epochs`: 10
477
+ - `max_steps`: -1
478
+ - `lr_scheduler_type`: linear
479
+ - `lr_scheduler_kwargs`: {}
480
+ - `warmup_ratio`: 0.0
481
+ - `warmup_steps`: 0
482
+ - `log_level`: passive
483
+ - `log_level_replica`: warning
484
+ - `log_on_each_node`: True
485
+ - `logging_nan_inf_filter`: True
486
+ - `save_safetensors`: True
487
+ - `save_on_each_node`: False
488
+ - `save_only_model`: False
489
+ - `restore_callback_states_from_checkpoint`: False
490
+ - `no_cuda`: False
491
+ - `use_cpu`: False
492
+ - `use_mps_device`: False
493
+ - `seed`: 42
494
+ - `data_seed`: None
495
+ - `jit_mode_eval`: False
496
+ - `use_ipex`: False
497
+ - `bf16`: False
498
+ - `fp16`: False
499
+ - `fp16_opt_level`: O1
500
+ - `half_precision_backend`: auto
501
+ - `bf16_full_eval`: False
502
+ - `fp16_full_eval`: False
503
+ - `tf32`: None
504
+ - `local_rank`: 0
505
+ - `ddp_backend`: None
506
+ - `tpu_num_cores`: None
507
+ - `tpu_metrics_debug`: False
508
+ - `debug`: []
509
+ - `dataloader_drop_last`: False
510
+ - `dataloader_num_workers`: 0
511
+ - `dataloader_prefetch_factor`: None
512
+ - `past_index`: -1
513
+ - `disable_tqdm`: False
514
+ - `remove_unused_columns`: True
515
+ - `label_names`: None
516
+ - `load_best_model_at_end`: False
517
+ - `ignore_data_skip`: False
518
+ - `fsdp`: []
519
+ - `fsdp_min_num_params`: 0
520
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
521
+ - `tp_size`: 0
522
+ - `fsdp_transformer_layer_cls_to_wrap`: None
523
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
524
+ - `deepspeed`: None
525
+ - `label_smoothing_factor`: 0.0
526
+ - `optim`: adamw_torch
527
+ - `optim_args`: None
528
+ - `adafactor`: False
529
+ - `group_by_length`: False
530
+ - `length_column_name`: length
531
+ - `ddp_find_unused_parameters`: None
532
+ - `ddp_bucket_cap_mb`: None
533
+ - `ddp_broadcast_buffers`: False
534
+ - `dataloader_pin_memory`: True
535
+ - `dataloader_persistent_workers`: False
536
+ - `skip_memory_metrics`: True
537
+ - `use_legacy_prediction_loop`: False
538
+ - `push_to_hub`: False
539
+ - `resume_from_checkpoint`: None
540
+ - `hub_model_id`: None
541
+ - `hub_strategy`: every_save
542
+ - `hub_private_repo`: None
543
+ - `hub_always_push`: False
544
+ - `gradient_checkpointing`: False
545
+ - `gradient_checkpointing_kwargs`: None
546
+ - `include_inputs_for_metrics`: False
547
+ - `include_for_metrics`: []
548
+ - `eval_do_concat_batches`: True
549
+ - `fp16_backend`: auto
550
+ - `push_to_hub_model_id`: None
551
+ - `push_to_hub_organization`: None
552
+ - `mp_parameters`:
553
+ - `auto_find_batch_size`: False
554
+ - `full_determinism`: False
555
+ - `torchdynamo`: None
556
+ - `ray_scope`: last
557
+ - `ddp_timeout`: 1800
558
+ - `torch_compile`: False
559
+ - `torch_compile_backend`: None
560
+ - `torch_compile_mode`: None
561
+ - `include_tokens_per_second`: False
562
+ - `include_num_input_tokens_seen`: False
563
+ - `neftune_noise_alpha`: None
564
+ - `optim_target_modules`: None
565
+ - `batch_eval_metrics`: False
566
+ - `eval_on_start`: False
567
+ - `use_liger_kernel`: False
568
+ - `eval_use_gather_object`: False
569
+ - `average_tokens_across_devices`: False
570
+ - `prompts`: None
571
+ - `batch_sampler`: batch_sampler
572
+ - `multi_dataset_batch_sampler`: round_robin
573
+
574
+ </details>
575
+
576
+ ### Training Logs
577
+ | Epoch | Step | cosine_ndcg@10 |
578
+ |:-----:|:----:|:--------------:|
579
+ | 1.0 | 16 | 0.9484 |
580
+ | 2.0 | 32 | 0.9539 |
581
+ | 3.0 | 48 | 0.9692 |
582
+ | 3.125 | 50 | 0.9846 |
583
+ | 4.0 | 64 | 0.9692 |
584
+ | 5.0 | 80 | 0.9692 |
585
+ | 6.0 | 96 | 0.9539 |
586
+ | 6.25 | 100 | 0.9385 |
587
+ | 7.0 | 112 | 0.9539 |
588
+ | 8.0 | 128 | 0.9539 |
589
+ | 9.0 | 144 | 0.9539 |
590
+ | 9.375 | 150 | 0.9539 |
591
+ | 10.0 | 160 | 0.9539 |
592
+
593
+
594
+ ### Framework Versions
595
+ - Python: 3.13.2
596
+ - Sentence Transformers: 4.1.0
597
+ - Transformers: 4.51.3
598
+ - PyTorch: 2.7.0
599
+ - Accelerate: 1.6.0
600
+ - Datasets: 3.5.1
601
+ - Tokenizers: 0.21.1
602
+
603
+ ## Citation
604
+
605
+ ### BibTeX
606
+
607
+ #### Sentence Transformers
608
+ ```bibtex
609
+ @inproceedings{reimers-2019-sentence-bert,
610
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
611
+ author = "Reimers, Nils and Gurevych, Iryna",
612
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
613
+ month = "11",
614
+ year = "2019",
615
+ publisher = "Association for Computational Linguistics",
616
+ url = "https://arxiv.org/abs/1908.10084",
617
+ }
618
+ ```
619
+
620
+ #### MatryoshkaLoss
621
+ ```bibtex
622
+ @misc{kusupati2024matryoshka,
623
+ title={Matryoshka Representation Learning},
624
+ author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
625
+ year={2024},
626
+ eprint={2205.13147},
627
+ archivePrefix={arXiv},
628
+ primaryClass={cs.LG}
629
+ }
630
+ ```
631
+
632
+ #### MultipleNegativesRankingLoss
633
+ ```bibtex
634
+ @misc{henderson2017efficient,
635
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
636
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
637
+ year={2017},
638
+ eprint={1705.00652},
639
+ archivePrefix={arXiv},
640
+ primaryClass={cs.CL}
641
+ }
642
+ ```
643
+
644
+ <!--
645
+ ## Glossary
646
+
647
+ *Clearly define terms in order to be accessible across audiences.*
648
+ -->
649
+
650
+ <!--
651
+ ## Model Card Authors
652
+
653
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
654
+ -->
655
+
656
+ <!--
657
+ ## Model Card Contact
658
+
659
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
660
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "BertModel"
4
+ ],
5
+ "attention_probs_dropout_prob": 0.1,
6
+ "classifier_dropout": null,
7
+ "hidden_act": "gelu",
8
+ "hidden_dropout_prob": 0.1,
9
+ "hidden_size": 1024,
10
+ "initializer_range": 0.02,
11
+ "intermediate_size": 4096,
12
+ "layer_norm_eps": 1e-12,
13
+ "max_position_embeddings": 512,
14
+ "model_type": "bert",
15
+ "num_attention_heads": 16,
16
+ "num_hidden_layers": 24,
17
+ "pad_token_id": 0,
18
+ "position_embedding_type": "absolute",
19
+ "torch_dtype": "float32",
20
+ "transformers_version": "4.51.3",
21
+ "type_vocab_size": 2,
22
+ "use_cache": true,
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.1.0",
4
+ "transformers": "4.51.3",
5
+ "pytorch": "2.7.0"
6
+ },
7
+ "prompts": {
8
+ "query": "Represent this sentence for searching relevant passages: "
9
+ },
10
+ "default_prompt_name": null,
11
+ "similarity_fn_name": "cosine"
12
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:905623312325c379d4fe8d7b8aef711b755649f16129291f5801d0f2f9565841
3
+ size 1336413848
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "extra_special_tokens": {},
48
+ "mask_token": "[MASK]",
49
+ "max_length": 512,
50
+ "model_max_length": 512,
51
+ "pad_to_multiple_of": null,
52
+ "pad_token": "[PAD]",
53
+ "pad_token_type_id": 0,
54
+ "padding_side": "right",
55
+ "sep_token": "[SEP]",
56
+ "stride": 0,
57
+ "strip_accents": null,
58
+ "tokenize_chinese_chars": true,
59
+ "tokenizer_class": "BertTokenizer",
60
+ "truncation_side": "right",
61
+ "truncation_strategy": "longest_first",
62
+ "unk_token": "[UNK]"
63
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff