CalebMaresca's picture
Add new SentenceTransformer model
4debbcb verified
---
tags:
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:370
- loss:MatryoshkaLoss
- loss:MultipleNegativesRankingLoss
base_model: Snowflake/snowflake-arctic-embed-l
widget:
- source_sentence: ' What is the proposed alternative to imposing random events purely
by chance on players?'
sentences:
- "half a day). \n• They are perceived to be new and innovative (despite being around\
\ since 1987). \n• They are easy to transport, requiring only pen and paper –\
\ with perhaps a few maps and \ncounters. \n• They work well in multi-domain,\
\ multi-agency contexts allowing all Actors to participate \nequally. \nA few\
\ Words of Warning \n• The fact that a Matrix Game requires little infrastructure\
\ can be a problem – it just \ndoesn't look sexy and the strengths that it can\
\ be done quickly with the minimum of \nfuss, can be reduced by efforts to make\
\ it look cool/expensive. \n• The non-quantitative nature of the game can frustrate\
\ analysts. \n• Matrix Games require an experience facilitator to run them."
- "inform the other players of their stated intentions. In many cases these are\
\ not really \n\"arguments\" as part of the game, so shouldn't count as their\
\ action for the turn, unless they \nwish to specify a measurable effect (such\
\ as increasing their approval ratings). \nTrade Agreements \nIn some games, trade\
\ forms a very important part of the game narrative. In most cases this \ncan\
\ be treated simply as part of the normal ebb and flow of the argument process.\
\ \nHowever, in some circumstances, particularly when timescales are long, trade\
\ can require \ngreater attention as to the nuances of the economic benefits and\
\ impacts. In these cases, it \nmay be necessary to get the two sides to make\
\ additional arguments as to what they expect"
- "possible throughout the game, having “random events” happen completely at random\
\ is \nproblematic. An Actor may be disadvantaged purely by chance, more than\
\ once during the \ngame, which can reduce their immersion and engagement. The\
\ narrative develops during \nthe game based on the decisions of the players and\
\ their reactions to the decisions of other \nplayers. Having random events imposed\
\ on them by chance breaks this “cause and effect” \ncycle and degrades the game\
\ flow. \nThe alternative is to give the random event to the participants. They\
\ will then make a \ndecision as to how this can contribute to the narrative being\
\ developed by the players. They"
- source_sentence: ' What is the primary purpose of the game described in the context?'
sentences:
- "If you are using voting systems, either as Diceless Adjudication or as Estimative\
\ Probability, \nyou should take great care to ensure that the players are being\
\ as professional as possible, \nand not merely \"voting for themselves\" in a\
\ competitive manner. Many players can be quite \nvery competitive, so it may\
\ be necessary to not allow them to vote on their argument – and \nequally it\
\ may be necessary to keep an eye on players who are in direct competition. The\
\ \nintention is to develop a narrative, generating insights – rather than trying\
\ to win at all \ncosts. \n \n7 An example is https://www.turningtechnologies.eu/turningpoint/\
\ \n8 An example is https://www.polleverywhere.com/\n\fVersion 15 \nPage 14 of\
\ 52 \n© Tom Mouat 2019, 2020, 2022, 2023"
- "spend the time piling markers on counters. Tracks can be generic (in that they\
\ simply record \nthe number of plusses or minuses applied) or they might have\
\ specific \"trigger levels\" (in \nthat when the morale of the infantry is reduced\
\ to -3, the \"raw\" units will desert and return \nto their homes. \nIt can also\
\ be useful to have a \"Press\" actor whose job it is to record the results of\
\ \narguments (both visible to the public and those not), as well as putting the\
\ \"Press spin\" on \nthe events. This role can be useful in looking after the\
\ \"Consequence Management\" \nelements mentioned earlier. \nThe Components (and\
\ Characters) Affect the Game \nWhen participants are thinking on their feet,\
\ what they can see will affect what they argue"
- "materials, a short game, and small numbers of participants. If they want to conduct\
\ a \"deep \ndive\", this isn't the appropriate game - the purpose is to identify\
\ the insights – so make a \nnote and move on. The \"deep dive\" should follow\
\ later or in a different type of game. You \nshould, therefore, make sure you\
\ include this point in your introductory briefing so that the \nplayers are clear\
\ from the outset. \nWhen dealing with dominant people, who continually interrupt\
\ and dominate the \nArguments, you need to take a harder line. You should interrupt\
\ them when they interrupt \nanother player making a point. Point out to them\
\ that they had their chance. This isn't a"
- source_sentence: ' Why should Big Projects or Long-Term Plans require no more than
three successful arguments in the game?'
sentences:
- "much on this single thing. \nThis does not mean that arguments have to only\
\ be about things that can happen within \nthe turn length of the game. It is\
\ possible to make \"long term\" arguments like anything else. \nIf, in a Baltic\
\ game with week-long turns, you want to argue that an electricity cable \nbetween\
\ Sweden and Lithuania is to be built with the aim of reducing Lithuania's \n\
dependence on Russian energy, this would be judged as normal. It just would not\
\ come to \n \n9 I am indebted to Prof Rex Brynen for this suggestion.\n\fVersion\
\ 15 \nPage 23 of 52 \n© Tom Mouat 2019, 2020, 2022, 2023 \nfruition in the length\
\ of the game – but, assuming the argument was successful, it would"
- "games.\n\fVersion 15 \nPage 36 of 52 \n© Tom Mouat 2019, 2020, 2022, 2023 \n\
Why I like Matrix Games \n• Designing a Matrix Game can be done quickly with the\
\ minimum of fuss. \n• Participating in a Matrix Game does not require an understanding\
\ of complex and \nunfamiliar rules. \n• Matrix games can cover a wide variety\
\ of possible scenarios, including conceptual \nconflicts like Cyber. \n• They\
\ are especially good in the non-kinetic, effects based, domain. \n• Matrix games\
\ deal with qualitative outputs so are especially useful for non-analysts. \n\
• The games work best with small groups, increasing immersion and buy-in to the\
\ game. \n• Matrix games are extremely inexpensive (and they work best with short\
\ sessions lasting \nhalf a day)."
- "protection: Its hidden location, its boundary fence, and the security guards,\
\ all of which \nmust be overcome by successful arguments before the base can\
\ be penetrated. \nAs a rule of thumb, nothing should have more than 3 levels\
\ of protection as it will simply \ntake too long and dominate the game to the\
\ exclusion of everything else. \nBig Projects or Long-Term Plans \nDepending\
\ on the level of the game, some actions and events represent such a large \n\
investment in time and effort that they require multiple arguments in order to\
\ bring them \nto fruition. As a rule of thumb, a Big Project should also take\
\ no more than 3 successful \narguments (like protected and hidden things above);\
\ otherwise, the game is focussed too"
- source_sentence: ' Which associations related to wargaming and simulation are mentioned
in the context?'
sentences:
- "out their objectives and explain why they though they succeeded or failed can\
\ be most \ninstructive. Also, if you then ask the assembled group \"who won?\"\
\ and they all agree, then \nthis can be a very powerful indicator of things that\
\ might need to be looked at more closely \nas a result of the game. \nFinally,\
\ the insights from the game can take a little time to come out. They might not\
\ be \nimmediately obvious, so taking time to consider what happened in the game\
\ and whether \nindividual events are noteworthy, is very useful. I am continually\
\ surprised at the predictive \npower of such a simple game. \n \n \n \n11 See:\
\ Game theory, simulated interaction, and unaided judgement for forecasting decisions\
\ in conflicts. Kesten C. Green."
- 'gaming vignettes
job opportunities/positions vacant
latest links
methodology
not-so-serious
playtesters needed
reader survey
request for proposals
scholarships and fellowships
simulation and game reports
simulation and game reviews
simulation and gaming debacles
simulation and gaming history
simulation and gaming ideas
simulation and gaming journals
simulation and gaming materials
simulation and gaming miscellany
simulation and gaming news
simulation and gaming publications
simulation and gaming software
Archives
M T W T F S S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
26 27 28 29 30
Associations
Australian Defence Force Wargaming Group
Connections Netherlands
Connections North (Canada)'
- "Senior Officers, Dominant People and Contentious Arguments \nIt is not uncommon\
\ in a Matrix Game that the participants want to \"debate\" the arguments. \n\
To a limited extent this is ok, but as stated elsewhere, the game needs to move\
\ at a pace, \ncreating an immersive narrative and forcing the players to have\
\ to live with the \nconsequences of their earlier decisions. \nIt can happen\
\ that a Senior Officer, used to \"seminar wargames\", will interrupt when you\
\ \nwant to move on and say \"wait a minute - this is a really valuable debate\
\ - let's just dig \ndown...\" You should try to point out that this is not that\
\ sort of game - Matrix Games are to \ngain an insight and understanding in a\
\ specific way. Short notice, minimal preparation and"
- source_sentence: ' Why is it important for player roles in a Matrix Game to operate
at broadly similar levels?'
sentences:
- 'The Basic Rule. The basic rule is as follows: 1 x 6-Sided Dice = 1 x Combat
Unit The size of that Combat Unit will, of course, vary from game to game. In
the boarding action it may be as little as 5-10 men; in a Map Game, it could be
as much as an entire Brigade, or even a Corps.
The Method. The dice on the opposing sides are rolled as follows: Roll the Dice.
Line them up, Highest vs Highest If one side has more dice than the other, any
dice that are extra, and score less than the lowest dice of the side with the
fewer dice, are ignored.'
- "Matrix Game Checklist ....................................................................................\
\ 38 \nSample Spendable Bonus Cards ......................................................................\
\ 40 \nSample Random Events ...................................................................................\
\ 41 \nSample Voting Cards for Diceless Adjudication ...............................................\
\ 43 \nSample Estimative Probability Cards ...............................................................\
\ 44 \nSample Turn Order Cards ................................................................................\
\ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........\
\ 46"
- 'When you are designing a Matrix Game it is worth thinking about the level at
which the players roles will be operating in the game. In is usually better, and
produces a more balanced game, when the level on which the player roles are operating
are broadly similar. It would be difficult to get a balanced game if 3 of the
players are playing Generals in command of vast Armies, and another player is
playing a simple individual soldier.
Levels of Protection and Hidden Things.'
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
- cosine_accuracy@1
- cosine_accuracy@3
- cosine_accuracy@5
- cosine_accuracy@10
- cosine_precision@1
- cosine_precision@3
- cosine_precision@5
- cosine_precision@10
- cosine_recall@1
- cosine_recall@3
- cosine_recall@5
- cosine_recall@10
- cosine_ndcg@10
- cosine_mrr@10
- cosine_map@100
model-index:
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
results:
- task:
type: information-retrieval
name: Information Retrieval
dataset:
name: Unknown
type: unknown
metrics:
- type: cosine_accuracy@1
value: 0.9347826086956522
name: Cosine Accuracy@1
- type: cosine_accuracy@3
value: 1.0
name: Cosine Accuracy@3
- type: cosine_accuracy@5
value: 1.0
name: Cosine Accuracy@5
- type: cosine_accuracy@10
value: 1.0
name: Cosine Accuracy@10
- type: cosine_precision@1
value: 0.9347826086956522
name: Cosine Precision@1
- type: cosine_precision@3
value: 0.33333333333333337
name: Cosine Precision@3
- type: cosine_precision@5
value: 0.1999999999999999
name: Cosine Precision@5
- type: cosine_precision@10
value: 0.09999999999999995
name: Cosine Precision@10
- type: cosine_recall@1
value: 0.9347826086956522
name: Cosine Recall@1
- type: cosine_recall@3
value: 1.0
name: Cosine Recall@3
- type: cosine_recall@5
value: 1.0
name: Cosine Recall@5
- type: cosine_recall@10
value: 1.0
name: Cosine Recall@10
- type: cosine_ndcg@10
value: 0.97023760333851
name: Cosine Ndcg@10
- type: cosine_mrr@10
value: 0.9601449275362318
name: Cosine Mrr@10
- type: cosine_map@100
value: 0.960144927536232
name: Cosine Map@100
---
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
## Model Details
### Model Description
- **Model Type:** Sentence Transformer
- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
- **Maximum Sequence Length:** 512 tokens
- **Output Dimensionality:** 1024 dimensions
- **Similarity Function:** Cosine Similarity
<!-- - **Training Dataset:** Unknown -->
<!-- - **Language:** Unknown -->
<!-- - **License:** Unknown -->
### Model Sources
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
### Full Model Architecture
```
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
```
## Usage
### Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
```bash
pip install -U sentence-transformers
```
Then you can load this model and run inference.
```python
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("CalebMaresca/matrix-game-embeddings-ft-v1")
# Run inference
sentences = [
' Why is it important for player roles in a Matrix Game to operate at broadly similar levels?',
'When you are designing a Matrix Game it is worth thinking about the level at which the players roles will be operating in the game. In is usually better, and produces a more balanced game, when the level on which the player roles are operating are broadly similar. It would be difficult to get a balanced game if 3 of the players are playing Generals in command of vast Armies, and another player is playing a simple individual soldier.\n\nLevels of Protection and Hidden Things.',
'Matrix Game Checklist .................................................................................... 38 \nSample Spendable Bonus Cards ...................................................................... 40 \nSample Random Events ................................................................................... 41 \nSample Voting Cards for Diceless Adjudication ............................................... 43 \nSample Estimative Probability Cards ............................................................... 44 \nSample Turn Order Cards ................................................................................ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........ 46',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
```
<!--
### Direct Usage (Transformers)
<details><summary>Click to see the direct usage in Transformers</summary>
</details>
-->
<!--
### Downstream Usage (Sentence Transformers)
You can finetune this model on your own dataset.
<details><summary>Click to expand</summary>
</details>
-->
<!--
### Out-of-Scope Use
*List how the model may foreseeably be misused and address what users ought not to do with the model.*
-->
## Evaluation
### Metrics
#### Information Retrieval
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)
| Metric | Value |
|:--------------------|:-----------|
| cosine_accuracy@1 | 0.9348 |
| cosine_accuracy@3 | 1.0 |
| cosine_accuracy@5 | 1.0 |
| cosine_accuracy@10 | 1.0 |
| cosine_precision@1 | 0.9348 |
| cosine_precision@3 | 0.3333 |
| cosine_precision@5 | 0.2 |
| cosine_precision@10 | 0.1 |
| cosine_recall@1 | 0.9348 |
| cosine_recall@3 | 1.0 |
| cosine_recall@5 | 1.0 |
| cosine_recall@10 | 1.0 |
| **cosine_ndcg@10** | **0.9702** |
| cosine_mrr@10 | 0.9601 |
| cosine_map@100 | 0.9601 |
<!--
## Bias, Risks and Limitations
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
-->
<!--
### Recommendations
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
-->
## Training Details
### Training Dataset
#### Unnamed Dataset
* Size: 370 training samples
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
* Approximate statistics based on the first 370 samples:
| | sentence_0 | sentence_1 |
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
| type | string | string |
| details | <ul><li>min: 11 tokens</li><li>mean: 20.19 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 150.83 tokens</li><li>max: 512 tokens</li></ul> |
* Samples:
| sentence_0 | sentence_1 |
|:------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <code> What distinguishes "established facts" from other types of facts in the game briefings or play?</code> | <code>Forces soldiers are going to be much more effective in combat than untrained protestors; <br>and "established facts" which are facts that have been specifically mentioned in the game <br>briefings or have become established during play as the result of successful arguments. <br>The latter can be immediately deployed as supporting reasons (Pros and Cons), but the <br>former need to have been argued successfully in order for them to be specifically included. <br>Many inexperienced players will make vast all-encompassing arguments full of assumptions <br>that are not reasonable. For example: It is not a reasonable assumption that unarmed <br>Protestors could fight off trained Police. It is reasonable to assume that the Police are</code> |
| <code> Why is it unreasonable to assume that unarmed protestors could fight off trained police according to the context?</code> | <code>Forces soldiers are going to be much more effective in combat than untrained protestors; <br>and "established facts" which are facts that have been specifically mentioned in the game <br>briefings or have become established during play as the result of successful arguments. <br>The latter can be immediately deployed as supporting reasons (Pros and Cons), but the <br>former need to have been argued successfully in order for them to be specifically included. <br>Many inexperienced players will make vast all-encompassing arguments full of assumptions <br>that are not reasonable. For example: It is not a reasonable assumption that unarmed <br>Protestors could fight off trained Police. It is reasonable to assume that the Police are</code> |
| <code> What was the outcome of the initial Russian attack against the German units, and how did it affect the ammunition status of both sides?</code> | <code>The Russians succeed in pushing back one of the German units and forcing and already depleted unit to use up ammunition, (but are pushed back themselves and 2 units use a lot of ammo (one of which becomes combat ineffective on -3)). Overall, as the success is matched by failure, the line itself holds. The Russians attack again, the next day:<br><br>Initial Dice Throw: RUSSIAN: 6 5 5 4 2 4 GERMAN: 1 2 4 4 Lined Up and Modified: RUSSIAN: 5 4 3 3 2 1 (two of the Russians = -2) GERMAN: 4 3 3 2 (one of the Germans = -1) Result of Third Day; lose: (one of the Germans = +0) RUSSIAN: 5 4 3 3 GERMAN: 3 lose: 4 3 2</code> |
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
```json
{
"loss": "MultipleNegativesRankingLoss",
"matryoshka_dims": [
768,
512,
256,
128,
64
],
"matryoshka_weights": [
1,
1,
1,
1,
1
],
"n_dims_per_step": -1
}
```
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `num_train_epochs`: 10
- `multi_dataset_batch_sampler`: round_robin
#### All Hyperparameters
<details><summary>Click to expand</summary>
- `overwrite_output_dir`: False
- `do_predict`: False
- `eval_strategy`: steps
- `prediction_loss_only`: True
- `per_device_train_batch_size`: 10
- `per_device_eval_batch_size`: 10
- `per_gpu_train_batch_size`: None
- `per_gpu_eval_batch_size`: None
- `gradient_accumulation_steps`: 1
- `eval_accumulation_steps`: None
- `torch_empty_cache_steps`: None
- `learning_rate`: 5e-05
- `weight_decay`: 0.0
- `adam_beta1`: 0.9
- `adam_beta2`: 0.999
- `adam_epsilon`: 1e-08
- `max_grad_norm`: 1
- `num_train_epochs`: 10
- `max_steps`: -1
- `lr_scheduler_type`: linear
- `lr_scheduler_kwargs`: {}
- `warmup_ratio`: 0.0
- `warmup_steps`: 0
- `log_level`: passive
- `log_level_replica`: warning
- `log_on_each_node`: True
- `logging_nan_inf_filter`: True
- `save_safetensors`: True
- `save_on_each_node`: False
- `save_only_model`: False
- `restore_callback_states_from_checkpoint`: False
- `no_cuda`: False
- `use_cpu`: False
- `use_mps_device`: False
- `seed`: 42
- `data_seed`: None
- `jit_mode_eval`: False
- `use_ipex`: False
- `bf16`: False
- `fp16`: False
- `fp16_opt_level`: O1
- `half_precision_backend`: auto
- `bf16_full_eval`: False
- `fp16_full_eval`: False
- `tf32`: None
- `local_rank`: 0
- `ddp_backend`: None
- `tpu_num_cores`: None
- `tpu_metrics_debug`: False
- `debug`: []
- `dataloader_drop_last`: False
- `dataloader_num_workers`: 0
- `dataloader_prefetch_factor`: None
- `past_index`: -1
- `disable_tqdm`: False
- `remove_unused_columns`: True
- `label_names`: None
- `load_best_model_at_end`: False
- `ignore_data_skip`: False
- `fsdp`: []
- `fsdp_min_num_params`: 0
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
- `tp_size`: 0
- `fsdp_transformer_layer_cls_to_wrap`: None
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
- `deepspeed`: None
- `label_smoothing_factor`: 0.0
- `optim`: adamw_torch
- `optim_args`: None
- `adafactor`: False
- `group_by_length`: False
- `length_column_name`: length
- `ddp_find_unused_parameters`: None
- `ddp_bucket_cap_mb`: None
- `ddp_broadcast_buffers`: False
- `dataloader_pin_memory`: True
- `dataloader_persistent_workers`: False
- `skip_memory_metrics`: True
- `use_legacy_prediction_loop`: False
- `push_to_hub`: False
- `resume_from_checkpoint`: None
- `hub_model_id`: None
- `hub_strategy`: every_save
- `hub_private_repo`: None
- `hub_always_push`: False
- `gradient_checkpointing`: False
- `gradient_checkpointing_kwargs`: None
- `include_inputs_for_metrics`: False
- `include_for_metrics`: []
- `eval_do_concat_batches`: True
- `fp16_backend`: auto
- `push_to_hub_model_id`: None
- `push_to_hub_organization`: None
- `mp_parameters`:
- `auto_find_batch_size`: False
- `full_determinism`: False
- `torchdynamo`: None
- `ray_scope`: last
- `ddp_timeout`: 1800
- `torch_compile`: False
- `torch_compile_backend`: None
- `torch_compile_mode`: None
- `include_tokens_per_second`: False
- `include_num_input_tokens_seen`: False
- `neftune_noise_alpha`: None
- `optim_target_modules`: None
- `batch_eval_metrics`: False
- `eval_on_start`: False
- `use_liger_kernel`: False
- `eval_use_gather_object`: False
- `average_tokens_across_devices`: False
- `prompts`: None
- `batch_sampler`: batch_sampler
- `multi_dataset_batch_sampler`: round_robin
</details>
### Training Logs
| Epoch | Step | cosine_ndcg@10 |
|:------:|:----:|:--------------:|
| 1.0 | 37 | 0.9273 |
| 1.3514 | 50 | 0.9490 |
| 2.0 | 74 | 0.9462 |
| 2.7027 | 100 | 0.9527 |
| 3.0 | 111 | 0.9527 |
| 4.0 | 148 | 0.9783 |
| 4.0541 | 150 | 0.9811 |
| 5.0 | 185 | 0.9622 |
| 5.4054 | 200 | 0.9622 |
| 6.0 | 222 | 0.9702 |
| 6.7568 | 250 | 0.9622 |
| 7.0 | 259 | 0.9622 |
| 8.0 | 296 | 0.9702 |
| 8.1081 | 300 | 0.9702 |
| 9.0 | 333 | 0.9702 |
| 9.4595 | 350 | 0.9702 |
| 10.0 | 370 | 0.9702 |
### Framework Versions
- Python: 3.13.2
- Sentence Transformers: 4.1.0
- Transformers: 4.51.3
- PyTorch: 2.7.0+cu126
- Accelerate: 1.6.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1
## Citation
### BibTeX
#### Sentence Transformers
```bibtex
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
```
#### MatryoshkaLoss
```bibtex
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
```
#### MultipleNegativesRankingLoss
```bibtex
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
<!--
## Glossary
*Clearly define terms in order to be accessible across audiences.*
-->
<!--
## Model Card Authors
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
-->
<!--
## Model Card Contact
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
-->