matrix-game-embeddings-ft-v1 / README.md

Add new SentenceTransformer model

4debbcb verified 4 months ago

31.8 kB

	---
	tags:
	- sentence-transformers
	- sentence-similarity
	- feature-extraction
	- generated_from_trainer
	- dataset_size:370
	- loss:MatryoshkaLoss
	- loss:MultipleNegativesRankingLoss
	base_model: Snowflake/snowflake-arctic-embed-l
	widget:
	- source_sentence: ' What is the proposed alternative to imposing random events purely
	by chance on players?'
	sentences:
	- "half a day). \n• They are perceived to be new and innovative (despite being around\
	\ since 1987). \n• They are easy to transport, requiring only pen and paper –\
	\ with perhaps a few maps and \ncounters. \n• They work well in multi-domain,\
	\ multi-agency contexts allowing all Actors to participate \nequally. \nA few\
	\ Words of Warning \n• The fact that a Matrix Game requires little infrastructure\
	\ can be a problem – it just \ndoesn't look sexy and the strengths that it can\
	\ be done quickly with the minimum of \nfuss, can be reduced by efforts to make\
	\ it look cool/expensive. \n• The non-quantitative nature of the game can frustrate\
	\ analysts. \n• Matrix Games require an experience facilitator to run them."
	- "inform the other players of their stated intentions. In many cases these are\
	\ not really \n\"arguments\" as part of the game, so shouldn't count as their\
	\ action for the turn, unless they \nwish to specify a measurable effect (such\
	\ as increasing their approval ratings). \nTrade Agreements \nIn some games, trade\
	\ forms a very important part of the game narrative. In most cases this \ncan\
	\ be treated simply as part of the normal ebb and flow of the argument process.\
	\ \nHowever, in some circumstances, particularly when timescales are long, trade\
	\ can require \ngreater attention as to the nuances of the economic benefits and\
	\ impacts. In these cases, it \nmay be necessary to get the two sides to make\
	\ additional arguments as to what they expect"
	- "possible throughout the game, having “random events” happen completely at random\
	\ is \nproblematic. An Actor may be disadvantaged purely by chance, more than\
	\ once during the \ngame, which can reduce their immersion and engagement. The\
	\ narrative develops during \nthe game based on the decisions of the players and\
	\ their reactions to the decisions of other \nplayers. Having random events imposed\
	\ on them by chance breaks this “cause and effect” \ncycle and degrades the game\
	\ flow. \nThe alternative is to give the random event to the participants. They\
	\ will then make a \ndecision as to how this can contribute to the narrative being\
	\ developed by the players. They"
	- source_sentence: ' What is the primary purpose of the game described in the context?'
	sentences:
	- "If you are using voting systems, either as Diceless Adjudication or as Estimative\
	\ Probability, \nyou should take great care to ensure that the players are being\
	\ as professional as possible, \nand not merely \"voting for themselves\" in a\
	\ competitive manner. Many players can be quite \nvery competitive, so it may\
	\ be necessary to not allow them to vote on their argument – and \nequally it\
	\ may be necessary to keep an eye on players who are in direct competition. The\
	\ \nintention is to develop a narrative, generating insights – rather than trying\
	\ to win at all \ncosts. \n \n7 An example is https://www.turningtechnologies.eu/turningpoint/\
	\ \n8 An example is https://www.polleverywhere.com/\n\fVersion 15 \nPage 14 of\
	\ 52 \n© Tom Mouat 2019, 2020, 2022, 2023"
	- "spend the time piling markers on counters. Tracks can be generic (in that they\
	\ simply record \nthe number of plusses or minuses applied) or they might have\
	\ specific \"trigger levels\" (in \nthat when the morale of the infantry is reduced\
	\ to -3, the \"raw\" units will desert and return \nto their homes. \nIt can also\
	\ be useful to have a \"Press\" actor whose job it is to record the results of\
	\ \narguments (both visible to the public and those not), as well as putting the\
	\ \"Press spin\" on \nthe events. This role can be useful in looking after the\
	\ \"Consequence Management\" \nelements mentioned earlier. \nThe Components (and\
	\ Characters) Affect the Game \nWhen participants are thinking on their feet,\
	\ what they can see will affect what they argue"
	- "materials, a short game, and small numbers of participants. If they want to conduct\
	\ a \"deep \ndive\", this isn't the appropriate game - the purpose is to identify\
	\ the insights – so make a \nnote and move on. The \"deep dive\" should follow\
	\ later or in a different type of game. You \nshould, therefore, make sure you\
	\ include this point in your introductory briefing so that the \nplayers are clear\
	\ from the outset. \nWhen dealing with dominant people, who continually interrupt\
	\ and dominate the \nArguments, you need to take a harder line. You should interrupt\
	\ them when they interrupt \nanother player making a point. Point out to them\
	\ that they had their chance. This isn't a"
	- source_sentence: ' Why should Big Projects or Long-Term Plans require no more than
	three successful arguments in the game?'
	sentences:
	- "much on this single thing. \nThis does not mean that arguments have to only\
	\ be about things that can happen within \nthe turn length of the game. It is\
	\ possible to make \"long term\" arguments like anything else. \nIf, in a Baltic\
	\ game with week-long turns, you want to argue that an electricity cable \nbetween\
	\ Sweden and Lithuania is to be built with the aim of reducing Lithuania's \n\
	dependence on Russian energy, this would be judged as normal. It just would not\
	\ come to \n \n9 I am indebted to Prof Rex Brynen for this suggestion.\n\fVersion\
	\ 15 \nPage 23 of 52 \n© Tom Mouat 2019, 2020, 2022, 2023 \nfruition in the length\
	\ of the game – but, assuming the argument was successful, it would"
	- "games.\n\fVersion 15 \nPage 36 of 52 \n© Tom Mouat 2019, 2020, 2022, 2023 \n\
	Why I like Matrix Games \n• Designing a Matrix Game can be done quickly with the\
	\ minimum of fuss. \n• Participating in a Matrix Game does not require an understanding\
	\ of complex and \nunfamiliar rules. \n• Matrix games can cover a wide variety\
	\ of possible scenarios, including conceptual \nconflicts like Cyber. \n• They\
	\ are especially good in the non-kinetic, effects based, domain. \n• Matrix games\
	\ deal with qualitative outputs so are especially useful for non-analysts. \n\
	• The games work best with small groups, increasing immersion and buy-in to the\
	\ game. \n• Matrix games are extremely inexpensive (and they work best with short\
	\ sessions lasting \nhalf a day)."
	- "protection: Its hidden location, its boundary fence, and the security guards,\
	\ all of which \nmust be overcome by successful arguments before the base can\
	\ be penetrated. \nAs a rule of thumb, nothing should have more than 3 levels\
	\ of protection as it will simply \ntake too long and dominate the game to the\
	\ exclusion of everything else. \nBig Projects or Long-Term Plans \nDepending\
	\ on the level of the game, some actions and events represent such a large \n\
	investment in time and effort that they require multiple arguments in order to\
	\ bring them \nto fruition. As a rule of thumb, a Big Project should also take\
	\ no more than 3 successful \narguments (like protected and hidden things above);\
	\ otherwise, the game is focussed too"
	- source_sentence: ' Which associations related to wargaming and simulation are mentioned
	in the context?'
	sentences:
	- "out their objectives and explain why they though they succeeded or failed can\
	\ be most \ninstructive. Also, if you then ask the assembled group \"who won?\"\
	\ and they all agree, then \nthis can be a very powerful indicator of things that\
	\ might need to be looked at more closely \nas a result of the game. \nFinally,\
	\ the insights from the game can take a little time to come out. They might not\
	\ be \nimmediately obvious, so taking time to consider what happened in the game\
	\ and whether \nindividual events are noteworthy, is very useful. I am continually\
	\ surprised at the predictive \npower of such a simple game. \n \n \n \n11 See:\
	\ Game theory, simulated interaction, and unaided judgement for forecasting decisions\
	\ in conflicts. Kesten C. Green."
	- 'gaming vignettes


	job opportunities/positions vacant


	latest links


	methodology


	not-so-serious


	playtesters needed


	reader survey


	request for proposals


	scholarships and fellowships


	simulation and game reports


	simulation and game reviews


	simulation and gaming debacles


	simulation and gaming history


	simulation and gaming ideas


	simulation and gaming journals


	simulation and gaming materials


	simulation and gaming miscellany


	simulation and gaming news


	simulation and gaming publications


	simulation and gaming software


	Archives


	M T W T F S S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
	26 27 28 29 30


	Associations


	Australian Defence Force Wargaming Group


	Connections Netherlands


	Connections North (Canada)'
	- "Senior Officers, Dominant People and Contentious Arguments \nIt is not uncommon\
	\ in a Matrix Game that the participants want to \"debate\" the arguments. \n\
	To a limited extent this is ok, but as stated elsewhere, the game needs to move\
	\ at a pace, \ncreating an immersive narrative and forcing the players to have\
	\ to live with the \nconsequences of their earlier decisions. \nIt can happen\
	\ that a Senior Officer, used to \"seminar wargames\", will interrupt when you\
	\ \nwant to move on and say \"wait a minute - this is a really valuable debate\
	\ - let's just dig \ndown...\" You should try to point out that this is not that\
	\ sort of game - Matrix Games are to \ngain an insight and understanding in a\
	\ specific way. Short notice, minimal preparation and"
	- source_sentence: ' Why is it important for player roles in a Matrix Game to operate
	at broadly similar levels?'
	sentences:
	- 'The Basic Rule. The basic rule is as follows: 1 x 6-Sided Dice = 1 x Combat
	Unit The size of that Combat Unit will, of course, vary from game to game. In
	the boarding action it may be as little as 5-10 men; in a Map Game, it could be
	as much as an entire Brigade, or even a Corps.


	The Method. The dice on the opposing sides are rolled as follows: Roll the Dice.
	Line them up, Highest vs Highest If one side has more dice than the other, any
	dice that are extra, and score less than the lowest dice of the side with the
	fewer dice, are ignored.'
	- "Matrix Game Checklist ....................................................................................\
	\ 38 \nSample Spendable Bonus Cards ......................................................................\
	\ 40 \nSample Random Events ...................................................................................\
	\ 41 \nSample Voting Cards for Diceless Adjudication ...............................................\
	\ 43 \nSample Estimative Probability Cards ...............................................................\
	\ 44 \nSample Turn Order Cards ................................................................................\
	\ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........\
	\ 46"
	- 'When you are designing a Matrix Game it is worth thinking about the level at
	which the players roles will be operating in the game. In is usually better, and
	produces a more balanced game, when the level on which the player roles are operating
	are broadly similar. It would be difficult to get a balanced game if 3 of the
	players are playing Generals in command of vast Armies, and another player is
	playing a simple individual soldier.


	Levels of Protection and Hidden Things.'
	pipeline_tag: sentence-similarity
	library_name: sentence-transformers
	metrics:
	- cosine_accuracy@1
	- cosine_accuracy@3
	- cosine_accuracy@5
	- cosine_accuracy@10
	- cosine_precision@1
	- cosine_precision@3
	- cosine_precision@5
	- cosine_precision@10
	- cosine_recall@1
	- cosine_recall@3
	- cosine_recall@5
	- cosine_recall@10
	- cosine_ndcg@10
	- cosine_mrr@10
	- cosine_map@100
	model-index:
	- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l
	results:
	- task:
	type: information-retrieval
	name: Information Retrieval
	dataset:
	name: Unknown
	type: unknown
	metrics:
	- type: cosine_accuracy@1
	value: 0.9347826086956522
	name: Cosine Accuracy@1
	- type: cosine_accuracy@3
	value: 1.0
	name: Cosine Accuracy@3
	- type: cosine_accuracy@5
	value: 1.0
	name: Cosine Accuracy@5
	- type: cosine_accuracy@10
	value: 1.0
	name: Cosine Accuracy@10
	- type: cosine_precision@1
	value: 0.9347826086956522
	name: Cosine Precision@1
	- type: cosine_precision@3
	value: 0.33333333333333337
	name: Cosine Precision@3
	- type: cosine_precision@5
	value: 0.1999999999999999
	name: Cosine Precision@5
	- type: cosine_precision@10
	value: 0.09999999999999995
	name: Cosine Precision@10
	- type: cosine_recall@1
	value: 0.9347826086956522
	name: Cosine Recall@1
	- type: cosine_recall@3
	value: 1.0
	name: Cosine Recall@3
	- type: cosine_recall@5
	value: 1.0
	name: Cosine Recall@5
	- type: cosine_recall@10
	value: 1.0
	name: Cosine Recall@10
	- type: cosine_ndcg@10
	value: 0.97023760333851
	name: Cosine Ndcg@10
	- type: cosine_mrr@10
	value: 0.9601449275362318
	name: Cosine Mrr@10
	- type: cosine_map@100
	value: 0.960144927536232
	name: Cosine Map@100
	---

	# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l

	This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

	## Model Details

	### Model Description
	- Model Type: Sentence Transformer
	- Base model: [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b -->
	- Maximum Sequence Length: 512 tokens
	- Output Dimensionality: 1024 dimensions
	- Similarity Function: Cosine Similarity
	<!-- - Training Dataset: Unknown -->
	<!-- - Language: Unknown -->
	<!-- - License: Unknown -->

	### Model Sources

	- Documentation: [Sentence Transformers Documentation](https://sbert.net)
	- Repository: [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
	- Hugging Face: [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)

	### Full Model Architecture

	```
	SentenceTransformer(
	(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
	(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
	(2): Normalize()
	)
	```

	## Usage

	### Direct Usage (Sentence Transformers)

	First install the Sentence Transformers library:

	```bash
	pip install -U sentence-transformers
	```

	Then you can load this model and run inference.
	```python
	from sentence_transformers import SentenceTransformer

	# Download from the 🤗 Hub
	model = SentenceTransformer("CalebMaresca/matrix-game-embeddings-ft-v1")
	# Run inference
	sentences = [
	' Why is it important for player roles in a Matrix Game to operate at broadly similar levels?',
	'When you are designing a Matrix Game it is worth thinking about the level at which the players roles will be operating in the game. In is usually better, and produces a more balanced game, when the level on which the player roles are operating are broadly similar. It would be difficult to get a balanced game if 3 of the players are playing Generals in command of vast Armies, and another player is playing a simple individual soldier.\n\nLevels of Protection and Hidden Things.',
	'Matrix Game Checklist .................................................................................... 38 \nSample Spendable Bonus Cards ...................................................................... 40 \nSample Random Events ................................................................................... 41 \nSample Voting Cards for Diceless Adjudication ............................................... 43 \nSample Estimative Probability Cards ............................................................... 44 \nSample Turn Order Cards ................................................................................ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........ 46',
	]
	embeddings = model.encode(sentences)
	print(embeddings.shape)
	# [3, 1024]

	# Get the similarity scores for the embeddings
	similarities = model.similarity(embeddings, embeddings)
	print(similarities.shape)
	# [3, 3]
	```

	<!--
	### Direct Usage (Transformers)

	<details><summary>Click to see the direct usage in Transformers</summary>

	</details>
	-->

	<!--
	### Downstream Usage (Sentence Transformers)

	You can finetune this model on your own dataset.

	<details><summary>Click to expand</summary>

	</details>
	-->

	<!--
	### Out-of-Scope Use

	List how the model may foreseeably be misused and address what users ought not to do with the model.
	-->

	## Evaluation

	### Metrics

	#### Information Retrieval

	* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator)

	\| Metric \| Value \|
	\|:--------------------\|:-----------\|
	\| cosine_accuracy@1 \| 0.9348 \|
	\| cosine_accuracy@3 \| 1.0 \|
	\| cosine_accuracy@5 \| 1.0 \|
	\| cosine_accuracy@10 \| 1.0 \|
	\| cosine_precision@1 \| 0.9348 \|
	\| cosine_precision@3 \| 0.3333 \|
	\| cosine_precision@5 \| 0.2 \|
	\| cosine_precision@10 \| 0.1 \|
	\| cosine_recall@1 \| 0.9348 \|
	\| cosine_recall@3 \| 1.0 \|
	\| cosine_recall@5 \| 1.0 \|
	\| cosine_recall@10 \| 1.0 \|
	\| cosine_ndcg@10 \| 0.9702 \|
	\| cosine_mrr@10 \| 0.9601 \|
	\| cosine_map@100 \| 0.9601 \|

	<!--
	## Bias, Risks and Limitations

	What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.
	-->

	<!--
	### Recommendations

	What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.
	-->

	## Training Details

	### Training Dataset

	#### Unnamed Dataset

	* Size: 370 training samples
	* Columns: <code>sentence_0</code> and <code>sentence_1</code>
	* Approximate statistics based on the first 370 samples:
	\| \| sentence_0 \| sentence_1 \|
	\|:--------\|:-----------------------------------------------------------------------------------\|:------------------------------------------------------------------------------------\|
	\| type \| string \| string \|
	\| details \| <ul><li>min: 11 tokens</li><li>mean: 20.19 tokens</li><li>max: 34 tokens</li></ul> \| <ul><li>min: 8 tokens</li><li>mean: 150.83 tokens</li><li>max: 512 tokens</li></ul> \|
	* Samples:
	\| sentence_0 \| sentence_1 \|
	\|:------------------------------------------------------------------------------------------------------------------------------------------------------\|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------\|
	\| <code> What distinguishes "established facts" from other types of facts in the game briefings or play?</code> \| <code>Forces soldiers are going to be much more effective in combat than untrained protestors; <br>and "established facts" which are facts that have been specifically mentioned in the game <br>briefings or have become established during play as the result of successful arguments. <br>The latter can be immediately deployed as supporting reasons (Pros and Cons), but the <br>former need to have been argued successfully in order for them to be specifically included. <br>Many inexperienced players will make vast all-encompassing arguments full of assumptions <br>that are not reasonable. For example: It is not a reasonable assumption that unarmed <br>Protestors could fight off trained Police. It is reasonable to assume that the Police are</code> \|
	\| <code> Why is it unreasonable to assume that unarmed protestors could fight off trained police according to the context?</code> \| <code>Forces soldiers are going to be much more effective in combat than untrained protestors; <br>and "established facts" which are facts that have been specifically mentioned in the game <br>briefings or have become established during play as the result of successful arguments. <br>The latter can be immediately deployed as supporting reasons (Pros and Cons), but the <br>former need to have been argued successfully in order for them to be specifically included. <br>Many inexperienced players will make vast all-encompassing arguments full of assumptions <br>that are not reasonable. For example: It is not a reasonable assumption that unarmed <br>Protestors could fight off trained Police. It is reasonable to assume that the Police are</code> \|
	\| <code> What was the outcome of the initial Russian attack against the German units, and how did it affect the ammunition status of both sides?</code> \| <code>The Russians succeed in pushing back one of the German units and forcing and already depleted unit to use up ammunition, (but are pushed back themselves and 2 units use a lot of ammo (one of which becomes combat ineffective on -3)). Overall, as the success is matched by failure, the line itself holds. The Russians attack again, the next day:<br><br>Initial Dice Throw: RUSSIAN: 6 5 5 4 2 4 GERMAN: 1 2 4 4 Lined Up and Modified: RUSSIAN: 5 4 3 3 2 1 (two of the Russians = -2) GERMAN: 4 3 3 2 (one of the Germans = -1) Result of Third Day; lose: (one of the Germans = +0) RUSSIAN: 5 4 3 3 GERMAN: 3 lose: 4 3 2</code> \|
	* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters:
	```json
	{
	"loss": "MultipleNegativesRankingLoss",
	"matryoshka_dims": [
	768,
	512,
	256,
	128,
	64
	],
	"matryoshka_weights": [
	1,
	1,
	1,
	1,
	1
	],
	"n_dims_per_step": -1
	}
	```

	### Training Hyperparameters
	#### Non-Default Hyperparameters

	- `eval_strategy`: steps
	- `per_device_train_batch_size`: 10
	- `per_device_eval_batch_size`: 10
	- `num_train_epochs`: 10
	- `multi_dataset_batch_sampler`: round_robin

	#### All Hyperparameters
	<details><summary>Click to expand</summary>

	- `overwrite_output_dir`: False
	- `do_predict`: False
	- `eval_strategy`: steps
	- `prediction_loss_only`: True
	- `per_device_train_batch_size`: 10
	- `per_device_eval_batch_size`: 10
	- `per_gpu_train_batch_size`: None
	- `per_gpu_eval_batch_size`: None
	- `gradient_accumulation_steps`: 1
	- `eval_accumulation_steps`: None
	- `torch_empty_cache_steps`: None
	- `learning_rate`: 5e-05
	- `weight_decay`: 0.0
	- `adam_beta1`: 0.9
	- `adam_beta2`: 0.999
	- `adam_epsilon`: 1e-08
	- `max_grad_norm`: 1
	- `num_train_epochs`: 10
	- `max_steps`: -1
	- `lr_scheduler_type`: linear
	- `lr_scheduler_kwargs`: {}
	- `warmup_ratio`: 0.0
	- `warmup_steps`: 0
	- `log_level`: passive
	- `log_level_replica`: warning
	- `log_on_each_node`: True
	- `logging_nan_inf_filter`: True
	- `save_safetensors`: True
	- `save_on_each_node`: False
	- `save_only_model`: False
	- `restore_callback_states_from_checkpoint`: False
	- `no_cuda`: False
	- `use_cpu`: False
	- `use_mps_device`: False
	- `seed`: 42
	- `data_seed`: None
	- `jit_mode_eval`: False
	- `use_ipex`: False
	- `bf16`: False
	- `fp16`: False
	- `fp16_opt_level`: O1
	- `half_precision_backend`: auto
	- `bf16_full_eval`: False
	- `fp16_full_eval`: False
	- `tf32`: None
	- `local_rank`: 0
	- `ddp_backend`: None
	- `tpu_num_cores`: None
	- `tpu_metrics_debug`: False
	- `debug`: []
	- `dataloader_drop_last`: False
	- `dataloader_num_workers`: 0
	- `dataloader_prefetch_factor`: None
	- `past_index`: -1
	- `disable_tqdm`: False
	- `remove_unused_columns`: True
	- `label_names`: None
	- `load_best_model_at_end`: False
	- `ignore_data_skip`: False
	- `fsdp`: []
	- `fsdp_min_num_params`: 0
	- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
	- `tp_size`: 0
	- `fsdp_transformer_layer_cls_to_wrap`: None
	- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
	- `deepspeed`: None
	- `label_smoothing_factor`: 0.0
	- `optim`: adamw_torch
	- `optim_args`: None
	- `adafactor`: False
	- `group_by_length`: False
	- `length_column_name`: length
	- `ddp_find_unused_parameters`: None
	- `ddp_bucket_cap_mb`: None
	- `ddp_broadcast_buffers`: False
	- `dataloader_pin_memory`: True
	- `dataloader_persistent_workers`: False
	- `skip_memory_metrics`: True
	- `use_legacy_prediction_loop`: False
	- `push_to_hub`: False
	- `resume_from_checkpoint`: None
	- `hub_model_id`: None
	- `hub_strategy`: every_save
	- `hub_private_repo`: None
	- `hub_always_push`: False
	- `gradient_checkpointing`: False
	- `gradient_checkpointing_kwargs`: None
	- `include_inputs_for_metrics`: False
	- `include_for_metrics`: []
	- `eval_do_concat_batches`: True
	- `fp16_backend`: auto
	- `push_to_hub_model_id`: None
	- `push_to_hub_organization`: None
	- `mp_parameters`:
	- `auto_find_batch_size`: False
	- `full_determinism`: False
	- `torchdynamo`: None
	- `ray_scope`: last
	- `ddp_timeout`: 1800
	- `torch_compile`: False
	- `torch_compile_backend`: None
	- `torch_compile_mode`: None
	- `include_tokens_per_second`: False
	- `include_num_input_tokens_seen`: False
	- `neftune_noise_alpha`: None
	- `optim_target_modules`: None
	- `batch_eval_metrics`: False
	- `eval_on_start`: False
	- `use_liger_kernel`: False
	- `eval_use_gather_object`: False
	- `average_tokens_across_devices`: False
	- `prompts`: None
	- `batch_sampler`: batch_sampler
	- `multi_dataset_batch_sampler`: round_robin

	</details>

	### Training Logs
	\| Epoch \| Step \| cosine_ndcg@10 \|
	\|:------:\|:----:\|:--------------:\|
	\| 1.0 \| 37 \| 0.9273 \|
	\| 1.3514 \| 50 \| 0.9490 \|
	\| 2.0 \| 74 \| 0.9462 \|
	\| 2.7027 \| 100 \| 0.9527 \|
	\| 3.0 \| 111 \| 0.9527 \|
	\| 4.0 \| 148 \| 0.9783 \|
	\| 4.0541 \| 150 \| 0.9811 \|
	\| 5.0 \| 185 \| 0.9622 \|
	\| 5.4054 \| 200 \| 0.9622 \|
	\| 6.0 \| 222 \| 0.9702 \|
	\| 6.7568 \| 250 \| 0.9622 \|
	\| 7.0 \| 259 \| 0.9622 \|
	\| 8.0 \| 296 \| 0.9702 \|
	\| 8.1081 \| 300 \| 0.9702 \|
	\| 9.0 \| 333 \| 0.9702 \|
	\| 9.4595 \| 350 \| 0.9702 \|
	\| 10.0 \| 370 \| 0.9702 \|


	### Framework Versions
	- Python: 3.13.2
	- Sentence Transformers: 4.1.0
	- Transformers: 4.51.3
	- PyTorch: 2.7.0+cu126
	- Accelerate: 1.6.0
	- Datasets: 3.6.0
	- Tokenizers: 0.21.1

	## Citation

	### BibTeX

	#### Sentence Transformers
	```bibtex
	@inproceedings{reimers-2019-sentence-bert,
	title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
	author = "Reimers, Nils and Gurevych, Iryna",
	booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
	month = "11",
	year = "2019",
	publisher = "Association for Computational Linguistics",
	url = "https://arxiv.org/abs/1908.10084",
	}
	```

	#### MatryoshkaLoss
	```bibtex
	@misc{kusupati2024matryoshka,
	title={Matryoshka Representation Learning},
	author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
	year={2024},
	eprint={2205.13147},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	#### MultipleNegativesRankingLoss
	```bibtex
	@misc{henderson2017efficient,
	title={Efficient Natural Language Response Suggestion for Smart Reply},
	author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
	year={2017},
	eprint={1705.00652},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```

	<!--
	## Glossary

	Clearly define terms in order to be accessible across audiences.
	-->

	<!--
	## Model Card Authors

	Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.
	-->

	<!--
	## Model Card Contact

	Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.
	-->