|
--- |
|
tags: |
|
- sentence-transformers |
|
- sentence-similarity |
|
- feature-extraction |
|
- generated_from_trainer |
|
- dataset_size:370 |
|
- loss:MatryoshkaLoss |
|
- loss:MultipleNegativesRankingLoss |
|
base_model: Snowflake/snowflake-arctic-embed-l |
|
widget: |
|
- source_sentence: ' What is the proposed alternative to imposing random events purely |
|
by chance on players?' |
|
sentences: |
|
- "half a day). \n• They are perceived to be new and innovative (despite being around\ |
|
\ since 1987). \n• They are easy to transport, requiring only pen and paper –\ |
|
\ with perhaps a few maps and \ncounters. \n• They work well in multi-domain,\ |
|
\ multi-agency contexts allowing all Actors to participate \nequally. \nA few\ |
|
\ Words of Warning \n• The fact that a Matrix Game requires little infrastructure\ |
|
\ can be a problem – it just \ndoesn't look sexy and the strengths that it can\ |
|
\ be done quickly with the minimum of \nfuss, can be reduced by efforts to make\ |
|
\ it look cool/expensive. \n• The non-quantitative nature of the game can frustrate\ |
|
\ analysts. \n• Matrix Games require an experience facilitator to run them." |
|
- "inform the other players of their stated intentions. In many cases these are\ |
|
\ not really \n\"arguments\" as part of the game, so shouldn't count as their\ |
|
\ action for the turn, unless they \nwish to specify a measurable effect (such\ |
|
\ as increasing their approval ratings). \nTrade Agreements \nIn some games, trade\ |
|
\ forms a very important part of the game narrative. In most cases this \ncan\ |
|
\ be treated simply as part of the normal ebb and flow of the argument process.\ |
|
\ \nHowever, in some circumstances, particularly when timescales are long, trade\ |
|
\ can require \ngreater attention as to the nuances of the economic benefits and\ |
|
\ impacts. In these cases, it \nmay be necessary to get the two sides to make\ |
|
\ additional arguments as to what they expect" |
|
- "possible throughout the game, having “random events” happen completely at random\ |
|
\ is \nproblematic. An Actor may be disadvantaged purely by chance, more than\ |
|
\ once during the \ngame, which can reduce their immersion and engagement. The\ |
|
\ narrative develops during \nthe game based on the decisions of the players and\ |
|
\ their reactions to the decisions of other \nplayers. Having random events imposed\ |
|
\ on them by chance breaks this “cause and effect” \ncycle and degrades the game\ |
|
\ flow. \nThe alternative is to give the random event to the participants. They\ |
|
\ will then make a \ndecision as to how this can contribute to the narrative being\ |
|
\ developed by the players. They" |
|
- source_sentence: ' What is the primary purpose of the game described in the context?' |
|
sentences: |
|
- "If you are using voting systems, either as Diceless Adjudication or as Estimative\ |
|
\ Probability, \nyou should take great care to ensure that the players are being\ |
|
\ as professional as possible, \nand not merely \"voting for themselves\" in a\ |
|
\ competitive manner. Many players can be quite \nvery competitive, so it may\ |
|
\ be necessary to not allow them to vote on their argument – and \nequally it\ |
|
\ may be necessary to keep an eye on players who are in direct competition. The\ |
|
\ \nintention is to develop a narrative, generating insights – rather than trying\ |
|
\ to win at all \ncosts. \n \n7 An example is https://www.turningtechnologies.eu/turningpoint/\ |
|
\ \n8 An example is https://www.polleverywhere.com/\n\fVersion 15 \nPage 14 of\ |
|
\ 52 \n© Tom Mouat 2019, 2020, 2022, 2023" |
|
- "spend the time piling markers on counters. Tracks can be generic (in that they\ |
|
\ simply record \nthe number of plusses or minuses applied) or they might have\ |
|
\ specific \"trigger levels\" (in \nthat when the morale of the infantry is reduced\ |
|
\ to -3, the \"raw\" units will desert and return \nto their homes. \nIt can also\ |
|
\ be useful to have a \"Press\" actor whose job it is to record the results of\ |
|
\ \narguments (both visible to the public and those not), as well as putting the\ |
|
\ \"Press spin\" on \nthe events. This role can be useful in looking after the\ |
|
\ \"Consequence Management\" \nelements mentioned earlier. \nThe Components (and\ |
|
\ Characters) Affect the Game \nWhen participants are thinking on their feet,\ |
|
\ what they can see will affect what they argue" |
|
- "materials, a short game, and small numbers of participants. If they want to conduct\ |
|
\ a \"deep \ndive\", this isn't the appropriate game - the purpose is to identify\ |
|
\ the insights – so make a \nnote and move on. The \"deep dive\" should follow\ |
|
\ later or in a different type of game. You \nshould, therefore, make sure you\ |
|
\ include this point in your introductory briefing so that the \nplayers are clear\ |
|
\ from the outset. \nWhen dealing with dominant people, who continually interrupt\ |
|
\ and dominate the \nArguments, you need to take a harder line. You should interrupt\ |
|
\ them when they interrupt \nanother player making a point. Point out to them\ |
|
\ that they had their chance. This isn't a" |
|
- source_sentence: ' Why should Big Projects or Long-Term Plans require no more than |
|
three successful arguments in the game?' |
|
sentences: |
|
- "much on this single thing. \nThis does not mean that arguments have to only\ |
|
\ be about things that can happen within \nthe turn length of the game. It is\ |
|
\ possible to make \"long term\" arguments like anything else. \nIf, in a Baltic\ |
|
\ game with week-long turns, you want to argue that an electricity cable \nbetween\ |
|
\ Sweden and Lithuania is to be built with the aim of reducing Lithuania's \n\ |
|
dependence on Russian energy, this would be judged as normal. It just would not\ |
|
\ come to \n \n9 I am indebted to Prof Rex Brynen for this suggestion.\n\fVersion\ |
|
\ 15 \nPage 23 of 52 \n© Tom Mouat 2019, 2020, 2022, 2023 \nfruition in the length\ |
|
\ of the game – but, assuming the argument was successful, it would" |
|
- "games.\n\fVersion 15 \nPage 36 of 52 \n© Tom Mouat 2019, 2020, 2022, 2023 \n\ |
|
Why I like Matrix Games \n• Designing a Matrix Game can be done quickly with the\ |
|
\ minimum of fuss. \n• Participating in a Matrix Game does not require an understanding\ |
|
\ of complex and \nunfamiliar rules. \n• Matrix games can cover a wide variety\ |
|
\ of possible scenarios, including conceptual \nconflicts like Cyber. \n• They\ |
|
\ are especially good in the non-kinetic, effects based, domain. \n• Matrix games\ |
|
\ deal with qualitative outputs so are especially useful for non-analysts. \n\ |
|
• The games work best with small groups, increasing immersion and buy-in to the\ |
|
\ game. \n• Matrix games are extremely inexpensive (and they work best with short\ |
|
\ sessions lasting \nhalf a day)." |
|
- "protection: Its hidden location, its boundary fence, and the security guards,\ |
|
\ all of which \nmust be overcome by successful arguments before the base can\ |
|
\ be penetrated. \nAs a rule of thumb, nothing should have more than 3 levels\ |
|
\ of protection as it will simply \ntake too long and dominate the game to the\ |
|
\ exclusion of everything else. \nBig Projects or Long-Term Plans \nDepending\ |
|
\ on the level of the game, some actions and events represent such a large \n\ |
|
investment in time and effort that they require multiple arguments in order to\ |
|
\ bring them \nto fruition. As a rule of thumb, a Big Project should also take\ |
|
\ no more than 3 successful \narguments (like protected and hidden things above);\ |
|
\ otherwise, the game is focussed too" |
|
- source_sentence: ' Which associations related to wargaming and simulation are mentioned |
|
in the context?' |
|
sentences: |
|
- "out their objectives and explain why they though they succeeded or failed can\ |
|
\ be most \ninstructive. Also, if you then ask the assembled group \"who won?\"\ |
|
\ and they all agree, then \nthis can be a very powerful indicator of things that\ |
|
\ might need to be looked at more closely \nas a result of the game. \nFinally,\ |
|
\ the insights from the game can take a little time to come out. They might not\ |
|
\ be \nimmediately obvious, so taking time to consider what happened in the game\ |
|
\ and whether \nindividual events are noteworthy, is very useful. I am continually\ |
|
\ surprised at the predictive \npower of such a simple game. \n \n \n \n11 See:\ |
|
\ Game theory, simulated interaction, and unaided judgement for forecasting decisions\ |
|
\ in conflicts. Kesten C. Green." |
|
- 'gaming vignettes |
|
|
|
|
|
job opportunities/positions vacant |
|
|
|
|
|
latest links |
|
|
|
|
|
methodology |
|
|
|
|
|
not-so-serious |
|
|
|
|
|
playtesters needed |
|
|
|
|
|
reader survey |
|
|
|
|
|
request for proposals |
|
|
|
|
|
scholarships and fellowships |
|
|
|
|
|
simulation and game reports |
|
|
|
|
|
simulation and game reviews |
|
|
|
|
|
simulation and gaming debacles |
|
|
|
|
|
simulation and gaming history |
|
|
|
|
|
simulation and gaming ideas |
|
|
|
|
|
simulation and gaming journals |
|
|
|
|
|
simulation and gaming materials |
|
|
|
|
|
simulation and gaming miscellany |
|
|
|
|
|
simulation and gaming news |
|
|
|
|
|
simulation and gaming publications |
|
|
|
|
|
simulation and gaming software |
|
|
|
|
|
Archives |
|
|
|
|
|
M T W T F S S 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 |
|
26 27 28 29 30 |
|
|
|
|
|
Associations |
|
|
|
|
|
Australian Defence Force Wargaming Group |
|
|
|
|
|
Connections Netherlands |
|
|
|
|
|
Connections North (Canada)' |
|
- "Senior Officers, Dominant People and Contentious Arguments \nIt is not uncommon\ |
|
\ in a Matrix Game that the participants want to \"debate\" the arguments. \n\ |
|
To a limited extent this is ok, but as stated elsewhere, the game needs to move\ |
|
\ at a pace, \ncreating an immersive narrative and forcing the players to have\ |
|
\ to live with the \nconsequences of their earlier decisions. \nIt can happen\ |
|
\ that a Senior Officer, used to \"seminar wargames\", will interrupt when you\ |
|
\ \nwant to move on and say \"wait a minute - this is a really valuable debate\ |
|
\ - let's just dig \ndown...\" You should try to point out that this is not that\ |
|
\ sort of game - Matrix Games are to \ngain an insight and understanding in a\ |
|
\ specific way. Short notice, minimal preparation and" |
|
- source_sentence: ' Why is it important for player roles in a Matrix Game to operate |
|
at broadly similar levels?' |
|
sentences: |
|
- 'The Basic Rule. The basic rule is as follows: 1 x 6-Sided Dice = 1 x Combat |
|
Unit The size of that Combat Unit will, of course, vary from game to game. In |
|
the boarding action it may be as little as 5-10 men; in a Map Game, it could be |
|
as much as an entire Brigade, or even a Corps. |
|
|
|
|
|
The Method. The dice on the opposing sides are rolled as follows: Roll the Dice. |
|
Line them up, Highest vs Highest If one side has more dice than the other, any |
|
dice that are extra, and score less than the lowest dice of the side with the |
|
fewer dice, are ignored.' |
|
- "Matrix Game Checklist ....................................................................................\ |
|
\ 38 \nSample Spendable Bonus Cards ......................................................................\ |
|
\ 40 \nSample Random Events ...................................................................................\ |
|
\ 41 \nSample Voting Cards for Diceless Adjudication ...............................................\ |
|
\ 43 \nSample Estimative Probability Cards ...............................................................\ |
|
\ 44 \nSample Turn Order Cards ................................................................................\ |
|
\ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........\ |
|
\ 46" |
|
- 'When you are designing a Matrix Game it is worth thinking about the level at |
|
which the players roles will be operating in the game. In is usually better, and |
|
produces a more balanced game, when the level on which the player roles are operating |
|
are broadly similar. It would be difficult to get a balanced game if 3 of the |
|
players are playing Generals in command of vast Armies, and another player is |
|
playing a simple individual soldier. |
|
|
|
|
|
Levels of Protection and Hidden Things.' |
|
pipeline_tag: sentence-similarity |
|
library_name: sentence-transformers |
|
metrics: |
|
- cosine_accuracy@1 |
|
- cosine_accuracy@3 |
|
- cosine_accuracy@5 |
|
- cosine_accuracy@10 |
|
- cosine_precision@1 |
|
- cosine_precision@3 |
|
- cosine_precision@5 |
|
- cosine_precision@10 |
|
- cosine_recall@1 |
|
- cosine_recall@3 |
|
- cosine_recall@5 |
|
- cosine_recall@10 |
|
- cosine_ndcg@10 |
|
- cosine_mrr@10 |
|
- cosine_map@100 |
|
model-index: |
|
- name: SentenceTransformer based on Snowflake/snowflake-arctic-embed-l |
|
results: |
|
- task: |
|
type: information-retrieval |
|
name: Information Retrieval |
|
dataset: |
|
name: Unknown |
|
type: unknown |
|
metrics: |
|
- type: cosine_accuracy@1 |
|
value: 0.9347826086956522 |
|
name: Cosine Accuracy@1 |
|
- type: cosine_accuracy@3 |
|
value: 1.0 |
|
name: Cosine Accuracy@3 |
|
- type: cosine_accuracy@5 |
|
value: 1.0 |
|
name: Cosine Accuracy@5 |
|
- type: cosine_accuracy@10 |
|
value: 1.0 |
|
name: Cosine Accuracy@10 |
|
- type: cosine_precision@1 |
|
value: 0.9347826086956522 |
|
name: Cosine Precision@1 |
|
- type: cosine_precision@3 |
|
value: 0.33333333333333337 |
|
name: Cosine Precision@3 |
|
- type: cosine_precision@5 |
|
value: 0.1999999999999999 |
|
name: Cosine Precision@5 |
|
- type: cosine_precision@10 |
|
value: 0.09999999999999995 |
|
name: Cosine Precision@10 |
|
- type: cosine_recall@1 |
|
value: 0.9347826086956522 |
|
name: Cosine Recall@1 |
|
- type: cosine_recall@3 |
|
value: 1.0 |
|
name: Cosine Recall@3 |
|
- type: cosine_recall@5 |
|
value: 1.0 |
|
name: Cosine Recall@5 |
|
- type: cosine_recall@10 |
|
value: 1.0 |
|
name: Cosine Recall@10 |
|
- type: cosine_ndcg@10 |
|
value: 0.97023760333851 |
|
name: Cosine Ndcg@10 |
|
- type: cosine_mrr@10 |
|
value: 0.9601449275362318 |
|
name: Cosine Mrr@10 |
|
- type: cosine_map@100 |
|
value: 0.960144927536232 |
|
name: Cosine Map@100 |
|
--- |
|
|
|
# SentenceTransformer based on Snowflake/snowflake-arctic-embed-l |
|
|
|
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more. |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
- **Model Type:** Sentence Transformer |
|
- **Base model:** [Snowflake/snowflake-arctic-embed-l](https://huggingface.co/Snowflake/snowflake-arctic-embed-l) <!-- at revision d8fb21ca8d905d2832ee8b96c894d3298964346b --> |
|
- **Maximum Sequence Length:** 512 tokens |
|
- **Output Dimensionality:** 1024 dimensions |
|
- **Similarity Function:** Cosine Similarity |
|
<!-- - **Training Dataset:** Unknown --> |
|
<!-- - **Language:** Unknown --> |
|
<!-- - **License:** Unknown --> |
|
|
|
### Model Sources |
|
|
|
- **Documentation:** [Sentence Transformers Documentation](https://sbert.net) |
|
- **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers) |
|
- **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers) |
|
|
|
### Full Model Architecture |
|
|
|
``` |
|
SentenceTransformer( |
|
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel |
|
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True}) |
|
(2): Normalize() |
|
) |
|
``` |
|
|
|
## Usage |
|
|
|
### Direct Usage (Sentence Transformers) |
|
|
|
First install the Sentence Transformers library: |
|
|
|
```bash |
|
pip install -U sentence-transformers |
|
``` |
|
|
|
Then you can load this model and run inference. |
|
```python |
|
from sentence_transformers import SentenceTransformer |
|
|
|
# Download from the 🤗 Hub |
|
model = SentenceTransformer("CalebMaresca/matrix-game-embeddings-ft-v1") |
|
# Run inference |
|
sentences = [ |
|
' Why is it important for player roles in a Matrix Game to operate at broadly similar levels?', |
|
'When you are designing a Matrix Game it is worth thinking about the level at which the players roles will be operating in the game. In is usually better, and produces a more balanced game, when the level on which the player roles are operating are broadly similar. It would be difficult to get a balanced game if 3 of the players are playing Generals in command of vast Armies, and another player is playing a simple individual soldier.\n\nLevels of Protection and Hidden Things.', |
|
'Matrix Game Checklist .................................................................................... 38 \nSample Spendable Bonus Cards ...................................................................... 40 \nSample Random Events ................................................................................... 41 \nSample Voting Cards for Diceless Adjudication ............................................... 43 \nSample Estimative Probability Cards ............................................................... 44 \nSample Turn Order Cards ................................................................................ 45 \nSample Markers for Matrix Games for Effects and Conventional Forces ........ 46', |
|
] |
|
embeddings = model.encode(sentences) |
|
print(embeddings.shape) |
|
# [3, 1024] |
|
|
|
# Get the similarity scores for the embeddings |
|
similarities = model.similarity(embeddings, embeddings) |
|
print(similarities.shape) |
|
# [3, 3] |
|
``` |
|
|
|
<!-- |
|
### Direct Usage (Transformers) |
|
|
|
<details><summary>Click to see the direct usage in Transformers</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Downstream Usage (Sentence Transformers) |
|
|
|
You can finetune this model on your own dataset. |
|
|
|
<details><summary>Click to expand</summary> |
|
|
|
</details> |
|
--> |
|
|
|
<!-- |
|
### Out-of-Scope Use |
|
|
|
*List how the model may foreseeably be misused and address what users ought not to do with the model.* |
|
--> |
|
|
|
## Evaluation |
|
|
|
### Metrics |
|
|
|
#### Information Retrieval |
|
|
|
* Evaluated with [<code>InformationRetrievalEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.InformationRetrievalEvaluator) |
|
|
|
| Metric | Value | |
|
|:--------------------|:-----------| |
|
| cosine_accuracy@1 | 0.9348 | |
|
| cosine_accuracy@3 | 1.0 | |
|
| cosine_accuracy@5 | 1.0 | |
|
| cosine_accuracy@10 | 1.0 | |
|
| cosine_precision@1 | 0.9348 | |
|
| cosine_precision@3 | 0.3333 | |
|
| cosine_precision@5 | 0.2 | |
|
| cosine_precision@10 | 0.1 | |
|
| cosine_recall@1 | 0.9348 | |
|
| cosine_recall@3 | 1.0 | |
|
| cosine_recall@5 | 1.0 | |
|
| cosine_recall@10 | 1.0 | |
|
| **cosine_ndcg@10** | **0.9702** | |
|
| cosine_mrr@10 | 0.9601 | |
|
| cosine_map@100 | 0.9601 | |
|
|
|
<!-- |
|
## Bias, Risks and Limitations |
|
|
|
*What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.* |
|
--> |
|
|
|
<!-- |
|
### Recommendations |
|
|
|
*What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.* |
|
--> |
|
|
|
## Training Details |
|
|
|
### Training Dataset |
|
|
|
#### Unnamed Dataset |
|
|
|
* Size: 370 training samples |
|
* Columns: <code>sentence_0</code> and <code>sentence_1</code> |
|
* Approximate statistics based on the first 370 samples: |
|
| | sentence_0 | sentence_1 | |
|
|:--------|:-----------------------------------------------------------------------------------|:------------------------------------------------------------------------------------| |
|
| type | string | string | |
|
| details | <ul><li>min: 11 tokens</li><li>mean: 20.19 tokens</li><li>max: 34 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 150.83 tokens</li><li>max: 512 tokens</li></ul> | |
|
* Samples: |
|
| sentence_0 | sentence_1 | |
|
|:------------------------------------------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| |
|
| <code> What distinguishes "established facts" from other types of facts in the game briefings or play?</code> | <code>Forces soldiers are going to be much more effective in combat than untrained protestors; <br>and "established facts" which are facts that have been specifically mentioned in the game <br>briefings or have become established during play as the result of successful arguments. <br>The latter can be immediately deployed as supporting reasons (Pros and Cons), but the <br>former need to have been argued successfully in order for them to be specifically included. <br>Many inexperienced players will make vast all-encompassing arguments full of assumptions <br>that are not reasonable. For example: It is not a reasonable assumption that unarmed <br>Protestors could fight off trained Police. It is reasonable to assume that the Police are</code> | |
|
| <code> Why is it unreasonable to assume that unarmed protestors could fight off trained police according to the context?</code> | <code>Forces soldiers are going to be much more effective in combat than untrained protestors; <br>and "established facts" which are facts that have been specifically mentioned in the game <br>briefings or have become established during play as the result of successful arguments. <br>The latter can be immediately deployed as supporting reasons (Pros and Cons), but the <br>former need to have been argued successfully in order for them to be specifically included. <br>Many inexperienced players will make vast all-encompassing arguments full of assumptions <br>that are not reasonable. For example: It is not a reasonable assumption that unarmed <br>Protestors could fight off trained Police. It is reasonable to assume that the Police are</code> | |
|
| <code> What was the outcome of the initial Russian attack against the German units, and how did it affect the ammunition status of both sides?</code> | <code>The Russians succeed in pushing back one of the German units and forcing and already depleted unit to use up ammunition, (but are pushed back themselves and 2 units use a lot of ammo (one of which becomes combat ineffective on -3)). Overall, as the success is matched by failure, the line itself holds. The Russians attack again, the next day:<br><br>Initial Dice Throw: RUSSIAN: 6 5 5 4 2 4 GERMAN: 1 2 4 4 Lined Up and Modified: RUSSIAN: 5 4 3 3 2 1 (two of the Russians = -2) GERMAN: 4 3 3 2 (one of the Germans = -1) Result of Third Day; lose: (one of the Germans = +0) RUSSIAN: 5 4 3 3 GERMAN: 3 lose: 4 3 2</code> | |
|
* Loss: [<code>MatryoshkaLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#matryoshkaloss) with these parameters: |
|
```json |
|
{ |
|
"loss": "MultipleNegativesRankingLoss", |
|
"matryoshka_dims": [ |
|
768, |
|
512, |
|
256, |
|
128, |
|
64 |
|
], |
|
"matryoshka_weights": [ |
|
1, |
|
1, |
|
1, |
|
1, |
|
1 |
|
], |
|
"n_dims_per_step": -1 |
|
} |
|
``` |
|
|
|
### Training Hyperparameters |
|
#### Non-Default Hyperparameters |
|
|
|
- `eval_strategy`: steps |
|
- `per_device_train_batch_size`: 10 |
|
- `per_device_eval_batch_size`: 10 |
|
- `num_train_epochs`: 10 |
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
#### All Hyperparameters |
|
<details><summary>Click to expand</summary> |
|
|
|
- `overwrite_output_dir`: False |
|
- `do_predict`: False |
|
- `eval_strategy`: steps |
|
- `prediction_loss_only`: True |
|
- `per_device_train_batch_size`: 10 |
|
- `per_device_eval_batch_size`: 10 |
|
- `per_gpu_train_batch_size`: None |
|
- `per_gpu_eval_batch_size`: None |
|
- `gradient_accumulation_steps`: 1 |
|
- `eval_accumulation_steps`: None |
|
- `torch_empty_cache_steps`: None |
|
- `learning_rate`: 5e-05 |
|
- `weight_decay`: 0.0 |
|
- `adam_beta1`: 0.9 |
|
- `adam_beta2`: 0.999 |
|
- `adam_epsilon`: 1e-08 |
|
- `max_grad_norm`: 1 |
|
- `num_train_epochs`: 10 |
|
- `max_steps`: -1 |
|
- `lr_scheduler_type`: linear |
|
- `lr_scheduler_kwargs`: {} |
|
- `warmup_ratio`: 0.0 |
|
- `warmup_steps`: 0 |
|
- `log_level`: passive |
|
- `log_level_replica`: warning |
|
- `log_on_each_node`: True |
|
- `logging_nan_inf_filter`: True |
|
- `save_safetensors`: True |
|
- `save_on_each_node`: False |
|
- `save_only_model`: False |
|
- `restore_callback_states_from_checkpoint`: False |
|
- `no_cuda`: False |
|
- `use_cpu`: False |
|
- `use_mps_device`: False |
|
- `seed`: 42 |
|
- `data_seed`: None |
|
- `jit_mode_eval`: False |
|
- `use_ipex`: False |
|
- `bf16`: False |
|
- `fp16`: False |
|
- `fp16_opt_level`: O1 |
|
- `half_precision_backend`: auto |
|
- `bf16_full_eval`: False |
|
- `fp16_full_eval`: False |
|
- `tf32`: None |
|
- `local_rank`: 0 |
|
- `ddp_backend`: None |
|
- `tpu_num_cores`: None |
|
- `tpu_metrics_debug`: False |
|
- `debug`: [] |
|
- `dataloader_drop_last`: False |
|
- `dataloader_num_workers`: 0 |
|
- `dataloader_prefetch_factor`: None |
|
- `past_index`: -1 |
|
- `disable_tqdm`: False |
|
- `remove_unused_columns`: True |
|
- `label_names`: None |
|
- `load_best_model_at_end`: False |
|
- `ignore_data_skip`: False |
|
- `fsdp`: [] |
|
- `fsdp_min_num_params`: 0 |
|
- `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False} |
|
- `tp_size`: 0 |
|
- `fsdp_transformer_layer_cls_to_wrap`: None |
|
- `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None} |
|
- `deepspeed`: None |
|
- `label_smoothing_factor`: 0.0 |
|
- `optim`: adamw_torch |
|
- `optim_args`: None |
|
- `adafactor`: False |
|
- `group_by_length`: False |
|
- `length_column_name`: length |
|
- `ddp_find_unused_parameters`: None |
|
- `ddp_bucket_cap_mb`: None |
|
- `ddp_broadcast_buffers`: False |
|
- `dataloader_pin_memory`: True |
|
- `dataloader_persistent_workers`: False |
|
- `skip_memory_metrics`: True |
|
- `use_legacy_prediction_loop`: False |
|
- `push_to_hub`: False |
|
- `resume_from_checkpoint`: None |
|
- `hub_model_id`: None |
|
- `hub_strategy`: every_save |
|
- `hub_private_repo`: None |
|
- `hub_always_push`: False |
|
- `gradient_checkpointing`: False |
|
- `gradient_checkpointing_kwargs`: None |
|
- `include_inputs_for_metrics`: False |
|
- `include_for_metrics`: [] |
|
- `eval_do_concat_batches`: True |
|
- `fp16_backend`: auto |
|
- `push_to_hub_model_id`: None |
|
- `push_to_hub_organization`: None |
|
- `mp_parameters`: |
|
- `auto_find_batch_size`: False |
|
- `full_determinism`: False |
|
- `torchdynamo`: None |
|
- `ray_scope`: last |
|
- `ddp_timeout`: 1800 |
|
- `torch_compile`: False |
|
- `torch_compile_backend`: None |
|
- `torch_compile_mode`: None |
|
- `include_tokens_per_second`: False |
|
- `include_num_input_tokens_seen`: False |
|
- `neftune_noise_alpha`: None |
|
- `optim_target_modules`: None |
|
- `batch_eval_metrics`: False |
|
- `eval_on_start`: False |
|
- `use_liger_kernel`: False |
|
- `eval_use_gather_object`: False |
|
- `average_tokens_across_devices`: False |
|
- `prompts`: None |
|
- `batch_sampler`: batch_sampler |
|
- `multi_dataset_batch_sampler`: round_robin |
|
|
|
</details> |
|
|
|
### Training Logs |
|
| Epoch | Step | cosine_ndcg@10 | |
|
|:------:|:----:|:--------------:| |
|
| 1.0 | 37 | 0.9273 | |
|
| 1.3514 | 50 | 0.9490 | |
|
| 2.0 | 74 | 0.9462 | |
|
| 2.7027 | 100 | 0.9527 | |
|
| 3.0 | 111 | 0.9527 | |
|
| 4.0 | 148 | 0.9783 | |
|
| 4.0541 | 150 | 0.9811 | |
|
| 5.0 | 185 | 0.9622 | |
|
| 5.4054 | 200 | 0.9622 | |
|
| 6.0 | 222 | 0.9702 | |
|
| 6.7568 | 250 | 0.9622 | |
|
| 7.0 | 259 | 0.9622 | |
|
| 8.0 | 296 | 0.9702 | |
|
| 8.1081 | 300 | 0.9702 | |
|
| 9.0 | 333 | 0.9702 | |
|
| 9.4595 | 350 | 0.9702 | |
|
| 10.0 | 370 | 0.9702 | |
|
|
|
|
|
### Framework Versions |
|
- Python: 3.13.2 |
|
- Sentence Transformers: 4.1.0 |
|
- Transformers: 4.51.3 |
|
- PyTorch: 2.7.0+cu126 |
|
- Accelerate: 1.6.0 |
|
- Datasets: 3.6.0 |
|
- Tokenizers: 0.21.1 |
|
|
|
## Citation |
|
|
|
### BibTeX |
|
|
|
#### Sentence Transformers |
|
```bibtex |
|
@inproceedings{reimers-2019-sentence-bert, |
|
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks", |
|
author = "Reimers, Nils and Gurevych, Iryna", |
|
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing", |
|
month = "11", |
|
year = "2019", |
|
publisher = "Association for Computational Linguistics", |
|
url = "https://arxiv.org/abs/1908.10084", |
|
} |
|
``` |
|
|
|
#### MatryoshkaLoss |
|
```bibtex |
|
@misc{kusupati2024matryoshka, |
|
title={Matryoshka Representation Learning}, |
|
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi}, |
|
year={2024}, |
|
eprint={2205.13147}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
#### MultipleNegativesRankingLoss |
|
```bibtex |
|
@misc{henderson2017efficient, |
|
title={Efficient Natural Language Response Suggestion for Smart Reply}, |
|
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil}, |
|
year={2017}, |
|
eprint={1705.00652}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |
|
|
|
<!-- |
|
## Glossary |
|
|
|
*Clearly define terms in order to be accessible across audiences.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Authors |
|
|
|
*Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.* |
|
--> |
|
|
|
<!-- |
|
## Model Card Contact |
|
|
|
*Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.* |
|
--> |