BGE base Financial Matryoshka
This is a sentence-transformers model finetuned from NovaSearch/stella_en_400M_v5 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: NovaSearch/stella_en_400M_v5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 1024 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: NewModel
(1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Dense({'in_features': 1024, 'out_features': 1024, 'bias': True, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("cristiano-sartori/stella_finetuned1")
# Run inference
sentences = [
'Describe the techniques that typical dynamically scheduled\n processors use to achieve the same purpose of the following features\n of Intel Itanium: (a) Predicated execution; (b) advanced\n loads---that is, loads moved before a store and explicit check for\n RAW hazards; (c) speculative loads---that is, loads moved before a\n branch and explicit check for exceptions; (d) rotating register\n file.',
'Dynamically scheduled processors are designed to improve the efficiency of instruction execution by allowing the CPU to make decisions at runtime about the order of instruction execution. Let\'s break down each feature you mentioned from the Intel Itanium architecture and see how typical dynamically scheduled processors achieve similar goals.\n\n### (a) Predicated Execution\n\n**Intuition:**\nPredicated execution allows the processor to execute instructions based on certain conditions without using traditional branching (like `if` statements). This helps to avoid pipeline stalls that can occur when a branch is taken.\n\n**Example:**\nImagine you have the following pseudo-code:\n```c\nif (x > 0) {\n y = z + 1;\n} else {\n y = z - 1;\n}\n```\n\nIn a predicated execution model, instead of branching, the processor can execute both instructions but use a predicate (a boolean condition) to determine which result to keep:\n```assembly\np1 = (x > 0)\ny1 = z + 1; // Execute regardless\ny2 = z - 1; // Execute regardless\ny = p1 ? y1 : y2; // Keep the result based on p1\n```\n\n**Dynamically Scheduled Processors:**\nThese processors use techniques like "instruction scheduling" and "register renaming" to allow for instructions to be executed out of order while avoiding the pitfalls of branches. The hardware can evaluate conditions ahead of time and execute the necessary instructions while keeping track of which values are valid.\n\n### (b) Advanced Loads\n\n**Intuition:**\nAdvanced loads allow the processor to move load instructions (fetching data from memory) ahead of store instructions (writing data to memory), while also checking for Read After Write (RAW) hazards to ensure data correctness.\n\n**Example:**\nConsider the following pseudo-code:\n```c\na = b; // Store b into a\nc = a; // Load a into c\n```\n\nIf `a` is stored before it\'s loaded again, there can be a dependency. Advanced load techniques allow the processor to load `c = a` even before it knows if the value of `a` has been updated, as long as it checks that no store operation that modifies `a` has occurred.\n\n**Dynamically Scheduled Processors:**\nThese processors often use a technique called "out-of-order execution." They keep track of the status of loads and stores in a structure like a reorder buffer. When a load is scheduled, the processor checks if any previous instructions modify the value it needs (checking for RAW hazards) before executing the load.\n\n### (c) Speculative Loads\n\n**Intuition:**\nSpeculative loads allow the processor to execute load instructions before it knows the outcome of branches, which can keep the pipeline filled and reduce stalls.\n\n**Example:**\nConsider a branch that depends on a condition:\n```c\nif (x > 0) {\n a = b + c;\n}\n```\n\nInstead of waiting for the branch to be resolved, a speculative load might execute `load a` regardless of the branch\'s result. If the branch is taken, the processor can discard this load if it\'s not necessary.\n\n**Dynamically Scheduled Processors:**\nThese processors utilize "speculative execution," where they predict the likely path of execution based on past behavior. They perform loads and even entire blocks of instructions speculatively and have mechanisms to roll back if they guessed wrong while checking for exceptions (like accessing invalid memory).\n\n### (d) Rotating Register File\n\n**Intuition:**\nA rotating register file allows the processor to efficiently manage registers, effectively giving it more registers to work with by cycling through them for different contexts or states.\n\n**Example:**\nIn a simple program, if you have a limited number of registers but multiple functions, rotating registers means that as soon as one function completes, its registers can be reused for the next function without needing to save them to memory.\n\n**Dynamically Scheduled Processors:**\nMany dynamically scheduled processors use a "register renaming" technique, which allows them to allocate physical registers dynamically. When an instruction is ready to execute, it checks which registers are free and assigns one, effectively "rotating" the use of registers without the programmer needing to manage this directly.\n\n### Summary\n\nIn summary, dynamically scheduled processors use advanced techniques like out-of-order execution, speculative execution, and register renaming to achieve efficiency and performance similar to the features found in Intel Itanium. These techniques help to minimize stalls and maximize instruction throughput by allowing more flexibility in how instructions are executed relative to their dependencies and branch outcomes.',
"The question at hand explores whether it is possible to add new documents to a collection such that one document, , is ranked higher than another document, , based on a specific query, while also allowing for the possibility of ranking higher than simultaneously.\n\nTo analyze this problem, we begin by examining the two documents in question: \n\n- Document contains three occurrences of 'a', one occurrence of 'b', and none of 'c' (represented as ).\n- Document has one occurrence each of 'a', 'b', and 'c' (represented as ).\n\nGiven the query , our focus lies on the occurrences of 'a' and 'b' in both documents.\n\nNext, we calculate the term frequencies for the relevant terms in each document:\n\n- For , the term frequencies are:\n - \n - \n - \n\n- For , the term frequencies are:\n - \n - \n - \n\nThe total number of terms in each document is calculated as follows:\n\n- Total terms in (3 'a's + 1 'b' + 0 'c's).\n- Total terms in (1 'a' + 1 'b' + 1 'c').\n\nWe will apply the smoothed probabilistic retrieval model using the formula:\n\\[\nP(w | d) = \\frac{f_{d}(w) + \\lambda \\cdot P(w | C)}{N + \\lambda \\cdot |V|}\n\\]\nwhere is the total number of terms in the document, is the size of the vocabulary (which is 3 in this case), and is the probability of the word in the overall collection.\n\nAssuming a uniform distribution for the collection, we calculate:\n- \n- \n- \n\nNow, we compute the probabilities for the query terms for each document.\n\nFor document :\n- Probability of 'a':\n\\[\nP(a | d_1) = \\frac{3 + 0.5 \\cdot 0.4}{4 + 0.5 \\cdot 3} = \\frac{3 + 0.2}{4 + 1.5} = \\frac{3.2}{5.5} \\approx 0.5818\n\\]\n- Probability of 'b':\n\\[\nP(b | d_1) = \\frac{1 + 0.5 \\cdot 0.2}{4 + 0.5 \\cdot 3} = \\frac{1 + 0.1}{5.5} = \\frac{1.1}{5.5} \\approx 0.2\n\\]\n- Combined score for for the query :\n\\[\nP(q | d_1) = P(a | d_1) \\cdot P(b | d_1) \\approx 0.5818 \\cdot 0.2 \\approx 0.1164\n\\]\n\nFor document :\n- Probability of 'a':\n\\[\nP(a | d_2) = \\frac{1 + 0.5 \\cdot 0.4}{3 + 0.5 \\cdot 3} = \\frac{1 + 0.2}{4.5} = \\frac{1.2}{4.5} \\approx 0.2667\n\\]\n- Probability of 'b':\n\\[\nP(b | d_2) = \\frac{1 + 0.5 \\cdot 0.2}{3 + 0.5 \\cdot 3} = \\frac{1 + 0.1}{4.5} = \\frac{1.1}{4.5} \\approx 0.2444\n\\]\n- Combined score for for the query :\n\\[\nP(q | d_2) = P(a | d_2) \\cdot P(b | d_2) \\approx 0.2667 \\cdot 0.2444 \\approx 0.0652\n\\]\n\nAt this stage, we find that and , indicating that currently ranks higher than .\n\nTo explore the possibility of achieving both and , we consider the addition of new documents. While it is theoretically possible to manipulate rankings by introducing documents that alter the frequency of terms, the fundamental nature of probabilistic scoring means that achieving both conditions simultaneously is implausible. Specifically, any document that increases the score of will likely decrease the score of and vice versa due to the competitive nature of the scoring based on term frequencies.\n\nIn conclusion, while document addition can influence individual rankings, the inherent constraints of probabilistic retrieval prevent the simultaneous fulfillment of both ranking conditions. Therefore, the answer is **no, it is not possible** to enforce both rankings as required.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
dim_768
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 768 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.2947 |
cosine_accuracy@3 | 0.8702 |
cosine_accuracy@5 | 0.9333 |
cosine_accuracy@10 | 0.9789 |
cosine_precision@1 | 0.2947 |
cosine_precision@3 | 0.2901 |
cosine_precision@5 | 0.1867 |
cosine_precision@10 | 0.0979 |
cosine_recall@1 | 0.2947 |
cosine_recall@3 | 0.8702 |
cosine_recall@5 | 0.9333 |
cosine_recall@10 | 0.9789 |
cosine_ndcg@10 | 0.661 |
cosine_mrr@10 | 0.5552 |
cosine_map@100 | 0.5566 |
Information Retrieval
- Dataset:
dim_512
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 512 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.3053 |
cosine_accuracy@3 | 0.8702 |
cosine_accuracy@5 | 0.9228 |
cosine_accuracy@10 | 0.9719 |
cosine_precision@1 | 0.3053 |
cosine_precision@3 | 0.2901 |
cosine_precision@5 | 0.1846 |
cosine_precision@10 | 0.0972 |
cosine_recall@1 | 0.3053 |
cosine_recall@3 | 0.8702 |
cosine_recall@5 | 0.9228 |
cosine_recall@10 | 0.9719 |
cosine_ndcg@10 | 0.6643 |
cosine_mrr@10 | 0.5616 |
cosine_map@100 | 0.5636 |
Information Retrieval
- Dataset:
dim_256
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 256 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.2912 |
cosine_accuracy@3 | 0.8702 |
cosine_accuracy@5 | 0.9263 |
cosine_accuracy@10 | 0.9684 |
cosine_precision@1 | 0.2912 |
cosine_precision@3 | 0.2901 |
cosine_precision@5 | 0.1853 |
cosine_precision@10 | 0.0968 |
cosine_recall@1 | 0.2912 |
cosine_recall@3 | 0.8702 |
cosine_recall@5 | 0.9263 |
cosine_recall@10 | 0.9684 |
cosine_ndcg@10 | 0.6575 |
cosine_mrr@10 | 0.5535 |
cosine_map@100 | 0.5558 |
Information Retrieval
- Dataset:
dim_128
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 128 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.2632 |
cosine_accuracy@3 | 0.8456 |
cosine_accuracy@5 | 0.9088 |
cosine_accuracy@10 | 0.9614 |
cosine_precision@1 | 0.2632 |
cosine_precision@3 | 0.2819 |
cosine_precision@5 | 0.1818 |
cosine_precision@10 | 0.0961 |
cosine_recall@1 | 0.2632 |
cosine_recall@3 | 0.8456 |
cosine_recall@5 | 0.9088 |
cosine_recall@10 | 0.9614 |
cosine_ndcg@10 | 0.6377 |
cosine_mrr@10 | 0.5299 |
cosine_map@100 | 0.5326 |
Information Retrieval
- Dataset:
dim_64
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 64 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.2596 |
cosine_accuracy@3 | 0.8386 |
cosine_accuracy@5 | 0.9088 |
cosine_accuracy@10 | 0.9474 |
cosine_precision@1 | 0.2596 |
cosine_precision@3 | 0.2795 |
cosine_precision@5 | 0.1818 |
cosine_precision@10 | 0.0947 |
cosine_recall@1 | 0.2596 |
cosine_recall@3 | 0.8386 |
cosine_recall@5 | 0.9088 |
cosine_recall@10 | 0.9474 |
cosine_ndcg@10 | 0.6306 |
cosine_mrr@10 | 0.5248 |
cosine_map@100 | 0.5286 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 1,140 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 5 tokens
- mean: 167.15 tokens
- max: 512 tokens
- min: 3 tokens
- mean: 375.41 tokens
- max: 512 tokens
- Samples:
anchor positive Devise an algorithm that, without consensus, implements a weaker specification of NBAC by replacing the termination property with very weak termination.
Very weak termination: If no process crashes, then all processes decide. Is a failure detector needed to implement this algorithm?
To implement a weaker specification of Non-blocking Atomic Commit (NBAC) with a focus on very weak termination, we can devise a straightforward algorithm leveraging Best-Effort Broadcast. The key here is to ensure that if no processes crash, all processes should arrive at a decision, either COMMIT or ABORT.
First, each process will broadcast its proposal to all other processes. Upon receiving proposals, each process will keep track of the received messages. If a process receives only COMMIT proposals from all other processes, it decides to COMMIT; otherwise, it decides to ABORT. This design assumes that no failures occur, which is a critical part of the specification.
Watch out for the assumption that no processes crash. If even one process fails, the termination property is broken as some processes may not receive all necessary proposals, leading to a situation where decisions cannot be made consistently. This highlights that failure detection is not required in this scenar...The "Consensus-Based Total-Order Broadcast" algorithm transforms a consensus abstraction (together with a reliable broadcast abstraction) into a total-order broadcast abstraction. Describe a transformation between these two primitives in the other direction, that is, implement a (uniform) consensus abstraction from a (uniform) total-order broadcast abstraction.
To implement a (uniform) consensus abstraction from a (uniform) total-order broadcast abstraction, we can follow these steps:
1. Initialize a variabledecided
tofalse
to track if a consensus value has been reached.
2. When a process invokespropose(v)
, it uses the total-order broadcast (TO) to send the valuev
.
3. Upon receiving a TO-delivered message with a valuev
, ifdecided
is stillfalse
, the process setsdecided
totrue
and callsdecide(v)
.
This approach works because the total-order broadcast ensures that all processes deliver messages in the same order, allowing them to reach consensus on the first value that is delivered. Thus, the consensus is achieved by agreeing on the first proposed value that is TO-delivered.We learnt in the lecture that terms are typically stored in an inverted list. Now, in the inverted list, instead of only storing document identifiers of the documents in which the term appears, assume we also store an offset of the appearance of a term in a document. An $offset$ of a term $l_k$ given a document is defined as the number of words between the start of the document and $l_k$. Thus our inverted list is now: $l_k= \langle f_k: {d_{i_1} \rightarrow [o_1,\ldots,o_{n_{i_1}}]}, {d_{i_2} \rightarrow [o_1,\ldots,o_{n_{i_2}}]}, \ldots, {d_{i_k} \rightarrow [o_1,\ldots,o_{n_{i_k}}]} \rangle$ This means that in document $d_{i_1}$ term $l_k$ appears $n_{i_1}$ times and at offset $[o_1,\ldots,o_{n_{i_1}}]$, where $[o_1,\ldots,o_{n_{i_1}}]$ are sorted in ascending order, these type of indices are also known as term-offset indices. An example of a term-offset index is as follows: Obama = $⟨4 : {1 → [3]},{2 → [6]},{3 → [2,17]},{4 → [1]}⟩$ Governor = $⟨2 : {4 → [3]}, ...
### Understanding the Problem
We are tasked with analyzing a query involving the SLOP operator between two terms, "Obama" and "Election." The SLOP operator allows for flexibility in the proximity of terms within a specified number of words. Specifically, for a query of the form QueryTerm1 SLOP/x QueryTerm2, we need to find occurrences of QueryTerm1 within x words of QueryTerm2, regardless of word order.
### Term-Offset Indexes
We have the following term-offset indexes for the relevant terms:
- Obama = ( \langle 4 : {1 \rightarrow [3], 2 \rightarrow [6], 3 \rightarrow [2, 17], 4 \rightarrow [1]} \rangle )
- Election = ( \langle 4 : {1 \rightarrow [1], 2 \rightarrow [1, 21], 3 \rightarrow [3], 5 \rightarrow [16, 22, 51]} \rangle )
From these indexes, we can interpret:
- "Obama" appears in documents 1, 2, 3, and 4 at the specified offsets.
- "Election" appears in documents 1, 2, 3, and 5 at its respective offsets.
### Analyzing the SLOP Operator
We n... - Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 2per_device_eval_batch_size
: 16gradient_accumulation_steps
: 16learning_rate
: 2e-05lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Falseload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 2per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Falselocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.2807 | 10 | 0.1056 | - | - | - | - | - |
0.5614 | 20 | 0.6075 | - | - | - | - | - |
0.8421 | 30 | 0.272 | - | - | - | - | - |
1.0 | 36 | - | 0.6633 | 0.6597 | 0.6581 | 0.6378 | 0.6330 |
1.1123 | 40 | 0.1235 | - | - | - | - | - |
1.3930 | 50 | 0.3118 | - | - | - | - | - |
1.6737 | 60 | 0.2751 | - | - | - | - | - |
1.9544 | 70 | 0.0067 | - | - | - | - | - |
2.0 | 72 | - | 0.6605 | 0.6679 | 0.6592 | 0.6441 | 0.6326 |
2.2246 | 80 | 0.0981 | - | - | - | - | - |
2.5053 | 90 | 0.0005 | - | - | - | - | - |
2.7860 | 100 | 0.5609 | - | - | - | - | - |
3.0 | 108 | - | 0.6610 | 0.6643 | 0.6575 | 0.6377 | 0.6306 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.8
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.7.0+cu126
- Accelerate: 1.3.0
- Datasets: 3.6.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 3
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for cristiano-sartori/stella_finetuned1
Base model
NovaSearch/stella_en_400M_v5Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.295
- Cosine Accuracy@3 on dim 768self-reported0.870
- Cosine Accuracy@5 on dim 768self-reported0.933
- Cosine Accuracy@10 on dim 768self-reported0.979
- Cosine Precision@1 on dim 768self-reported0.295
- Cosine Precision@3 on dim 768self-reported0.290
- Cosine Precision@5 on dim 768self-reported0.187
- Cosine Precision@10 on dim 768self-reported0.098
- Cosine Recall@1 on dim 768self-reported0.295
- Cosine Recall@3 on dim 768self-reported0.870