rbhatia46's picture
Add new SentenceTransformer model.
7ea0f12 verified
|
raw
history blame
32.5 kB
metadata
base_model: mixedbread-ai/mxbai-embed-large-v1
datasets: []
language:
  - en
library_name: sentence-transformers
license: apache-2.0
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
pipeline_tag: sentence-similarity
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:3550
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
widget:
  - source_sentence: >-
      At the end of 2023, Alphabet Inc. reported total debts amounting to $14.2
      billion, compared to $10.9 billion at the end of 2022.
    sentences:
      - What was the total debt of Alphabet Inc. as of the end of 2023?
      - >-
        What was ExxonMobil's contribution to the energy production in the
        Energy sector during 2020?
      - Describe Amazon's revenue growth in 2023?
  - source_sentence: >-
      In 2022, Pfizer strategically managed cash flow from investments by
      utilizing operating cash flow, issuing new debt, and through the
      monetization of certain non-core assets. This approach of diversifying the
      source of funding for investments was done to minimize risk and
      uncertainty in economic conditions.
    sentences:
      - >-
        How much capital expenditure did AUX Energy invest in renewable energy
        projects in 2022?
      - >-
        What effect did the 2023 market downturn have on Amazon's retail and
        cloud segments?
      - How did Pfizer manage cash flows from investments in 2022?
  - source_sentence: >-
      The primary revenue generators for JPMorgan Chase for the fiscal year 2023
      were the Corporate & Investment Bank (CIB) and the Asset & Wealth
      Management (AWM) sectors. The CIB sector benefited from a rise in merger
      and acquisition activities, while AWM saw large net inflows.
    sentences:
      - >-
        What is General Electric's strategic priority for its Aviation business
        segment?
      - >-
        Which sectors contributed the most to the revenue of JPMorgan Chase for
        FY 2023?
      - What is the principal activity of Apple Inc.?
  - source_sentence: >-
      For the fiscal year 2023, Microsoft's Intelligent Cloud segment generated
      revenues of $58 billion, demonstrating solid growth fueled by strong
      demand for cloud services and server products.
    sentences:
      - >-
        What is the primary strategy of McDonald’s to drive growth in the
        future?
      - >-
        What impact did the increase in gold prices have on Newmont
        Corporation's revenue in 2023?
      - >-
        What was the revenue generated by Microsoft's Intelligent Cloud segment
        for fiscal year 2023?
  - source_sentence: >-
      Microsoft, in their latest press release, revealed that they are
      anticipating a revenue growth of approximately 12% for the fiscal year
      ending in 2024.
    sentences:
      - What is Microsoft's projected revenue growth for fiscal year 2024?
      - >-
        What is the fair value of equity method investments of Microsoft in the
        fiscal year 2025?
      - What was the impact of COVID-19 on Zoom's profits?
model-index:
  - name: mxbai-embed-large-v1-financial-rag-matryoshka
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 1024
          type: dim_1024
        metrics:
          - type: cosine_accuracy@1
            value: 0.8455696202531645
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9392405063291139
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9670886075949368
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9898734177215189
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8455696202531645
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.31308016877637135
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19341772151898737
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0989873417721519
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8455696202531645
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9392405063291139
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9670886075949368
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9898734177215189
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9212281141643793
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.898873819570022
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8993853803492357
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.8455696202531645
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9392405063291139
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9670886075949368
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9898734177215189
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8455696202531645
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3130801687763713
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.1934177215189873
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0989873417721519
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8455696202531645
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9392405063291139
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9670886075949368
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9898734177215189
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9217284365901642
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8994826200522402
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8999494134557425
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.8405063291139241
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9367088607594937
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9645569620253165
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9898734177215189
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8405063291139241
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.31223628691983124
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19291139240506328
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0989873417721519
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8405063291139241
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9367088607594937
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9645569620253165
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9898734177215189
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9186273598847787
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8954631303998389
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8958871142668611
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.8455696202531645
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9392405063291139
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9645569620253165
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9898734177215189
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8455696202531645
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3130801687763713
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19291139240506328
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.0989873417721519
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8455696202531645
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9392405063291139
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9645569620253165
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9898734177215189
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9201161947922436
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8975597749648381
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8979721416614026
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.8405063291139241
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9417721518987342
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9645569620253165
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9848101265822785
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8405063291139241
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3139240506329114
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19291139240506328
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09848101265822784
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8405063291139241
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9417721518987342
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9645569620253165
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9848101265822785
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9170562815583235
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8948693992364878
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8957325656059834
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.8405063291139241
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.9316455696202531
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.9569620253164557
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9822784810126582
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.8405063291139241
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.3105485232067511
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.19139240506329114
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.09822784810126582
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.8405063291139241
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.9316455696202531
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.9569620253164557
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.9822784810126582
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.9153318022971121
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8934589109905566
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.8943102728098851
            name: Cosine Map@100

mxbai-embed-large-v1-financial-rag-matryoshka

This is a sentence-transformers model finetuned from mixedbread-ai/mxbai-embed-large-v1. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: mixedbread-ai/mxbai-embed-large-v1
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("rbhatia46/mxbai-embed-large-v1-financial-rag-matryoshka")
# Run inference
sentences = [
    'Microsoft, in their latest press release, revealed that they are anticipating a revenue growth of approximately 12% for the fiscal year ending in 2024.',
    "What is Microsoft's projected revenue growth for fiscal year 2024?",
    "What was the impact of COVID-19 on Zoom's profits?",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.8456
cosine_accuracy@3 0.9392
cosine_accuracy@5 0.9671
cosine_accuracy@10 0.9899
cosine_precision@1 0.8456
cosine_precision@3 0.3131
cosine_precision@5 0.1934
cosine_precision@10 0.099
cosine_recall@1 0.8456
cosine_recall@3 0.9392
cosine_recall@5 0.9671
cosine_recall@10 0.9899
cosine_ndcg@10 0.9212
cosine_mrr@10 0.8989
cosine_map@100 0.8994

Information Retrieval

Metric Value
cosine_accuracy@1 0.8456
cosine_accuracy@3 0.9392
cosine_accuracy@5 0.9671
cosine_accuracy@10 0.9899
cosine_precision@1 0.8456
cosine_precision@3 0.3131
cosine_precision@5 0.1934
cosine_precision@10 0.099
cosine_recall@1 0.8456
cosine_recall@3 0.9392
cosine_recall@5 0.9671
cosine_recall@10 0.9899
cosine_ndcg@10 0.9217
cosine_mrr@10 0.8995
cosine_map@100 0.8999

Information Retrieval

Metric Value
cosine_accuracy@1 0.8405
cosine_accuracy@3 0.9367
cosine_accuracy@5 0.9646
cosine_accuracy@10 0.9899
cosine_precision@1 0.8405
cosine_precision@3 0.3122
cosine_precision@5 0.1929
cosine_precision@10 0.099
cosine_recall@1 0.8405
cosine_recall@3 0.9367
cosine_recall@5 0.9646
cosine_recall@10 0.9899
cosine_ndcg@10 0.9186
cosine_mrr@10 0.8955
cosine_map@100 0.8959

Information Retrieval

Metric Value
cosine_accuracy@1 0.8456
cosine_accuracy@3 0.9392
cosine_accuracy@5 0.9646
cosine_accuracy@10 0.9899
cosine_precision@1 0.8456
cosine_precision@3 0.3131
cosine_precision@5 0.1929
cosine_precision@10 0.099
cosine_recall@1 0.8456
cosine_recall@3 0.9392
cosine_recall@5 0.9646
cosine_recall@10 0.9899
cosine_ndcg@10 0.9201
cosine_mrr@10 0.8976
cosine_map@100 0.898

Information Retrieval

Metric Value
cosine_accuracy@1 0.8405
cosine_accuracy@3 0.9418
cosine_accuracy@5 0.9646
cosine_accuracy@10 0.9848
cosine_precision@1 0.8405
cosine_precision@3 0.3139
cosine_precision@5 0.1929
cosine_precision@10 0.0985
cosine_recall@1 0.8405
cosine_recall@3 0.9418
cosine_recall@5 0.9646
cosine_recall@10 0.9848
cosine_ndcg@10 0.9171
cosine_mrr@10 0.8949
cosine_map@100 0.8957

Information Retrieval

Metric Value
cosine_accuracy@1 0.8405
cosine_accuracy@3 0.9316
cosine_accuracy@5 0.957
cosine_accuracy@10 0.9823
cosine_precision@1 0.8405
cosine_precision@3 0.3105
cosine_precision@5 0.1914
cosine_precision@10 0.0982
cosine_recall@1 0.8405
cosine_recall@3 0.9316
cosine_recall@5 0.957
cosine_recall@10 0.9823
cosine_ndcg@10 0.9153
cosine_mrr@10 0.8935
cosine_map@100 0.8943

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,550 training samples
  • Columns: positive and anchor
  • Approximate statistics based on the first 1000 samples:
    positive anchor
    type string string
    details
    • min: 17 tokens
    • mean: 44.69 tokens
    • max: 105 tokens
    • min: 10 tokens
    • mean: 18.26 tokens
    • max: 30 tokens
  • Samples:
    positive anchor
    The total revenue for Google as of 2021 stands at approximately $181 billion, primarily driven by the performance of its advertising and cloud segments, hailing from the Information Technology sector. What is the total revenue of Google as of 2021?
    In Q4 2021, Amazon.com Inc. reported a significant increase in net income, reaching $14.3 billion, due to the surge in online shopping during the pandemic. What was the Net Income of Amazon.com Inc. in Q4 2021?
    Coca-Cola reported full-year 2021 revenue of $37.3 billion, a rise of 13% compared to $33.0 billion in 2020. This was primarily due to strong volume growth as well as improved pricing and mix. How did Coca-Cola's revenue performance in 2021 measure against its previous year?
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_map@100 dim_128_cosine_map@100 dim_256_cosine_map@100 dim_512_cosine_map@100 dim_64_cosine_map@100 dim_768_cosine_map@100
0.8649 6 - 0.8783 0.8651 0.8713 0.8783 0.8439 0.8809
1.4414 10 0.7682 - - - - - -
1.8739 13 - 0.8918 0.8827 0.8875 0.8918 0.8729 0.8933
2.8829 20 0.1465 0.8948 0.8896 0.8928 0.8961 0.8884 0.8953
3.8919 27 - 0.8930 0.8884 0.8917 0.8959 0.8900 0.8945
4.3243 30 0.0646 - - - - - -
4.9009 34 - 0.8972 0.8883 0.8947 0.8955 0.8925 0.8970
5.7658 40 0.0397 - - - - - -
5.9099 41 - 0.8964 0.8915 0.8953 0.8943 0.8926 0.8979
6.9189 48 - 0.8994 0.8930 0.8966 0.8955 0.8932 0.8974
7.2072 50 0.0319 - - - - - -
7.9279 55 - 0.8998 0.8945 0.8967 0.8961 0.8943 0.8999
8.6486 60 0.0296 0.8994 0.8957 0.898 0.8959 0.8943 0.8999
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.6
  • Sentence Transformers: 3.0.1
  • Transformers: 4.41.2
  • PyTorch: 2.1.2+cu121
  • Accelerate: 0.32.1
  • Datasets: 2.19.1
  • Tokenizers: 0.19.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning}, 
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply}, 
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}