e5large-en-sa-v1 / README.md
saikasyap's picture
Initial commit
8e0e716 verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - dense
  - generated_from_trainer
  - dataset_size:257886
  - loss:MultipleNegativesRankingLoss
base_model: intfloat/multilingual-e5-large
widget:
  - source_sentence: >-
      Wherever and whenever they saw any creature, any dweller of the Khandava,
      escaping from the fire, those two great heroes immediately shot it down.
    sentences:
      - वयं पठाम 
      - |
        दि अमोङ्ग- अस् कुक्कुटस्य खण्डः पैरेडोलिया इत्यस्य उदाहरणम् अस्ति।
      - >-
        यत्र यत्र च दृश्यन्ते प्राणिनः खाण्डवालयाः। पलायन्तः प्रवीरौ तौ तत्र
        तत्राभ्यधावताम्॥
  - source_sentence: >
      Residents were trapped in houses and elsewhere as the roads turned into
      rivers.
    sentences:
      - वयमधुना षट्-लेबल्स् योजितवन्तः।
      - >
        पदवीषु नद्यायमानासु अन्यत्र गन्तुम् अकल्पाः वस्तव्याः गृहेष्वेव निबद्धाः
        आसन्।
      - >-
        स्व॒स्ति न॒ इन्द्रो॑ वृ॒द्धश्र॑वाः स्व॒स्ति नः॑ पू॒षा वि॒श्ववे॑दाः ।
        स्व॒स्ति न॒स्तार्क्ष्यो॒ अरि॑ष्टनेमिः स्व॒स्ति नो॒ बृह॒स्पति॑र्दधातु  
  - source_sentence: From this street the village is seen.
    sentences:
      - >-
        धर्मदण्डो न निर्दण्डो धर्मकार्यानुशासकः। यन्त्रितः कार्यकरणैः
        षड्भागकृतलक्षणः॥
      - एतस्याः वीथ्याः ग्रामं दृश्यते 
      - >
        भवता पत्रकर्त्रा नगरे सामुदायिकायाः हिंसायाः विषये मिथ्यावार्ताः
        प्रकाशिताः इत्यतः जनाः भीताः सन्ति।
  - source_sentence: >
      Visitors have put poppies next to the names of their relatives and
      friends.
    sentences:
      - >-
        परी॒तो षि॑ञ्चता सु॒तं सोमो॒ य उ॑त्त॒मं ह॒विः । द॒ध॒न्वाँ यो नर्यो॑
        अ॒प्स्व१॒॑न्तरा सु॒षाव॒ सोम॒मद्रि॑भिः  
      - >
        सन्दर्शकाः स्वीयानां सम्बन्धिनां, सुहृदां च नाम्नः पार्श्वे पोप्पीस्
        न्यक्षिपन्।
      - >
        बीबीगढ्-गृहं यत्र आङ्ग्लस्त्रियः, बालकाः च हताः, तथा च कूपः यस्मात्
        मृतानां शवाः च प्राप्ताः।
  - source_sentence: |
      The majority of these nations are now republics or part of republics.
    sentences:
      - |
        एतेषु अधिकांशाः देशाः अधुना गणराज्यानि उत गणराज्यानां भागाः वा सन्ति।
      - >-
        तदिन्द्रजालप्रतिम बाणजालममित्रहा। विसृज्य दिक्षु सर्वासु महेन्द्र इव
        वज्रभृत्॥
      - >-
        अत्र मूलसञ्चिका (source file) विद्यते। pdflatex इत्यादेशमुपयुज्य
        सङ्कलयामि।
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - src2trg_accuracy
  - trg2src_accuracy
  - mean_accuracy
model-index:
  - name: SentenceTransformer based on intfloat/multilingual-e5-large
    results:
      - task:
          type: translation
          name: Translation
        dataset:
          name: eval en sa
          type: eval-en-sa
        metrics:
          - type: src2trg_accuracy
            value: 0.866
            name: Src2Trg Accuracy
          - type: trg2src_accuracy
            value: 0.868
            name: Trg2Src Accuracy
          - type: mean_accuracy
            value: 0.867
            name: Mean Accuracy

SentenceTransformer based on intfloat/multilingual-e5-large

This is a sentence-transformers model finetuned from intfloat/multilingual-e5-large. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: intfloat/multilingual-e5-large
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The majority of these nations are now republics or part of republics.\n',
    'एतेषु अधिकांशाः देशाः अधुना गणराज्यानि उत गणराज्यानां भागाः वा सन्ति।\n',
    'अत्र मूलसञ्चिका (source file) विद्यते। pdflatex इत्यादेशमुपयुज्य सङ्कलयामि।',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.8049, 0.1296],
#         [0.8049, 1.0000, 0.1642],
#         [0.1296, 0.1642, 1.0000]])

Evaluation

Metrics

Translation

Metric Value
src2trg_accuracy 0.866
trg2src_accuracy 0.868
mean_accuracy 0.867

Training Details

Training Dataset

Unnamed Dataset

  • Size: 257,886 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 5 tokens
    • mean: 33.91 tokens
    • max: 403 tokens
    • min: 6 tokens
    • mean: 37.33 tokens
    • max: 228 tokens
  • Samples:
    sentence_0 sentence_1
    "For the purpose of this tutorial, we shall list these instructions in slides." अस्य पाठस्य आनुकूल्याय स्लैड् द्वारा आदेशान् वदामः ।
    Gandharva prajapati, Vishwakarma and mana swaroop. Please protect Gandharva Brahmins and Kshatriyas. Riku and Sama have an apsara named Ashti. Please protect us. This sacrifice is an offering for them. Swaha for them. (43) प्र॒जाप॑तिर्वि॒श्वक॑र्मा॒ मनो॑ गन्ध॒र्वस्तस्य॑ऽऋ॒क्सा॒मान्य॑प्स॒रस॒ऽएष्ट॑यो॒ नाम॑। स न॑ऽइ॒दं ब्रह्म॑ क्ष॒त्रं पा॑तु॒ तस्मै॒ स्वाहा॒ वाट् ताभ्यः॒ स्वाहा॑ ॥ (४३)
    Many things are sold to treat acne, the most popular being benzoyl peroxide.
    आक्ने-चिकित्सार्थं नाइकानि वस्तूनि विक्रीयन्ते, तेषु अतिजनप्रियं बेन्ज़ोय्ल् पराक्सैड्।
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • num_train_epochs: 15
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 15
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss eval-en-sa_mean_accuracy
0.0078 500 0.2715 -
0.0155 1000 0.0402 -
0.0233 1500 0.0323 -
0.0310 2000 0.0305 -
0.0388 2500 0.0169 -
0.0465 3000 0.0122 -
0.0543 3500 0.011 -
0.0620 4000 0.0134 -
0.0698 4500 0.0081 -
0.0776 5000 0.0177 -
0.0853 5500 0.0195 -
0.0931 6000 0.014 -
0.1008 6500 0.0226 -
0.1086 7000 0.0122 -
0.1163 7500 0.0156 -
0.1241 8000 0.0192 -
0.1318 8500 0.023 -
0.1396 9000 0.0153 -
0.1474 9500 0.0275 -
0.1551 10000 0.0272 -
0.1629 10500 0.0222 -
0.1706 11000 0.0134 -
0.1784 11500 0.0216 -
0.1861 12000 0.0152 -
0.1939 12500 0.0104 -
0.2016 13000 0.0178 -
0.2094 13500 0.0209 -
0.2171 14000 0.0211 -
0.2249 14500 0.0198 -
0.2327 15000 0.0212 -
0.2404 15500 0.0177 -
0.2482 16000 0.0221 -
0.2559 16500 0.0206 -
0.2637 17000 0.0181 -
0.2714 17500 0.0165 -
0.2792 18000 0.0145 -
0.2869 18500 0.0139 -
0.2947 19000 0.0198 -
0.3025 19500 0.0139 -
0.3102 20000 0.0177 -
0.3180 20500 0.0104 -
0.3257 21000 0.0149 -
0.3335 21500 0.0144 -
0.3412 22000 0.0168 -
0.3490 22500 0.0156 -
0.3567 23000 0.0132 -
0.3645 23500 0.0152 -
0.3723 24000 0.0147 -
0.3800 24500 0.0142 -
0.3878 25000 0.018 -
0.3955 25500 0.0246 -
0.4033 26000 0.0105 -
0.4110 26500 0.0097 -
0.4188 27000 0.0145 -
0.4265 27500 0.0136 -
0.4343 28000 0.0182 -
0.4421 28500 0.016 -
0.4498 29000 0.0088 -
0.4576 29500 0.0106 -
0.4653 30000 0.02 -
0.4731 30500 0.0153 -
0.4808 31000 0.0118 -
0.4886 31500 0.0141 -
0.4963 32000 0.0194 -
0.5041 32500 0.0149 -
0.5119 33000 0.0099 -
0.5196 33500 0.0212 -
0.5274 34000 0.0112 -
0.5351 34500 0.0175 -
0.5429 35000 0.0149 -
0.5506 35500 0.0142 -
0.5584 36000 0.0174 -
0.5661 36500 0.0146 -
0.5739 37000 0.0186 -
0.5816 37500 0.0167 -
0.5894 38000 0.0356 -
0.5972 38500 0.0195 -
0.6049 39000 0.0165 -
0.6127 39500 0.0202 -
0.6204 40000 0.0142 -
0.6282 40500 0.0104 -
0.6359 41000 0.0104 -
0.6437 41500 0.0155 -
0.6514 42000 0.0056 -
0.6592 42500 0.0102 -
0.6670 43000 0.0096 -
0.6747 43500 0.0219 -
0.6825 44000 0.0106 -
0.6902 44500 0.0129 -
0.6980 45000 0.0152 -
0.7057 45500 0.0158 -
0.7135 46000 0.0082 -
0.7212 46500 0.0159 -
0.7290 47000 0.0184 -
0.7368 47500 0.0101 -
0.7445 48000 0.0101 -
0.7523 48500 0.0115 -
0.7600 49000 0.0111 -
0.7678 49500 0.0116 -
0.7755 50000 0.0085 0.867

Framework Versions

  • Python: 3.10.18
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.1
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.10.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}