SentenceTransformer based on rasyosef/roberta-amharic-text-embedding-medium

This is a sentence-transformers model finetuned from rasyosef/roberta-amharic-text-embedding-medium. It maps sentences & paragraphs to a 512-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 510, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 512, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mogesa/Roberta-amharic-news-sentence-transformer")
# Run inference
sentences = [
    'የግሉ ወረት እና አፍሪቃ',
    '« በፊናንሱ ተቋማዊ ተሀድሶ የተነሳ የአፍሪቃ ህብረት ለቀጣዩ በጀቱ 12 ከመቶ ቁጠባ አድርጓል በዚህ አባል ሀገራት ያበረከቱት አስተዋፅኦ ትልቅ ነው',
    'በሱዳን ጉዳይ ጣልቃ በመግባት የነዳጅ የሌሎች የተፈጥሮ ሀብቷን የመቀራመት እድል ሊፈጠር ሰበብ የሚሰጡ ሀገራት መኖራቸው ደግሞ ሁለተኛው ምክንያት ነው',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 512]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 217,850 training samples
  • Columns: sentence_0, sentence_1, and label
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1 label
    type string string float
    details
    • min: 6 tokens
    • mean: 11.96 tokens
    • max: 35 tokens
    • min: 8 tokens
    • mean: 21.1 tokens
    • max: 57 tokens
    • min: -0.17
    • mean: 0.37
    • max: 0.9
  • Samples:
    sentence_0 sentence_1 label
    በማእከላዊ ጎንደር ዞን ጠገዴ የታገቱት ስድስት ታዳጊዎች ለምን ተገደሉ "ቦታው ዘወር ያለ ነበር ኮከራ ቀበሌ የሚባል ድሮም 'የሽፍታ መጠጊያ' ይባላል 0.33186144
    የኢትዮ-ምህዳር ጋዜጣ ዋና አዘጋጅ ታሰረ ዋና አዘጋጁ በወንጀል ህግ በአንቀፅ 613 “ስማ ማጥፋት የሀሰት ሀሜት” በሚል የተቀመጠውን ተላልፏል በሚል የተከሰሰው 0.50249875
    አምባሳደር ሺን ፤ ኢትዮጵያና ኤርትራ አምባሳደሩ ቀደም በአለም አቀፍ ፍርድ ቤት በተደረገ ድርድር ውጤት ባድመ የኤርትራ መሆኗን እትዮጵያውያን መቀበል ይኖርባቸዋል ብለዋል 0.54789203
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Epoch Step Training Loss
0.0367 500 1.2372
0.0734 1000 1.0754
0.1102 1500 1.0128
0.1469 2000 0.9841
0.1836 2500 0.944
0.2203 3000 0.9168
0.2571 3500 0.8863
0.2938 4000 0.8685
0.3305 4500 0.8575
0.3672 5000 0.8637
0.4039 5500 0.8353
0.4407 6000 0.8147
0.4774 6500 0.7913
0.5141 7000 0.7751
0.5508 7500 0.7719
0.5875 8000 0.7605
0.6243 8500 0.7206
0.6610 9000 0.7219
0.6977 9500 0.7302
0.7344 10000 0.7307
0.7712 10500 0.7019
0.8079 11000 0.7127
0.8446 11500 0.6693
0.8813 12000 0.6934
0.9180 12500 0.6721
0.9548 13000 0.6657
0.9915 13500 0.6696
1.0282 14000 0.5583
1.0649 14500 0.5335
1.1016 15000 0.5234
1.1384 15500 0.5192
1.1751 16000 0.5317
1.2118 16500 0.5325
1.2485 17000 0.5201
1.2853 17500 0.5096
1.3220 18000 0.5001
1.3587 18500 0.5015
1.3954 19000 0.4862
1.4321 19500 0.4901
1.4689 20000 0.5168
1.5056 20500 0.499
1.5423 21000 0.4937
1.5790 21500 0.4772
1.6157 22000 0.4709
1.6525 22500 0.4971
1.6892 23000 0.485
1.7259 23500 0.4689
1.7626 24000 0.4789
1.7994 24500 0.4606
1.8361 25000 0.4711
1.8728 25500 0.4774
1.9095 26000 0.4649
1.9462 26500 0.4779
1.9830 27000 0.4703
2.0197 27500 0.4202
2.0564 28000 0.389
2.0931 28500 0.3824
2.1298 29000 0.3682
2.1666 29500 0.3764
2.2033 30000 0.366
2.2400 30500 0.3723
2.2767 31000 0.38
2.3135 31500 0.3632
2.3502 32000 0.3817
2.3869 32500 0.3894
2.4236 33000 0.3844
2.4603 33500 0.3761
2.4971 34000 0.3871
2.5338 34500 0.3672
2.5705 35000 0.3621
2.6072 35500 0.3907
2.6439 36000 0.3688
2.6807 36500 0.3653
2.7174 37000 0.3632
2.7541 37500 0.3698
2.7908 38000 0.3696
2.8276 38500 0.3624
2.8643 39000 0.3731
2.9010 39500 0.3634
2.9377 40000 0.3504
2.9744 40500 0.3643

Framework Versions

  • Python: 3.11.11
  • Sentence Transformers: 4.1.0
  • Transformers: 4.48.3
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.5.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
8
Safetensors
Model size
42.1M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mogesa/Roberta-amharic-news-sentence-transformer