SentenceTransformer based on abkimc/distilroberta-base-sentence-transformer

This is a sentence-transformers model finetuned from abkimc/distilroberta-base-sentence-transformer. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False, 'architecture': 'RobertaModel'})
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("abkimc/distilroberta-base-sentence-transformer")
# Run inference
sentences = [
    'The HTC Legend has made its official debut in India days after it was informally launched .',
    'HTC Legend makes official debut in India',
    'Britain, Bill Gates join forces',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[ 1.0000,  0.9061, -0.0382],
#         [ 0.9061,  1.0000, -0.0170],
#         [-0.0382, -0.0170,  1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 180,000 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 12 tokens
    • mean: 33.68 tokens
    • max: 293 tokens
    • min: 5 tokens
    • mean: 10.98 tokens
    • max: 28 tokens
  • Samples:
    sentence_0 sentence_1
    Content is the king in today's world of journalism and a newspaper cannot survive if it compromises on the quality of the content, said Abhilash Khandekar, Maharashtra state head of Dainik Bhaskar Group on Tuesday. 'Content is king in today's journalism'
    Sammons Pensions has launched its ninth annual salary survey which aims to document remuneration packages across the industry. Sammons launches ninth salary survey
    The state of Tennessee saw a major spike in foreclosure filings in 2008, according to a report by the Tennessee Housing Development Agency. Tennessee sees major spike in foreclosure filings in 2008
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim",
        "gather_across_devices": false
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 64
  • per_device_eval_batch_size: 64
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss
0.1777 500 2.8662
0.3555 1000 0.0631
0.5332 1500 0.0149
0.7110 2000 0.0097
0.8887 2500 0.0079
1.0665 3000 0.0062
1.2442 3500 0.0041
1.4220 4000 0.0037
1.5997 4500 0.0038
1.7775 5000 0.0034
1.9552 5500 0.0038
2.1330 6000 0.0021
2.3107 6500 0.0015
2.4884 7000 0.0016
2.6662 7500 0.0015
2.8439 8000 0.0018
3.0217 8500 0.0015
3.1994 9000 0.0013
3.3772 9500 0.001
3.5549 10000 0.0011
3.7327 10500 0.0011
3.9104 11000 0.0014
4.0882 11500 0.0011
4.2659 12000 0.0007
4.4437 12500 0.0009
4.6214 13000 0.0009
4.7991 13500 0.0008
4.9769 14000 0.0008
5.1546 14500 0.0009
5.3324 15000 0.0007
5.5101 15500 0.0007
5.6879 16000 0.0007
5.8656 16500 0.0006
6.0434 17000 0.0007
6.2211 17500 0.0007
6.3989 18000 0.0005
6.5766 18500 0.0007
6.7544 19000 0.0005
6.9321 19500 0.0005
7.1098 20000 0.0005
7.2876 20500 0.0006
7.4653 21000 0.0005
7.6431 21500 0.0004
7.8208 22000 0.0004
7.9986 22500 0.0004
8.1763 23000 0.0004
8.3541 23500 0.0004
8.5318 24000 0.0005
8.7096 24500 0.0004
8.8873 25000 0.0004
9.0651 25500 0.0005
9.2428 26000 0.0004
9.4205 26500 0.0005
9.5983 27000 0.0004
9.7760 27500 0.0004
9.9538 28000 0.0004

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.55.4
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
12
Safetensors
Model size
82.1M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for abkimc/distilroberta-base-sentence-transformer

Unable to build the model tree, the base model loops to the model itself. Learn more.