SentenceTransformer based on sentence-transformers/paraphrase-mpnet-base-v2

This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-mpnet-base-v2 on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: sentence-transformers/paraphrase-mpnet-base-v2
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • train

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    'The marriage of Baptiste and Hannah demonstrates their commitment to sharing their lives and supporting one another.',
    'By getting married, Baptiste and Hannah take on a duty to care for each other, both emotionally and materially.',
    'If the marriage brings happiness to Baptiste and Hannah, then they are pursuing their right to happiness.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 18,963 training samples
  • Columns: anchor, positive, and negative
  • Approximate statistics based on the first 1000 samples:
    anchor positive negative
    type string string string
    details
    • min: 10 tokens
    • mean: 25.92 tokens
    • max: 51 tokens
    • min: 9 tokens
    • mean: 28.31 tokens
    • max: 60 tokens
    • min: 11 tokens
    • mean: 28.69 tokens
    • max: 67 tokens
  • Samples:
    anchor positive negative
    Saving the group of people from harm by diverting the trolley supports the value of preserving life. The group of people tied to the tracks have a right to life, which is protected when the trolley is diverted to save them. Diverting the trolley reduces overall harm by preventing the deaths of many people at the cost of one person's life.
    The bake sale could be seen as an expression of support for a particular cause, and the right to freely express oneself and associate with others who share the same views is important. The bake sale might be seen as a form of protest or support for a specific cause, and individuals have the right to engage in peaceful protest or show support. If the bake sale directly or indirectly promotes religious discrimination, this can infringe on the fundamental right of individuals to be free from discrimination or harm due to their religious beliefs.
    Children have a right to life, and saving them from danger upholds this right. Children should be protected from harm, abuse, and danger, and saving them ensures this right is respected. Children have a right to grow up with access to healthcare, education, and a nurturing environment. Saving them may help secure these rights.
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 40,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • overwrite_output_dir: True
  • per_device_train_batch_size: 32
  • learning_rate: 2.1456771788455288e-05
  • num_train_epochs: 2
  • warmup_ratio: 0.03254893834779507
  • fp16: True
  • dataloader_num_workers: 4
  • remove_unused_columns: False

All Hyperparameters

Click to expand
  • overwrite_output_dir: True
  • do_predict: False
  • eval_strategy: no
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2.1456771788455288e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.03254893834779507
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 4
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: False
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss
0.0337 20 0.2448
0.0675 40 0.1918
0.1012 60 0.14
0.1349 80 0.186
0.1686 100 0.1407
0.2024 120 0.1672
0.2361 140 0.1832
0.2698 160 0.116
0.3035 180 0.1341
0.3373 200 0.2118
0.3710 220 0.1274
0.4047 240 0.1993
0.4384 260 0.1561
0.4722 280 0.1517
0.5059 300 0.1635
0.5396 320 0.1646
0.5734 340 0.1337
0.6071 360 0.1406
0.6408 380 0.1114
0.6745 400 0.1314
0.7083 420 0.1481
0.7420 440 0.1932
0.7757 460 0.1568
0.8094 480 0.1319
0.8432 500 0.1536
0.8769 520 0.1462
0.9106 540 0.1336
0.9444 560 0.1453
0.9781 580 0.2005
1.0118 600 0.1265
1.0455 620 0.0702
1.0793 640 0.0739
1.1130 660 0.049
1.1467 680 0.0613
1.1804 700 0.0663
1.2142 720 0.0726
1.2479 740 0.0822
1.2816 760 0.0651
1.3153 780 0.0603
1.3491 800 0.0468
1.3828 820 0.061
1.4165 840 0.0891
1.4503 860 0.0607
1.4840 880 0.0673
1.5177 900 0.0728
1.5514 920 0.065
1.5852 940 0.0824
1.6189 960 0.0695
1.6526 980 0.0626
1.6863 1000 0.0525
1.7201 1020 0.0482
1.7538 1040 0.0968
1.7875 1060 0.0717
1.8212 1080 0.0704
1.8550 1100 0.0666
1.8887 1120 0.0841
1.9224 1140 0.0682
1.9562 1160 0.0584
1.9899 1180 0.0423

Framework Versions

  • Python: 3.9.21
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.5.2
  • Datasets: 3.4.1
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
2
Safetensors
Model size
109M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Rich740804/st-scale70

Finetuned
(304)
this model