SentenceTransformer based on sentence-transformers/paraphrase-mpnet-base-v2
This is a sentence-transformers model finetuned from sentence-transformers/paraphrase-mpnet-base-v2 on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/paraphrase-mpnet-base-v2
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- train
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the ๐ค Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
'The marriage of Baptiste and Hannah demonstrates their commitment to sharing their lives and supporting one another.',
'By getting married, Baptiste and Hannah take on a duty to care for each other, both emotionally and materially.',
'If the marriage brings happiness to Baptiste and Hannah, then they are pursuing their right to happiness.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
train
- Dataset: train
- Size: 18,963 training samples
- Columns:
anchor
,positive
, andnegative
- Approximate statistics based on the first 1000 samples:
anchor positive negative type string string string details - min: 10 tokens
- mean: 25.92 tokens
- max: 51 tokens
- min: 9 tokens
- mean: 28.31 tokens
- max: 60 tokens
- min: 11 tokens
- mean: 28.69 tokens
- max: 67 tokens
- Samples:
anchor positive negative Saving the group of people from harm by diverting the trolley supports the value of preserving life.
The group of people tied to the tracks have a right to life, which is protected when the trolley is diverted to save them.
Diverting the trolley reduces overall harm by preventing the deaths of many people at the cost of one person's life.
The bake sale could be seen as an expression of support for a particular cause, and the right to freely express oneself and associate with others who share the same views is important.
The bake sale might be seen as a form of protest or support for a specific cause, and individuals have the right to engage in peaceful protest or show support.
If the bake sale directly or indirectly promotes religious discrimination, this can infringe on the fundamental right of individuals to be free from discrimination or harm due to their religious beliefs.
Children have a right to life, and saving them from danger upholds this right.
Children should be protected from harm, abuse, and danger, and saving them ensures this right is respected.
Children have a right to grow up with access to healthcare, education, and a nurturing environment. Saving them may help secure these rights.
- Loss:
MultipleNegativesRankingLoss
with these parameters:{ "scale": 40, "similarity_fct": "cos_sim" }
Training Hyperparameters
Non-Default Hyperparameters
overwrite_output_dir
: Trueper_device_train_batch_size
: 32learning_rate
: 2.1456771788455288e-05num_train_epochs
: 2warmup_ratio
: 0.03254893834779507fp16
: Truedataloader_num_workers
: 4remove_unused_columns
: False
All Hyperparameters
Click to expand
overwrite_output_dir
: Truedo_predict
: Falseeval_strategy
: noprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 8per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2.1456771788455288e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 2max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.03254893834779507warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 4dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Falselabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss |
---|---|---|
0.0337 | 20 | 0.2448 |
0.0675 | 40 | 0.1918 |
0.1012 | 60 | 0.14 |
0.1349 | 80 | 0.186 |
0.1686 | 100 | 0.1407 |
0.2024 | 120 | 0.1672 |
0.2361 | 140 | 0.1832 |
0.2698 | 160 | 0.116 |
0.3035 | 180 | 0.1341 |
0.3373 | 200 | 0.2118 |
0.3710 | 220 | 0.1274 |
0.4047 | 240 | 0.1993 |
0.4384 | 260 | 0.1561 |
0.4722 | 280 | 0.1517 |
0.5059 | 300 | 0.1635 |
0.5396 | 320 | 0.1646 |
0.5734 | 340 | 0.1337 |
0.6071 | 360 | 0.1406 |
0.6408 | 380 | 0.1114 |
0.6745 | 400 | 0.1314 |
0.7083 | 420 | 0.1481 |
0.7420 | 440 | 0.1932 |
0.7757 | 460 | 0.1568 |
0.8094 | 480 | 0.1319 |
0.8432 | 500 | 0.1536 |
0.8769 | 520 | 0.1462 |
0.9106 | 540 | 0.1336 |
0.9444 | 560 | 0.1453 |
0.9781 | 580 | 0.2005 |
1.0118 | 600 | 0.1265 |
1.0455 | 620 | 0.0702 |
1.0793 | 640 | 0.0739 |
1.1130 | 660 | 0.049 |
1.1467 | 680 | 0.0613 |
1.1804 | 700 | 0.0663 |
1.2142 | 720 | 0.0726 |
1.2479 | 740 | 0.0822 |
1.2816 | 760 | 0.0651 |
1.3153 | 780 | 0.0603 |
1.3491 | 800 | 0.0468 |
1.3828 | 820 | 0.061 |
1.4165 | 840 | 0.0891 |
1.4503 | 860 | 0.0607 |
1.4840 | 880 | 0.0673 |
1.5177 | 900 | 0.0728 |
1.5514 | 920 | 0.065 |
1.5852 | 940 | 0.0824 |
1.6189 | 960 | 0.0695 |
1.6526 | 980 | 0.0626 |
1.6863 | 1000 | 0.0525 |
1.7201 | 1020 | 0.0482 |
1.7538 | 1040 | 0.0968 |
1.7875 | 1060 | 0.0717 |
1.8212 | 1080 | 0.0704 |
1.8550 | 1100 | 0.0666 |
1.8887 | 1120 | 0.0841 |
1.9224 | 1140 | 0.0682 |
1.9562 | 1160 | 0.0584 |
1.9899 | 1180 | 0.0423 |
Framework Versions
- Python: 3.9.21
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.6.0+cu124
- Accelerate: 1.5.2
- Datasets: 3.4.1
- Tokenizers: 0.21.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 2