yuriivoievidka's picture
Upload folder using huggingface_hub
233344d verified
metadata
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:10635
  - loss:MultipleNegativesSymmetricRankingLoss
base_model: sentence-transformers/multi-qa-mpnet-base-dot-v1
widget:
  - source_sentence: '12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson'
    sentences:
      - Books on Investing
      - Books on Resilience
      - Books on Motivational
  - source_sentence: >-
      Get the Guy: Learn Secrets of the Male Mind to Find the Man You Want and
      the Love You Deserve by Matthew Hussey
    sentences:
      - Books on Complexity
      - Books on Decision Making
      - Books on Self-Help for Women
  - source_sentence: >-
      The Magic of Tiny Business (You Don’t Have to Go Big to Make a Great
      Living) by Sharon Rowe
    sentences:
      - Books on Vegetarianism
      - Books on Personal Development
      - Books on Emotions
  - source_sentence: >-
      The Dorito Effect: The Surprising New Truth About Food and Flavor by Mark
      Schatzker
    sentences:
      - Books on Skincare
      - Books on Work-Life Balance
      - Books on Problem Solving
  - source_sentence: '12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson'
    sentences:
      - Books on Psychology
      - Books on Positive Thinking
      - Books on Investing
pipeline_tag: sentence-similarity
library_name: sentence-transformers

SentenceTransformer based on sentence-transformers/multi-qa-mpnet-base-dot-v1

This is a sentence-transformers model finetuned from sentence-transformers/multi-qa-mpnet-base-dot-v1 on the train dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("sentence_transformers_model_id")
# Run inference
sentences = [
    '12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson',
    'Books on Psychology',
    'Books on Positive Thinking',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Training Details

Training Dataset

train

  • Dataset: train
  • Size: 10,635 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 11 tokens
    • mean: 24.11 tokens
    • max: 60 tokens
    • min: 5 tokens
    • mean: 5.89 tokens
    • max: 10 tokens
  • Samples:
    anchor positive
    The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing by Marie Kondō Books on Organization
    The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing by Marie Kondō Books on Minimalism
    The Life-Changing Magic of Tidying Up: The Japanese Art of Decluttering and Organizing by Marie Kondō Books on Japanese Art
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

train

  • Dataset: train
  • Size: 5,359 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 8 tokens
    • mean: 22.0 tokens
    • max: 38 tokens
    • min: 4 tokens
    • mean: 5.85 tokens
    • max: 13 tokens
  • Samples:
    anchor positive
    12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson Books on Psychology
    12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson Books on Self-Help
    12 Rules For Life: An Antidote to Chaos by Jordan B. Peterson Books on Personal Development
  • Loss: MultipleNegativesSymmetricRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • learning_rate: 2e-05
  • num_train_epochs: 10
  • warmup_ratio: 0.1

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss train loss
0.6006 200 2.6385 2.4890
1.2012 400 2.3324 2.4199
1.8018 600 2.1772 2.3891
2.4024 800 2.0635 2.3691
3.0030 1000 1.9915 2.3609
3.6036 1200 1.9008 2.3689
4.2042 1400 1.8603 2.3850
4.8048 1600 1.8421 2.3468
5.4054 1800 1.785 2.3649
6.0060 2000 1.786 2.3783
6.6066 2200 1.7331 2.3782
7.2072 2400 1.7062 2.3826
7.8078 2600 1.6929 2.3926
8.4084 2800 1.6618 2.4069
9.0090 3000 1.6348 2.4155
9.6096 3200 1.6553 2.4060

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}