SentenceTransformer based on dbourget/philai-embeddings-2.0

This is a sentence-transformers model finetuned from dbourget/philai-embeddings-2.0. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: dbourget/philai-embeddings-2.0
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 1024 tokens
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-30e")
# Run inference
sentences = [
    'scientific revolutions',
    'paradigm shifts',
    'scientific realism',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Triplet

Metric Value
cosine_accuracy 0.8215
dot_accuracy 0.2449
manhattan_accuracy 0.835
euclidean_accuracy 0.8342
max_accuracy 0.835

Training Details

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • learning_rate: 1e-06
  • weight_decay: 0.01
  • num_train_epochs: 20
  • lr_scheduler_type: constant
  • bf16: True
  • dataloader_drop_last: True
  • resume_from_checkpoint: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 138
  • per_device_eval_batch_size: 138
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-06
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 20
  • max_steps: -1
  • lr_scheduler_type: constant
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: True
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: 2
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: True
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: False
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Click to expand
Epoch Step Training Loss loss beatai-dev_max_accuracy
0 0 - - 0.8308
0.1471 10 1.056 - -
0.2941 20 1.0992 - -
0.4412 30 1.1678 - -
0.5882 40 1.1586 - -
0.7353 50 1.1777 2.0793 0.8291
0.8824 60 1.1344 - -
1.0294 70 1.0578 - -
1.1765 80 1.0981 - -
1.3235 90 1.1216 - -
1.4706 100 1.0436 2.0826 0.8283
1.6176 110 1.0422 - -
1.7647 120 1.0857 - -
1.9118 130 1.0502 - -
2.0588 140 1.0363 - -
2.2059 150 1.081 2.0763 0.8316
2.3529 160 1.1764 - -
2.5 170 1.0393 - -
2.6471 180 0.9586 - -
2.7941 190 1.0537 - -
2.9412 200 1.0313 2.0645 0.8325
3.0882 210 1.0401 - -
3.2353 220 1.0389 - -
3.3824 230 1.0225 - -
3.5294 240 1.0131 - -
3.6765 250 0.9565 2.0705 0.8308
3.8235 260 1.0059 - -
3.9706 270 0.9629 - -
4.1176 280 0.9546 - -
4.2647 290 0.989 - -
4.4118 300 1.0573 2.0514 0.8375
4.5588 310 0.894 - -
4.7059 320 1.0082 - -
4.8529 330 0.969 - -
5.0 340 0.9187 - -
5.1471 350 0.9034 2.0663 0.8350
5.2941 360 0.9043 - -
5.4412 370 0.9517 - -
5.5882 380 1.0272 - -
5.7353 390 0.95 - -
5.8824 400 0.8288 2.0400 0.8367
6.0294 410 0.9809 - -
6.1765 420 0.8776 - -
6.3235 430 0.9744 - -
6.4706 440 0.9982 - -
6.6176 450 0.9076 2.0429 0.8350
6.7647 460 0.8792 - -
6.9118 470 0.787 - -
7.0588 480 0.9506 - -
7.2059 490 0.927 - -
7.3529 500 0.9464 2.0487 0.8316
7.5 510 0.886 - -
7.6471 520 0.9142 - -
7.7941 530 0.8741 - -
7.9412 540 0.8703 - -
8.0882 550 0.8947 2.0411 0.8333
8.2353 560 0.8742 - -
8.3824 570 0.8083 - -
8.5294 580 0.9134 - -
8.6765 590 0.8197 - -
8.8235 600 0.8253 2.0272 0.8367
8.9706 610 0.8665 - -
9.1176 620 0.8853 - -
9.2647 630 0.7566 - -
9.4118 640 0.9101 - -
9.5588 650 0.801 2.0243 0.8350
9.7059 660 0.8551 - -
9.8529 670 0.8748 - -
10.0 680 0.9798 - -
10.1471 690 1.0544 - -
10.2941 700 1.2077 2.0128 0.8367
10.4412 710 1.0386 - -
10.5882 720 1.0508 - -
10.7353 730 1.0063 - -
10.8824 740 1.0758 - -
11.0294 750 1.1552 2.0031 0.8367
11.1765 760 1.0259 - -
11.3235 770 1.0724 - -
11.4706 780 1.0524 - -
11.6176 790 0.9957 - -
11.7647 800 1.0697 2.0022 0.8367
11.9118 810 1.0544 - -
12.0588 820 1.0762 - -
12.2059 830 1.0858 - -
12.3529 840 1.0418 - -
12.5 850 1.0041 1.9936 0.8392
12.6471 860 0.998 - -
12.7941 870 1.0737 - -
12.9412 880 1.0637 - -
13.0882 890 0.9689 - -
13.2353 900 1.001 1.9818 0.8392
13.3824 910 1.0418 - -
13.5294 920 1.0097 - -
13.6765 930 1.0244 - -
13.8235 940 1.0383 - -
13.9706 950 1.034 1.9798 0.8367
14.1176 960 0.9609 - -
14.2647 970 1.049 - -
14.4118 980 1.0012 - -
14.5588 990 0.9008 - -
14.7059 1000 1.0131 1.9741 0.8384
14.8529 1010 0.9714 - -
15.0 1020 0.9987 - -
15.1471 1030 1.1139 - -
15.2941 1040 1.005 - -
15.4412 1050 0.9074 1.9761 0.8359
15.5882 1060 0.9298 - -
15.7353 1070 0.9335 - -
15.8824 1080 0.9445 - -
16.0294 1090 1.0087 - -
16.1765 1100 0.9187 1.9679 0.8384
16.3235 1110 0.8502 - -
16.4706 1120 0.9924 - -
16.6176 1130 0.9982 - -
16.7647 1140 0.9643 - -
16.9118 1150 0.9491 1.9727 0.8333
17.0588 1160 0.9801 - -
17.2059 1170 0.9374 - -
17.3529 1180 0.8309 - -
17.5 1190 0.9524 - -
17.6471 1200 0.886 1.9797 0.8350
17.7941 1210 0.9026 - -
17.9412 1220 0.8859 - -
18.0882 1230 0.8745 - -
18.2353 1240 0.9474 - -
18.3824 1250 0.878 1.9737 0.8342
18.5294 1260 0.8372 - -
18.6765 1270 0.833 - -
18.8235 1280 0.9648 - -
18.9706 1290 0.918 - -
19.1176 1300 0.9588 1.9669 0.8359
19.2647 1310 1.0334 - -
19.4118 1320 0.8347 - -
19.5588 1330 0.828 - -
19.7059 1340 0.9117 - -
19.8529 1350 0.9123 1.9666 0.8350
20.0 1360 0.8538 - -

Framework Versions

  • Python: 3.8.18
  • Sentence Transformers: 3.1.1
  • Transformers: 4.45.0
  • PyTorch: 1.13.1+cu117
  • Accelerate: 0.34.2
  • Datasets: 3.0.1
  • Tokenizers: 0.20.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

TripletLoss

@misc{hermans2017defense,
    title={In Defense of the Triplet Loss for Person Re-Identification},
    author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
    year={2017},
    eprint={1703.07737},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}
Downloads last month
1
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the model is not deployed on the HF Inference API.

Model tree for dbourget/pb-small-10e-tsdae6e-philsim-cosine-6e-beatai-30e

Finetuned
(1)
this model

Evaluation results