SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("KatjaK/gnd_retriever_full")
# Run inference
sentences = [
    'Das Silberkomplott',
    'Manipulation',
    'Vergangenheitsbewältigung',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2744, 0.1445],
#         [0.2744, 1.0000, 0.0990],
#         [0.1445, 0.0990, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

  • Size: 2,627,253 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 20.23 tokens
    • max: 74 tokens
    • min: 3 tokens
    • mean: 5.24 tokens
    • max: 20 tokens
  • Samples:
    anchor positive
    Technikphilosophie zur Einführung Technikphilosophie
    Anreizsysteme zur Steuerung der Hersteller-Händler-Beziehung in der Automobilindustrie Kraftfahrzeugindustrie
    Anreizsysteme zur Steuerung der Hersteller-Händler-Beziehung in der Automobilindustrie Beziehungsmanagement
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 3,203 evaluation samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 22.29 tokens
    • max: 81 tokens
    • min: 3 tokens
    • mean: 6.16 tokens
    • max: 26 tokens
  • Samples:
    anchor positive
    Synökologische Studien zum simultanen Befall von Winterweizen (Triticum aestivum L.) mit Aphiden und getreidepathogenen Pilzen Ernteertrag
    Synökologische Studien zum simultanen Befall von Winterweizen (Triticum aestivum L.) mit Aphiden und getreidepathogenen Pilzen Phytopathogene Pilze
    Synökologische Studien zum simultanen Befall von Winterweizen (Triticum aestivum L.) mit Aphiden und getreidepathogenen Pilzen Winterweizen
  • Loss: MultipleNegativesRankingLoss with these parameters:
    {
        "scale": 20.0,
        "similarity_fct": "cos_sim"
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • learning_rate: 1e-05
  • num_train_epochs: 2

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 32
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 1e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 2
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Click to expand
Epoch Step Training Loss Validation Loss
0.0061 500 1.1036 -
0.0122 1000 1.0041 1.0189
0.0183 1500 0.945 -
0.0244 2000 0.9385 0.9852
0.0304 2500 0.9184 -
0.0365 3000 0.8971 0.9426
0.0426 3500 0.8749 -
0.0487 4000 0.8655 0.9245
0.0548 4500 0.8616 -
0.0609 5000 0.8459 0.9042
0.0670 5500 0.8372 -
0.0731 6000 0.8311 0.9032
0.0792 6500 0.8385 -
0.0853 7000 0.8295 0.8817
0.0913 7500 0.824 -
0.0974 8000 0.8309 0.8769
0.1035 8500 0.8093 -
0.1096 9000 0.8038 0.8593
0.1157 9500 0.7933 -
0.1218 10000 0.7978 0.8567
0.1279 10500 0.7832 -
0.1340 11000 0.7789 0.8536
0.1401 11500 0.784 -
0.1462 12000 0.783 0.8428
0.1522 12500 0.7695 -
0.1583 13000 0.7805 0.8412
0.1644 13500 0.7727 -
0.1705 14000 0.7642 0.8276
0.1766 14500 0.7578 -
0.1827 15000 0.7555 0.8285
0.1888 15500 0.759 -
0.1949 16000 0.7464 0.8125
0.2010 16500 0.7317 -
0.2071 17000 0.7341 0.8087
0.2131 17500 0.7564 -
0.2192 18000 0.7329 0.8105
0.2253 18500 0.7266 -
0.2314 19000 0.7404 0.8094
0.2375 19500 0.7334 -
0.2436 20000 0.7436 0.8065
0.2497 20500 0.7453 -
0.2558 21000 0.7201 0.7896
0.2619 21500 0.7223 -
0.2680 22000 0.7183 0.7864
0.2740 22500 0.7097 -
0.2801 23000 0.7132 0.7980
0.2862 23500 0.7107 -
0.2923 24000 0.7217 0.7940
0.2984 24500 0.7019 -
0.3045 25000 0.7183 0.7903
0.3106 25500 0.6922 -
0.3167 26000 0.7096 0.7818
0.3228 26500 0.7062 -
0.3289 27000 0.7184 0.7869
0.3349 27500 0.7002 -
0.3410 28000 0.708 0.7813
0.3471 28500 0.7117 -
0.3532 29000 0.7128 0.7715
0.3593 29500 0.7046 -
0.3654 30000 0.6814 0.7755
0.3715 30500 0.6898 -
0.3776 31000 0.6773 0.7884
0.3837 31500 0.6991 -
0.3898 32000 0.703 0.7697
0.3958 32500 0.688 -
0.4019 33000 0.7101 0.7813
0.4080 33500 0.6873 -
0.4141 34000 0.6866 0.7658
0.4202 34500 0.6803 -
0.4263 35000 0.6748 0.7574
0.4324 35500 0.6844 -
0.4385 36000 0.6719 0.7483
0.4446 36500 0.6738 -
0.4507 37000 0.6798 0.7524
0.4567 37500 0.6834 -
0.4628 38000 0.6748 0.7434
0.4689 38500 0.6711 -
0.4750 39000 0.6748 0.7425
0.4811 39500 0.6813 -
0.4872 40000 0.6721 0.7470
0.4933 40500 0.6537 -
0.4994 41000 0.6783 0.7540
0.5055 41500 0.6691 -
0.5116 42000 0.6426 0.7547
0.5176 42500 0.6608 -
0.5237 43000 0.6612 0.7517
0.5298 43500 0.6551 -
0.5359 44000 0.6578 0.7391
0.5420 44500 0.6557 -
0.5481 45000 0.6421 0.7398
0.5542 45500 0.6672 -
0.5603 46000 0.6511 0.7325
0.5664 46500 0.6568 -
0.5725 47000 0.673 0.7238
0.5785 47500 0.6648 -
0.5846 48000 0.6465 0.7280
0.5907 48500 0.6683 -
0.5968 49000 0.6533 0.7261
0.6029 49500 0.661 -
0.6090 50000 0.647 0.7210
0.6151 50500 0.6554 -
0.6212 51000 0.6426 0.7165
0.6273 51500 0.6527 -
0.6334 52000 0.6427 0.7204
0.6394 52500 0.643 -
0.6455 53000 0.6528 0.7115
0.6516 53500 0.6266 -
0.6577 54000 0.6498 0.7143
0.6638 54500 0.6542 -
0.6699 55000 0.631 0.7141
0.6760 55500 0.6421 -
0.6821 56000 0.6457 0.7107
0.6882 56500 0.646 -
0.6943 57000 0.6483 0.7102
0.7003 57500 0.6531 -
0.7064 58000 0.6436 0.7127
0.7125 58500 0.6177 -
0.7186 59000 0.635 0.7073
0.7247 59500 0.6388 -
0.7308 60000 0.6205 0.7067
0.7369 60500 0.6121 -
0.7430 61000 0.6337 0.7020
0.7491 61500 0.6239 -
0.7552 62000 0.6306 0.7058
0.7612 62500 0.6188 -
0.7673 63000 0.6152 0.7022
0.7734 63500 0.6255 -
0.7795 64000 0.6115 0.7012
0.7856 64500 0.6536 -
0.7917 65000 0.6188 0.6899
0.7978 65500 0.6255 -
0.8039 66000 0.6182 0.6920
0.8100 66500 0.6278 -
0.8161 67000 0.6204 0.6921
0.8221 67500 0.6281 -
0.8282 68000 0.6265 0.6890
0.8343 68500 0.624 -
0.8404 69000 0.6067 0.6973
0.8465 69500 0.6199 -
0.8526 70000 0.6195 0.6841
0.8587 70500 0.6272 -
0.8648 71000 0.6224 0.6851
0.8709 71500 0.6326 -
0.8770 72000 0.607 0.6747
0.8830 72500 0.612 -
0.8891 73000 0.6187 0.6717
0.8952 73500 0.6094 -
0.9013 74000 0.6112 0.6811
0.9074 74500 0.6212 -
0.9135 75000 0.5992 0.6767
0.9196 75500 0.6206 -
0.9257 76000 0.6099 0.6853
0.9318 76500 0.6108 -
0.9379 77000 0.6037 0.6767
0.9439 77500 0.6055 -
0.9500 78000 0.5952 0.6811
0.9561 78500 0.5947 -
0.9622 79000 0.6082 0.6704
0.9683 79500 0.6037 -
0.9744 80000 0.604 0.6717
0.9805 80500 0.6034 -
0.9866 81000 0.6034 0.6776
0.9927 81500 0.5965 -
0.9988 82000 0.6094 0.6748
1.0048 82500 0.5564 -
1.0109 83000 0.5471 0.6782
1.0170 83500 0.5518 -
1.0231 84000 0.5467 0.6738
1.0292 84500 0.5582 -
1.0353 85000 0.5394 0.6714
1.0414 85500 0.5395 -
1.0475 86000 0.5561 0.6668
1.0536 86500 0.5438 -
1.0597 87000 0.5488 0.6615
1.0657 87500 0.5347 -
1.0718 88000 0.5331 0.6616
1.0779 88500 0.5454 -
1.0840 89000 0.5442 0.6622
1.0901 89500 0.5535 -
1.0962 90000 0.5321 0.6612
1.1023 90500 0.5432 -
1.1084 91000 0.5418 0.6635
1.1145 91500 0.5308 -
1.1206 92000 0.5555 0.6514
1.1266 92500 0.5342 -
1.1327 93000 0.5321 0.6592
1.1388 93500 0.5482 -
1.1449 94000 0.5275 0.6525
1.1510 94500 0.5478 -
1.1571 95000 0.5343 0.6516
1.1632 95500 0.5391 -
1.1693 96000 0.5403 0.6463
1.1754 96500 0.5293 -
1.1815 97000 0.5375 0.6542
1.1875 97500 0.5463 -
1.1936 98000 0.529 0.6528
1.1997 98500 0.5377 -
1.2058 99000 0.5329 0.6534
1.2119 99500 0.5572 -
1.2180 100000 0.5323 0.6532
1.2241 100500 0.5323 -
1.2302 101000 0.5412 0.6651
1.2363 101500 0.546 -
1.2424 102000 0.5367 0.6606
1.2484 102500 0.5371 -
1.2545 103000 0.5369 0.6571
1.2606 103500 0.5331 -
1.2667 104000 0.5362 0.6483
1.2728 104500 0.532 -
1.2789 105000 0.5405 0.6535
1.2850 105500 0.5205 -
1.2911 106000 0.5378 0.6550
1.2972 106500 0.5392 -
1.3033 107000 0.5261 0.6504
1.3093 107500 0.533 -
1.3154 108000 0.5384 0.6575
1.3215 108500 0.5239 -
1.3276 109000 0.5311 0.6509
1.3337 109500 0.5288 -
1.3398 110000 0.5253 0.6550
1.3459 110500 0.5305 -
1.3520 111000 0.507 0.6527
1.3581 111500 0.5217 -
1.3642 112000 0.541 0.6499
1.3702 112500 0.5226 -
1.3763 113000 0.5337 0.6497
1.3824 113500 0.5275 -
1.3885 114000 0.538 0.6495
1.3946 114500 0.5209 -
1.4007 115000 0.5345 0.6466
1.4068 115500 0.5355 -
1.4129 116000 0.5451 0.6465
1.4190 116500 0.5125 -
1.4251 117000 0.5345 0.6463
1.4311 117500 0.5119 -
1.4372 118000 0.5165 0.6444
1.4433 118500 0.5189 -
1.4494 119000 0.537 0.6451
1.4555 119500 0.5273 -
1.4616 120000 0.5187 0.6447
1.4677 120500 0.536 -
1.4738 121000 0.5301 0.6406
1.4799 121500 0.5291 -
1.4860 122000 0.5211 0.6359
1.4920 122500 0.5175 -
1.4981 123000 0.5341 0.6300
1.5042 123500 0.5227 -
1.5103 124000 0.517 0.6311
1.5164 124500 0.5062 -
1.5225 125000 0.5127 0.6346
1.5286 125500 0.535 -
1.5347 126000 0.5159 0.6302
1.5408 126500 0.5301 -
1.5469 127000 0.5197 0.6301
1.5529 127500 0.5195 -
1.5590 128000 0.5197 0.6274
1.5651 128500 0.5205 -
1.5712 129000 0.5141 0.6268
1.5773 129500 0.5255 -
1.5834 130000 0.517 0.6226
1.5895 130500 0.5204 -
1.5956 131000 0.527 0.6200
1.6017 131500 0.5233 -
1.6078 132000 0.5211 0.6229
1.6138 132500 0.5083 -
1.6199 133000 0.517 0.6215
1.6260 133500 0.5192 -
1.6321 134000 0.5114 0.6244
1.6382 134500 0.5147 -
1.6443 135000 0.5197 0.6247
1.6504 135500 0.5212 -
1.6565 136000 0.5234 0.6252
1.6626 136500 0.5269 -
1.6687 137000 0.5144 0.6223
1.6747 137500 0.509 -
1.6808 138000 0.5164 0.6194
1.6869 138500 0.5196 -
1.6930 139000 0.5101 0.6202
1.6991 139500 0.5192 -
1.7052 140000 0.5083 0.6195
1.7113 140500 0.512 -
1.7174 141000 0.504 0.6232
1.7235 141500 0.5175 -
1.7296 142000 0.5149 0.6221
1.7356 142500 0.5167 -
1.7417 143000 0.5168 0.6197
1.7478 143500 0.51 -
1.7539 144000 0.5107 0.6176
1.7600 144500 0.5005 -
1.7661 145000 0.5058 0.6195
1.7722 145500 0.5062 -
1.7783 146000 0.5032 0.6168
1.7844 146500 0.5311 -
1.7905 147000 0.5016 0.6173
1.7965 147500 0.5205 -
1.8026 148000 0.4971 0.6163
1.8087 148500 0.5121 -
1.8148 149000 0.5188 0.6145
1.8209 149500 0.5077 -
1.8270 150000 0.5213 0.6146
1.8331 150500 0.5133 -
1.8392 151000 0.5071 0.6118
1.8453 151500 0.5097 -
1.8514 152000 0.5151 0.6123
1.8574 152500 0.5158 -
1.8635 153000 0.5124 0.6130
1.8696 153500 0.5042 -
1.8757 154000 0.498 0.6138
1.8818 154500 0.5159 -
1.8879 155000 0.5023 0.6127
1.8940 155500 0.5031 -
1.9001 156000 0.4981 0.6140
1.9062 156500 0.5078 -
1.9123 157000 0.507 0.6144
1.9183 157500 0.4967 -
1.9244 158000 0.5215 0.6127
1.9305 158500 0.5104 -
1.9366 159000 0.5171 0.6134
1.9427 159500 0.512 -
1.9488 160000 0.5088 0.6122
1.9549 160500 0.4961 -
1.9610 161000 0.5056 0.6119
1.9671 161500 0.508 -
1.9732 162000 0.5119 0.6121
1.9792 162500 0.5002 -
1.9853 163000 0.51 0.6119
1.9914 163500 0.4835 -
1.9975 164000 0.5014 0.6118

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.0.0
  • Transformers: 4.53.0
  • PyTorch: 2.7.1+cu126
  • Accelerate: 1.8.1
  • Datasets: 3.6.0
  • Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
5
Safetensors
Model size
568M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KatjaK/gnd_retriever_full

Base model

BAAI/bge-m3
Finetuned
(303)
this model