SentenceTransformer based on BAAI/bge-m3

This is a sentence-transformers model finetuned from BAAI/bge-m3. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Sentence Transformer
Base model: BAAI/bge-m3
Maximum Sequence Length: 8192 tokens
Output Dimensionality: 1024 dimensions
Similarity Function: Cosine Similarity

Model Sources

Documentation: Sentence Transformers Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Sentence Transformers on Hugging Face

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False, 'architecture': 'XLMRobertaModel'})
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("KatjaK/gnd_retriever_full")
# Run inference
sentences = [
    'Das Silberkomplott',
    'Manipulation',
    'Vergangenheitsbewältigung',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities)
# tensor([[1.0000, 0.2744, 0.1445],
#         [0.2744, 1.0000, 0.0990],
#         [0.1445, 0.0990, 1.0000]])

Training Details

Training Dataset

Unnamed Dataset

Size: 2,627,253 training samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 20.23 tokens
max: 74 tokens

min: 3 tokens
mean: 5.24 tokens
max: 20 tokens

	anchor	positive
type	string	string
details	min: 3 tokens mean: 20.23 tokens max: 74 tokens	min: 3 tokens mean: 5.24 tokens max: 20 tokens

Samples:

anchor	positive
`Technikphilosophie zur Einführung`	`Technikphilosophie`
`Anreizsysteme zur Steuerung der Hersteller-Händler-Beziehung in der Automobilindustrie`	`Kraftfahrzeugindustrie`
`Anreizsysteme zur Steuerung der Hersteller-Händler-Beziehung in der Automobilindustrie`	`Beziehungsmanagement`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Evaluation Dataset

Unnamed Dataset

Size: 3,203 evaluation samples
Columns: anchor and positive
Approximate statistics based on the first 1000 samples:
anchor positive
type string string
details
min: 3 tokens
mean: 22.29 tokens
max: 81 tokens

min: 3 tokens
mean: 6.16 tokens
max: 26 tokens

	anchor	positive
type	string	string
details	min: 3 tokens mean: 22.29 tokens max: 81 tokens	min: 3 tokens mean: 6.16 tokens max: 26 tokens

Samples:

anchor	positive
`Synökologische Studien zum simultanen Befall von Winterweizen (Triticum aestivum L.) mit Aphiden und getreidepathogenen Pilzen`	`Ernteertrag`
`Synökologische Studien zum simultanen Befall von Winterweizen (Triticum aestivum L.) mit Aphiden und getreidepathogenen Pilzen`	`Phytopathogene Pilze`
`Synökologische Studien zum simultanen Befall von Winterweizen (Triticum aestivum L.) mit Aphiden und getreidepathogenen Pilzen`	`Winterweizen`

Loss: MultipleNegativesRankingLoss with these parameters:

{
    "scale": 20.0,
    "similarity_fct": "cos_sim"
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
learning_rate: 1e-05
num_train_epochs: 2

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 32
per_device_eval_batch_size: 32
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 1e-05
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 2
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.0
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: False
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss
0.0061	500	1.1036	-
0.0122	1000	1.0041	1.0189
0.0183	1500	0.945	-
0.0244	2000	0.9385	0.9852
0.0304	2500	0.9184	-
0.0365	3000	0.8971	0.9426
0.0426	3500	0.8749	-
0.0487	4000	0.8655	0.9245
0.0548	4500	0.8616	-
0.0609	5000	0.8459	0.9042
0.0670	5500	0.8372	-
0.0731	6000	0.8311	0.9032
0.0792	6500	0.8385	-
0.0853	7000	0.8295	0.8817
0.0913	7500	0.824	-
0.0974	8000	0.8309	0.8769
0.1035	8500	0.8093	-
0.1096	9000	0.8038	0.8593
0.1157	9500	0.7933	-
0.1218	10000	0.7978	0.8567
0.1279	10500	0.7832	-
0.1340	11000	0.7789	0.8536
0.1401	11500	0.784	-
0.1462	12000	0.783	0.8428
0.1522	12500	0.7695	-
0.1583	13000	0.7805	0.8412
0.1644	13500	0.7727	-
0.1705	14000	0.7642	0.8276
0.1766	14500	0.7578	-
0.1827	15000	0.7555	0.8285
0.1888	15500	0.759	-
0.1949	16000	0.7464	0.8125
0.2010	16500	0.7317	-
0.2071	17000	0.7341	0.8087
0.2131	17500	0.7564	-
0.2192	18000	0.7329	0.8105
0.2253	18500	0.7266	-
0.2314	19000	0.7404	0.8094
0.2375	19500	0.7334	-
0.2436	20000	0.7436	0.8065
0.2497	20500	0.7453	-
0.2558	21000	0.7201	0.7896
0.2619	21500	0.7223	-
0.2680	22000	0.7183	0.7864
0.2740	22500	0.7097	-
0.2801	23000	0.7132	0.7980
0.2862	23500	0.7107	-
0.2923	24000	0.7217	0.7940
0.2984	24500	0.7019	-
0.3045	25000	0.7183	0.7903
0.3106	25500	0.6922	-
0.3167	26000	0.7096	0.7818
0.3228	26500	0.7062	-
0.3289	27000	0.7184	0.7869
0.3349	27500	0.7002	-
0.3410	28000	0.708	0.7813
0.3471	28500	0.7117	-
0.3532	29000	0.7128	0.7715
0.3593	29500	0.7046	-
0.3654	30000	0.6814	0.7755
0.3715	30500	0.6898	-
0.3776	31000	0.6773	0.7884
0.3837	31500	0.6991	-
0.3898	32000	0.703	0.7697
0.3958	32500	0.688	-
0.4019	33000	0.7101	0.7813
0.4080	33500	0.6873	-
0.4141	34000	0.6866	0.7658
0.4202	34500	0.6803	-
0.4263	35000	0.6748	0.7574
0.4324	35500	0.6844	-
0.4385	36000	0.6719	0.7483
0.4446	36500	0.6738	-
0.4507	37000	0.6798	0.7524
0.4567	37500	0.6834	-
0.4628	38000	0.6748	0.7434
0.4689	38500	0.6711	-
0.4750	39000	0.6748	0.7425
0.4811	39500	0.6813	-
0.4872	40000	0.6721	0.7470
0.4933	40500	0.6537	-
0.4994	41000	0.6783	0.7540
0.5055	41500	0.6691	-
0.5116	42000	0.6426	0.7547
0.5176	42500	0.6608	-
0.5237	43000	0.6612	0.7517
0.5298	43500	0.6551	-
0.5359	44000	0.6578	0.7391
0.5420	44500	0.6557	-
0.5481	45000	0.6421	0.7398
0.5542	45500	0.6672	-
0.5603	46000	0.6511	0.7325
0.5664	46500	0.6568	-
0.5725	47000	0.673	0.7238
0.5785	47500	0.6648	-
0.5846	48000	0.6465	0.7280
0.5907	48500	0.6683	-
0.5968	49000	0.6533	0.7261
0.6029	49500	0.661	-
0.6090	50000	0.647	0.7210
0.6151	50500	0.6554	-
0.6212	51000	0.6426	0.7165
0.6273	51500	0.6527	-
0.6334	52000	0.6427	0.7204
0.6394	52500	0.643	-
0.6455	53000	0.6528	0.7115
0.6516	53500	0.6266	-
0.6577	54000	0.6498	0.7143
0.6638	54500	0.6542	-
0.6699	55000	0.631	0.7141
0.6760	55500	0.6421	-
0.6821	56000	0.6457	0.7107
0.6882	56500	0.646	-
0.6943	57000	0.6483	0.7102
0.7003	57500	0.6531	-
0.7064	58000	0.6436	0.7127
0.7125	58500	0.6177	-
0.7186	59000	0.635	0.7073
0.7247	59500	0.6388	-
0.7308	60000	0.6205	0.7067
0.7369	60500	0.6121	-
0.7430	61000	0.6337	0.7020
0.7491	61500	0.6239	-
0.7552	62000	0.6306	0.7058
0.7612	62500	0.6188	-
0.7673	63000	0.6152	0.7022
0.7734	63500	0.6255	-
0.7795	64000	0.6115	0.7012
0.7856	64500	0.6536	-
0.7917	65000	0.6188	0.6899
0.7978	65500	0.6255	-
0.8039	66000	0.6182	0.6920
0.8100	66500	0.6278	-
0.8161	67000	0.6204	0.6921
0.8221	67500	0.6281	-
0.8282	68000	0.6265	0.6890
0.8343	68500	0.624	-
0.8404	69000	0.6067	0.6973
0.8465	69500	0.6199	-
0.8526	70000	0.6195	0.6841
0.8587	70500	0.6272	-
0.8648	71000	0.6224	0.6851
0.8709	71500	0.6326	-
0.8770	72000	0.607	0.6747
0.8830	72500	0.612	-
0.8891	73000	0.6187	0.6717
0.8952	73500	0.6094	-
0.9013	74000	0.6112	0.6811
0.9074	74500	0.6212	-
0.9135	75000	0.5992	0.6767
0.9196	75500	0.6206	-
0.9257	76000	0.6099	0.6853
0.9318	76500	0.6108	-
0.9379	77000	0.6037	0.6767
0.9439	77500	0.6055	-
0.9500	78000	0.5952	0.6811
0.9561	78500	0.5947	-
0.9622	79000	0.6082	0.6704
0.9683	79500	0.6037	-
0.9744	80000	0.604	0.6717
0.9805	80500	0.6034	-
0.9866	81000	0.6034	0.6776
0.9927	81500	0.5965	-
0.9988	82000	0.6094	0.6748
1.0048	82500	0.5564	-
1.0109	83000	0.5471	0.6782
1.0170	83500	0.5518	-
1.0231	84000	0.5467	0.6738
1.0292	84500	0.5582	-
1.0353	85000	0.5394	0.6714
1.0414	85500	0.5395	-
1.0475	86000	0.5561	0.6668
1.0536	86500	0.5438	-
1.0597	87000	0.5488	0.6615
1.0657	87500	0.5347	-
1.0718	88000	0.5331	0.6616
1.0779	88500	0.5454	-
1.0840	89000	0.5442	0.6622
1.0901	89500	0.5535	-
1.0962	90000	0.5321	0.6612
1.1023	90500	0.5432	-
1.1084	91000	0.5418	0.6635
1.1145	91500	0.5308	-
1.1206	92000	0.5555	0.6514
1.1266	92500	0.5342	-
1.1327	93000	0.5321	0.6592
1.1388	93500	0.5482	-
1.1449	94000	0.5275	0.6525
1.1510	94500	0.5478	-
1.1571	95000	0.5343	0.6516
1.1632	95500	0.5391	-
1.1693	96000	0.5403	0.6463
1.1754	96500	0.5293	-
1.1815	97000	0.5375	0.6542
1.1875	97500	0.5463	-
1.1936	98000	0.529	0.6528
1.1997	98500	0.5377	-
1.2058	99000	0.5329	0.6534
1.2119	99500	0.5572	-
1.2180	100000	0.5323	0.6532
1.2241	100500	0.5323	-
1.2302	101000	0.5412	0.6651
1.2363	101500	0.546	-
1.2424	102000	0.5367	0.6606
1.2484	102500	0.5371	-
1.2545	103000	0.5369	0.6571
1.2606	103500	0.5331	-
1.2667	104000	0.5362	0.6483
1.2728	104500	0.532	-
1.2789	105000	0.5405	0.6535
1.2850	105500	0.5205	-
1.2911	106000	0.5378	0.6550
1.2972	106500	0.5392	-
1.3033	107000	0.5261	0.6504
1.3093	107500	0.533	-
1.3154	108000	0.5384	0.6575
1.3215	108500	0.5239	-
1.3276	109000	0.5311	0.6509
1.3337	109500	0.5288	-
1.3398	110000	0.5253	0.6550
1.3459	110500	0.5305	-
1.3520	111000	0.507	0.6527
1.3581	111500	0.5217	-
1.3642	112000	0.541	0.6499
1.3702	112500	0.5226	-
1.3763	113000	0.5337	0.6497
1.3824	113500	0.5275	-
1.3885	114000	0.538	0.6495
1.3946	114500	0.5209	-
1.4007	115000	0.5345	0.6466
1.4068	115500	0.5355	-
1.4129	116000	0.5451	0.6465
1.4190	116500	0.5125	-
1.4251	117000	0.5345	0.6463
1.4311	117500	0.5119	-
1.4372	118000	0.5165	0.6444
1.4433	118500	0.5189	-
1.4494	119000	0.537	0.6451
1.4555	119500	0.5273	-
1.4616	120000	0.5187	0.6447
1.4677	120500	0.536	-
1.4738	121000	0.5301	0.6406
1.4799	121500	0.5291	-
1.4860	122000	0.5211	0.6359
1.4920	122500	0.5175	-
1.4981	123000	0.5341	0.6300
1.5042	123500	0.5227	-
1.5103	124000	0.517	0.6311
1.5164	124500	0.5062	-
1.5225	125000	0.5127	0.6346
1.5286	125500	0.535	-
1.5347	126000	0.5159	0.6302
1.5408	126500	0.5301	-
1.5469	127000	0.5197	0.6301
1.5529	127500	0.5195	-
1.5590	128000	0.5197	0.6274
1.5651	128500	0.5205	-
1.5712	129000	0.5141	0.6268
1.5773	129500	0.5255	-
1.5834	130000	0.517	0.6226
1.5895	130500	0.5204	-
1.5956	131000	0.527	0.6200
1.6017	131500	0.5233	-
1.6078	132000	0.5211	0.6229
1.6138	132500	0.5083	-
1.6199	133000	0.517	0.6215
1.6260	133500	0.5192	-
1.6321	134000	0.5114	0.6244
1.6382	134500	0.5147	-
1.6443	135000	0.5197	0.6247
1.6504	135500	0.5212	-
1.6565	136000	0.5234	0.6252
1.6626	136500	0.5269	-
1.6687	137000	0.5144	0.6223
1.6747	137500	0.509	-
1.6808	138000	0.5164	0.6194
1.6869	138500	0.5196	-
1.6930	139000	0.5101	0.6202
1.6991	139500	0.5192	-
1.7052	140000	0.5083	0.6195
1.7113	140500	0.512	-
1.7174	141000	0.504	0.6232
1.7235	141500	0.5175	-
1.7296	142000	0.5149	0.6221
1.7356	142500	0.5167	-
1.7417	143000	0.5168	0.6197
1.7478	143500	0.51	-
1.7539	144000	0.5107	0.6176
1.7600	144500	0.5005	-
1.7661	145000	0.5058	0.6195
1.7722	145500	0.5062	-
1.7783	146000	0.5032	0.6168
1.7844	146500	0.5311	-
1.7905	147000	0.5016	0.6173
1.7965	147500	0.5205	-
1.8026	148000	0.4971	0.6163
1.8087	148500	0.5121	-
1.8148	149000	0.5188	0.6145
1.8209	149500	0.5077	-
1.8270	150000	0.5213	0.6146
1.8331	150500	0.5133	-
1.8392	151000	0.5071	0.6118
1.8453	151500	0.5097	-
1.8514	152000	0.5151	0.6123
1.8574	152500	0.5158	-
1.8635	153000	0.5124	0.6130
1.8696	153500	0.5042	-
1.8757	154000	0.498	0.6138
1.8818	154500	0.5159	-
1.8879	155000	0.5023	0.6127
1.8940	155500	0.5031	-
1.9001	156000	0.4981	0.6140
1.9062	156500	0.5078	-
1.9123	157000	0.507	0.6144
1.9183	157500	0.4967	-
1.9244	158000	0.5215	0.6127
1.9305	158500	0.5104	-
1.9366	159000	0.5171	0.6134
1.9427	159500	0.512	-
1.9488	160000	0.5088	0.6122
1.9549	160500	0.4961	-
1.9610	161000	0.5056	0.6119
1.9671	161500	0.508	-
1.9732	162000	0.5119	0.6121
1.9792	162500	0.5002	-
1.9853	163000	0.51	0.6119
1.9914	163500	0.4835	-
1.9975	164000	0.5014	0.6118

Framework Versions

Python: 3.12.11
Sentence Transformers: 5.0.0
Transformers: 4.53.0
PyTorch: 2.7.1+cu126
Accelerate: 1.8.1
Datasets: 3.6.0
Tokenizers: 0.21.2

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

KatjaK
/

gnd_retriever_full

SentenceTransformer based on BAAI/bge-m3

Model Details

Model Description

Model Sources

Full Model Architecture

Usage

Direct Usage (Sentence Transformers)

Training Details

Training Dataset

Unnamed Dataset

Evaluation Dataset

Unnamed Dataset

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

MultipleNegativesRankingLoss

Model tree for KatjaK/gnd_retriever_full