BGE base banking-domain

This is a sentence-transformers model finetuned from BAAI/bge-m3 on the json dataset. It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-m3
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 1024 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: vi
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: XLMRobertaModel 
  (1): Pooling({'word_embedding_dimension': 1024, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the ๐Ÿค— Hub
model = SentenceTransformer("splendor1811/BGE-base-banking-ONE")
# Run inference
sentences = [
    'Cรกc giao dแป‹ch sแปญ dแปฅng thแบป tรญn dแปฅng cแปงa tรดi ',
    'Hฦฐแป›ng dแบซn xem lแป‹ch sแปญ ',
    'Thiแบฟt bแป‹ loa sแบฝ ฤ‘ฦฐแปฃc bแบฃo hร nh trong 12 thรกng. Nแบฟu cรณ vแบฅn ฤ‘แป vแป sแบฃn phแบฉm trong quรก trรฌnh sแปญ dแปฅng, Bแบกn vui lรฒng ฤ‘แบฟn Trung Tรขm Bแบฃo Hร nh Phong Vลฉ gแบงn nhแบฅt hoแบทc liรชn hแป‡ hotline: 1800 6865 ฤ‘แปƒ ฤ‘ฦฐแปฃc hแป— trแปฃ bแบฃo hร nh.\nThรดng tin vแป cแปญa hร ng bแบฃo hร nh Phong Vลฉ nhฦฐ sau:\n+ Miแปn Bแบฏc: Tแบงng 3, sแป‘ 62 Trแบงn ฤแบกi Nghฤฉa, Phฦฐแปng ฤแป“ng Tรขm, Quแบญn Hai Bร  Trฦฐng, TP. Hร  Nแป™i.\n+ Miแปn Nam: 132E Cรกch Mแบกng Thรกng 8, Phฦฐแปng 9, Quแบญn 3, TP. Hแป“ Chรญ Minh.\n+ Miแปn Trung: Tแบงng 2, 14-16-18 Nguyแป…n Vฤƒn Linh, Phฦฐแปng Nam Dฦฐฦกng, Quแบญn Hแบฃi Chรขu, TP. ฤร  Nแบตng.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 1024]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6794
cosine_accuracy@3 0.6794
cosine_accuracy@5 0.6794
cosine_accuracy@10 0.7495
cosine_precision@1 0.6794
cosine_precision@3 0.6794
cosine_precision@5 0.6794
cosine_precision@10 0.6476
cosine_recall@1 0.0645
cosine_recall@3 0.1935
cosine_recall@5 0.3226
cosine_recall@10 0.6094
cosine_ndcg@10 0.6841
cosine_mrr@10 0.6864
cosine_map@100 0.7415

Information Retrieval

Metric Value
cosine_accuracy@1 0.6833
cosine_accuracy@3 0.6833
cosine_accuracy@5 0.6833
cosine_accuracy@10 0.7502
cosine_precision@1 0.6833
cosine_precision@3 0.6833
cosine_precision@5 0.6833
cosine_precision@10 0.6506
cosine_recall@1 0.065
cosine_recall@3 0.1951
cosine_recall@5 0.3252
cosine_recall@10 0.6137
cosine_ndcg@10 0.6878
cosine_mrr@10 0.69
cosine_map@100 0.7446

Information Retrieval

Metric Value
cosine_accuracy@1 0.6742
cosine_accuracy@3 0.6742
cosine_accuracy@5 0.6742
cosine_accuracy@10 0.745
cosine_precision@1 0.6742
cosine_precision@3 0.6742
cosine_precision@5 0.6742
cosine_precision@10 0.6426
cosine_recall@1 0.064
cosine_recall@3 0.1919
cosine_recall@5 0.3198
cosine_recall@10 0.6041
cosine_ndcg@10 0.679
cosine_mrr@10 0.6813
cosine_map@100 0.7379

Information Retrieval

Metric Value
cosine_accuracy@1 0.6684
cosine_accuracy@3 0.6684
cosine_accuracy@5 0.6684
cosine_accuracy@10 0.7346
cosine_precision@1 0.6684
cosine_precision@3 0.6684
cosine_precision@5 0.6684
cosine_precision@10 0.6369
cosine_recall@1 0.0631
cosine_recall@3 0.1892
cosine_recall@5 0.3153
cosine_recall@10 0.5953
cosine_ndcg@10 0.6728
cosine_mrr@10 0.675
cosine_map@100 0.7314

Information Retrieval

Metric Value
cosine_accuracy@1 0.636
cosine_accuracy@3 0.636
cosine_accuracy@5 0.636
cosine_accuracy@10 0.7099
cosine_precision@1 0.636
cosine_precision@3 0.636
cosine_precision@5 0.636
cosine_precision@10 0.6071
cosine_recall@1 0.0601
cosine_recall@3 0.1803
cosine_recall@5 0.3005
cosine_recall@10 0.5685
cosine_ndcg@10 0.6409
cosine_mrr@10 0.6433
cosine_map@100 0.7035

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 13,863 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 3 tokens
    • mean: 22.35 tokens
    • max: 61 tokens
    • min: 3 tokens
    • mean: 225.69 tokens
    • max: 419 tokens
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            1024,
            768,
            512,
            256,
            128
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 6
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 6
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_1024_cosine_ndcg@10 dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10
0.3687 10 48.8875 - - - - -
0.7373 20 23.8518 - - - - -
1.0 28 - 0.6421 0.6376 0.6334 0.6215 0.5950
1.0737 30 16.242 - - - - -
1.4424 40 13.0298 - - - - -
1.8111 50 12.8472 - - - - -
2.0 56 - 0.6764 0.6663 0.6589 0.6487 0.6127
2.1475 60 9.3195 - - - - -
2.5161 70 9.0553 - - - - -
2.8848 80 9.8082 - - - - -
3.0 84 - 0.6801 0.6792 0.6749 0.6679 0.6279
3.2212 90 7.864 - - - - -
3.5899 100 7.6955 - - - - -
3.9585 110 8.0813 - - - - -
4.0 112 - 0.6879 0.6888 0.6779 0.6645 0.6361
4.2949 120 6.899 - - - - -
4.6636 130 7.1247 - - - - -
5.0 140 6.2173 0.6841 0.6859 0.6770 0.6702 0.6410
5.3687 150 6.741 - - - - -
5.7373 160 6.5777 - - - - -
6.0 168 - 0.6841 0.6878 0.6790 0.6728 0.6409
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.11.12
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.6.0+cu124
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
19
Safetensors
Model size
568M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for splendor1811/BGE-base-banking-ONE

Base model

BAAI/bge-m3
Finetuned
(295)
this model

Evaluation results