SentenceTransformer based on Snowflake/snowflake-arctic-embed-m

This is a sentence-transformers model finetuned from Snowflake/snowflake-arctic-embed-m. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: Snowflake/snowflake-arctic-embed-m
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("vijayarulmuthu/finetuned_arctic_ft-a85433c9-6284-4afb-8e87-e110823d565c")
# Run inference
sentences = [
    'Whom did David smite and subdue, taking Gath and her towns from their control?',
    'Now after this it came to pass, that David smote the Philistines, and subdued them, and took Gath and her towns out of the hand of the Philistines. And he smote Moab; and the Moabites became David’s servants, [and] brought gifts. And David smote Hadarezer king of Zobah unto Hamath, as he went to stablish his dominion by the river Euphrates. And David took from him a thousand chariots, and seven thousand horsemen, and twenty thousand footmen: David also houghed all the chariot [horses], but reserved of them an hundred chariots. And when the Syrians of Damascus came to help Hadarezer king of Zobah, David slew of the Syrians two and twenty thousand men. Then David put [garrisons] in Syriadamascus; and the Syrians became David’s servants, [and] brought gifts. Thus the LORD preserved David whithersoever he went. And David took the shields of gold that were on the servants of Hadarezer, and brought them to Jerusalem. Likewise from Tibhath, and from Chun, cities of Hadarezer, brought David very much brass, wherewith Solomon made the brasen sea, and the pillars, and the vessels of brass.',
    'So Shishak king of Egypt came up against Jerusalem, and took away the treasures of the house of the LORD, and the treasures of the king’s house; he took all: he carried away also the shields of gold which Solomon had made. Instead of which king Rehoboam made shields of brass, and committed [them] to the hands of the chief of the guard, that kept the entrance of the king’s house. And when the king entered into the house of the LORD, the guard came and fetched them, and brought them again into the guard chamber. And when he humbled himself, the wrath of the LORD turned from him, that he would not destroy [him] altogether: and also in Judah things went well. So king Rehoboam strengthened himself in Jerusalem, and reigned: for Rehoboam [was] one and forty years old when he began to reign, and he reigned seventeen years in Jerusalem, the city which the LORD had chosen out of all the tribes of Israel, to put his name there. And his mother’s name [was] Naamah an Ammonitess. And he did evil, because he prepared not his heart to seek the LORD. Now the acts of Rehoboam, first and last, [are] they not written in the book of Shemaiah the prophet, and of Iddo the seer concerning genealogies? And [there were] wars between Rehoboam and Jeroboam continually. And Rehoboam slept with his fathers, and was buried in the city of David: and Abijah his son reigned in his stead.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.6479
cosine_accuracy@3 0.833
cosine_accuracy@5 0.873
cosine_accuracy@10 0.9238
cosine_precision@1 0.6479
cosine_precision@3 0.2777
cosine_precision@5 0.1746
cosine_precision@10 0.0924
cosine_recall@1 0.018
cosine_recall@3 0.0231
cosine_recall@5 0.0242
cosine_recall@10 0.0257
cosine_ndcg@10 0.1739
cosine_mrr@10 0.7469
cosine_map@100 0.0208

Training Details

Training Dataset

Unnamed Dataset

  • Size: 6,612 training samples
  • Columns: sentence_0 and sentence_1
  • Approximate statistics based on the first 1000 samples:
    sentence_0 sentence_1
    type string string
    details
    • min: 7 tokens
    • mean: 17.56 tokens
    • max: 42 tokens
    • min: 13 tokens
    • mean: 250.95 tokens
    • max: 504 tokens
  • Samples:
    sentence_0 sentence_1
    What was the reason given by Elijah the prophet for the LORD's punishment on Jehoram? Then Jehoram went forth with his princes, and all his chariots with him: and he rose up by night, and smote the Edomites which compassed him in, and the captains of the chariots. So the Edomites revolted from under the hand of Judah unto this day. The same time [also] did Libnah revolt from under his hand; because he had forsaken the LORD God of his fathers. Moreover he made high places in the mountains of Judah, and caused the inhabitants of Jerusalem to commit fornication, and compelled Judah [thereto]. And there came a writing to him from Elijah the prophet, saying, Thus saith the LORD God of David thy father, Because thou hast not walked in the ways of Jehoshaphat thy father, nor in the ways of Asa king of Judah, But hast walked in the way of the kings of Israel, and hast made Judah and the inhabitants of Jerusalem to go a whoring, like to the whoredoms of the house of Ahab, and also hast slain thy brethren of thy father’s house, [which were] better than thyself: Behold, with a gre...
    What happened at the sixth hour until the ninth hour according to the passage? And we indeed justly; for we receive the due reward of our deeds: but this man hath done nothing amiss. And he said unto Jesus, Lord, remember me when thou comest into thy kingdom. And Jesus said unto him, Verily I say unto thee, To day shalt thou be with me in paradise. And it was about the sixth hour, and there was a darkness over all the earth until the ninth hour. And the sun was darkened, and the veil of the temple was rent in the midst. And when Jesus had cried with a loud voice, he said, Father, into thy hands I commend my spirit: and having said thus, he gave up the ghost. Now when the centurion saw what was done, he glorified God, saying, Certainly this was a righteous man. And all the people that came together to that sight, beholding the things which were done, smote their breasts, and returned.
    Who is commanded by the Lord to set a watchman and declare what he sees? The burden of the desert of the sea. As whirlwinds in the south pass through; [so] it cometh from the desert, from a terrible land. A grievous vision is declared unto me; the treacherous dealer dealeth treacherously, and the spoiler spoileth. Go up, O Elam: besiege, O Media; all the sighing thereof have I made to cease. Therefore are my loins filled with pain: pangs have taken hold upon me, as the pangs of a woman that travaileth: I was bowed down at the hearing [of it]; I was dismayed at the seeing [of it]. My heart panted, fearfulness affrighted me: the night of my pleasure hath he turned into fear unto me. Prepare the table, watch in the watchtower, eat, drink: arise, ye princes, [and] anoint the shield. For thus hath the Lord said unto me, Go, set a watchman, let him declare what he seeth. And he saw a chariot [with] a couple of horsemen, a chariot of asses, [and] a chariot of camels; and he hearkened diligently with much heed: And he cried, A lion: My lord, I stand continually upo...
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • num_train_epochs: 10
  • multi_dataset_batch_sampler: round_robin

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 10
  • per_device_eval_batch_size: 10
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1
  • num_train_epochs: 10
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: round_robin

Training Logs

Click to expand
Epoch Step Training Loss validation_cosine_ndcg@10
0.0755 50 - 0.0982
0.1511 100 - 0.1408
0.2266 150 - 0.1546
0.3021 200 - 0.1612
0.3776 250 - 0.1655
0.4532 300 - 0.1663
0.5287 350 - 0.1710
0.6042 400 - 0.1704
0.6798 450 - 0.1713
0.7553 500 2.378 0.1702
0.8308 550 - 0.1727
0.9063 600 - 0.1734
0.9819 650 - 0.1741
1.0 662 - 0.1745
1.0574 700 - 0.1752
1.1329 750 - 0.1761
1.2085 800 - 0.1750
1.2840 850 - 0.1719
1.3595 900 - 0.1730
1.4350 950 - 0.1760
1.5106 1000 0.7402 0.1776
1.5861 1050 - 0.1757
1.6616 1100 - 0.1774
1.7372 1150 - 0.1757
1.8127 1200 - 0.1749
1.8882 1250 - 0.1745
1.9637 1300 - 0.1758
2.0 1324 - 0.1776
2.0393 1350 - 0.1772
2.1148 1400 - 0.1751
2.1903 1450 - 0.1757
2.2659 1500 0.467 0.1742
2.3414 1550 - 0.1748
2.4169 1600 - 0.1738
2.4924 1650 - 0.1749
2.5680 1700 - 0.1772
2.6435 1750 - 0.1772
2.7190 1800 - 0.1772
2.7946 1850 - 0.1774
2.8701 1900 - 0.1770
2.9456 1950 - 0.1757
3.0 1986 - 0.1771
3.0211 2000 0.2653 0.1762
3.0967 2050 - 0.1745
3.1722 2100 - 0.1748
3.2477 2150 - 0.1749
3.3233 2200 - 0.1766
3.3988 2250 - 0.1746
3.4743 2300 - 0.1749
3.5498 2350 - 0.1766
3.6254 2400 - 0.1752
3.7009 2450 - 0.1749
3.7764 2500 0.1809 0.1746
3.8520 2550 - 0.1751
3.9275 2600 - 0.1755
4.0 2648 - 0.1744
4.0030 2650 - 0.1747
4.0785 2700 - 0.1747
4.1541 2750 - 0.1766
4.2296 2800 - 0.1761
4.3051 2850 - 0.1745
4.3807 2900 - 0.1748
4.4562 2950 - 0.1753
4.5317 3000 0.1368 0.1741
4.6073 3050 - 0.1718
4.6828 3100 - 0.1730
4.7583 3150 - 0.1735
4.8338 3200 - 0.1753
4.9094 3250 - 0.1744
4.9849 3300 - 0.1752
5.0 3310 - 0.1758
5.0604 3350 - 0.1771
5.1360 3400 - 0.1758
5.2115 3450 - 0.1741
5.2870 3500 0.1178 0.1741
5.3625 3550 - 0.1746
5.4381 3600 - 0.1744
5.5136 3650 - 0.1740
5.5891 3700 - 0.1743
5.6647 3750 - 0.1744
5.7402 3800 - 0.1733
5.8157 3850 - 0.1747
5.8912 3900 - 0.1755
5.9668 3950 - 0.1734
6.0 3972 - 0.1740
6.0423 4000 0.0878 0.1745
6.1178 4050 - 0.1734
6.1934 4100 - 0.1725
6.2689 4150 - 0.1748
6.3444 4200 - 0.1743
6.4199 4250 - 0.1742
6.4955 4300 - 0.1738
6.5710 4350 - 0.1756
6.6465 4400 - 0.1746
6.7221 4450 - 0.1754
6.7976 4500 0.0697 0.1756
6.8731 4550 - 0.1755
6.9486 4600 - 0.1755
7.0 4634 - 0.1755
7.0242 4650 - 0.1752
7.0997 4700 - 0.1766
7.1752 4750 - 0.1745
7.2508 4800 - 0.1751
7.3263 4850 - 0.1746
7.4018 4900 - 0.1747
7.4773 4950 - 0.1742
7.5529 5000 0.0643 0.1743
7.6284 5050 - 0.1736
7.7039 5100 - 0.1739
7.7795 5150 - 0.1737
7.8550 5200 - 0.1736
7.9305 5250 - 0.1744
8.0 5296 - 0.1750
8.0060 5300 - 0.1751
8.0816 5350 - 0.1742
8.1571 5400 - 0.1739
8.2326 5450 - 0.1745
8.3082 5500 0.0521 0.1745
8.3837 5550 - 0.1746
8.4592 5600 - 0.1743
8.5347 5650 - 0.1744
8.6103 5700 - 0.1750
8.6858 5750 - 0.1749
8.7613 5800 - 0.1748
8.8369 5850 - 0.1747
8.9124 5900 - 0.1747
8.9879 5950 - 0.1746
9.0 5958 - 0.1746
9.0634 6000 0.044 0.1745
9.1390 6050 - 0.1742
9.2145 6100 - 0.1740
9.2900 6150 - 0.1742
9.3656 6200 - 0.1744
9.4411 6250 - 0.1739
9.5166 6300 - 0.1737
9.5921 6350 - 0.1740
9.6677 6400 - 0.1738
9.7432 6450 - 0.1739
9.8187 6500 0.043 0.1738
9.8943 6550 - 0.1738
9.9698 6600 - 0.1739
10.0 6620 - 0.1739

Framework Versions

  • Python: 3.13.3
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.3
  • PyTorch: 2.7.0
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}
Downloads last month
45
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vijayarulmuthu/finetuned_arctic_ft-a85433c9-6284-4afb-8e87-e110823d565c

Finetuned
(47)
this model

Evaluation results