BGE base Financial Matryoshka
This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: BAAI/bge-base-en-v1.5
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 dimensions
- Similarity Function: Cosine Similarity
- Training Dataset:
- json
- Language: en
- License: apache-2.0
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
(2): Normalize()
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("cristiano-sartori/bge_ft2")
# Run inference
sentences = [
"Which of the following statements about coverage-guided fuzzing is/are correct?\nA. [\nB. '\nC. R\nD. e\nE. d\nD. u\nF. n\nG. d\nH. a\nI. n",
'To determine which statements about coverage-guided fuzzing are correct, let\'s analyze each option step by step.\n\n1. **Redundant seeds in the corpus will reduce fuzzing efficiency.**\n - **Analysis:** This statement is generally true. In coverage-guided fuzzing, the goal is to explore as many different paths and code branches as possible. If the corpus contains many redundant seeds (i.e., inputs that lead to the same code paths), it can lead to wasted effort and reduced efficiency since the fuzzer may spend more time exploring the same paths rather than discovering new ones.\n\n2. **Counting the number of times the covered code has been executed provides a more fine-grained view of program behavior than only "covered/not covered" binary code coverage.**\n - **Analysis:** This statement is correct. While binary code coverage only tells you whether a particular part of the code has been executed, counting the number of times each part of the code is executed (also known as edge or path coverage) provides deeper insights into the program\'s behavior. This finer granularity can help the fuzzer prioritize certain inputs that might lead to new or interesting behaviors.\n\n3. **Due to the coverage feedback, a small random perturbation of a seed can have a significant impact on further exploration.**\n - **Analysis:** This statement is also correct. Coverage-guided fuzzers utilize feedback about which parts of the code are executed to guide their exploration. Even a small change in input can lead to different execution paths being taken, which may uncover new code that wasn\'t reached with the original seed. As such, small perturbations can indeed have a large impact on the exploration of the input space.\n\n4. **Fuzzers that have higher code coverage always find more bugs.**\n - **Analysis:** This statement is misleading and generally false. While higher code coverage can increase the likelihood of finding bugs, it does not guarantee that more bugs will be found. Some parts of the code may be covered but not contain any bugs, while other areas might have bugs that are difficult to reach, regardless of coverage. Thus, while there is a correlation between coverage and bug discovery, it is not a strict rule that higher coverage will always lead to more bugs being found.\n\nBased on this analysis, the correct statements about coverage-guided fuzzing are:\n\n- **1. True**\n- **2. True**\n- **3. True**\n- **4. False**\n\nIn summary, statements 1, 2, and 3 are correct, while statement 4 is not.',
"To decrypt the ciphertext in RSA, we first need to find the private key such that , where and . \n\nGiven , we need to find such that:\n\n\\[\n13d \\equiv 1 \\mod 60\n\\]\n\nUsing the Extended Euclidean Algorithm, we find :\n\n1. \n2. \n3. \n4. \n5. \n6. \n\nBack substituting to find :\n\n\\[\n1 = 3 - (5 - 1 \\cdot 3) = 2 \\cdot 3 - 5\n\\]\n\\[\n1 = 2 \\cdot (8 - 1 \\cdot 5) - 5 = 2 \\cdot 8 - 3 \\cdot 5\n\\]\n\\[\n= 2 \\cdot 8 - 3 \\cdot (13 - 1 \\cdot 8) = 5 \\cdot 8 - 3 \\cdot 13\n\\]\n\\[\n= 5 \\cdot (60 - 4 \\cdot 13) - 3 \\cdot 13 = 5 \\cdot 60 - 23 \\cdot 13\n\\]\n\nThus, , or .\n\nNow we can decrypt the ciphertext :\n\n\\[\nm \\equiv c^d \\mod n\n\\]\n\\[\nm \\equiv 14^{37} \\mod 77\n\\]\n\nTo simplify this computation, we can use the Chinese Remainder Theorem by calculating and :\n\n1. Calculate :\n \\[\n 14 \\equiv 0 \\mod 7 \\implies 14^{37} \\equiv 0 \\mod 7\n \\]\n\n2. Calculate :\n \\[\n 14 \\equiv 3 \\mod 11\n \\]\n Using Fermat's Little Theorem, . Thus:\n \\[\n 37 \\mod 10 = 7 \\implies 3^{37} \\equiv 3^7 \\mod 11\n \\]\n We calculate :\n \\[\n 3^2 = 9, \\quad 3^4 = 81 \\equiv 4 \\mod 11\n \\]\n \\[\n 3^6 = 3^4 \\cdot 3^2 = 4 \\cdot 9 = 36 \\equiv 3 \\mod 11\n \\]\n \\[\n 3^7 = 3^6 \\cdot 3 = 3 \\cdot 3 = 9 \\mod 11\n \\]\n\nNow we have:\n- \n- \n\nWe can solve these congruences using the method of successive substitutions or direct computation. \n\nLet . Then:\n\n\\[\n7k \\equiv 9 \\mod 11 \\implies 7k = 9 + 11j\n\\]\nSolving for modulo 11, we need the modular inverse of 7 mod 11, which is 8 (since ). Thus:\n\n\\[\nk \\equiv 8 \\cdot 9 \\mod 11 \\equiv 72 \\mod 11 \\equiv 6 \\mod 11\n\\]\n\nSo . Substituting back, we have:\n\n\\[\nm = 7(11m + 6) = 77m + 42\n\\]\nThus, .\n\nThe message sent was .\n\nTherefore, the correct answer is:\n\n**$t = 42$**.",
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Evaluation
Metrics
Information Retrieval
- Dataset:
dim_768
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 768 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.748 |
cosine_accuracy@3 | 0.9134 |
cosine_accuracy@5 | 0.9291 |
cosine_accuracy@10 | 0.9528 |
cosine_precision@1 | 0.748 |
cosine_precision@3 | 0.3045 |
cosine_precision@5 | 0.1858 |
cosine_precision@10 | 0.0953 |
cosine_recall@1 | 0.748 |
cosine_recall@3 | 0.9134 |
cosine_recall@5 | 0.9291 |
cosine_recall@10 | 0.9528 |
cosine_ndcg@10 | 0.8627 |
cosine_mrr@10 | 0.8326 |
cosine_map@100 | 0.8333 |
Information Retrieval
- Dataset:
dim_512
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 512 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.7638 |
cosine_accuracy@3 | 0.9055 |
cosine_accuracy@5 | 0.9291 |
cosine_accuracy@10 | 0.9449 |
cosine_precision@1 | 0.7638 |
cosine_precision@3 | 0.3018 |
cosine_precision@5 | 0.1858 |
cosine_precision@10 | 0.0945 |
cosine_recall@1 | 0.7638 |
cosine_recall@3 | 0.9055 |
cosine_recall@5 | 0.9291 |
cosine_recall@10 | 0.9449 |
cosine_ndcg@10 | 0.8659 |
cosine_mrr@10 | 0.8394 |
cosine_map@100 | 0.8408 |
Information Retrieval
- Dataset:
dim_256
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 256 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.7323 |
cosine_accuracy@3 | 0.9055 |
cosine_accuracy@5 | 0.9134 |
cosine_accuracy@10 | 0.9449 |
cosine_precision@1 | 0.7323 |
cosine_precision@3 | 0.3018 |
cosine_precision@5 | 0.1827 |
cosine_precision@10 | 0.0945 |
cosine_recall@1 | 0.7323 |
cosine_recall@3 | 0.9055 |
cosine_recall@5 | 0.9134 |
cosine_recall@10 | 0.9449 |
cosine_ndcg@10 | 0.8492 |
cosine_mrr@10 | 0.8173 |
cosine_map@100 | 0.8184 |
Information Retrieval
- Dataset:
dim_128
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 128 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.7244 |
cosine_accuracy@3 | 0.8898 |
cosine_accuracy@5 | 0.9134 |
cosine_accuracy@10 | 0.937 |
cosine_precision@1 | 0.7244 |
cosine_precision@3 | 0.2966 |
cosine_precision@5 | 0.1827 |
cosine_precision@10 | 0.0937 |
cosine_recall@1 | 0.7244 |
cosine_recall@3 | 0.8898 |
cosine_recall@5 | 0.9134 |
cosine_recall@10 | 0.937 |
cosine_ndcg@10 | 0.8372 |
cosine_mrr@10 | 0.8045 |
cosine_map@100 | 0.806 |
Information Retrieval
- Dataset:
dim_64
- Evaluated with
InformationRetrievalEvaluator
with these parameters:{ "truncate_dim": 64 }
Metric | Value |
---|---|
cosine_accuracy@1 | 0.6929 |
cosine_accuracy@3 | 0.8661 |
cosine_accuracy@5 | 0.9134 |
cosine_accuracy@10 | 0.9291 |
cosine_precision@1 | 0.6929 |
cosine_precision@3 | 0.2887 |
cosine_precision@5 | 0.1827 |
cosine_precision@10 | 0.0929 |
cosine_recall@1 | 0.6929 |
cosine_recall@3 | 0.8661 |
cosine_recall@5 | 0.9134 |
cosine_recall@10 | 0.9291 |
cosine_ndcg@10 | 0.8202 |
cosine_mrr@10 | 0.784 |
cosine_map@100 | 0.7859 |
Training Details
Training Dataset
json
- Dataset: json
- Size: 1,137 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 5 tokens
- mean: 107.02 tokens
- max: 512 tokens
- min: 3 tokens
- mean: 353.32 tokens
- max: 512 tokens
- Samples:
anchor positive A simple substitution cipher can be broken \dots
A. 1The correct answer is: A. by analysing the probability occurrence of the language.
A simple substitution cipher replaces each letter in the plaintext with another letter, which means that the frequency of letters in the ciphertext will still reflect the frequency of letters in the original language. For instance, in English, the letter 'E' is the most commonly used letter, followed by 'T', 'A', 'O', etc. By analyzing the frequency of letters and patterns in the ciphertext, one can deduce which letters correspond to which, thereby breaking the cipher.
Options B, C, and D are not relevant to breaking a simple substitution cipher:
- B. only by using a quantum computer. Quantum computers are not necessary for breaking simple substitution ciphers, as they can be solved with classical techniques.
- C. by using the ENIGMA machine. The ENIGMA machine was used for a more complex form of encryption during World War II and is not applicable to simple substitution ciphers.
- **D...Consider a Generative Adversarial Network (GAN) which successfully produces images of goats. Which of the following statements is false?
A. T
B. h
C. e
D.
E. d
D. i
F. s
G. c
H. r
I. iTo determine which statement is false regarding the Generative Adversarial Network (GAN) that produces images of goats, it's essential to clarify the roles of the generator and the discriminator within the GAN framework.
1. Generator: The generator's main function is to learn the distribution of the training data, which consists of images of goats, and to generate new images that resemble this distribution. The goal is to create synthetic images that are indistinguishable from real goat images.
2. Discriminator: The discriminator's role is to differentiate between real images (from the training dataset) and fake images (produced by the generator). Its primary task is to classify images as real or fake, not to categorize them into specific classes like "goat" or "non-goat." The discriminator is trained to recognize whether an image comes from the real dataset or is a synthetic creation, regardless of the specific type of image.
Now, let's analyze each option provided in the q...Consider the following toy learning corpus of 59 tokens (using a tokenizer that splits on whitespaces and punctuation), out of a possible vocabulary of $N=100$ different tokens:
Pulsed operation of lasers refers to any laser not classified as continuous wave, so that the optical power appears in pulses of some duration at some repetition rate. This\linebreak encompasses a wide range of technologies addressing a number of different motivations. Some lasers are pulsed simply because they cannot be run in continuous wave mode.
Using a 2-gram language model, what are the values of the parameters corresponding to "continuous wave" and to "pulsed laser" using Maximum-Likelihood estimates?The probability of "continuous wave" is calculated as $P(\text{continuous wave})=\frac{2}{58}$ because the phrase appears twice in the bigram analysis of the 59-token corpus. In contrast, the phrase "pulsed laser" has a probability of $P(\text{pulsed laser})=0$, as it does not appear at all in the dataset, making it impossible to derive a maximum likelihood estimate for it.
- Loss:
MatryoshkaLoss
with these parameters:{ "loss": "MultipleNegativesRankingLoss", "matryoshka_dims": [ 768, 512, 256, 128, 64 ], "matryoshka_weights": [ 1, 1, 1, 1, 1 ], "n_dims_per_step": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: epochper_device_train_batch_size
: 2per_device_eval_batch_size
: 2gradient_accumulation_steps
: 16learning_rate
: 2e-05num_train_epochs
: 5lr_scheduler_type
: cosinewarmup_ratio
: 0.1bf16
: Truetf32
: Falseload_best_model_at_end
: Trueoptim
: adamw_torch_fusedbatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: epochprediction_loss_only
: Trueper_device_train_batch_size
: 2per_device_eval_batch_size
: 2per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 16eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 2e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 5max_steps
: -1lr_scheduler_type
: cosinelr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Falselocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torch_fusedoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | dim_768_cosine_ndcg@10 | dim_512_cosine_ndcg@10 | dim_256_cosine_ndcg@10 | dim_128_cosine_ndcg@10 | dim_64_cosine_ndcg@10 |
---|---|---|---|---|---|---|---|
0.2812 | 10 | 5.8639 | - | - | - | - | - |
0.5624 | 20 | 3.1297 | - | - | - | - | - |
0.8436 | 30 | 2.5823 | - | - | - | - | - |
1.0 | 36 | - | 0.8431 | 0.8461 | 0.8367 | 0.8263 | 0.8052 |
1.1125 | 40 | 0.8878 | - | - | - | - | - |
1.3937 | 50 | 1.1603 | - | - | - | - | - |
1.6749 | 60 | 0.6109 | - | - | - | - | - |
1.9561 | 70 | 1.7633 | - | - | - | - | - |
2.0 | 72 | - | 0.8590 | 0.8583 | 0.8336 | 0.8280 | 0.8039 |
2.2250 | 80 | 0.3261 | - | - | - | - | - |
2.5062 | 90 | 0.3084 | - | - | - | - | - |
2.7873 | 100 | 0.2973 | - | - | - | - | - |
3.0 | 108 | - | 0.8628 | 0.8713 | 0.8519 | 0.8421 | 0.8165 |
3.0562 | 110 | 0.2864 | - | - | - | - | - |
3.3374 | 120 | 0.1124 | - | - | - | - | - |
3.6186 | 130 | 0.8529 | - | - | - | - | - |
3.8998 | 140 | 0.3042 | - | - | - | - | - |
4.0 | 144 | - | 0.8612 | 0.8659 | 0.8502 | 0.8349 | 0.8171 |
4.1687 | 150 | 0.4779 | - | - | - | - | - |
4.4499 | 160 | 0.2737 | - | - | - | - | - |
4.7311 | 170 | 0.5733 | - | - | - | - | - |
5.0 | 180 | 0.0481 | 0.8627 | 0.8659 | 0.8492 | 0.8372 | 0.8202 |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.12.8
- Sentence Transformers: 4.1.0
- Transformers: 4.52.4
- PyTorch: 2.7.0+cu126
- Accelerate: 1.3.0
- Datasets: 3.6.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
MatryoshkaLoss
@misc{kusupati2024matryoshka,
title={Matryoshka Representation Learning},
author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
year={2024},
eprint={2205.13147},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
MultipleNegativesRankingLoss
@misc{henderson2017efficient,
title={Efficient Natural Language Response Suggestion for Smart Reply},
author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
year={2017},
eprint={1705.00652},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 1
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for cristiano-sartori/bge_ft2
Base model
BAAI/bge-base-en-v1.5Evaluation results
- Cosine Accuracy@1 on dim 768self-reported0.748
- Cosine Accuracy@3 on dim 768self-reported0.913
- Cosine Accuracy@5 on dim 768self-reported0.929
- Cosine Accuracy@10 on dim 768self-reported0.953
- Cosine Precision@1 on dim 768self-reported0.748
- Cosine Precision@3 on dim 768self-reported0.304
- Cosine Precision@5 on dim 768self-reported0.186
- Cosine Precision@10 on dim 768self-reported0.095
- Cosine Recall@1 on dim 768self-reported0.748
- Cosine Recall@3 on dim 768self-reported0.913