potsu-potsu's picture
Add new SentenceTransformer model
b1619cc verified
metadata
language:
  - en
license: apache-2.0
tags:
  - sentence-transformers
  - sentence-similarity
  - feature-extraction
  - generated_from_trainer
  - dataset_size:4012
  - loss:MatryoshkaLoss
  - loss:MultipleNegativesRankingLoss
base_model: BAAI/bge-base-en-v1.5
widget:
  - source_sentence: Do cephalopods use RNA editing less frequently than other species?
    sentences:
      - >-
        Extensive messenger RNA editing generates transcript and protein
        diversity in genes involved in neural excitability, as previously
        described, as well as in genes participating in a broad range of other
        cellular functions. 
      - >-
        GV1001 is a 16-amino-acid vaccine peptide derived from the human
        telomerase reverse transcriptase sequence. It has been developed as a
        vaccine against various cancers.
      - >-
        Using acetyl-specific K516 antibodies, we show that acetylation of
        endogenous S6K1 at this site is potently induced upon growth factor
        stimulation. We propose that K516 acetylation may serve to modulate
        important kinase-independent functions of S6K1 in response to growth
        factor signalling. Following mitogen stimulation, S6Ks interact with the
        p300 and p300/CBP-associated factor (PCAF) acetyltransferases. S6Ks can
        be acetylated by p300 and PCAF in vitro and S6K acetylation is detected
        in cells expressing p300
  - source_sentence: Can pets affect infant microbiomed?
    sentences:
      - >-
        Yes, exposure to household furry pets influences the gut microbiota of
        infants.
      - >-
        Thiazovivin is a selective small molecule that directly targets
        Rho-associated kinase (ROCK) and increases expression of pluripotency
        factors.
      - ' Here, we present evidence that the calcium/calmodulin-dependent protein kinase IV (CaMK4) is increased and required during Th17 cell differentiation. Inhibition of CaMK4 reduced Il17 transcription through decreased activation of the cAMP response element modulator a (CREM-a) and reduced activation of the AKT/mTOR pathway, which is known to enhance Th17 differentiation. CAMK4 knockdown and kinase-dead mutant inhibited crocin-mediated HO-1 expression, Nrf2 activation, and phosphorylation of Akt, indicating that HO-1 expression is mediated by CAMK4 and that Akt is a downstream mediator of CAMK4 in crocin signaling'
  - source_sentence: >-
      In what proportion of children with heart failure has Enalapril been shown
      to be safe and effective?
    sentences:
      - >-
        5-HT2A (5-hydroxytryptamine type 2a) receptor can be evaluated with the
        [18F]altanserin.
      - >-
        In children with heart failure evidence of the effect of enalapril is
        empirical. Enalapril was clinically safe and effective in 50% to 80% of
        for children with cardiac failure secondary to congenital heart
        malformations before and after cardiac surgery,  impaired ventricular
        function , valvar regurgitation,  congestive cardiomyopathy,  , arterial
        hypertension, life-threatening arrhythmias coexisting with circulatory
        insufficiency.   

        ACE inhibitors have shown a transient beneficial effect on heart failure
        due to anticancer drugs and possibly a beneficial effect in muscular
        dystrophy-associated cardiomyopathy, which deserves further studies.
      - |-
        necroptosis
        apoptosis  
        pro-survival/inflammation NF-κB activation
  - source_sentence: How are SAHFS created?
    sentences:
      - >-
        In particular, up to 17% of neutrophil nuclei of healthy women exhibit a
        drumstick-shaped appendage that contains the inactive X chromosome.
      - >-
        miR-1, miR-133, miR-208a, miR-206, miR-494, miR-146a, miR-222, miR-21,
        miR-221, miR-20a, miR-133a, miR-133b, miR-23, miR-107 and miR-181 are
        involved in exercise adaptation
      - >-
        Cellular senescence-associated heterochromatic foci (SAHFS) are a novel
        type of chromatin condensation involving alterations of linker histone
        H1 and linker DNA-binding proteins. SAHFS can be formed by a variety of
        cell types, but their mechanism of action remains unclear.
  - source_sentence: >-
      What are the effects of the deletion of all three Pcdh clusters
      (tricluster deletion) in mice?
    sentences:
      - >-
        Multicluster Pcdh diversity is required for mouse olfactory neural
        circuit assembly. The vertebrate clustered protocadherin (Pcdh) cell
        surface proteins are encoded by three closely linked gene clusters
        (Pcdhα, Pcdhβ, and Pcdhγ). Although deletion of individual Pcdh clusters
        had subtle phenotypic consequences, the loss of all three clusters
        (tricluster deletion) led to a severe axonal arborization defect and
        loss of self-avoidance.
      - >-
        The myocyte enhancer factor-2 (MEF2) proteins are MADS-box transcription
        factors that are essential for differentiation of all muscle lineages
        but their mechanisms of action remain largely undefined. MEF2C
        expression initiates cardiomyogenesis, resulting in the up-regulation of
        Brachyury T, bone morphogenetic protein-4, Nkx2-5, GATA-4, cardiac
        alpha-actin, and myosin heavy chain expression. Inactivation of the
        MEF2C gene causes cardiac developmental arrest and severe downregulation
        of a number of cardiac markers including atrial natriuretic factor
        (ANF). BMP-2, a regulator of cardiac development during embryogenesis,
        was shown to increase PI 3-kinase activity in cardiac precursor cells,
        resulting in increased expression of sarcomeric myosin heavy chain (MHC)
        and MEF-2A. Furthermore, expression of MEF-2A increased MHC expression
        in a PI 3-kinase-dependent manner. Other studies showed that Gli2 and
        MEF2C proteins form a complex, capable of synergizing on
        cardiomyogenesis-related promoters. Dominant interference of
        calcineurin/mAKAP binding blunts the increase in MEF2 transcriptional
        activity seen during myoblast differentiation, as well as the expression
        of endogenous MEF2-target genes. These findings show that MEF-2 can
        direct early stages of cell differentiation into a cardiomyogenic
        pathway.
      - >-
        Investigators proposed that there have been three extended periods in
        the evolution of gene regulatory elements. Early vertebrate evolution
        was characterized by regulatory gains near transcription factors and
        developmental genes, but this trend was replaced by innovations near
        extracellular signaling genes, and then innovations near
        posttranslational protein modifiers.
pipeline_tag: sentence-similarity
library_name: sentence-transformers
metrics:
  - cosine_accuracy@1
  - cosine_accuracy@3
  - cosine_accuracy@5
  - cosine_accuracy@10
  - cosine_precision@1
  - cosine_precision@3
  - cosine_precision@5
  - cosine_precision@10
  - cosine_recall@1
  - cosine_recall@3
  - cosine_recall@5
  - cosine_recall@10
  - cosine_ndcg@10
  - cosine_mrr@10
  - cosine_map@100
model-index:
  - name: BGE Base Biomedical MRL
    results:
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 768
          type: dim_768
        metrics:
          - type: cosine_accuracy@1
            value: 0.7524752475247525
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8628005657708628
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8995756718528995
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9222065063649222
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7524752475247525
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5973597359735974
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5162659123055162
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3977369165487977
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2341252729014147
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3973567239272255
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4854465714352775
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.6062286842357961
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6940262144509974
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.813453896410049
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6257720133395309
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 512
          type: dim_512
        metrics:
          - type: cosine_accuracy@1
            value: 0.7538896746817539
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8585572842998586
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8953323903818954
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9207920792079208
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7538896746817539
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5964167845355963
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5142857142857143
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3977369165487976
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2333750448173818
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.39469849211764985
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4795534995350502
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.604605995019471
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6913260437859404
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.8125008419209268
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6197252995126041
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 256
          type: dim_256
        metrics:
          - type: cosine_accuracy@1
            value: 0.7355021216407355
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8486562942008486
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8868458274398868
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.9137199434229137
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7355021216407355
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5818010372465817
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.5018387553041018
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.38896746817538896
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.22793567434972276
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.37898311614248786
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4645337797167325
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5878379619993058
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6742555106189646
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7975213847915401
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.6002622001635138
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 128
          type: dim_128
        metrics:
          - type: cosine_accuracy@1
            value: 0.7057991513437057
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.8132956152758133
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.8500707213578501
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.8953323903818954
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.7057991513437057
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5535124941065536
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.47355021216407356
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.36605374823196607
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.2151445774205944
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.3572108621267904
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.4326304442151515
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.5469428314195238
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.6357672212173915
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7700377180575202
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.5565977979127998
            name: Cosine Map@100
      - task:
          type: information-retrieval
          name: Information Retrieval
        dataset:
          name: dim 64
          type: dim_64
        metrics:
          - type: cosine_accuracy@1
            value: 0.6265912305516266
            name: Cosine Accuracy@1
          - type: cosine_accuracy@3
            value: 0.7666195190947667
            name: Cosine Accuracy@3
          - type: cosine_accuracy@5
            value: 0.809052333804809
            name: Cosine Accuracy@5
          - type: cosine_accuracy@10
            value: 0.85997171145686
            name: Cosine Accuracy@10
          - type: cosine_precision@1
            value: 0.6265912305516266
            name: Cosine Precision@1
          - type: cosine_precision@3
            value: 0.5002357378595002
            name: Cosine Precision@3
          - type: cosine_precision@5
            value: 0.4291371994342292
            name: Cosine Precision@5
          - type: cosine_precision@10
            value: 0.3312588401697313
            name: Cosine Precision@10
          - type: cosine_recall@1
            value: 0.18851019998558088
            name: Cosine Recall@1
          - type: cosine_recall@3
            value: 0.31756777149198423
            name: Cosine Recall@3
          - type: cosine_recall@5
            value: 0.38736111738995704
            name: Cosine Recall@5
          - type: cosine_recall@10
            value: 0.49729865330882483
            name: Cosine Recall@10
          - type: cosine_ndcg@10
            value: 0.5709082950268725
            name: Cosine Ndcg@10
          - type: cosine_mrr@10
            value: 0.7040951033878895
            name: Cosine Mrr@10
          - type: cosine_map@100
            value: 0.4804032390680516
            name: Cosine Map@100

BGE Base Biomedical MRL

This is a sentence-transformers model finetuned from BAAI/bge-base-en-v1.5 on the json dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: BAAI/bge-base-en-v1.5
  • Maximum Sequence Length: 512 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • json
  • Language: en
  • License: apache-2.0

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': True}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("potsu-potsu/bge-base-biomedical-matryoshka")
# Run inference
sentences = [
    'What are the effects of the deletion of all three Pcdh clusters (tricluster deletion) in mice?',
    'Multicluster Pcdh diversity is required for mouse olfactory neural circuit assembly. The vertebrate clustered protocadherin (Pcdh) cell surface proteins are encoded by three closely linked gene clusters (Pcdhα, Pcdhβ, and Pcdhγ). Although deletion of individual Pcdh clusters had subtle phenotypic consequences, the loss of all three clusters (tricluster deletion) led to a severe axonal arborization defect and loss of self-avoidance.',
    'Investigators proposed that there have been three extended periods in the evolution of gene regulatory elements. Early vertebrate evolution was characterized by regulatory gains near transcription factors and developmental genes, but this trend was replaced by innovations near extracellular signaling genes, and then innovations near posttranslational protein modifiers.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Information Retrieval

Metric Value
cosine_accuracy@1 0.7525
cosine_accuracy@3 0.8628
cosine_accuracy@5 0.8996
cosine_accuracy@10 0.9222
cosine_precision@1 0.7525
cosine_precision@3 0.5974
cosine_precision@5 0.5163
cosine_precision@10 0.3977
cosine_recall@1 0.2341
cosine_recall@3 0.3974
cosine_recall@5 0.4854
cosine_recall@10 0.6062
cosine_ndcg@10 0.694
cosine_mrr@10 0.8135
cosine_map@100 0.6258

Information Retrieval

Metric Value
cosine_accuracy@1 0.7539
cosine_accuracy@3 0.8586
cosine_accuracy@5 0.8953
cosine_accuracy@10 0.9208
cosine_precision@1 0.7539
cosine_precision@3 0.5964
cosine_precision@5 0.5143
cosine_precision@10 0.3977
cosine_recall@1 0.2334
cosine_recall@3 0.3947
cosine_recall@5 0.4796
cosine_recall@10 0.6046
cosine_ndcg@10 0.6913
cosine_mrr@10 0.8125
cosine_map@100 0.6197

Information Retrieval

Metric Value
cosine_accuracy@1 0.7355
cosine_accuracy@3 0.8487
cosine_accuracy@5 0.8868
cosine_accuracy@10 0.9137
cosine_precision@1 0.7355
cosine_precision@3 0.5818
cosine_precision@5 0.5018
cosine_precision@10 0.389
cosine_recall@1 0.2279
cosine_recall@3 0.379
cosine_recall@5 0.4645
cosine_recall@10 0.5878
cosine_ndcg@10 0.6743
cosine_mrr@10 0.7975
cosine_map@100 0.6003

Information Retrieval

Metric Value
cosine_accuracy@1 0.7058
cosine_accuracy@3 0.8133
cosine_accuracy@5 0.8501
cosine_accuracy@10 0.8953
cosine_precision@1 0.7058
cosine_precision@3 0.5535
cosine_precision@5 0.4736
cosine_precision@10 0.3661
cosine_recall@1 0.2151
cosine_recall@3 0.3572
cosine_recall@5 0.4326
cosine_recall@10 0.5469
cosine_ndcg@10 0.6358
cosine_mrr@10 0.77
cosine_map@100 0.5566

Information Retrieval

Metric Value
cosine_accuracy@1 0.6266
cosine_accuracy@3 0.7666
cosine_accuracy@5 0.8091
cosine_accuracy@10 0.86
cosine_precision@1 0.6266
cosine_precision@3 0.5002
cosine_precision@5 0.4291
cosine_precision@10 0.3313
cosine_recall@1 0.1885
cosine_recall@3 0.3176
cosine_recall@5 0.3874
cosine_recall@10 0.4973
cosine_ndcg@10 0.5709
cosine_mrr@10 0.7041
cosine_map@100 0.4804

Training Details

Training Dataset

json

  • Dataset: json
  • Size: 4,012 training samples
  • Columns: anchor and positive
  • Approximate statistics based on the first 1000 samples:
    anchor positive
    type string string
    details
    • min: 5 tokens
    • mean: 16.13 tokens
    • max: 49 tokens
    • min: 3 tokens
    • mean: 63.38 tokens
    • max: 485 tokens
  • Samples:
    anchor positive
    What is the implication of histone lysine methylation in medulloblastoma? Aberrant patterns of H3K4, H3K9, and H3K27 histone lysine methylation were shown to result in histone code alterations, which induce changes in gene expression, and affect the proliferation rate of cells in medulloblastoma.
    What is the role of STAG1/STAG2 proteins in differentiation? STAG1/STAG2 proteins are tumour suppressor proteins that suppress cell proliferation and are essential for differentiation.
    What is the association between cell phone use and glioblastoma? The association between cell phone use and incident glioblastoma remains unclear. Some studies have reported that cell phone use was associated with incident glioblastoma, and with reduced survival of patients diagnosed with glioblastoma. However, other studies have repeatedly replicated to find an association between cell phone use and glioblastoma.
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "MultipleNegativesRankingLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: epoch
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 16
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • lr_scheduler_type: cosine
  • warmup_ratio: 0.1
  • bf16: True
  • tf32: True
  • load_best_model_at_end: True
  • optim: adamw_torch_fused
  • batch_sampler: no_duplicates

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: epoch
  • prediction_loss_only: True
  • per_device_train_batch_size: 32
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 16
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: cosine
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: True
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: no_duplicates
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss dim_768_cosine_ndcg@10 dim_512_cosine_ndcg@10 dim_256_cosine_ndcg@10 dim_128_cosine_ndcg@10 dim_64_cosine_ndcg@10
1.0 8 - 0.7106 0.7071 0.683 0.6384 0.5326
1.2540 10 25.4992 - - - - -
2.0 16 - 0.6976 0.6942 0.6763 0.6375 0.5635
2.5079 20 11.3871 - - - - -
3.0 24 - 0.6940 0.6907 0.6745 0.6365 0.5697
3.7619 30 8.6795 - - - - -
4.0 32 - 0.6940 0.6913 0.6743 0.6358 0.5709
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.5
  • Sentence Transformers: 4.1.0
  • Transformers: 4.52.4
  • PyTorch: 2.7.1+cu128
  • Accelerate: 1.7.0
  • Datasets: 3.6.0
  • Tokenizers: 0.21.1

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

MultipleNegativesRankingLoss

@misc{henderson2017efficient,
    title={Efficient Natural Language Response Suggestion for Smart Reply},
    author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
    year={2017},
    eprint={1705.00652},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}