SentenceTransformer based on nomic-ai/modernbert-embed-base

This is a sentence-transformers model finetuned from nomic-ai/modernbert-embed-base on the stsb_multi_es_augmented (private) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Sentence Transformer
  • Base model: nomic-ai/modernbert-embed-base
  • Maximum Sequence Length: 8192 tokens
  • Output Dimensionality: 768 dimensions
  • Similarity Function: Cosine Similarity
  • Training Dataset:
    • Private stsb dataset

Model Sources

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel 
  (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
  (2): Normalize()
)

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import SentenceTransformer

# Download from the 🤗 Hub
model = SentenceTransformer("mrm8488/modernbert-embed-base-ft-sts-spanish-matryoshka-768-64-5e")
# Run inference
sentences = [
    'El cordero está mirando hacia la cámara.',
    'Un gato está mirando hacia la cámara también.',
    '"Sí, no deseo estar presente durante este testimonio", declaró tranquilamente Peterson, de 31 años, al juez cuando fue devuelto a su celda.',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]

# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]

Evaluation

Metrics

Semantic Similarity

  • Datasets: sts-dev-768, sts-dev-512, sts-dev-256, sts-dev-128, sts-dev-64, sts-test-768, sts-test-512, sts-test-256, sts-test-128 and sts-test-64
  • Evaluated with EmbeddingSimilarityEvaluator
Metric sts-dev-768 sts-dev-512 sts-dev-256 sts-dev-128 sts-dev-64 sts-test-768 sts-test-512 sts-test-256 sts-test-128 sts-test-64
pearson_cosine 0.7499 0.7468 0.7419 0.7263 0.6973 0.8673 0.8665 0.8568 0.8485 0.8194
spearman_cosine 0.7532 0.7482 0.7451 0.7304 0.707 0.8767 0.8752 0.8702 0.8617 0.842

Training Details

Training Dataset

stsb_multi_es_augmented (private)

  • Size: 2,697 training samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 1000 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 9 tokens
    • mean: 28.42 tokens
    • max: 96 tokens
    • min: 10 tokens
    • mean: 28.01 tokens
    • max: 92 tokens
    • min: 0.0
    • mean: 2.72
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    El pájaro de tamaño reducido se posó con delicadeza en una rama cubierta de escarcha. Un ave de color amarillo descansaba tranquilamente en una rama. 3.200000047683716
    Una chica está tocando la flauta en un parque. Un grupo de músicos está tocando en un escenario al aire libre. 1.286
    La aclamada escritora británica, Doris Lessing, galardonada con el premio Nobel, fallece La destacada autora británica, Doris Lessing, reconocida con el prestigioso Premio Nobel, muere 4.199999809265137
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Evaluation Dataset

stsb_multi_es_augmented (private)

  • Size: 697 evaluation samples
  • Columns: sentence1, sentence2, and score
  • Approximate statistics based on the first 697 samples:
    sentence1 sentence2 score
    type string string float
    details
    • min: 9 tokens
    • mean: 29.35 tokens
    • max: 87 tokens
    • min: 9 tokens
    • mean: 28.52 tokens
    • max: 81 tokens
    • min: 0.0
    • mean: 2.3
    • max: 5.0
  • Samples:
    sentence1 sentence2 score
    Un incendio ocurrido en un hospital psiquiátrico ruso resultó en la trágica muerte de 38 personas. Se teme que el incendio en un hospital psiquiátrico ruso cause la pérdida de la vida de 38 individuos. 4.199999809265137
    "Street dijo que el otro individuo a veces se siente avergonzado de su fiesta, lo cual provoca risas en la multitud" "A veces, el otro tipo se encuentra avergonzado de su fiesta y no se le puede culpar." 3.5
    El veterano diplomático de Malasia tuvo un encuentro con Suu Kyi el miércoles en la casa del lago en Yangon donde permanece bajo arresto domiciliario. Razali Ismail tuvo una reunión de 90 minutos con Suu Kyi, quien ganó el Premio Nobel de la Paz en 1991, en su casa del lago donde está recluida. 3.691999912261963
  • Loss: MatryoshkaLoss with these parameters:
    {
        "loss": "CoSENTLoss",
        "matryoshka_dims": [
            768,
            512,
            256,
            128,
            64
        ],
        "matryoshka_weights": [
            1,
            1,
            1,
            1,
            1
        ],
        "n_dims_per_step": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • num_train_epochs: 5
  • warmup_ratio: 0.1
  • bf16: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 16
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 5e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss sts-dev-768_spearman_cosine sts-dev-512_spearman_cosine sts-dev-256_spearman_cosine sts-dev-128_spearman_cosine sts-dev-64_spearman_cosine sts-test-768_spearman_cosine sts-test-512_spearman_cosine sts-test-256_spearman_cosine sts-test-128_spearman_cosine sts-test-64_spearman_cosine
0.5917 100 23.7709 22.5494 0.7185 0.7146 0.7055 0.6794 0.6570 - - - - -
1.1834 200 22.137 22.7634 0.7449 0.7412 0.7439 0.7287 0.7027 - - - - -
1.7751 300 21.5527 22.6985 0.7321 0.7281 0.7243 0.7063 0.6862 - - - - -
2.3669 400 20.5745 24.0021 0.7302 0.7264 0.7221 0.7097 0.6897 - - - - -
2.9586 500 20.0861 24.0091 0.7392 0.7361 0.7293 0.7124 0.6906 - - - - -
3.5503 600 18.8191 26.9012 0.7502 0.7462 0.7399 0.7207 0.6960 - - - - -
4.1420 700 18.3 29.0209 0.7496 0.7454 0.7432 0.7284 0.7065 - - - - -
4.7337 800 17.6496 28.9536 0.7532 0.7482 0.7451 0.7304 0.7070 - - - - -
5.0 845 - - - - - - - 0.8767 0.8752 0.8702 0.8617 0.8420

Framework Versions

  • Python: 3.10.12
  • Sentence Transformers: 3.3.1
  • Transformers: 4.48.0
  • PyTorch: 2.5.1+cu121
  • Accelerate: 1.2.1
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

MatryoshkaLoss

@misc{kusupati2024matryoshka,
    title={Matryoshka Representation Learning},
    author={Aditya Kusupati and Gantavya Bhatt and Aniket Rege and Matthew Wallingford and Aditya Sinha and Vivek Ramanujan and William Howard-Snyder and Kaifeng Chen and Sham Kakade and Prateek Jain and Ali Farhadi},
    year={2024},
    eprint={2205.13147},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

CoSENTLoss

@online{kexuefm-8847,
    title={CoSENT: A more efficient sentence vector scheme than Sentence-BERT},
    author={Su Jianlin},
    year={2022},
    month={Jan},
    url={https://kexue.fm/archives/8847},
}
Downloads last month
72
Safetensors
Model size
149M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for mrm8488/modernbert-embed-base-ft-sts-spanish-matryoshka-768-64

Finetuned
(12)
this model

Collection including mrm8488/modernbert-embed-base-ft-sts-spanish-matryoshka-768-64

Evaluation results