CrossEncoder based on answerdotai/ModernBERT-base

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: answerdotai/ModernBERT-base
  • Maximum Sequence Length: 8192 tokens
  • Number of Output Labels: 1 label
  • Training Dataset:
  • Language: en

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("Studeni/reranker-msmarco-v1.1-ModernBERT-base-listnet")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Metric NanoMSMARCO NanoNFCorpus NanoNQ
map 0.4674 (-0.0222) 0.3153 (+0.0449) 0.5727 (+0.1520)
mrr@10 0.4580 (-0.0195) 0.4976 (-0.0023) 0.5714 (+0.1447)
ndcg@10 0.5335 (-0.0069) 0.3530 (+0.0280) 0.6278 (+0.1272)

Cross Encoder Nano BEIR

Metric Value
map 0.4518 (+0.0582)
mrr@10 0.5090 (+0.0410)
ndcg@10 0.5048 (+0.0494)

Training Details

Training Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 82,326 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 11 characters
    • mean: 34.16 characters
    • max: 96 characters
    • size: 10 elements
    • size: 10 elements
  • Samples:
    query docs labels
    what does a bursa do ['Bursae (plural for bursa) are flattened fluid-filled sacs that function as cushions between your bones and the muscles (deep bursae) or bones and tendons (superficial bursae). Your bursae play an important role in leading a healthy, active life. When the bursae are not irritated and working properly, your joints move smoothly and painlessly. However, when a bursa becomes swollen and inflamed, the condition is known as bursitis.', 'A bursa is a small, fluid-filled sac that acts as a cushion between a bone and other moving parts: muscles, tendons, or skin. Bursae are found throughout the body. Bursitis occurs when a bursa becomes inflamed (redness and increased fluid in the bursa). A tendon is a flexible band of fibrous tissue that connects muscles to bones. Tendinitis is inflammation of a tendon. Tendons transmit the pull of the muscle to the bone to cause movement.', 'A bursa (plural bursae or bursas) is a small fluid-filled sac lined by synovial membrane with an inner capillary laye... [1, 1, 0, 0, 0, ...]
    what is gluten in ['Gluten is a general name for the proteins found in wheat (durum, emmer, spelt, farina, farro, KAMUT® khorasan wheat and einkorn), rye, barley and triticale. Gluten helps foods maintain their shape, acting as a glue that holds food together.', 'Definition. A gluten-free diet is a diet that excludes the protein gluten. Gluten is found in grains such as wheat, barley, rye, and a cross between wheat and rye called triticale. A gluten-free diet is primarily used to treat celiac disease. Gluten causes inflammation in the small intestines of people with', 'A gluten-free diet is a diet that excludes the protein gluten. Gluten is found in grains such as wheat, barley, rye, and a cross between wheat and rye called triticale. A gluten-free diet is primarily used to treat celiac disease. Gluten causes inflammation in the small intestines of people with', 'Gluten is found in wheat, rye, barley and any foods made with these grains. Avoiding wheat can be especially hard because this means you shoul... [1, 0, 0, 0, 0, ...]
    what is a payaway ['Playaway is the name of a solid-state prerecorded audio player introduced in 2005 by Findaway World, LLC, based in Solon, Ohio. About the size of a deck of playing cards and weighing 2 ounces, it can store up to 80 hours of audio. As of March 2010, the audiobooks are all produced in high definition audio. The digital content (audiobook or music compilation) is preloaded at the factory and cannot be changed or copied by the end user. A 3.5 mm stereo jack provides output to earphones or an external amplifier. Playaway was specifically designed to use most commonly available cassette adaptors and FM transmitters. Power is provided by a changeable 1.5V AAA cell, which the manufacturer claims allows it to operate approximately 20 hours before battery depletion, 30 hours for the newer versions.', "Playaway Audio Books. Playaway® is the simplest way to listen to audio on the go. Each Playaway is a self-contained audiobook, weighs only two ounces, and comes with a battery. Just plug in your ... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "eps": 1e-10,
        "pad_value": -1
    }
    

Evaluation Dataset

ms_marco

  • Dataset: ms_marco at a47ee7a
  • Size: 82,326 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 10 characters
    • mean: 34.53 characters
    • max: 93 characters
    • size: 10 elements
    • size: 10 elements
  • Samples:
    query docs labels
    how long does ehic card take to arrive ['The quickest way is to apply online. Your EHIC will normally arrive within seven days and will usually be valid for five years. This means at a reduced cost or sometimes free of charge. Even with an EHIC, you may have to pay towards your treatment, depending on the rules of the country you’re visiting. You may be able to claim the money back – always try to apply for a refund before you return home. Find out how to do this in the country-by-country guide for the EHIC', "What is the EHIC? The European Health Insurance Card or EHIC was introduced in 2004 across the European Union. It allows Irish residents to access health services in any EU country and in Switzerland, Iceland, Liechtenstein and Norway, if they become ill or injured while on a temporary stay in that country. No. Your card will be valid for 4 to 5 years. Check that you and your family's cards are valid before you travel, and if they have expired, it's easy to renew them online at www.ehic.ie or at your Local Health Offi... [1, 0, 0, 0, 0, ...]
    what are the muscles that dorsiflex the foot ['Dorsiflexion of the foot uses four muscles. These are the tibialis anterior, extensor digitorum longus, extensor hallucis longus, and the peroneus tertius. ', 'There are four muscles in the anterior compartment of the leg; tibialis anterior, extensor digitorun longus, extensor hallucis longus and fibularis tertius. Collectively, they act to dorsiflex and invert the foot at the ankle joint. The extensor digitorum longus and extensor hallucis longus also extend the toes. The muscles in this compartment are innervated by the deep fibular nerve (L4-L5), and blood is supplied via the anterior tibial artery.', 'Many muscles do the work of moving the ankle and foot. Some of the muscles that move the foot start higher up in the leg, and smaller muscles work right in the foot itself. The leg is divided into compartments: the anterior, lateral, and posterior compartments. The muscles in these compartments help move the ankle and the foot: Anterior compartment: This compartment lies in front ... [1, 0, 0, 0, 0, ...]
    What does the thailand flag mean ["Thai Flag Meaning: The red stripes mean Thailand's nation. The white stands for the country's main religion. Blue is Thailand's national color and it represents the Thai monarchy/Royal Family. Hoped This helped Brought to you by: Firescream66 and the post starter. B … lue is Thailand's national color and it represents the Thai monarchy. The blue is also used to honor Thailand's World War I allies, Great Britain, France, United States and Russia, who all had red, white and blue flags.", "The red represents the blood spilled to maintain Thailand's independence. The white stands for purity and is the color of Buddhism. And the Blue represents the Thai monarchy. The pattern repeats so that the flag can be flown without ever appearing upside down. The old Siam (former name of Thailand) flag was a solid red with a white elephant in the middle.", 'The flag of the Kingdom of Thailand (Thai: ธงไตรรงค์, Thong Trairong, meaning tricolour flag”) shows five horizontal stripes in the colours red,... [1, 0, 0, 0, 0, ...]
  • Loss: ListNetLoss with these parameters:
    {
        "eps": 1e-10,
        "pad_value": -1
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 6
  • per_device_eval_batch_size: 16
  • torch_empty_cache_steps: 2000
  • learning_rate: 4e-06
  • warmup_ratio: 0.1
  • seed: 12
  • bf16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 6
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 1
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: 2000
  • learning_rate: 4e-06
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 12
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • dispatch_batches: None
  • split_batches: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional

Training Logs

Epoch Step Training Loss Validation Loss NanoMSMARCO_ndcg@10 NanoNFCorpus_ndcg@10 NanoNQ_ndcg@10 NanoBEIR_mean_ndcg@10
-1 -1 - - 0.0264 (-0.5140) 0.2585 (-0.0665) 0.0606 (-0.4401) 0.1152 (-0.3402)
0.0001 1 1.8458 - - - - -
0.0430 500 2.1043 - - - - -
0.0861 1000 2.0906 - - - - -
0.1291 1500 2.0873 - - - - -
0.1721 2000 2.0848 2.0847 0.0655 (-0.4749) 0.2309 (-0.0941) 0.1177 (-0.3830) 0.1380 (-0.3174)
0.2152 2500 2.0864 - - - - -
0.2582 3000 2.0884 - - - - -
0.3013 3500 2.0783 - - - - -
0.3443 4000 2.0792 2.0791 0.3223 (-0.2181) 0.3229 (-0.0021) 0.2919 (-0.2088) 0.3124 (-0.1430)
0.3873 4500 2.0817 - - - - -
0.4304 5000 2.0828 - - - - -
0.4734 5500 2.0785 - - - - -
0.5164 6000 2.0751 2.0740 0.4743 (-0.0661) 0.3450 (+0.0200) 0.5233 (+0.0226) 0.4475 (-0.0078)
0.5595 6500 2.0719 - - - - -
0.6025 7000 2.0726 - - - - -
0.6456 7500 2.0734 - - - - -
0.6886 8000 2.0769 2.0722 0.5006 (-0.0398) 0.3449 (+0.0198) 0.4920 (-0.0087) 0.4458 (-0.0095)
0.7316 8500 2.0722 - - - - -
0.7747 9000 2.0669 - - - - -
0.8177 9500 2.0787 - - - - -
0.8607 10000 2.0661 2.0710 0.5646 (+0.0242) 0.3363 (+0.0113) 0.5672 (+0.0666) 0.4894 (+0.0340)
0.9038 10500 2.0754 - - - - -
0.9468 11000 2.0717 - - - - -
0.9898 11500 2.0779 - - - - -
1.0329 12000 2.0703 2.0706 0.5609 (+0.0205) 0.3107 (-0.0144) 0.5817 (+0.0811) 0.4844 (+0.0291)
1.0759 12500 2.0692 - - - - -
1.1190 13000 2.0665 - - - - -
1.1620 13500 2.0801 - - - - -
1.2050 14000 2.0723 2.0702 0.5413 (+0.0009) 0.3249 (-0.0001) 0.5961 (+0.0954) 0.4874 (+0.0321)
1.2481 14500 2.0707 - - - - -
1.2911 15000 2.0715 - - - - -
1.3341 15500 2.0664 - - - - -
1.3772 16000 2.0736 2.0700 0.5234 (-0.0171) 0.3314 (+0.0064) 0.6068 (+0.1061) 0.4872 (+0.0318)
1.4202 16500 2.0733 - - - - -
1.4632 17000 2.0728 - - - - -
1.5063 17500 2.068 - - - - -
1.5493 18000 2.0669 2.0699 0.5335 (-0.0069) 0.3530 (+0.0280) 0.6278 (+0.1272) 0.5048 (+0.0494)
1.5924 18500 2.0713 - - - - -
1.6354 19000 2.0689 - - - - -
1.6784 19500 2.07 - - - - -
1.7215 20000 2.0731 2.0696 0.5365 (-0.0039) 0.3497 (+0.0247) 0.5845 (+0.0838) 0.4902 (+0.0349)
1.7645 20500 2.0678 - - - - -
1.8075 21000 2.0646 - - - - -
1.8506 21500 2.0631 - - - - -
1.8936 22000 2.0714 2.0694 0.5340 (-0.0064) 0.3490 (+0.0239) 0.5653 (+0.0646) 0.4828 (+0.0274)
1.9367 22500 2.059 - - - - -
1.9797 23000 2.068 - - - - -
2.0227 23500 2.0664 - - - - -
2.0658 24000 2.0719 2.0699 0.5442 (+0.0038) 0.3531 (+0.0281) 0.5879 (+0.0873) 0.4951 (+0.0397)
2.1088 24500 2.0621 - - - - -
2.1518 25000 2.0669 - - - - -
2.1949 25500 2.067 - - - - -
2.2379 26000 2.0676 2.0700 0.5449 (+0.0044) 0.3334 (+0.0084) 0.5656 (+0.0649) 0.4813 (+0.0259)
2.2809 26500 2.0621 - - - - -
2.3240 27000 2.0634 - - - - -
2.3670 27500 2.065 - - - - -
2.4101 28000 2.0669 2.0704 0.5128 (-0.0276) 0.3495 (+0.0244) 0.5751 (+0.0744) 0.4791 (+0.0237)
2.4531 28500 2.0636 - - - - -
2.4961 29000 2.0623 - - - - -
2.5392 29500 2.0669 - - - - -
2.5822 30000 2.0615 2.0698 0.5448 (+0.0044) 0.3406 (+0.0156) 0.5768 (+0.0762) 0.4874 (+0.0321)
2.6252 30500 2.0708 - - - - -
2.6683 31000 2.0655 - - - - -
2.7113 31500 2.0511 - - - - -
2.7543 32000 2.0623 2.0699 0.5377 (-0.0027) 0.3505 (+0.0255) 0.5854 (+0.0847) 0.4912 (+0.0358)
2.7974 32500 2.0651 - - - - -
2.8404 33000 2.0675 - - - - -
2.8835 33500 2.0689 - - - - -
2.9265 34000 2.067 2.0699 0.5221 (-0.0184) 0.3605 (+0.0354) 0.5695 (+0.0688) 0.4840 (+0.0286)
2.9695 34500 2.0634 - - - - -
-1 -1 - - 0.5335 (-0.0069) 0.3530 (+0.0280) 0.6278 (+0.1272) 0.5048 (+0.0494)
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.10.13
  • Sentence Transformers: 3.5.0.dev0
  • Transformers: 4.48.1
  • PyTorch: 2.5.1+cu124
  • Accelerate: 1.3.0
  • Datasets: 3.2.0
  • Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to rank: from pairwise approach to listwise approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
8
Safetensors
Model size
150M params
Tensor type
F32
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The HF Inference API does not support text-classification models for sentence-transformers library.

Model tree for Studeni/reranker-msmarco-v1.1-ModernBERT-base-listnet

Finetuned
(334)
this model

Dataset used to train Studeni/reranker-msmarco-v1.1-ModernBERT-base-listnet