CrossEncoder based on answerdotai/ModernBERT-base

This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.

Model Details

Model Description

Model Type: Cross Encoder
Base model: answerdotai/ModernBERT-base
Maximum Sequence Length: 8192 tokens
Number of Output Labels: 1 label
Training Dataset:
- ms_marco
Language: en

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("Studeni/reranker-msmarco-v1.1-ModernBERT-base-listnet")
# Get scores for pairs of texts
pairs = [
    ['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
    ['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
    ['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'How many calories in an egg',
    [
        'There are on average between 55 and 80 calories in an egg depending on its size.',
        'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
        'Most of the calories in an egg come from the yellow yolk in the center.',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Reranking

Datasets: NanoMSMARCO, NanoNFCorpus and NanoNQ
Evaluated with CERerankingEvaluator

Metric	NanoMSMARCO	NanoNFCorpus	NanoNQ
map	0.4674 (-0.0222)	0.3153 (+0.0449)	0.5727 (+0.1520)
mrr@10	0.4580 (-0.0195)	0.4976 (-0.0023)	0.5714 (+0.1447)
ndcg@10	0.5335 (-0.0069)	0.3530 (+0.0280)	0.6278 (+0.1272)

Cross Encoder Nano BEIR

Dataset: NanoBEIR_mean
Evaluated with CENanoBEIREvaluator

Metric	Value
map	0.4518 (+0.0582)
mrr@10	0.5090 (+0.0410)
ndcg@10	0.5048 (+0.0494)

Training Details

Training Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 82,326 training samples
Columns: query, docs, and labels
Approximate statistics based on the first 1000 samples:
query docs labels
type string list list
details
min: 11 characters
mean: 34.16 characters
max: 96 characters

size: 10 elements

size: 10 elements

	query	docs	labels
type	string	list	list
details	min: 11 characters mean: 34.16 characters max: 96 characters	size: 10 elements	size: 10 elements

Samples:

query	docs	labels
`what does a bursa do`	['Bursae (plural for bursa) are flattened fluid-filled sacs that function as cushions between your bones and the muscles (deep bursae) or bones and tendons (superficial bursae). Your bursae play an important role in leading a healthy, active life. When the bursae are not irritated and working properly, your joints move smoothly and painlessly. However, when a bursa becomes swollen and inflamed, the condition is known as bursitis.', 'A bursa is a small, fluid-filled sac that acts as a cushion between a bone and other moving parts: muscles, tendons, or skin. Bursae are found throughout the body. Bursitis occurs when a bursa becomes inflamed (redness and increased fluid in the bursa). A tendon is a flexible band of fibrous tissue that connects muscles to bones. Tendinitis is inflammation of a tendon. Tendons transmit the pull of the muscle to the bone to cause movement.', 'A bursa (plural bursae or bursas) is a small fluid-filled sac lined by synovial membrane with an inner capillary laye...	`[1, 1, 0, 0, 0, ...]`
`what is gluten in`	['Gluten is a general name for the proteins found in wheat (durum, emmer, spelt, farina, farro, KAMUT® khorasan wheat and einkorn), rye, barley and triticale. Gluten helps foods maintain their shape, acting as a glue that holds food together.', 'Definition. A gluten-free diet is a diet that excludes the protein gluten. Gluten is found in grains such as wheat, barley, rye, and a cross between wheat and rye called triticale. A gluten-free diet is primarily used to treat celiac disease. Gluten causes inflammation in the small intestines of people with', 'A gluten-free diet is a diet that excludes the protein gluten. Gluten is found in grains such as wheat, barley, rye, and a cross between wheat and rye called triticale. A gluten-free diet is primarily used to treat celiac disease. Gluten causes inflammation in the small intestines of people with', 'Gluten is found in wheat, rye, barley and any foods made with these grains. Avoiding wheat can be especially hard because this means you shoul...	`[1, 0, 0, 0, 0, ...]`
`what is a payaway`	['Playaway is the name of a solid-state prerecorded audio player introduced in 2005 by Findaway World, LLC, based in Solon, Ohio. About the size of a deck of playing cards and weighing 2 ounces, it can store up to 80 hours of audio. As of March 2010, the audiobooks are all produced in high definition audio. The digital content (audiobook or music compilation) is preloaded at the factory and cannot be changed or copied by the end user. A 3.5 mm stereo jack provides output to earphones or an external amplifier. Playaway was specifically designed to use most commonly available cassette adaptors and FM transmitters. Power is provided by a changeable 1.5V AAA cell, which the manufacturer claims allows it to operate approximately 20 hours before battery depletion, 30 hours for the newer versions.', "Playaway Audio Books. Playaway® is the simplest way to listen to audio on the go. Each Playaway is a self-contained audiobook, weighs only two ounces, and comes with a battery. Just plug in your ...	`[1, 0, 0, 0, 0, ...]`

Loss: ListNetLoss with these parameters:

{
    "eps": 1e-10,
    "pad_value": -1
}

Evaluation Dataset

ms_marco

Dataset: ms_marco at a47ee7a
Size: 82,326 evaluation samples
Columns: query, docs, and labels
Approximate statistics based on the first 1000 samples:
query docs labels
type string list list
details
min: 10 characters
mean: 34.53 characters
max: 93 characters

size: 10 elements

size: 10 elements

	query	docs	labels
type	string	list	list
details	min: 10 characters mean: 34.53 characters max: 93 characters	size: 10 elements	size: 10 elements

Samples:

query	docs	labels
`how long does ehic card take to arrive`	['The quickest way is to apply online. Your EHIC will normally arrive within seven days and will usually be valid for five years. This means at a reduced cost or sometimes free of charge. Even with an EHIC, you may have to pay towards your treatment, depending on the rules of the country you’re visiting. You may be able to claim the money back – always try to apply for a refund before you return home. Find out how to do this in the country-by-country guide for the EHIC', "What is the EHIC? The European Health Insurance Card or EHIC was introduced in 2004 across the European Union. It allows Irish residents to access health services in any EU country and in Switzerland, Iceland, Liechtenstein and Norway, if they become ill or injured while on a temporary stay in that country. No. Your card will be valid for 4 to 5 years. Check that you and your family's cards are valid before you travel, and if they have expired, it's easy to renew them online at www.ehic.ie or at your Local Health Offi...	`[1, 0, 0, 0, 0, ...]`
`what are the muscles that dorsiflex the foot`	['Dorsiflexion of the foot uses four muscles. These are the tibialis anterior, extensor digitorum longus, extensor hallucis longus, and the peroneus tertius. ', 'There are four muscles in the anterior compartment of the leg; tibialis anterior, extensor digitorun longus, extensor hallucis longus and fibularis tertius. Collectively, they act to dorsiflex and invert the foot at the ankle joint. The extensor digitorum longus and extensor hallucis longus also extend the toes. The muscles in this compartment are innervated by the deep fibular nerve (L4-L5), and blood is supplied via the anterior tibial artery.', 'Many muscles do the work of moving the ankle and foot. Some of the muscles that move the foot start higher up in the leg, and smaller muscles work right in the foot itself. The leg is divided into compartments: the anterior, lateral, and posterior compartments. The muscles in these compartments help move the ankle and the foot: Anterior compartment: This compartment lies in front ...	`[1, 0, 0, 0, 0, ...]`
`What does the thailand flag mean`	["Thai Flag Meaning: The red stripes mean Thailand's nation. The white stands for the country's main religion. Blue is Thailand's national color and it represents the Thai monarchy/Royal Family. Hoped This helped Brought to you by: Firescream66 and the post starter. B … lue is Thailand's national color and it represents the Thai monarchy. The blue is also used to honor Thailand's World War I allies, Great Britain, France, United States and Russia, who all had red, white and blue flags.", "The red represents the blood spilled to maintain Thailand's independence. The white stands for purity and is the color of Buddhism. And the Blue represents the Thai monarchy. The pattern repeats so that the flag can be flown without ever appearing upside down. The old Siam (former name of Thailand) flag was a solid red with a white elephant in the middle.", 'The flag of the Kingdom of Thailand (Thai: ธงไตรรงค์, Thong Trairong, meaning tricolour flag”) shows five horizontal stripes in the colours red,...	`[1, 0, 0, 0, 0, ...]`

Loss: ListNetLoss with these parameters:

{
    "eps": 1e-10,
    "pad_value": -1
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 6
per_device_eval_batch_size: 16
torch_empty_cache_steps: 2000
learning_rate: 4e-06
warmup_ratio: 0.1
seed: 12
bf16: True
load_best_model_at_end: True

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 6
per_device_eval_batch_size: 16
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: 2000
learning_rate: 4e-06
weight_decay: 0.0
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 3
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 12
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: True
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: False
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: False
resume_from_checkpoint: None
hub_model_id: None
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
dispatch_batches: None
split_batches: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
eval_use_gather_object: False
average_tokens_across_devices: False
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional

Training Logs

Epoch	Step	Training Loss	Validation Loss	NanoMSMARCO_ndcg@10	NanoNFCorpus_ndcg@10	NanoNQ_ndcg@10	NanoBEIR_mean_ndcg@10
-1	-1	-	-	0.0264 (-0.5140)	0.2585 (-0.0665)	0.0606 (-0.4401)	0.1152 (-0.3402)
0.0001	1	1.8458	-	-	-	-	-
0.0430	500	2.1043	-	-	-	-	-
0.0861	1000	2.0906	-	-	-	-	-
0.1291	1500	2.0873	-	-	-	-	-
0.1721	2000	2.0848	2.0847	0.0655 (-0.4749)	0.2309 (-0.0941)	0.1177 (-0.3830)	0.1380 (-0.3174)
0.2152	2500	2.0864	-	-	-	-	-
0.2582	3000	2.0884	-	-	-	-	-
0.3013	3500	2.0783	-	-	-	-	-
0.3443	4000	2.0792	2.0791	0.3223 (-0.2181)	0.3229 (-0.0021)	0.2919 (-0.2088)	0.3124 (-0.1430)
0.3873	4500	2.0817	-	-	-	-	-
0.4304	5000	2.0828	-	-	-	-	-
0.4734	5500	2.0785	-	-	-	-	-
0.5164	6000	2.0751	2.0740	0.4743 (-0.0661)	0.3450 (+0.0200)	0.5233 (+0.0226)	0.4475 (-0.0078)
0.5595	6500	2.0719	-	-	-	-	-
0.6025	7000	2.0726	-	-	-	-	-
0.6456	7500	2.0734	-	-	-	-	-
0.6886	8000	2.0769	2.0722	0.5006 (-0.0398)	0.3449 (+0.0198)	0.4920 (-0.0087)	0.4458 (-0.0095)
0.7316	8500	2.0722	-	-	-	-	-
0.7747	9000	2.0669	-	-	-	-	-
0.8177	9500	2.0787	-	-	-	-	-
0.8607	10000	2.0661	2.0710	0.5646 (+0.0242)	0.3363 (+0.0113)	0.5672 (+0.0666)	0.4894 (+0.0340)
0.9038	10500	2.0754	-	-	-	-	-
0.9468	11000	2.0717	-	-	-	-	-
0.9898	11500	2.0779	-	-	-	-	-
1.0329	12000	2.0703	2.0706	0.5609 (+0.0205)	0.3107 (-0.0144)	0.5817 (+0.0811)	0.4844 (+0.0291)
1.0759	12500	2.0692	-	-	-	-	-
1.1190	13000	2.0665	-	-	-	-	-
1.1620	13500	2.0801	-	-	-	-	-
1.2050	14000	2.0723	2.0702	0.5413 (+0.0009)	0.3249 (-0.0001)	0.5961 (+0.0954)	0.4874 (+0.0321)
1.2481	14500	2.0707	-	-	-	-	-
1.2911	15000	2.0715	-	-	-	-	-
1.3341	15500	2.0664	-	-	-	-	-
1.3772	16000	2.0736	2.0700	0.5234 (-0.0171)	0.3314 (+0.0064)	0.6068 (+0.1061)	0.4872 (+0.0318)
1.4202	16500	2.0733	-	-	-	-	-
1.4632	17000	2.0728	-	-	-	-	-
1.5063	17500	2.068	-	-	-	-	-
1.5493	18000	2.0669	2.0699	0.5335 (-0.0069)	0.3530 (+0.0280)	0.6278 (+0.1272)	0.5048 (+0.0494)
1.5924	18500	2.0713	-	-	-	-	-
1.6354	19000	2.0689	-	-	-	-	-
1.6784	19500	2.07	-	-	-	-	-
1.7215	20000	2.0731	2.0696	0.5365 (-0.0039)	0.3497 (+0.0247)	0.5845 (+0.0838)	0.4902 (+0.0349)
1.7645	20500	2.0678	-	-	-	-	-
1.8075	21000	2.0646	-	-	-	-	-
1.8506	21500	2.0631	-	-	-	-	-
1.8936	22000	2.0714	2.0694	0.5340 (-0.0064)	0.3490 (+0.0239)	0.5653 (+0.0646)	0.4828 (+0.0274)
1.9367	22500	2.059	-	-	-	-	-
1.9797	23000	2.068	-	-	-	-	-
2.0227	23500	2.0664	-	-	-	-	-
2.0658	24000	2.0719	2.0699	0.5442 (+0.0038)	0.3531 (+0.0281)	0.5879 (+0.0873)	0.4951 (+0.0397)
2.1088	24500	2.0621	-	-	-	-	-
2.1518	25000	2.0669	-	-	-	-	-
2.1949	25500	2.067	-	-	-	-	-
2.2379	26000	2.0676	2.0700	0.5449 (+0.0044)	0.3334 (+0.0084)	0.5656 (+0.0649)	0.4813 (+0.0259)
2.2809	26500	2.0621	-	-	-	-	-
2.3240	27000	2.0634	-	-	-	-	-
2.3670	27500	2.065	-	-	-	-	-
2.4101	28000	2.0669	2.0704	0.5128 (-0.0276)	0.3495 (+0.0244)	0.5751 (+0.0744)	0.4791 (+0.0237)
2.4531	28500	2.0636	-	-	-	-	-
2.4961	29000	2.0623	-	-	-	-	-
2.5392	29500	2.0669	-	-	-	-	-
2.5822	30000	2.0615	2.0698	0.5448 (+0.0044)	0.3406 (+0.0156)	0.5768 (+0.0762)	0.4874 (+0.0321)
2.6252	30500	2.0708	-	-	-	-	-
2.6683	31000	2.0655	-	-	-	-	-
2.7113	31500	2.0511	-	-	-	-	-
2.7543	32000	2.0623	2.0699	0.5377 (-0.0027)	0.3505 (+0.0255)	0.5854 (+0.0847)	0.4912 (+0.0358)
2.7974	32500	2.0651	-	-	-	-	-
2.8404	33000	2.0675	-	-	-	-	-
2.8835	33500	2.0689	-	-	-	-	-
2.9265	34000	2.067	2.0699	0.5221 (-0.0184)	0.3605 (+0.0354)	0.5695 (+0.0688)	0.4840 (+0.0286)
2.9695	34500	2.0634	-	-	-	-	-
-1	-1	-	-	0.5335 (-0.0069)	0.3530 (+0.0280)	0.6278 (+0.1272)	0.5048 (+0.0494)

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.10.13
Sentence Transformers: 3.5.0.dev0
Transformers: 4.48.1
PyTorch: 2.5.1+cu124
Accelerate: 1.3.0
Datasets: 3.2.0
Tokenizers: 0.21.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to rank: from pairwise approach to listwise approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}

milistu
/

reranker-msmarco-v1.1-ModernBERT-base-listnet

CrossEncoder based on answerdotai/ModernBERT-base

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Cross Encoder Reranking

Cross Encoder Nano BEIR

Training Details

Training Dataset

ms_marco

Evaluation Dataset

ms_marco

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

ListNetLoss

Model tree for milistu/reranker-msmarco-v1.1-ModernBERT-base-listnet

Dataset used to train milistu/reranker-msmarco-v1.1-ModernBERT-base-listnet

Evaluation results