CrossEncoder based on answerdotai/ModernBERT-base
This is a Cross Encoder model finetuned from answerdotai/ModernBERT-base on the ms_marco dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Cross Encoder
- Base model: answerdotai/ModernBERT-base
- Maximum Sequence Length: 8192 tokens
- Number of Output Labels: 1 label
- Training Dataset:
- Language: en
Model Sources
- Documentation: Sentence Transformers Documentation
- Documentation: Cross Encoder Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Cross Encoders on Hugging Face
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("Studeni/reranker-msmarco-v1.1-ModernBERT-base-listnet")
# Get scores for pairs of texts
pairs = [
['How many calories in an egg', 'There are on average between 55 and 80 calories in an egg depending on its size.'],
['How many calories in an egg', 'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.'],
['How many calories in an egg', 'Most of the calories in an egg come from the yellow yolk in the center.'],
]
scores = model.predict(pairs)
print(scores.shape)
# (3,)
# Or rank different texts based on similarity to a single text
ranks = model.rank(
'How many calories in an egg',
[
'There are on average between 55 and 80 calories in an egg depending on its size.',
'Egg whites are very low in calories, have no fat, no cholesterol, and are loaded with protein.',
'Most of the calories in an egg come from the yellow yolk in the center.',
]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]
Evaluation
Metrics
Cross Encoder Reranking
- Datasets:
NanoMSMARCO
,NanoNFCorpus
andNanoNQ
- Evaluated with
CERerankingEvaluator
Metric | NanoMSMARCO | NanoNFCorpus | NanoNQ |
---|---|---|---|
map | 0.4674 (-0.0222) | 0.3153 (+0.0449) | 0.5727 (+0.1520) |
mrr@10 | 0.4580 (-0.0195) | 0.4976 (-0.0023) | 0.5714 (+0.1447) |
ndcg@10 | 0.5335 (-0.0069) | 0.3530 (+0.0280) | 0.6278 (+0.1272) |
Cross Encoder Nano BEIR
- Dataset:
NanoBEIR_mean
- Evaluated with
CENanoBEIREvaluator
Metric | Value |
---|---|
map | 0.4518 (+0.0582) |
mrr@10 | 0.5090 (+0.0410) |
ndcg@10 | 0.5048 (+0.0494) |
Training Details
Training Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 82,326 training samples
- Columns:
query
,docs
, andlabels
- Approximate statistics based on the first 1000 samples:
query docs labels type string list list details - min: 11 characters
- mean: 34.16 characters
- max: 96 characters
- size: 10 elements
- size: 10 elements
- Samples:
query docs labels what does a bursa do
['Bursae (plural for bursa) are flattened fluid-filled sacs that function as cushions between your bones and the muscles (deep bursae) or bones and tendons (superficial bursae). Your bursae play an important role in leading a healthy, active life. When the bursae are not irritated and working properly, your joints move smoothly and painlessly. However, when a bursa becomes swollen and inflamed, the condition is known as bursitis.', 'A bursa is a small, fluid-filled sac that acts as a cushion between a bone and other moving parts: muscles, tendons, or skin. Bursae are found throughout the body. Bursitis occurs when a bursa becomes inflamed (redness and increased fluid in the bursa). A tendon is a flexible band of fibrous tissue that connects muscles to bones. Tendinitis is inflammation of a tendon. Tendons transmit the pull of the muscle to the bone to cause movement.', 'A bursa (plural bursae or bursas) is a small fluid-filled sac lined by synovial membrane with an inner capillary laye...
[1, 1, 0, 0, 0, ...]
what is gluten in
['Gluten is a general name for the proteins found in wheat (durum, emmer, spelt, farina, farro, KAMUT® khorasan wheat and einkorn), rye, barley and triticale. Gluten helps foods maintain their shape, acting as a glue that holds food together.', 'Definition. A gluten-free diet is a diet that excludes the protein gluten. Gluten is found in grains such as wheat, barley, rye, and a cross between wheat and rye called triticale. A gluten-free diet is primarily used to treat celiac disease. Gluten causes inflammation in the small intestines of people with', 'A gluten-free diet is a diet that excludes the protein gluten. Gluten is found in grains such as wheat, barley, rye, and a cross between wheat and rye called triticale. A gluten-free diet is primarily used to treat celiac disease. Gluten causes inflammation in the small intestines of people with', 'Gluten is found in wheat, rye, barley and any foods made with these grains. Avoiding wheat can be especially hard because this means you shoul...
[1, 0, 0, 0, 0, ...]
what is a payaway
['Playaway is the name of a solid-state prerecorded audio player introduced in 2005 by Findaway World, LLC, based in Solon, Ohio. About the size of a deck of playing cards and weighing 2 ounces, it can store up to 80 hours of audio. As of March 2010, the audiobooks are all produced in high definition audio. The digital content (audiobook or music compilation) is preloaded at the factory and cannot be changed or copied by the end user. A 3.5 mm stereo jack provides output to earphones or an external amplifier. Playaway was specifically designed to use most commonly available cassette adaptors and FM transmitters. Power is provided by a changeable 1.5V AAA cell, which the manufacturer claims allows it to operate approximately 20 hours before battery depletion, 30 hours for the newer versions.', "Playaway Audio Books. Playaway® is the simplest way to listen to audio on the go. Each Playaway is a self-contained audiobook, weighs only two ounces, and comes with a battery. Just plug in your ...
[1, 0, 0, 0, 0, ...]
- Loss:
ListNetLoss
with these parameters:{ "eps": 1e-10, "pad_value": -1 }
Evaluation Dataset
ms_marco
- Dataset: ms_marco at a47ee7a
- Size: 82,326 evaluation samples
- Columns:
query
,docs
, andlabels
- Approximate statistics based on the first 1000 samples:
query docs labels type string list list details - min: 10 characters
- mean: 34.53 characters
- max: 93 characters
- size: 10 elements
- size: 10 elements
- Samples:
query docs labels how long does ehic card take to arrive
['The quickest way is to apply online. Your EHIC will normally arrive within seven days and will usually be valid for five years. This means at a reduced cost or sometimes free of charge. Even with an EHIC, you may have to pay towards your treatment, depending on the rules of the country you’re visiting. You may be able to claim the money back – always try to apply for a refund before you return home. Find out how to do this in the country-by-country guide for the EHIC', "What is the EHIC? The European Health Insurance Card or EHIC was introduced in 2004 across the European Union. It allows Irish residents to access health services in any EU country and in Switzerland, Iceland, Liechtenstein and Norway, if they become ill or injured while on a temporary stay in that country. No. Your card will be valid for 4 to 5 years. Check that you and your family's cards are valid before you travel, and if they have expired, it's easy to renew them online at www.ehic.ie or at your Local Health Offi...
[1, 0, 0, 0, 0, ...]
what are the muscles that dorsiflex the foot
['Dorsiflexion of the foot uses four muscles. These are the tibialis anterior, extensor digitorum longus, extensor hallucis longus, and the peroneus tertius. ', 'There are four muscles in the anterior compartment of the leg; tibialis anterior, extensor digitorun longus, extensor hallucis longus and fibularis tertius. Collectively, they act to dorsiflex and invert the foot at the ankle joint. The extensor digitorum longus and extensor hallucis longus also extend the toes. The muscles in this compartment are innervated by the deep fibular nerve (L4-L5), and blood is supplied via the anterior tibial artery.', 'Many muscles do the work of moving the ankle and foot. Some of the muscles that move the foot start higher up in the leg, and smaller muscles work right in the foot itself. The leg is divided into compartments: the anterior, lateral, and posterior compartments. The muscles in these compartments help move the ankle and the foot: Anterior compartment: This compartment lies in front ...
[1, 0, 0, 0, 0, ...]
What does the thailand flag mean
["Thai Flag Meaning: The red stripes mean Thailand's nation. The white stands for the country's main religion. Blue is Thailand's national color and it represents the Thai monarchy/Royal Family. Hoped This helped Brought to you by: Firescream66 and the post starter. B … lue is Thailand's national color and it represents the Thai monarchy. The blue is also used to honor Thailand's World War I allies, Great Britain, France, United States and Russia, who all had red, white and blue flags.", "The red represents the blood spilled to maintain Thailand's independence. The white stands for purity and is the color of Buddhism. And the Blue represents the Thai monarchy. The pattern repeats so that the flag can be flown without ever appearing upside down. The old Siam (former name of Thailand) flag was a solid red with a white elephant in the middle.", 'The flag of the Kingdom of Thailand (Thai: ธงไตรรงค์, Thong Trairong, meaning tricolour flag”) shows five horizontal stripes in the colours red,...
[1, 0, 0, 0, 0, ...]
- Loss:
ListNetLoss
with these parameters:{ "eps": 1e-10, "pad_value": -1 }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 6per_device_eval_batch_size
: 16torch_empty_cache_steps
: 2000learning_rate
: 4e-06warmup_ratio
: 0.1seed
: 12bf16
: Trueload_best_model_at_end
: True
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 6per_device_eval_batch_size
: 16per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: 2000learning_rate
: 4e-06weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 12data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Truefp16
: Falsefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Trueignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Nonehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseinclude_for_metrics
: []eval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseuse_liger_kernel
: Falseeval_use_gather_object
: Falseaverage_tokens_across_devices
: Falseprompts
: Nonebatch_sampler
: batch_samplermulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss | NanoMSMARCO_ndcg@10 | NanoNFCorpus_ndcg@10 | NanoNQ_ndcg@10 | NanoBEIR_mean_ndcg@10 |
---|---|---|---|---|---|---|---|
-1 | -1 | - | - | 0.0264 (-0.5140) | 0.2585 (-0.0665) | 0.0606 (-0.4401) | 0.1152 (-0.3402) |
0.0001 | 1 | 1.8458 | - | - | - | - | - |
0.0430 | 500 | 2.1043 | - | - | - | - | - |
0.0861 | 1000 | 2.0906 | - | - | - | - | - |
0.1291 | 1500 | 2.0873 | - | - | - | - | - |
0.1721 | 2000 | 2.0848 | 2.0847 | 0.0655 (-0.4749) | 0.2309 (-0.0941) | 0.1177 (-0.3830) | 0.1380 (-0.3174) |
0.2152 | 2500 | 2.0864 | - | - | - | - | - |
0.2582 | 3000 | 2.0884 | - | - | - | - | - |
0.3013 | 3500 | 2.0783 | - | - | - | - | - |
0.3443 | 4000 | 2.0792 | 2.0791 | 0.3223 (-0.2181) | 0.3229 (-0.0021) | 0.2919 (-0.2088) | 0.3124 (-0.1430) |
0.3873 | 4500 | 2.0817 | - | - | - | - | - |
0.4304 | 5000 | 2.0828 | - | - | - | - | - |
0.4734 | 5500 | 2.0785 | - | - | - | - | - |
0.5164 | 6000 | 2.0751 | 2.0740 | 0.4743 (-0.0661) | 0.3450 (+0.0200) | 0.5233 (+0.0226) | 0.4475 (-0.0078) |
0.5595 | 6500 | 2.0719 | - | - | - | - | - |
0.6025 | 7000 | 2.0726 | - | - | - | - | - |
0.6456 | 7500 | 2.0734 | - | - | - | - | - |
0.6886 | 8000 | 2.0769 | 2.0722 | 0.5006 (-0.0398) | 0.3449 (+0.0198) | 0.4920 (-0.0087) | 0.4458 (-0.0095) |
0.7316 | 8500 | 2.0722 | - | - | - | - | - |
0.7747 | 9000 | 2.0669 | - | - | - | - | - |
0.8177 | 9500 | 2.0787 | - | - | - | - | - |
0.8607 | 10000 | 2.0661 | 2.0710 | 0.5646 (+0.0242) | 0.3363 (+0.0113) | 0.5672 (+0.0666) | 0.4894 (+0.0340) |
0.9038 | 10500 | 2.0754 | - | - | - | - | - |
0.9468 | 11000 | 2.0717 | - | - | - | - | - |
0.9898 | 11500 | 2.0779 | - | - | - | - | - |
1.0329 | 12000 | 2.0703 | 2.0706 | 0.5609 (+0.0205) | 0.3107 (-0.0144) | 0.5817 (+0.0811) | 0.4844 (+0.0291) |
1.0759 | 12500 | 2.0692 | - | - | - | - | - |
1.1190 | 13000 | 2.0665 | - | - | - | - | - |
1.1620 | 13500 | 2.0801 | - | - | - | - | - |
1.2050 | 14000 | 2.0723 | 2.0702 | 0.5413 (+0.0009) | 0.3249 (-0.0001) | 0.5961 (+0.0954) | 0.4874 (+0.0321) |
1.2481 | 14500 | 2.0707 | - | - | - | - | - |
1.2911 | 15000 | 2.0715 | - | - | - | - | - |
1.3341 | 15500 | 2.0664 | - | - | - | - | - |
1.3772 | 16000 | 2.0736 | 2.0700 | 0.5234 (-0.0171) | 0.3314 (+0.0064) | 0.6068 (+0.1061) | 0.4872 (+0.0318) |
1.4202 | 16500 | 2.0733 | - | - | - | - | - |
1.4632 | 17000 | 2.0728 | - | - | - | - | - |
1.5063 | 17500 | 2.068 | - | - | - | - | - |
1.5493 | 18000 | 2.0669 | 2.0699 | 0.5335 (-0.0069) | 0.3530 (+0.0280) | 0.6278 (+0.1272) | 0.5048 (+0.0494) |
1.5924 | 18500 | 2.0713 | - | - | - | - | - |
1.6354 | 19000 | 2.0689 | - | - | - | - | - |
1.6784 | 19500 | 2.07 | - | - | - | - | - |
1.7215 | 20000 | 2.0731 | 2.0696 | 0.5365 (-0.0039) | 0.3497 (+0.0247) | 0.5845 (+0.0838) | 0.4902 (+0.0349) |
1.7645 | 20500 | 2.0678 | - | - | - | - | - |
1.8075 | 21000 | 2.0646 | - | - | - | - | - |
1.8506 | 21500 | 2.0631 | - | - | - | - | - |
1.8936 | 22000 | 2.0714 | 2.0694 | 0.5340 (-0.0064) | 0.3490 (+0.0239) | 0.5653 (+0.0646) | 0.4828 (+0.0274) |
1.9367 | 22500 | 2.059 | - | - | - | - | - |
1.9797 | 23000 | 2.068 | - | - | - | - | - |
2.0227 | 23500 | 2.0664 | - | - | - | - | - |
2.0658 | 24000 | 2.0719 | 2.0699 | 0.5442 (+0.0038) | 0.3531 (+0.0281) | 0.5879 (+0.0873) | 0.4951 (+0.0397) |
2.1088 | 24500 | 2.0621 | - | - | - | - | - |
2.1518 | 25000 | 2.0669 | - | - | - | - | - |
2.1949 | 25500 | 2.067 | - | - | - | - | - |
2.2379 | 26000 | 2.0676 | 2.0700 | 0.5449 (+0.0044) | 0.3334 (+0.0084) | 0.5656 (+0.0649) | 0.4813 (+0.0259) |
2.2809 | 26500 | 2.0621 | - | - | - | - | - |
2.3240 | 27000 | 2.0634 | - | - | - | - | - |
2.3670 | 27500 | 2.065 | - | - | - | - | - |
2.4101 | 28000 | 2.0669 | 2.0704 | 0.5128 (-0.0276) | 0.3495 (+0.0244) | 0.5751 (+0.0744) | 0.4791 (+0.0237) |
2.4531 | 28500 | 2.0636 | - | - | - | - | - |
2.4961 | 29000 | 2.0623 | - | - | - | - | - |
2.5392 | 29500 | 2.0669 | - | - | - | - | - |
2.5822 | 30000 | 2.0615 | 2.0698 | 0.5448 (+0.0044) | 0.3406 (+0.0156) | 0.5768 (+0.0762) | 0.4874 (+0.0321) |
2.6252 | 30500 | 2.0708 | - | - | - | - | - |
2.6683 | 31000 | 2.0655 | - | - | - | - | - |
2.7113 | 31500 | 2.0511 | - | - | - | - | - |
2.7543 | 32000 | 2.0623 | 2.0699 | 0.5377 (-0.0027) | 0.3505 (+0.0255) | 0.5854 (+0.0847) | 0.4912 (+0.0358) |
2.7974 | 32500 | 2.0651 | - | - | - | - | - |
2.8404 | 33000 | 2.0675 | - | - | - | - | - |
2.8835 | 33500 | 2.0689 | - | - | - | - | - |
2.9265 | 34000 | 2.067 | 2.0699 | 0.5221 (-0.0184) | 0.3605 (+0.0354) | 0.5695 (+0.0688) | 0.4840 (+0.0286) |
2.9695 | 34500 | 2.0634 | - | - | - | - | - |
-1 | -1 | - | - | 0.5335 (-0.0069) | 0.3530 (+0.0280) | 0.6278 (+0.1272) | 0.5048 (+0.0494) |
- The bold row denotes the saved checkpoint.
Framework Versions
- Python: 3.10.13
- Sentence Transformers: 3.5.0.dev0
- Transformers: 4.48.1
- PyTorch: 2.5.1+cu124
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.21.0
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
ListNetLoss
@inproceedings{cao2007learning,
title={Learning to rank: from pairwise approach to listwise approach},
author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
booktitle={Proceedings of the 24th international conference on Machine learning},
pages={129--136},
year={2007}
}
- Downloads last month
- 8
Inference Providers
NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API:
The HF Inference API does not support text-classification models for sentence-transformers library.
Model tree for Studeni/reranker-msmarco-v1.1-ModernBERT-base-listnet
Base model
answerdotai/ModernBERT-base