CrossEncoder based on sentence-transformers/all-mpnet-base-v2

This is a Cross Encoder model finetuned from sentence-transformers/all-mpnet-base-v2 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("Pranjal2002/all-mpnet-base-v3")
# Get scores for pairs of texts
pairs = [
    ['What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations?', '10-K'],
    ['What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations?', 'Earnings'],
    ['What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations?', 'DEF14A'],
    ['What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations?', '8-K'],
    ['What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations?', '10-Q'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations?',
    [
        '10-K',
        'Earnings',
        'DEF14A',
        '8-K',
        '10-Q',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,190 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 55 characters
    • mean: 103.12 characters
    • max: 180 characters
    • size: 5 elements
    • size: 5 elements
  • Samples:
    query docs labels
    What year over year growth rate was shown for paid memberships in the same table ['10-Q', '10-K', '8-K', 'Earnings', 'DEF14A'] [4, 3, 2, 1, 0]
    How did non‑GAAP EPS growth align with the incentive metrics set for management? ['DEF14A', '8-K', '10-K', '10-Q', 'Earnings'] [2, 1, 0, 0, 0]
    What questions were raised regarding Xcel Energy Inc.’s risk factors and mitigation plans related to the integration of renewable energy sources into their grid? ['10-K', 'Earnings', '8-K', '10-Q', 'DEF14A'] [4, 3, 2, 1, 0]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": null
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 798 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 798 samples:
    query docs labels
    type string list list
    details
    • min: 53 characters
    • mean: 102.91 characters
    • max: 179 characters
    • size: 5 elements
    • size: 5 elements
  • Samples:
    query docs labels
    What consolidation trends among competitors are highlighted in disclosures affecting Regions Financial Corporation’s regional banking operations? ['10-K', 'Earnings', 'DEF14A', '8-K', '10-Q'] [4, 3, 2, 1, 0]
    How does Pentair manage equity award burn rate or share pool availability? ['10-K', 'DEF14A', '10-Q', 'Earnings', '8-K'] [4, 3, 2, 1, 0]
    What key takeaways emerged from Valero Energy Corporation’s most recent earnings announcement? ['10-Q', '10-K', 'Earnings', '8-K', 'DEF14A'] [4, 3, 2, 1, 0]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 16
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 4
  • warmup_steps: 100
  • bf16: True
  • load_best_model_at_end: True
  • optim: adamw_torch

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 16
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 4
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0.1253 50 1.5991 -
0.2506 100 1.5027 -
0.3759 150 1.4395 -
0.5013 200 1.3949 1.3894
0.6266 250 1.364 -
0.7519 300 1.3699 -
0.8772 350 1.3892 -
1.0025 400 1.3444 1.4115
1.1278 450 1.364 -
1.2531 500 1.3373 -
1.3784 550 1.3416 -
1.5038 600 1.3046 1.3500
1.6291 650 1.3394 -
1.7544 700 1.3523 -
1.8797 750 1.332 -
2.005 800 1.338 1.3421
2.1303 850 1.3231 -
2.2556 900 1.3357 -
2.3810 950 1.2984 -
2.5063 1000 1.3052 1.3538
2.6316 1050 1.3177 -
2.7569 1100 1.3195 -
2.8822 1150 1.3114 -
3.0075 1200 1.3212 1.3506
3.1328 1250 1.2981 -
3.2581 1300 1.3051 -
3.3835 1350 1.2787 -
3.5088 1400 1.325 1.3410
3.6341 1450 1.2989 -
3.7594 1500 1.3105 -
3.8847 1550 1.283 -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
9
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pranjal2002/all-mpnet-base-v3

Finetuned
(301)
this model