CrossEncoder based on colbert-ir/colbertv2.0

This is a Cross Encoder model finetuned from colbert-ir/colbertv2.0 using the sentence-transformers library. It computes scores for pairs of texts, which can be used for text reranking and semantic search.

Model Details

Model Description

  • Model Type: Cross Encoder
  • Base model: colbert-ir/colbertv2.0
  • Maximum Sequence Length: 512 tokens
  • Number of Output Labels: 1 label

Model Sources

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("Pranjal2002/finetuned_colbert_finance_v2")
# Get scores for pairs of texts
pairs = [
    ['What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets?', 'Earnings'],
    ['What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets?', '8-K'],
    ['What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets?', 'DEF14A'],
    ['What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets?', '10-K'],
    ['What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets?', '10-Q'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    'What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets?',
    [
        'Earnings',
        '8-K',
        'DEF14A',
        '10-K',
        '10-Q',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Training Details

Training Dataset

Unnamed Dataset

  • Size: 3,988 training samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 1000 samples:
    query docs labels
    type string list list
    details
    • min: 53 characters
    • mean: 101.87 characters
    • max: 197 characters
    • size: 5 elements
    • size: 5 elements
  • Samples:
    query docs labels
    How has Keurig Dr Pepper’s beverage segment profitability trended over recent periods? ['10-Q', '10-K', 'Earnings', '8-K', 'DEF14A'] [4, 3, 2, 1, 0]
    How does management describe competitive advantages in generative AI developer tooling ['Earnings', '10-K', 'DEF14A', '8-K', '10-Q'] [4, 3, 2, 1, 0]
    What did Mohawk Industries’ leadership say about Mohawk Industries’ share repurchase plans? ['10-K', '10-Q', 'Earnings', 'DEF14A', '8-K'] [2, 2, 1, 0, 0]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": null
    }
    

Evaluation Dataset

Unnamed Dataset

  • Size: 998 evaluation samples
  • Columns: query, docs, and labels
  • Approximate statistics based on the first 998 samples:
    query docs labels
    type string list list
    details
    • min: 43 characters
    • mean: 102.97 characters
    • max: 203 characters
    • size: 5 elements
    • size: 5 elements
  • Samples:
    query docs labels
    What guidance was offered on The Estée Lauder Companies Inc.’s inventory management or supply chain efficiency targets? ['Earnings', '8-K', 'DEF14A', '10-K', '10-Q'] [4, 3, 2, 1, 0]
    What questions were asked about Live Nation Entertainment’s concert attendance and ticket sales engagement metrics? ['Earnings', '10-K', '8-K', '10-Q', 'DEF14A'] [4, 3, 2, 1, 0]
    How has the ratio of AvalonBay Communities’ recurring to one-time rental income evolved in the latest reporting period? ['10-Q', '10-K', 'Earnings', '8-K', 'DEF14A'] [4, 3, 2, 1, 0]
  • Loss: ListNetLoss with these parameters:
    {
        "activation_fn": "torch.nn.modules.linear.Identity",
        "mini_batch_size": null
    }
    

Training Hyperparameters

Non-Default Hyperparameters

  • eval_strategy: steps
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • gradient_accumulation_steps: 2
  • learning_rate: 2e-05
  • num_train_epochs: 5
  • warmup_steps: 100
  • fp16: True
  • load_best_model_at_end: True

All Hyperparameters

Click to expand
  • overwrite_output_dir: False
  • do_predict: False
  • eval_strategy: steps
  • prediction_loss_only: True
  • per_device_train_batch_size: 4
  • per_device_eval_batch_size: 4
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 2
  • eval_accumulation_steps: None
  • torch_empty_cache_steps: None
  • learning_rate: 2e-05
  • weight_decay: 0.0
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 5
  • max_steps: -1
  • lr_scheduler_type: linear
  • lr_scheduler_kwargs: {}
  • warmup_ratio: 0.0
  • warmup_steps: 100
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_nan_inf_filter: True
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 42
  • data_seed: None
  • jit_mode_eval: False
  • use_ipex: False
  • bf16: False
  • fp16: True
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: adamw_torch_fused
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: False
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: every_save
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: False
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: False
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: False
  • prompts: None
  • batch_sampler: batch_sampler
  • multi_dataset_batch_sampler: proportional
  • router_mapping: {}
  • learning_rate_mapping: {}

Training Logs

Epoch Step Training Loss Validation Loss
0.1003 50 1.5717 -
0.2006 100 1.4575 -
0.3009 150 1.4404 -
0.4012 200 1.408 1.3705
0.5015 250 1.3936 -
0.6018 300 1.3719 -
0.7021 350 1.3777 -
0.8024 400 1.3689 1.3444
0.9027 450 1.3612 -
1.0020 500 1.3263 -
1.1023 550 1.3493 -
1.2026 600 1.3602 1.3374
1.3029 650 1.3181 -
1.4032 700 1.3217 -
1.5035 750 1.3431 -
1.6038 800 1.3234 1.3374
1.7041 850 1.3317 -
1.8044 900 1.34 -
1.9047 950 1.3467 -
2.0040 1000 1.3236 1.3325
2.1043 1050 1.2743 -
2.2046 1100 1.3177 -
2.3049 1150 1.3004 -
2.4052 1200 1.3114 1.3274
2.5055 1250 1.3138 -
2.6058 1300 1.3263 -
2.7061 1350 1.3175 -
2.8064 1400 1.3033 1.3462
2.9067 1450 1.3112 -
3.0060 1500 1.3025 -
3.1063 1550 1.2818 -
3.2066 1600 1.2768 1.3426
3.3069 1650 1.275 -
3.4072 1700 1.3024 -
3.5075 1750 1.2765 -
3.6078 1800 1.2932 1.3467
3.7081 1850 1.2774 -
3.8084 1900 1.2759 -
3.9087 1950 1.2991 -
4.0080 2000 1.2763 1.3368
4.1083 2050 1.253 -
4.2086 2100 1.243 -
4.3089 2150 1.2719 -
4.4092 2200 1.256 1.3448
4.5095 2250 1.2718 -
4.6098 2300 1.2536 -
4.7101 2350 1.2696 -
4.8104 2400 1.2626 1.3456
4.9107 2450 1.2736 -
  • The bold row denotes the saved checkpoint.

Framework Versions

  • Python: 3.12.11
  • Sentence Transformers: 5.1.0
  • Transformers: 4.56.1
  • PyTorch: 2.8.0+cu126
  • Accelerate: 1.10.1
  • Datasets: 4.0.0
  • Tokenizers: 0.22.0

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

ListNetLoss

@inproceedings{cao2007learning,
    title={Learning to Rank: From Pairwise Approach to Listwise Approach},
    author={Cao, Zhe and Qin, Tao and Liu, Tie-Yan and Tsai, Ming-Feng and Li, Hang},
    booktitle={Proceedings of the 24th international conference on Machine learning},
    pages={129--136},
    year={2007}
}
Downloads last month
6
Safetensors
Model size
109M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Pranjal2002/finetuned_colbert_finance_v2

Finetuned
(12)
this model