Redis fine-tuned CrossEncoder model for semantic caching on LangCache

This is a Cross Encoder model finetuned from Alibaba-NLP/gte-reranker-modernbert-base on the LangCache Sentence Pairs (all) dataset using the sentence-transformers library. It computes scores for pairs of texts, which can be used for sentence pair classification.

Model Details

Model Description

Model Type: Cross Encoder
Base model: Alibaba-NLP/gte-reranker-modernbert-base
Maximum Sequence Length: 8192 tokens
Number of Output Labels: 1 label
Training Dataset:
- LangCache Sentence Pairs (all)
Language: en
License: apache-2.0

Model Sources

Documentation: Sentence Transformers Documentation
Documentation: Cross Encoder Documentation
Repository: Sentence Transformers on GitHub
Hugging Face: Cross Encoders on Hugging Face

Usage

Direct Usage (Sentence Transformers)

First install the Sentence Transformers library:

pip install -U sentence-transformers

Then you can load this model and run inference.

from sentence_transformers import CrossEncoder

# Download from the 🤗 Hub
model = CrossEncoder("aditeyabaral-redis/langcache-reranker-v1-wdwr")
# Get scores for pairs of texts
pairs = [
    ["He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .", '" The foodservice pie business does not fit our long-term growth strategy .'],
    ['Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .', 'His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .'],
    ['The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .', 'The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .'],
    ['The AFL-CIO is waiting until October to decide if it will endorse a candidate .', 'The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .'],
    ['No dates have been set for the civil or the criminal trial .', 'No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .'],
]
scores = model.predict(pairs)
print(scores.shape)
# (5,)

# Or rank different texts based on similarity to a single text
ranks = model.rank(
    "He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .",
    [
        '" The foodservice pie business does not fit our long-term growth strategy .',
        'His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .',
        'The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .',
        'The AFL-CIO announced Wednesday that it will decide in October whether to endorse a candidate before the primaries .',
        'No dates have been set for the criminal or civil cases , but Shanley has pleaded not guilty .',
    ]
)
# [{'corpus_id': ..., 'score': ...}, {'corpus_id': ..., 'score': ...}, ...]

Evaluation

Metrics

Cross Encoder Classification

Datasets: val and test
Evaluated with CrossEncoderClassificationEvaluator

Metric	val	test
accuracy	0.7731	0.723
accuracy_threshold	0.7637	0.9352
f1	0.6951	0.7144
f1_threshold	0.0464	0.9143
precision	0.6455	0.6303
recall	0.7529	0.8245
average_precision	0.7833	0.6907

Training Details

Training Dataset

LangCache Sentence Pairs (all)

Dataset: LangCache Sentence Pairs (all)
Size: 8,405 training samples
Columns: sentence1, sentence2, and label

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label
type	string	string	int
details	min: 28 characters mean: 116.35 characters max: 227 characters	min: 15 characters mean: 113.13 characters max: 243 characters	0: ~45.80% 1: ~54.20%

Samples:

sentence1	sentence2	label
`He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .`	`" The foodservice pie business does not fit our long-term growth strategy .`	`1`
`Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .`	`His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .`	`0`
`The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .`	`The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .`	`0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Evaluation Dataset

LangCache Sentence Pairs (all)

Dataset: LangCache Sentence Pairs (all)
Size: 8,405 evaluation samples
Columns: sentence1, sentence2, and label

Approximate statistics based on the first 1000 samples:

	sentence1	sentence2	label
type	string	string	int
details	min: 28 characters mean: 116.35 characters max: 227 characters	min: 15 characters mean: 113.13 characters max: 243 characters	0: ~45.80% 1: ~54.20%

Samples:

sentence1	sentence2	label
`He said the foodservice pie business doesn 't fit the company 's long-term growth strategy .`	`" The foodservice pie business does not fit our long-term growth strategy .`	`1`
`Magnarelli said Racicot hated the Iraqi regime and looked forward to using his long years of training in the war .`	`His wife said he was " 100 percent behind George Bush " and looked forward to using his years of training in the war .`	`0`
`The dollar was at 116.92 yen against the yen , flat on the session , and at 1.2891 against the Swiss franc , also flat .`	`The dollar was at 116.78 yen JPY = , virtually flat on the session , and at 1.2871 against the Swiss franc CHF = , down 0.1 percent .`	`0`

Loss: BinaryCrossEntropyLoss with these parameters:

{
    "activation_fn": "torch.nn.modules.linear.Identity",
    "pos_weight": null
}

Training Hyperparameters

Non-Default Hyperparameters

eval_strategy: steps
per_device_train_batch_size: 48
per_device_eval_batch_size: 48
learning_rate: 0.0002
weight_decay: 0.01
num_train_epochs: 20
warmup_ratio: 0.1
load_best_model_at_end: True
optim: adamw_torch
push_to_hub: True
hub_model_id: aditeyabaral-redis/langcache-reranker-v1-wdwr

All Hyperparameters

Click to expand

overwrite_output_dir: False
do_predict: False
eval_strategy: steps
prediction_loss_only: True
per_device_train_batch_size: 48
per_device_eval_batch_size: 48
per_gpu_train_batch_size: None
per_gpu_eval_batch_size: None
gradient_accumulation_steps: 1
eval_accumulation_steps: None
torch_empty_cache_steps: None
learning_rate: 0.0002
weight_decay: 0.01
adam_beta1: 0.9
adam_beta2: 0.999
adam_epsilon: 1e-08
max_grad_norm: 1.0
num_train_epochs: 20
max_steps: -1
lr_scheduler_type: linear
lr_scheduler_kwargs: {}
warmup_ratio: 0.1
warmup_steps: 0
log_level: passive
log_level_replica: warning
log_on_each_node: True
logging_nan_inf_filter: True
save_safetensors: True
save_on_each_node: False
save_only_model: False
restore_callback_states_from_checkpoint: False
no_cuda: False
use_cpu: False
use_mps_device: False
seed: 42
data_seed: None
jit_mode_eval: False
use_ipex: False
bf16: False
fp16: False
fp16_opt_level: O1
half_precision_backend: auto
bf16_full_eval: False
fp16_full_eval: False
tf32: None
local_rank: 0
ddp_backend: None
tpu_num_cores: None
tpu_metrics_debug: False
debug: []
dataloader_drop_last: True
dataloader_num_workers: 0
dataloader_prefetch_factor: None
past_index: -1
disable_tqdm: False
remove_unused_columns: True
label_names: None
load_best_model_at_end: True
ignore_data_skip: False
fsdp: []
fsdp_min_num_params: 0
fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
fsdp_transformer_layer_cls_to_wrap: None
accelerator_config: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
deepspeed: None
label_smoothing_factor: 0.0
optim: adamw_torch
optim_args: None
adafactor: False
group_by_length: False
length_column_name: length
ddp_find_unused_parameters: None
ddp_bucket_cap_mb: None
ddp_broadcast_buffers: False
dataloader_pin_memory: True
dataloader_persistent_workers: False
skip_memory_metrics: True
use_legacy_prediction_loop: False
push_to_hub: True
resume_from_checkpoint: None
hub_model_id: aditeyabaral-redis/langcache-reranker-v1-wdwr
hub_strategy: every_save
hub_private_repo: None
hub_always_push: False
hub_revision: None
gradient_checkpointing: False
gradient_checkpointing_kwargs: None
include_inputs_for_metrics: False
include_for_metrics: []
eval_do_concat_batches: True
fp16_backend: auto
push_to_hub_model_id: None
push_to_hub_organization: None
mp_parameters:
auto_find_batch_size: False
full_determinism: False
torchdynamo: None
ray_scope: last
ddp_timeout: 1800
torch_compile: False
torch_compile_backend: None
torch_compile_mode: None
include_tokens_per_second: False
include_num_input_tokens_seen: False
neftune_noise_alpha: None
optim_target_modules: None
batch_eval_metrics: False
eval_on_start: False
use_liger_kernel: False
liger_kernel_config: None
eval_use_gather_object: False
average_tokens_across_devices: True
prompts: None
batch_sampler: batch_sampler
multi_dataset_batch_sampler: proportional
router_mapping: {}
learning_rate_mapping: {}

Training Logs

Click to expand

Epoch	Step	Training Loss	Validation Loss	val_average_precision	test_average_precision
-1	-1	-	-	0.7676	0.6907
0.1833	1000	0.3563	0.4805	0.7831	-
0.3666	2000	0.2065	0.5394	0.8221	-
0.5499	3000	0.1983	0.5019	0.8178	-
0.7331	4000	0.1923	0.5109	0.7960	-
0.9164	5000	0.1886	0.4726	0.8058	-
1.0997	6000	0.183	0.5062	0.8032	-
1.2830	7000	0.1838	0.5152	0.8021	-
1.4663	8000	0.1858	0.5105	0.7926	-
1.6496	9000	0.1905	0.5052	0.7859	-
1.8328	10000	0.1926	0.5316	0.7895	-
2.0161	11000	0.1951	0.5340	0.7681	-
2.1994	12000	0.1853	0.5573	0.7577	-
2.3827	13000	0.1848	0.5530	0.7946	-
2.5660	14000	0.1813	0.5754	0.7655	-
2.7493	15000	0.1793	0.5316	0.7514	-
2.9326	16000	0.1778	0.5230	0.7868	-
3.1158	17000	0.1681	0.5246	0.7816	-
3.2991	18000	0.1662	0.4946	0.7732	-
3.4824	19000	0.1648	0.5262	0.7853	-
3.6657	20000	0.1649	0.5007	0.7871	-
3.8490	21000	0.1633	0.5368	0.7807	-
4.0323	22000	0.1602	0.5559	0.7769	-
4.2155	23000	0.149	0.5796	0.7697	-
4.3988	24000	0.1486	0.5322	0.7608	-
4.5821	25000	0.1495	0.5142	0.7713	-
4.7654	26000	0.1493	0.5203	0.7866	-
4.9487	27000	0.1498	0.5433	0.7738	-
5.1320	28000	0.1391	0.5589	0.7803	-
5.3152	29000	0.1346	0.5267	0.7713	-
5.4985	30000	0.1367	0.5657	0.7803	-
5.6818	31000	0.1358	0.5631	0.7646	-
5.8651	32000	0.136	0.5444	0.7753	-
6.0484	33000	0.1346	0.5605	0.7703	-
6.2317	34000	0.1222	0.5399	0.7776	-
6.4150	35000	0.1241	0.5272	0.7899	-
6.5982	36000	0.1243	0.6096	0.7723	-
6.7815	37000	0.1266	0.5661	0.7609	-
6.9648	38000	0.1246	0.5341	0.7889	-
7.1481	39000	0.1128	0.6223	0.7884	-
7.3314	40000	0.1124	0.5485	0.7743	-
7.5147	41000	0.1127	0.5375	0.7842	-
7.6979	42000	0.1122	0.5231	0.7939	-
7.8812	43000	0.1141	0.5608	0.7705	-
8.0645	44000	0.1088	0.6511	0.7813	-
8.2478	45000	0.0998	0.6217	0.7648	-
8.4311	46000	0.1017	0.6000	0.7822	-
8.6144	47000	0.1031	0.5469	0.7866	-
8.7977	48000	0.1012	0.5862	0.7790	-
8.9809	49000	0.1031	0.5527	0.7876	-
9.1642	50000	0.0921	0.5460	0.7788	-
9.3475	51000	0.0909	0.5820	0.7815	-
9.5308	52000	0.0919	0.5589	0.7841	-
9.7141	53000	0.0939	0.5521	0.7821	-
9.8974	54000	0.0925	0.6942	0.7797	-
10.0806	55000	0.0863	0.6208	0.7729	-
10.2639	56000	0.0803	0.6632	0.7911	-
10.4472	57000	0.0797	0.6583	0.7833	-
10.6305	58000	0.0824	0.6194	0.7862	-
10.8138	59000	0.0829	0.6136	0.7783	-
10.9971	60000	0.0819	0.5833	0.7727	-
11.1804	61000	0.0693	0.6491	0.7881	-
11.3636	62000	0.0709	0.6449	0.7784	-
11.5469	63000	0.0721	0.6158	0.7838	-
11.7302	64000	0.0721	0.6649	0.7841	-
11.9135	65000	0.0732	0.6403	0.7702	-
12.0968	66000	0.0679	0.6079	0.7817	-
12.2801	67000	0.0615	0.6862	0.7787	-
12.4633	68000	0.0629	0.7239	0.7824	-
12.6466	69000	0.0643	0.6419	0.7897	-
12.8299	70000	0.0635	0.6743	0.7762	-
13.0132	71000	0.064	0.7135	0.7741	-
13.1965	72000	0.0545	0.6643	0.7723	-
13.3798	73000	0.0548	0.6508	0.7758	-
13.5630	74000	0.0547	0.7003	0.7785	-
13.7463	75000	0.0548	0.7170	0.7846	-
13.9296	76000	0.0553	0.6917	0.7722	-
14.1129	77000	0.0508	0.7000	0.7767	-
14.2962	78000	0.0474	0.7336	0.7730	-
14.4795	79000	0.0465	0.7122	0.7795	-
14.6628	80000	0.0478	0.7321	0.7779	-
14.8460	81000	0.0468	0.7112	0.7796	-
15.0293	82000	0.0465	0.7534	0.7788	-
15.2126	83000	0.0395	0.7238	0.7808	-
15.3959	84000	0.0401	0.7686	0.7905	-
15.5792	85000	0.0408	0.7296	0.7900	-
15.7625	86000	0.0414	0.7533	0.7822	-
15.9457	87000	0.0402	0.7748	0.7867	-
16.1290	88000	0.0352	0.8267	0.7844	-
16.3123	89000	0.0354	0.7488	0.7912	-
16.4956	90000	0.0337	0.7850	0.7857	-
16.6789	91000	0.0333	0.7812	0.7815	-
16.8622	92000	0.0341	0.8184	0.7786	-
17.0455	93000	0.0333	0.8166	0.7781	-
17.2287	94000	0.0288	0.7980	0.7803	-
17.4120	95000	0.0282	0.8195	0.7774	-
17.5953	96000	0.0285	0.7864	0.7829	-
17.7786	97000	0.0284	0.8000	0.7838	-
17.9619	98000	0.0279	0.8118	0.7873	-
18.1452	99000	0.0245	0.8727	0.7866	-
18.3284	100000	0.0235	0.8695	0.7836	-
18.5117	101000	0.0236	0.8246	0.7820	-
18.6950	102000	0.0232	0.8543	0.7828	-
18.8783	103000	0.0234	0.8840	0.7793	-
19.0616	104000	0.0219	0.8804	0.7783	-
19.2449	105000	0.0201	0.8885	0.7812	-
19.4282	106000	0.0194	0.8901	0.7821	-
19.6114	107000	0.0197	0.8850	0.7824	-
19.7947	108000	0.0196	0.8835	0.7830	-
19.9780	109000	0.0197	0.8803	0.7833	-

The bold row denotes the saved checkpoint.

Framework Versions

Python: 3.12.3
Sentence Transformers: 5.1.0
Transformers: 4.55.0
PyTorch: 2.8.0+cu128
Accelerate: 1.10.0
Datasets: 4.0.0
Tokenizers: 0.21.4

Citation

BibTeX

Sentence Transformers

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/1908.10084",
}

aditeyabaral-redis
/

langcache-reranker-v1-wdwr

Redis fine-tuned CrossEncoder model for semantic caching on LangCache

Model Details

Model Description

Model Sources

Usage

Direct Usage (Sentence Transformers)

Evaluation

Metrics

Cross Encoder Classification

Training Details

Training Dataset

LangCache Sentence Pairs (all)

Evaluation Dataset

LangCache Sentence Pairs (all)

Training Hyperparameters

Non-Default Hyperparameters

All Hyperparameters

Training Logs

Framework Versions

Citation

BibTeX

Sentence Transformers

Model tree for aditeyabaral-redis/langcache-reranker-v1-wdwr

Dataset used to train aditeyabaral-redis/langcache-reranker-v1-wdwr

Evaluation results