---
language:
- en
tags:
- ColBERT
- PyLate
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
- dataset_size:533177
- loss:Distillation
base_model: jhu-clsp/ettin-encoder-17m
datasets:
- lightonai/ms-marco-en-bge-gemma
pipeline_tag: sentence-similarity
library_name: PyLate
metrics:
- MaxSim_accuracy@1
- MaxSim_accuracy@3
- MaxSim_accuracy@5
- MaxSim_accuracy@10
- MaxSim_precision@1
- MaxSim_precision@3
- MaxSim_precision@5
- MaxSim_precision@10
- MaxSim_recall@1
- MaxSim_recall@3
- MaxSim_recall@5
- MaxSim_recall@10
- MaxSim_ndcg@10
- MaxSim_mrr@10
- MaxSim_map@100
model-index:
- name: PyLate model based on jhu-clsp/ettin-encoder-17m
results:
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoClimateFEVER
type: NanoClimateFEVER
metrics:
- type: MaxSim_accuracy@1
value: 0.26
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.44
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.48
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.72
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.26
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.1733333333333333
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.12000000000000002
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.102
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.11999999999999998
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.23
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.25666666666666665
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.3999999999999999
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.30588764137829927
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.38180158730158725
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.23047723328383551
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoDBPedia
type: NanoDBPedia
metrics:
- type: MaxSim_accuracy@1
value: 0.72
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.86
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.9
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.94
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.72
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.6066666666666667
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.5720000000000001
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.49
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.08124424335875133
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.1639789174109182
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.2286688535902389
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.3339202593691046
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.5943825623026429
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.7955555555555555
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.47610983586026223
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoFEVER
type: NanoFEVER
metrics:
- type: MaxSim_accuracy@1
value: 0.84
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.94
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.96
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.98
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.84
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.33333333333333326
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.20799999999999996
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.10599999999999998
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.7766666666666667
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.9033333333333333
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.93
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.95
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.8863719088415238
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.8903333333333333
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.8591045425163073
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoFiQA2018
type: NanoFiQA2018
metrics:
- type: MaxSim_accuracy@1
value: 0.42
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.6
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.72
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.76
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.42
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.26666666666666666
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.22799999999999998
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.14
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.21591269841269842
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.35584920634920636
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.4972857142857143
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.5840079365079365
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.4728764591299225
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.5343571428571429
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.3849491712909313
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoHotpotQA
type: NanoHotpotQA
metrics:
- type: MaxSim_accuracy@1
value: 0.92
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.98
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 1.0
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1.0
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.92
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.54
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.3399999999999999
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.17999999999999997
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.46
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.81
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.85
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.9
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.8633780841984157
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.9506666666666667
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.8068729210481762
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoMSMARCO
type: NanoMSMARCO
metrics:
- type: MaxSim_accuracy@1
value: 0.5
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.66
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.68
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.78
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.5
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.22
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.136
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.07800000000000001
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.5
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.66
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.68
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.78
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.6350694238626255
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.5893809523809523
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.6008352347387884
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoNFCorpus
type: NanoNFCorpus
metrics:
- type: MaxSim_accuracy@1
value: 0.4
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.52
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.58
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.68
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.4
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.34
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.32
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.28
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.033468066221703355
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.07710152452729603
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.09567130100917189
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.14399069709040951
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.33024968480063904
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.47657936507936505
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.14186780303718874
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoNQ
type: NanoNQ
metrics:
- type: MaxSim_accuracy@1
value: 0.54
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.76
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.8
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.84
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.54
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.25333333333333335
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.16799999999999998
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.09
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.51
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.7
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.76
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.81
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.6699201254277886
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.6447222222222222
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.62006789085707
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoQuoraRetrieval
type: NanoQuoraRetrieval
metrics:
- type: MaxSim_accuracy@1
value: 0.82
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.96
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 1.0
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1.0
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.82
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.38666666666666655
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.244
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.12799999999999997
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.7340000000000001
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.912
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.956
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.9726666666666668
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.9086308248836141
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.9
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.8813997853997853
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoSCIDOCS
type: NanoSCIDOCS
metrics:
- type: MaxSim_accuracy@1
value: 0.42
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.6
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.66
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.74
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.42
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.2866666666666667
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.22399999999999995
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.15
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.08666666666666666
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.17666666666666664
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.2286666666666666
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.30666666666666664
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.31422844901617714
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.5303571428571429
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.24770788410611272
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoArguAna
type: NanoArguAna
metrics:
- type: MaxSim_accuracy@1
value: 0.2
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.48
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.64
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.76
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.2
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.15999999999999998
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.128
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.07600000000000001
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.2
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.48
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.64
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.76
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.45449277481893957
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.3582460317460317
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.36532756317756315
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoSciFact
type: NanoSciFact
metrics:
- type: MaxSim_accuracy@1
value: 0.6
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.76
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.82
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.88
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.6
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.26666666666666666
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.18
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.09799999999999999
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.575
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.74
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.815
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.87
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.7356545211627262
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.6952222222222221
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.691199074074074
name: Maxsim Map@100
- task:
type: py-late-information-retrieval
name: Py Late Information Retrieval
dataset:
name: NanoTouche2020
type: NanoTouche2020
metrics:
- type: MaxSim_accuracy@1
value: 0.6938775510204082
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.9183673469387755
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.9591836734693877
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 1.0
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.6938775510204082
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.6394557823129251
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.636734693877551
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.49387755102040815
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.04942817268713302
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.13043451476387394
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.2136324859904483
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.3155916739971445
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.5612872974984163
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.8085519922254615
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.41548604773283626
name: Maxsim Map@100
- task:
type: nano-beir
name: Nano BEIR
dataset:
name: NanoBEIR mean
type: NanoBEIR_mean
metrics:
- type: MaxSim_accuracy@1
value: 0.5641444270015699
name: Maxsim Accuracy@1
- type: MaxSim_accuracy@3
value: 0.729105180533752
name: Maxsim Accuracy@3
- type: MaxSim_accuracy@5
value: 0.7845525902668761
name: Maxsim Accuracy@5
- type: MaxSim_accuracy@10
value: 0.8523076923076923
name: Maxsim Accuracy@10
- type: MaxSim_precision@1
value: 0.5641444270015699
name: Maxsim Precision@1
- type: MaxSim_precision@3
value: 0.34406070120355836
name: Maxsim Precision@3
- type: MaxSim_precision@5
value: 0.26959497645211933
name: Maxsim Precision@5
- type: MaxSim_precision@10
value: 0.18552904238618523
name: Maxsim Precision@10
- type: MaxSim_recall@1
value: 0.3340297318472015
name: Maxsim Recall@1
- type: MaxSim_recall@3
value: 0.487643397157792
name: Maxsim Recall@3
- type: MaxSim_recall@5
value: 0.5501224375545312
name: Maxsim Recall@5
- type: MaxSim_recall@10
value: 0.6251418384844561
name: Maxsim Recall@10
- type: MaxSim_ndcg@10
value: 0.5948022890247485
name: Maxsim Ndcg@10
- type: MaxSim_mrr@10
value: 0.6581364780344372
name: Maxsim Mrr@10
- type: MaxSim_map@100
value: 0.5170311528556101
name: Maxsim Map@100
---
# PyLate model based on jhu-clsp/ettin-encoder-17m
This is a [PyLate](https://github.com/lightonai/pylate) model finetuned from [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m) on the [ms-marco-en-bge-gemma](https://huggingface.co/datasets/lightonai/ms-marco-en-bge-gemma) dataset. It maps sentences & paragraphs to sequences of 128-dimensional dense vectors and can be used for semantic textual similarity using the MaxSim operator.
## Model Details
### Model Description
- **Model Type:** PyLate model
- **Base model:** [jhu-clsp/ettin-encoder-17m](https://huggingface.co/jhu-clsp/ettin-encoder-17m)
- **Document Length:** 300 tokens
- **Query Length:** 32 tokens
- **Output Dimensionality:** 128 tokens
- **Similarity Function:** MaxSim
- **Training Dataset:**
- [ms-marco-en-bge-gemma](https://huggingface.co/datasets/lightonai/ms-marco-en-bge-gemma)
- **Language:** en
### Model Sources
- **Documentation:** [PyLate Documentation](https://lightonai.github.io/pylate/)
- **Repository:** [PyLate on GitHub](https://github.com/lightonai/pylate)
- **Hugging Face:** [PyLate models on Hugging Face](https://huggingface.co/models?library=PyLate)
### Full Model Architecture
```
ColBERT(
(0): Transformer({'max_seq_length': 299, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Dense({'in_features': 256, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
```
## Usage
First install the PyLate library:
```bash
pip install -U pylate
```
### Retrieval
PyLate provides a streamlined interface to index and retrieve documents using ColBERT models. The index leverages the Voyager HNSW index to efficiently handle document embeddings and enable fast retrieval.
#### Indexing documents
First, load the ColBERT model and initialize the Voyager index, then encode and index your documents:
```python
from pylate import indexes, models, retrieve
# Step 1: Load the ColBERT model
model = models.ColBERT(
model_name_or_path=pylate_model_id,
)
# Step 2: Initialize the Voyager index
index = indexes.Voyager(
index_folder="pylate-index",
index_name="index",
override=True, # This overwrites the existing index if any
)
# Step 3: Encode the documents
documents_ids = ["1", "2", "3"]
documents = ["document 1 text", "document 2 text", "document 3 text"]
documents_embeddings = model.encode(
documents,
batch_size=32,
is_query=False, # Ensure that it is set to False to indicate that these are documents, not queries
show_progress_bar=True,
)
# Step 4: Add document embeddings to the index by providing embeddings and corresponding ids
index.add_documents(
documents_ids=documents_ids,
documents_embeddings=documents_embeddings,
)
```
Note that you do not have to recreate the index and encode the documents every time. Once you have created an index and added the documents, you can re-use the index later by loading it:
```python
# To load an index, simply instantiate it with the correct folder/name and without overriding it
index = indexes.Voyager(
index_folder="pylate-index",
index_name="index",
)
```
#### Retrieving top-k documents for queries
Once the documents are indexed, you can retrieve the top-k most relevant documents for a given set of queries.
To do so, initialize the ColBERT retriever with the index you want to search in, encode the queries and then retrieve the top-k documents to get the top matches ids and relevance scores:
```python
# Step 1: Initialize the ColBERT retriever
retriever = retrieve.ColBERT(index=index)
# Step 2: Encode the queries
queries_embeddings = model.encode(
["query for document 3", "query for document 1"],
batch_size=32,
is_query=True, # # Ensure that it is set to False to indicate that these are queries
show_progress_bar=True,
)
# Step 3: Retrieve top-k documents
scores = retriever.retrieve(
queries_embeddings=queries_embeddings,
k=10, # Retrieve the top 10 matches for each query
)
```
### Reranking
If you only want to use the ColBERT model to perform reranking on top of your first-stage retrieval pipeline without building an index, you can simply use rank function and pass the queries and documents to rerank:
```python
from pylate import rank, models
queries = [
"query A",
"query B",
]
documents = [
["document A", "document B"],
["document 1", "document C", "document B"],
]
documents_ids = [
[1, 2],
[1, 3, 2],
]
model = models.ColBERT(
model_name_or_path=pylate_model_id,
)
queries_embeddings = model.encode(
queries,
is_query=True,
)
documents_embeddings = model.encode(
documents,
is_query=False,
)
reranked_documents = rank.rerank(
documents_ids=documents_ids,
queries_embeddings=queries_embeddings,
documents_embeddings=documents_embeddings,
)
```
## Evaluation
### Metrics
#### Py Late Information Retrieval
* Dataset: `['NanoClimateFEVER', 'NanoDBPedia', 'NanoFEVER', 'NanoFiQA2018', 'NanoHotpotQA', 'NanoMSMARCO', 'NanoNFCorpus', 'NanoNQ', 'NanoQuoraRetrieval', 'NanoSCIDOCS', 'NanoArguAna', 'NanoSciFact', 'NanoTouche2020']`
* Evaluated with pylate.evaluation.pylate_information_retrieval_evaluator.PyLateInformationRetrievalEvaluator
| Metric | NanoClimateFEVER | NanoDBPedia | NanoFEVER | NanoFiQA2018 | NanoHotpotQA | NanoMSMARCO | NanoNFCorpus | NanoNQ | NanoQuoraRetrieval | NanoSCIDOCS | NanoArguAna | NanoSciFact | NanoTouche2020 |
|:--------------------|:-----------------|:------------|:-----------|:-------------|:-------------|:------------|:-------------|:-----------|:-------------------|:------------|:------------|:------------|:---------------|
| MaxSim_accuracy@1 | 0.26 | 0.72 | 0.84 | 0.42 | 0.92 | 0.5 | 0.4 | 0.54 | 0.82 | 0.42 | 0.2 | 0.6 | 0.6939 |
| MaxSim_accuracy@3 | 0.44 | 0.86 | 0.94 | 0.6 | 0.98 | 0.66 | 0.52 | 0.76 | 0.96 | 0.6 | 0.48 | 0.76 | 0.9184 |
| MaxSim_accuracy@5 | 0.48 | 0.9 | 0.96 | 0.72 | 1.0 | 0.68 | 0.58 | 0.8 | 1.0 | 0.66 | 0.64 | 0.82 | 0.9592 |
| MaxSim_accuracy@10 | 0.72 | 0.94 | 0.98 | 0.76 | 1.0 | 0.78 | 0.68 | 0.84 | 1.0 | 0.74 | 0.76 | 0.88 | 1.0 |
| MaxSim_precision@1 | 0.26 | 0.72 | 0.84 | 0.42 | 0.92 | 0.5 | 0.4 | 0.54 | 0.82 | 0.42 | 0.2 | 0.6 | 0.6939 |
| MaxSim_precision@3 | 0.1733 | 0.6067 | 0.3333 | 0.2667 | 0.54 | 0.22 | 0.34 | 0.2533 | 0.3867 | 0.2867 | 0.16 | 0.2667 | 0.6395 |
| MaxSim_precision@5 | 0.12 | 0.572 | 0.208 | 0.228 | 0.34 | 0.136 | 0.32 | 0.168 | 0.244 | 0.224 | 0.128 | 0.18 | 0.6367 |
| MaxSim_precision@10 | 0.102 | 0.49 | 0.106 | 0.14 | 0.18 | 0.078 | 0.28 | 0.09 | 0.128 | 0.15 | 0.076 | 0.098 | 0.4939 |
| MaxSim_recall@1 | 0.12 | 0.0812 | 0.7767 | 0.2159 | 0.46 | 0.5 | 0.0335 | 0.51 | 0.734 | 0.0867 | 0.2 | 0.575 | 0.0494 |
| MaxSim_recall@3 | 0.23 | 0.164 | 0.9033 | 0.3558 | 0.81 | 0.66 | 0.0771 | 0.7 | 0.912 | 0.1767 | 0.48 | 0.74 | 0.1304 |
| MaxSim_recall@5 | 0.2567 | 0.2287 | 0.93 | 0.4973 | 0.85 | 0.68 | 0.0957 | 0.76 | 0.956 | 0.2287 | 0.64 | 0.815 | 0.2136 |
| MaxSim_recall@10 | 0.4 | 0.3339 | 0.95 | 0.584 | 0.9 | 0.78 | 0.144 | 0.81 | 0.9727 | 0.3067 | 0.76 | 0.87 | 0.3156 |
| **MaxSim_ndcg@10** | **0.3059** | **0.5944** | **0.8864** | **0.4729** | **0.8634** | **0.6351** | **0.3302** | **0.6699** | **0.9086** | **0.3142** | **0.4545** | **0.7357** | **0.5613** |
| MaxSim_mrr@10 | 0.3818 | 0.7956 | 0.8903 | 0.5344 | 0.9507 | 0.5894 | 0.4766 | 0.6447 | 0.9 | 0.5304 | 0.3582 | 0.6952 | 0.8086 |
| MaxSim_map@100 | 0.2305 | 0.4761 | 0.8591 | 0.3849 | 0.8069 | 0.6008 | 0.1419 | 0.6201 | 0.8814 | 0.2477 | 0.3653 | 0.6912 | 0.4155 |
#### Nano BEIR
* Dataset: `NanoBEIR_mean`
* Evaluated with pylate.evaluation.nano_beir_evaluator.NanoBEIREvaluator
| Metric | Value |
|:--------------------|:-----------|
| MaxSim_accuracy@1 | 0.5641 |
| MaxSim_accuracy@3 | 0.7291 |
| MaxSim_accuracy@5 | 0.7846 |
| MaxSim_accuracy@10 | 0.8523 |
| MaxSim_precision@1 | 0.5641 |
| MaxSim_precision@3 | 0.3441 |
| MaxSim_precision@5 | 0.2696 |
| MaxSim_precision@10 | 0.1855 |
| MaxSim_recall@1 | 0.334 |
| MaxSim_recall@3 | 0.4876 |
| MaxSim_recall@5 | 0.5501 |
| MaxSim_recall@10 | 0.6251 |
| **MaxSim_ndcg@10** | **0.5948** |
| MaxSim_mrr@10 | 0.6581 |
| MaxSim_map@100 | 0.517 |
## Training Details
### Training Dataset
#### ms-marco-en-bge-gemma
* Dataset: [ms-marco-en-bge-gemma](https://huggingface.co/datasets/lightonai/ms-marco-en-bge-gemma) at [d8bad49](https://huggingface.co/datasets/lightonai/ms-marco-en-bge-gemma/tree/d8bad497c8bd698c868a49721999c386d5e6ae8f)
* Size: 533,177 training samples
* Columns: query_id
, document_ids
, and scores
* Approximate statistics based on the first 1000 samples:
| | query_id | document_ids | scores |
|:--------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------|:------------------------------------|
| type | int | list | list |
| details |
685613
| [7546874, 1176459, 197677, 2306318, 8541504, ...]
| [0.9999999992804947, 0.24845418756716053, 0.7594154013647826, 0.26644182105618575, 0.390668914839766, ...]
|
| 237784
| [6366584, 4034101, 2325374, 6914618, 6042146, ...]
| [0.9999999991784339, 0.42233632827946693, 0.5956354295491569, 0.12644415907455164, 0.6636713730105909, ...]
|
| 904294
| [448408, 8743975, 49600, 7339401, 2714261, ...]
| [0.9999999991841937, 0.877629062381539, 0.8330146583389045, 0.3116634796692611, 0.4633524534142185, ...]
|
* Loss: pylate.losses.distillation.Distillation
### Training Hyperparameters
#### Non-Default Hyperparameters
- `eval_strategy`: steps
- `per_device_train_batch_size`: 16
- `learning_rate`: 3e-05
- `num_train_epochs`: 1
- `bf16`: True
#### All Hyperparameters