ConTEB models
Collection
Our models trained with the InSeNT approach. These are the checkpoints that we used to run the evaluations reported in our paper.
•
2 items
•
Updated
•
1
This is a contextual model finetuned from lightonai/GTE-ModernColBERT-v1 on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
This experimental model stems from the paper Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings. While results are promising, we have seen regression on standard embedding tasks, and using it in production will probably require further work on extending the training set to improve robustness and OOD generalization.
First install the contextual-embeddings
package:
pip install git+https://github.com/illuin-tech/contextual-embeddings
To run inference with a contextual model, you can use the following examples:
from contextual_embeddings import LongContextEmbeddingModel
from pylate.models import ColBERT
documents = [
[
"The old lighthouse keeper trimmed his lamp, its beam cutting a lonely path through the fog.",
"He remembered nights of violent storms, when the ocean seemed to swallow the sky whole.",
"Still, he found comfort in his duty, a silent guardian against the treacherous sea."
],
[
"A curious fox cub, all rust and wonder, ventured out from its den for the first time.",
"Each rustle of leaves, every chirping bird, was a new symphony to its tiny ears.",
"Under the watchful eye of its mother, it began to learn the secrets of the whispering forest."
]
]
base_model = ColBERT("illuin-conteb/modern-colbert-insent")
contextual_model = LongContextEmbeddingModel(
base_model=base_model,
pooling_mode="tokens"
)
embeddings = contextual_model.embed_documents(documents)
print("Length of embeddings:", len(embeddings)) # 2
print("Length of first document embedding:", len(embeddings[0])) # 3
print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size([22, 128])
ColBERT(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
@misc{conti2025contextgoldgoldpassage,
title={Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings},
author={Max Conti and Manuel Faysse and Gautier Viaud and Antoine Bosselut and Céline Hudelot and Pierre Colombo},
year={2025},
eprint={2505.24782},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2505.24782},
}