metadata
tags:
- ColBERT
- PyLate
- contextual-embeddings
- sentence-transformers
- sentence-similarity
- feature-extraction
- generated_from_trainer
ModernColBERT + InSeNT

This is a contextual model finetuned from lightonai/GTE-ModernColBERT-v1 on the ConTEB training dataset. It was trained using the InSeNT training approach, detailed in the corresponding paper.
This experimental model stems from the paper Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings. While results are promising, we have seen regression on standard embedding tasks, and using it in production will probably require further work on extending the training set to improve robustness and OOD generalization.
Usage
Direct Usage
First install the contextual-embeddings
package:
pip install git+https://github.com/illuin-tech/contextual-embeddings
To run inference with a contextual model, you can use the following examples:
from contextual_embeddings import LongContextEmbeddingModel
from pylate.models import ColBERT
documents = [
[
"The old lighthouse keeper trimmed his lamp, its beam cutting a lonely path through the fog.",
"He remembered nights of violent storms, when the ocean seemed to swallow the sky whole.",
"Still, he found comfort in his duty, a silent guardian against the treacherous sea."
],
[
"A curious fox cub, all rust and wonder, ventured out from its den for the first time.",
"Each rustle of leaves, every chirping bird, was a new symphony to its tiny ears.",
"Under the watchful eye of its mother, it began to learn the secrets of the whispering forest."
]
]
base_model = ColBERT("illuin-conteb/modern-colbert-insent")
contextual_model = LongContextEmbeddingModel(
base_model=base_model,
pooling_mode="tokens"
)
embeddings = contextual_model.embed_documents(documents)
print("Length of embeddings:", len(embeddings)) # 2
print("Length of first document embedding:", len(embeddings[0])) # 3
print(f"Shape of first chunk embedding: {embeddings[0][0].shape}") # torch.Size([22, 128])
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: lightonai/GTE-ModernColBERT-v1
- Maximum Sequence Length: tokens
- Output Dimensionality: 128 dimensions
- Similarity Function: MaxSim
- Training Dataset:
- train
Model Sources
- Repository: Contextual Embeddings
- Hugging Face: Contextual Embeddings
Full Model Architecture
ColBERT(
(0): Transformer({'max_seq_length': 8192, 'do_lower_case': False}) with Transformer model: ModernBertModel
(1): Dense({'in_features': 768, 'out_features': 128, 'bias': False, 'activation_function': 'torch.nn.modules.linear.Identity'})
)
Citation
@misc{conti2025contextgoldgoldpassage,
title={Context is Gold to find the Gold Passage: Evaluating and Training Contextual Document Embeddings},
author={Max Conti and Manuel Faysse and Gautier Viaud and Antoine Bosselut and Céline Hudelot and Pierre Colombo},
year={2025},
eprint={2505.24782},
archivePrefix={arXiv},
primaryClass={cs.IR},
url={https://arxiv.org/abs/2505.24782},
}