ColBERT-Zero 🐶 Collection First large-scale fully pre-trained ColBERT model using only public data, outperforming GTE-ModernColBERT and GTE-ModernBERT • 10 items • Updated 6 days ago • 16
Bharat-NanoBEIR: Indian Language Retrieval Benchmarks Collection NanoBEIR retrieval benchmarks translated into 22 Indian languages across 13 datasets. • 22 items • Updated Dec 13, 2025 • 5
CoRNStack Collection State-of-the-art code retrieval and re-ranking models and datasets • 9 items • Updated Mar 26, 2025 • 20
NanoBEIR datasets Collection These datasets are compatible with the (Sparse)NanoBEIREvaluator with Sentence Transformers v5.2+. Also CrossEncoderNanoBEIREvaluator if bm25 column • 18 items • Updated 26 days ago • 14
Embedding Model Datasets Collection A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers • 70 items • Updated Dec 10, 2025 • 163
view article Article Provence: efficient and robust context pruning for retrieval-augmented generation Jan 28, 2025 • 25
view article Article huggingface_hub v1.0: Five Years of Building the Foundation of Open Machine Learning +2 Oct 27, 2025 • 75
view article Article Training and Finetuning Sparse Embedding Models with Sentence Transformers v5 Jul 1, 2025 • 133