AI & ML interests

In the following you find models tuned to be used for sentence / text embedding generation. They can be used with the sentence-transformers package.

Recent Activity

sentence-transformers 's collections 4

Embedding Model Datasets
A curated subset of the datasets that work out of the box with Sentence Transformers: https://huggingface.co/datasets?other=sentence-transformers
Parallel Sentences Datasets
These datasets all have "english" and "non_english" columns for numerous datasets. They can be used to make embedding models multilingual.
MS MARCO Mined Triplets
These datasets contain MS MARCO Triplets gathered by mining hard negatives using various models. Each dataset has various subsets.
NanoBEIR 🍺with BM25 Rankings
NanoBEIR by Zeta Alpha, extended with BM25 scores. These datasets are used in the Sentence Transformers Cross Encoder NanoBEIR Evaluator.