Gemma 3 QAT Collection Quantization Aware Trained (QAT) Gemma 3 checkpoints. The model preserves similar quality as half precision while using 3x less memory • 15 items • Updated 18 days ago • 185
Describe Anything Collection Multimodal Large Language Models for Detailed Localized Image and Video Captioning • 7 items • Updated about 8 hours ago • 48
Multimodal DSE Retrievers Collection A collection of DSE models for multimodal retrieval • 5 items • Updated 21 days ago • 14
PubMedBERT Embeddings M2V Collection Models distilled with Model2Vec - 100K / 500K / 1M / 2M / 8M parameter variants. • 5 items • Updated Jan 26 • 4
Orpheus Multilingual Research Release Collection Beta Release of multilingual models. • 12 items • Updated 25 days ago • 77
view article Article Training and Finetuning Reranker Models with Sentence Transformers v4 Mar 26 • 125
📚 LLM pretraining datasets Collection A collection of datasets for LLM pretraining • 9 items • Updated about 12 hours ago • 7
Dar Datasets Collection datasets uploaded by https://github.com/ARBML/dar • 200 items • Updated Aug 22, 2024 • 11
KITAB-Bench Collection A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding • 24 items • Updated Feb 24 • 11
ModernBERT Collection Bringing BERT into modernity via both architecture changes and scaling • 3 items • Updated Dec 19, 2024 • 141
Reranker & Retrieval Arabic Datasets & Models Collection This collection contains different Arabic datasets and models for retrieval and reranking tasks. • 8 items • Updated Dec 7, 2024 • 3
AIMv2 Collection A collection of AIMv2 vision encoders that supports a number of resolutions, native resolution, and a distilled checkpoint. • 19 items • Updated Nov 22, 2024 • 77