SYNTHETIC-1 Collection A collection of tasks & verifiers for reasoning datasets • 9 items • Updated 1 day ago • 43
Gemma-APS Release Collection Gemma models for text-to-propositions segmentation. The models are distilled from fine-tuned Gemini Pro model applied to multi-domain synthetic data. • 3 items • Updated Dec 13, 2024 • 20
Gemma 2 JPN Release Collection A Gemma 2 2B model fine-tuned on Japanese text. It supports the Japanese language the same level of performance of EN only queries on Gemma 2. • 3 items • Updated Dec 13, 2024 • 28
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 11 days ago • 296
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Dec 13, 2024 • 85
Reranking Model Collection A collection of Korean-specific reranking models • 2 items • Updated Aug 16, 2024 • 3
Minitron Collection A family of compressed models obtained via pruning and knowledge distillation • 12 items • Updated Jan 17 • 60
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems Paper • 2407.01370 • Published Jul 1, 2024 • 86
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12, 2024 • 68
view article Article Training and Finetuning Embedding Models with Sentence Transformers v3 May 28, 2024 • 186