Magpie Conversation Ko Collection Magpie 데이터셋 한국어 번역본 (@nayohan님 번역 모델 사용) • 10 items • Updated Nov 6, 2024 • 2
Common Pile v0.1 Filtered Data Collection An LLM pre-training dataset produced by filtering and deduplicating the raw text collected in the Common Pile v0.1 • 31 items • Updated 4 days ago • 11
Common Pile v0.1 Collection All resources related to Common Pile v0.1, an 8TB dataset of public domain and openly licensed text • 4 items • Updated 4 days ago • 21
MiniCPM4 Collection MiniCPM4: Ultra-Efficient LLMs on End Devices • 17 items • Updated 39 minutes ago • 45
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12, 2024 • 134
view article Article Cosmopedia: how to create large-scale synthetic data for pre-training Large Language Models By loubnabnl and 2 others • Mar 20, 2024 • 94
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Paper • 1910.10683 • Published Oct 23, 2019 • 14
view article Article Exploring Quantization Backends in Diffusers By derekl35 and 2 others • 20 days ago • 33
UQFF Collection UQFF models. Examples for each in the model card! • 31 items • Updated 4 days ago • 17
view article Article Mitigating False Negatives in Multiple Negatives Ranking Loss for Retriever Training By dragonkue • 16 days ago • 7