Catwalk: A Unified Language Model Evaluation Framework for Many Datasets Paper • 2312.10253 • Published Dec 15, 2023 • 7
Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math Paper • 2312.17120 • Published Dec 28, 2023 • 25
Awesome feedback datasets Collection A curated list of datasets with human or AI feedback. Useful for training reward models or applying techniques like DPO. • 19 items • Updated Apr 12 • 65
Awesome SFT datasets Collection A curated list of interesting datasets to fine-tune language models with. • 43 items • Updated Apr 12 • 117