Instruction Pre-Training: Language Models are Supervised Multitask Learners Paper • 2406.14491 • Published Jun 20 • 85
Scaling Smart: Accelerating Large Language Model Pre-training with Small Model Initialization Paper • 2409.12903 • Published 3 days ago • 13