ZClip: Adaptive Spike Mitigation for LLM Pre-Training Paper โข 2504.02507 โข Published 18 days ago โข 76
MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases Paper โข 2402.14905 โข Published Feb 22, 2024 โข 130