Taming LLMs by Scaling Learning Rates with Gradient Grouping Paper • 2506.01049 • Published 11 days ago • 36