Optimization Benchmark for Diffusion Models on Dynamical Systems Paper • 2510.19376 • Published Oct 22
The Surprising Agreement Between Convex Optimization Theory and Learning-Rate Scheduling for Large Model Training Paper • 2501.18965 • Published Jan 31 • 7
SGD with Clipping is Secretly Estimating the Median Gradient Paper • 2402.12828 • Published Feb 20, 2024