Training - a zk67 Collection

zk67 's Collections

Foundation Models and AGI

Model Architecture

Instruction Tuning

inference optimization

LLM Reasoning Papers

LLM Tech Report

LLM Post Training

Training

updated Jan 18

A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

Paper • 2102.06356 • Published Feb 12, 2021

Note Optimizer-Google
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

Paper • 1904.00962 • Published Apr 1, 2019 • 1

Note Optimizer-lamb
Decoupled Weight Decay Regularization

Paper • 1711.05101 • Published Nov 14, 2017 • 3

Note Optimizer-adamw https://arxiv.org/abs/2410.05192 Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective Stanford. WSD to LR