Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
zk67 's Collections
LLM Evaluation
Foundation Models and AGI
Model Architecture
Instruction Tuning
Agent AI
Training
LLM Data
inference optimization
Ilya Papers
LLM Reasoning Papers
LLM Tech Report
LLM Post Training
LLM Pre-Train

Training

updated Jan 18
Upvote
-

  • A Large Batch Optimizer Reality Check: Traditional, Generic Optimizers Suffice Across Batch Sizes

    Paper • 2102.06356 • Published Feb 12, 2021

    Note Optimizer-Google


  • Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

    Paper • 1904.00962 • Published Apr 1, 2019 • 1

    Note Optimizer-lamb


  • Decoupled Weight Decay Regularization

    Paper • 1711.05101 • Published Nov 14, 2017 • 2

    Note Optimizer-adamw https://arxiv.org/abs/2410.05192 Understanding Warmup-Stable-Decay Learning Rates: A River Valley Loss Landscape Perspective Stanford. WSD to LR

Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs