Cautious Optimizers: Improving Training with One Line of Code Paper ā¢ 2411.16085 ā¢ Published Nov 25, 2024 ā¢ 21 ā¢ 2
Memory-Efficient LLM Training with Online Subspace Descent Paper ā¢ 2408.12857 ā¢ Published Aug 23, 2024 ā¢ 14 ā¢ 3
Memory-Efficient LLM Training with Online Subspace Descent Paper ā¢ 2408.12857 ā¢ Published Aug 23, 2024 ā¢ 14 ā¢ 3
LongNet: Scaling Transformers to 1,000,000,000 Tokens Paper ā¢ 2307.02486 ā¢ Published Jul 5, 2023 ā¢ 80 ā¢ 15