FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language Paper • 2506.20920 • Published 2 days ago • 23
Running 2.72k 2.72k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
SmolLM2 Collection State-of-the-art compact LLMs for on-device applications: 1.7B, 360M, 135M • 16 items • Updated May 5 • 269
Evaluating the Search Phase of Neural Architecture Search Paper • 1902.08142 • Published Feb 21, 2019
Landmark Attention: Random-Access Infinite Context Length for Transformers Paper • 2305.16300 • Published May 25, 2023
Faster Causal Attention Over Large Sequences Through Sparse Flash Attention Paper • 2306.01160 • Published Jun 1, 2023 • 1