Running 2.8k 2.8k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
Infinite-LLM: Efficient LLM Service for Long Context with DistAttention and Distributed KVCache Paper • 2401.02669 • Published Jan 5, 2024 • 16