Running 2.22k 2.22k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters
deepseek-ai/DeepSeek-R1-Distill-Llama-70B Text Generation • Updated 17 days ago • 384k • • 627
view article Article LLM Comparison/Test: Llama 3 Instruct 70B + 8B HF/GGUF/EXL2 (20 versions tested and compared!) By wolfram • Apr 24, 2024 • 62
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning Paper • 2410.02884 • Published Oct 3, 2024 • 54