view article Article From DeepSpeed to FSDP and Back Again with Hugging Face Accelerate Jun 13, 2024 • 45
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads Paper • 2401.10774 • Published Jan 19, 2024 • 54
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models Paper • 2401.06066 • Published Jan 11, 2024 • 47
Improving Text Embeddings with Large Language Models Paper • 2401.00368 • Published Dec 31, 2023 • 80
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws Paper • 2401.00448 • Published Dec 31, 2023 • 29
Understanding LLMs: A Comprehensive Overview from Training to Inference Paper • 2401.02038 • Published Jan 4, 2024 • 63
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU Paper • 2312.12456 • Published Dec 16, 2023 • 41
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 45
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 259