view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others • Dec 9, 2022 • 296
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain • Jan 30 • 91
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech • Apr 16 • 20
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 15 items • Updated Dec 6, 2024 • 622
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. • 17 items • Updated 3 days ago • 168
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper • 2312.11514 • Published Dec 12, 2023 • 257