view article Article Illustrating Reinforcement Learning from Human Feedback (RLHF) By natolambert and 3 others โข Dec 9, 2022 โข 294
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency By not-lain โข Jan 30 โข 90
view article Article Prefill and Decode for Concurrent Requests - Optimizing LLM Performance By tngtech โข Apr 16 โข 19
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 โข 15 items โข Updated Dec 6, 2024 โข 622
Phi-4 Collection Phi-4 family of small language, multi-modal and reasoning models. โข 16 items โข Updated 26 days ago โข 165
LLM in a flash: Efficient Large Language Model Inference with Limited Memory Paper โข 2312.11514 โข Published Dec 12, 2023 โข 257