view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! By medmekk and 1 other • 18 days ago • 45
view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy By medmekk and 5 others • Sep 18, 2024 • 227
view article Article Llama 3.1 - 405B, 70B & 8B with multilinguality and long context By philschmid and 7 others • Jul 23, 2024 • 231
view article Article quanto: a pytorch quantization toolkit By dacorvo and 2 others • Mar 18, 2024 • 35
view article Article Overview of natively supported quantization schemes in 🤗 Transformers By ybelkada and 4 others • Sep 12, 2023 • 12
view article Article Making LLMs lighter with AutoGPTQ and transformers By marcsun13 and 5 others • Aug 23, 2023 • 45