view article Article Fine-tuning LLMs to 1.58bit: extreme quantization made easy Sep 18, 2024 โข 243
Optimizing Large Language Model Training Using FP4 Quantization Paper โข 2501.17116 โข Published Jan 28 โข 39
Mamba2-In-Llama3 Collection Mamba2 distilled from Llama3 8B instruct. The Mamba in the Llama: Distilling and Accelerating Hybrid Models (https://arxiv.org/abs/2408.15237). โข 4 items โข Updated Sep 9, 2024 โข 2
The Mamba in the Llama: Distilling and Accelerating Hybrid Models Paper โข 2408.15237 โข Published Aug 27, 2024 โข 42
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper โข 2310.16795 โข Published Oct 25, 2023 โข 27