Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens Paper • 2411.17691 • Published Nov 26, 2024 • 11 • 5
Thanos: Enhancing Conversational Agents with Skill-of-Mind-Infused Large Language Model Paper • 2411.04496 • Published Nov 7, 2024 • 22 • 3
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Paper • 2410.23918 • Published Oct 31, 2024 • 19 • 6
BitStack: Fine-Grained Size Control for Compressed Large Language Models in Variable Memory Environments Paper • 2410.23918 • Published Oct 31, 2024 • 19 • 6
Erasing Conceptual Knowledge from Language Models Paper • 2410.02760 • Published Oct 3, 2024 • 14 • 4
Tutor CoPilot: A Human-AI Approach for Scaling Real-Time Expertise Paper • 2410.03017 • Published Oct 3, 2024 • 27 • 5
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices Paper • 2410.00531 • Published Oct 1, 2024 • 30 • 6
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models Paper • 2404.07839 • Published Apr 11, 2024 • 44 • 2