Winner-Take-All Column Row Sampling for Memory Efficient Adaptation of Language Model Paper • 2305.15265 • Published May 24, 2023 • 1
DIVISION: Memory Efficient Training via Dual Activation Precision Paper • 2208.04187 • Published Aug 5, 2022
KV Cache Compression, But What Must We Give in Return? A Comprehensive Benchmark of Long Context Capable Approaches Paper • 2407.01527 • Published Jul 1, 2024
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 74
Assessing and Enhancing Large Language Models in Rare Disease Question-answering Paper • 2408.08422 • Published Aug 15, 2024
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published 13 days ago • 5
AutoL2S: Auto Long-Short Reasoning for Efficient Large Language Models Paper • 2505.22662 • Published 13 days ago • 5
70% Size, 100% Accuracy: Lossless LLM Compression for Efficient GPU Inference via Dynamic-Length Float Paper • 2504.11651 • Published Apr 15 • 28
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models Paper • 2503.16419 • Published Mar 20 • 74