OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning Paper • 2405.18380 • Published May 28, 2024 • 1
FFN-SkipLLM: A Hidden Gem for Autoregressive Decoding with Adaptive Feed Forward Skipping Paper • 2404.03865 • Published Apr 5, 2024
Found in the Middle: How Language Models Use Long Contexts Better via Plug-and-Play Positional Encoding Paper • 2403.04797 • Published Mar 5, 2024 • 1
The Unreasonable Effectiveness of Random Pruning: Return of the Most Naive Baseline for Sparse Training Paper • 2202.02643 • Published Feb 5, 2022 • 1
Sparse Training via Boosting Pruning Plasticity with Neuroregeneration Paper • 2106.10404 • Published Jun 19, 2021 • 1
The Emergence of Essential Sparsity in Large Pre-trained Models: The Weights that Matter Paper • 2306.03805 • Published Jun 6, 2023 • 1
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs Paper • 2310.08915 • Published Oct 13, 2023
AdaMerging: Adaptive Model Merging for Multi-Task Learning Paper • 2310.02575 • Published Oct 4, 2023 • 1
Sparse MoE as the New Dropout: Scaling Dense and Self-Slimmable Transformers Paper • 2303.01610 • Published Mar 2, 2023
Q-GaLore: Quantized GaLore with INT4 Projection and Layer-Adaptive Low-Rank Gradients Paper • 2407.08296 • Published Jul 11, 2024 • 33
From GaLore to WeLore: How Low-Rank Weights Non-uniformly Emerge from Low-Rank Gradients Paper • 2407.11239 • Published Jul 15, 2024 • 8
O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning Paper • 2501.12570 • Published Jan 22 • 24
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published 27 days ago • 9
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published 14 days ago • 16
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published 10 days ago • 20
SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers Paper • 2502.20545 • Published 10 days ago • 20
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam Paper • 2502.17055 • Published 14 days ago • 16
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published 27 days ago • 9
Mask-Enhanced Autoregressive Prediction: Pay Less Attention to Learn More Paper • 2502.07490 • Published 27 days ago • 9 • 2