WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Paper • 2505.19427 • Published May 26 • 10
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Paper • 2505.19427 • Published May 26 • 10
WINA: Weight Informed Neuron Activation for Accelerating Large Language Model Inference Paper • 2505.19427 • Published May 26 • 10 • 2
OTOv3: Automatic Architecture-Agnostic Neural Network Training and Compression from Structured Pruning to Erasing Operators Paper • 2312.09411 • Published Dec 15, 2023
DistiLLM: Towards Streamlined Distillation for Large Language Models Paper • 2402.03898 • Published Feb 6, 2024 • 3
FORA: Fast-Forward Caching in Diffusion Transformer Acceleration Paper • 2407.01425 • Published Jul 1, 2024
HESSO: Towards Automatic Efficient and User Friendly Any Neural Network Training and Pruning Paper • 2409.09085 • Published Sep 11, 2024
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published Mar 10 • 32
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published Mar 10 • 32
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs Paper • 2503.07067 • Published Mar 10 • 32 • 2
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision Paper • 2312.09390 • Published Dec 14, 2023 • 33