SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer Paper • 2303.17605 • Published Mar 30, 2023
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Paper • 2005.14187 • Published May 28, 2020 • 2
MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models Paper • 2308.12963 • Published Aug 24, 2023
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 53
AMC: AutoML for Model Compression and Acceleration on Mobile Devices Paper • 1802.03494 • Published Feb 10, 2018
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy Paper • 2006.08509 • Published Jun 15, 2020
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge Paper • 2411.12915 • Published Nov 19, 2024
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 42
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 42
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 95
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 95