SparseLoRA: Accelerating LLM Fine-Tuning with Contextual Sparsity Paper • 2506.16500 • Published Jun 19 • 17
SparseViT: Revisiting Activation Sparsity for Efficient High-Resolution Vision Transformer Paper • 2303.17605 • Published Mar 30, 2023
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Paper • 2005.14187 • Published May 28, 2020 • 2
MapPrior: Bird's-Eye View Map Layout Estimation with Generative Models Paper • 2308.12963 • Published Aug 24, 2023
FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer Paper • 2301.08739 • Published Jan 20, 2023
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19, 2024 • 53
AMC: AutoML for Model Compression and Acceleration on Mobile Devices Paper • 1802.03494 • Published Feb 10, 2018
APQ: Joint Search for Network Architecture, Pruning and Quantization Policy Paper • 2006.08509 • Published Jun 15, 2020
VILA-M3: Enhancing Vision-Language Models with Medical Expert Knowledge Paper • 2411.12915 • Published Nov 19, 2024
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding Paper • 2505.22618 • Published May 28 • 42
Token-Efficient Long Video Understanding for Multimodal LLMs Paper • 2503.04130 • Published Mar 6 • 95
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention Paper • 2502.14866 • Published Feb 20 • 13
StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation Paper • 2312.12491 • Published Dec 19, 2023 • 73
LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models Paper • 2309.12307 • Published Sep 21, 2023 • 89
BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation Paper • 2205.13542 • Published May 26, 2022