Radial Attention: $O(n\log n)$ Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published 13 days ago • 35
Jetfire: Efficient and Accurate Transformer Pretraining with INT8 Data Flow and Per-Block Quantization Paper • 2403.12422 • Published Mar 19, 2024 • 1
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training Paper • 2410.19313 • Published Oct 25, 2024 • 19
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity Paper • 2502.01776 • Published Feb 3 • 2
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache Paper • 2502.10424 • Published Feb 5 • 1
Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation Paper • 2505.18875 • Published May 24 • 41