Lynx: Towards High-Fidelity Personalized Video Generation Paper • 2509.15496 • Published 18 days ago • 12
Seeing Voices: Generating A-Roll Video from Audio with Mirage Paper • 2506.08279 • Published Jun 9 • 27
Efficient LLaMA-3.2-Vision by Trimming Cross-attended Visual Features Paper • 2504.00557 • Published Apr 1 • 15
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation Paper • 2503.09641 • Published Mar 12 • 40
SANA-Sprint Collection 🏃SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation • 6 items • Updated 24 days ago • 43
Fourier Position Embedding: Enhancing Attention's Periodic Extension for Length Generalization Paper • 2412.17739 • Published Dec 23, 2024 • 41
FastVLM: Efficient Vision Encoding for Vision Language Models Paper • 2412.13303 • Published Dec 17, 2024 • 70
FashionComposer: Compositional Fashion Image Generation Paper • 2412.14168 • Published Dec 18, 2024 • 16
FitDiT: Advancing the Authentic Garment Details for High-fidelity Virtual Try-on Paper • 2411.10499 • Published Nov 15, 2024 • 13
RedPajama: an Open Dataset for Training Large Language Models Paper • 2411.12372 • Published Nov 19, 2024 • 56
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality Paper • 2410.19355 • Published Oct 25, 2024 • 23
LLaVA-o1: Let Vision Language Models Reason Step-by-Step Paper • 2411.10440 • Published Nov 15, 2024 • 129
MarDini: Masked Autoregressive Diffusion for Video Generation at Scale Paper • 2410.20280 • Published Oct 26, 2024 • 23
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22, 2024 • 31
Efficient Diffusion Models: A Comprehensive Survey from Principles to Practices Paper • 2410.11795 • Published Oct 15, 2024 • 18