SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL Paper • 2504.11455 • Published 5 days ago • 11
Video-T1: Test-Time Scaling for Video Generation Paper • 2503.18942 • Published 27 days ago • 88
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Paper • 2503.19462 • Published 27 days ago • 10
Personalize Anything for Free with Diffusion Transformer Paper • 2503.12590 • Published Mar 16 • 44
FlowTok: Flowing Seamlessly Across Text and Image Tokens Paper • 2503.10772 • Published Mar 13 • 18
Kolmogorov-Arnold Attention: Is Learnable Attention Better For Vision Transformers? Paper • 2503.10632 • Published Mar 13 • 14
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published Mar 10 • 82
Automated Movie Generation via Multi-Agent CoT Planning Paper • 2503.07314 • Published Mar 10 • 43
ObjectMover: Generative Object Movement with Video Prior Paper • 2503.08037 • Published Mar 11 • 4
One-step Diffusion Models with f-Divergence Distribution Matching Paper • 2502.15681 • Published Feb 21 • 7
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention Paper • 2502.11089 • Published Feb 16 • 153
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 35
Small Models Struggle to Learn from Strong Reasoners Paper • 2502.12143 • Published Feb 17 • 34
MME-CoT: Benchmarking Chain-of-Thought in Large Multimodal Models for Reasoning Quality, Robustness, and Efficiency Paper • 2502.09621 • Published Feb 13 • 27