VLM4D: Towards Spatiotemporal Awareness in Vision Language Models Paper • 2508.02095 • Published 20 days ago • 6
Universal Reasoner: A Single, Composable Plug-and-Play Reasoner for Frozen LLMs Paper • 2505.19075 • Published May 25 • 21
Zero4D: Training-Free 4D Video Generation From Single Video Using Off-the-Shelf Video Diffusion Model Paper • 2503.22622 • Published Mar 28 • 18
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity Paper • 2503.07677 • Published Mar 10 • 87
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation Paper • 2503.09151 • Published Mar 12 • 32