Long Video Diffusion Generation with Segmented Cross-Attention and Content-Rich Video Data Curation Paper • 2412.01316 • Published Dec 2, 2024 • 9
Centroid-centered Modeling for Efficient Vision Transformer Pre-training Paper • 2303.04664 • Published Mar 8, 2023
ContPhy: Continuum Physical Concept Learning and Reasoning from Videos Paper • 2402.06119 • Published Feb 9, 2024 • 1
3D-VLA: A 3D Vision-Language-Action Generative World Model Paper • 2403.09631 • Published Mar 14, 2024 • 10