Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment Paper • 2502.04328 • Published 15 days ago • 26
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 29 days ago • 24
Video-MMMU: Evaluating Knowledge Acquisition from Multi-Discipline Professional Videos Paper • 2501.13826 • Published 29 days ago • 24
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published Jan 15 • 20
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published Jan 15 • 15
CityDreamer4D: Compositional Generative Model of Unbounded 4D Cities Paper • 2501.08983 • Published Jan 15 • 20
RepVideo: Rethinking Cross-Layer Representation for Video Generation Paper • 2501.08994 • Published Jan 15 • 15
Are VLMs Ready for Autonomous Driving? An Empirical Study from the Reliability, Data, and Metric Perspectives Paper • 2501.04003 • Published Jan 7 • 25
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published Jan 7 • 23
Diffusion as Shader: 3D-aware Video Diffusion for Versatile Video Generation Control Paper • 2501.03847 • Published Jan 7 • 23
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 35
Evaluation Agent: Efficient and Promptable Evaluation Framework for Visual Generative Models Paper • 2412.09645 • Published Dec 10, 2024 • 35
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion Paper • 2412.09626 • Published Dec 12, 2024 • 20