LongVie: Multimodal-Guided Controllable Ultra-Long Video Generation Paper • 2508.03694 • Published 18 days ago • 50
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding Paper • 2507.15028 • Published Jul 20 • 20
FreeMorph: Tuning-Free Generalized Image Morphing with Diffusion Model Paper • 2507.01953 • Published Jul 2 • 19
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 9
FSAR: Federated Skeleton-based Action Recognition with Adaptive Topology Structure and Knowledge Distillation Paper • 2306.11046 • Published Jun 19, 2023
Towards Language-Driven Video Inpainting via Multimodal Large Language Models Paper • 2401.10226 • Published Jan 18, 2024 • 1
Mugs: A Multi-Granular Self-Supervised Learning Framework Paper • 2203.14415 • Published Mar 27, 2022
Scaling Supervised Local Learning with Augmented Auxiliary Networks Paper • 2402.17318 • Published Feb 27, 2024
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models Paper • 2411.13503 • Published Nov 20, 2024 • 35
Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT Paper • 2502.06782 • Published Feb 10 • 14
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models Paper • 2501.08453 • Published Jan 14 • 1
IVY-FAKE: A Unified Explainable Framework and Benchmark for Image and Video AIGC Detection Paper • 2506.00979 • Published Jun 1 • 13
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Paper • 2506.03123 • Published Jun 3 • 14
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers Paper • 2506.07986 • Published Jun 9 • 19
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers Paper • 2506.07986 • Published Jun 9 • 19
DCM: Dual-Expert Consistency Model for Efficient and High-Quality Video Generation Paper • 2506.03123 • Published Jun 3 • 14