Learning Video Generation for Robotic Manipulation with Collaborative Trajectory Control Paper • 2506.01943 • Published 5 days ago • 23
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction Paper • 2505.22613 • Published 10 days ago • 7
MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios Paper • 2505.21333 • Published 11 days ago • 39
Training-Free Efficient Video Generation via Dynamic Token Carving Paper • 2505.16864 • Published 16 days ago • 21
Flow-GRPO: Training Flow Matching Models via Online RL Paper • 2505.05470 • Published 30 days ago • 78
Koala-36M: A Large-scale Video Dataset Improving Consistency between Fine-grained Conditions and Video Content Paper • 2410.08260 • Published Oct 10, 2024
SPF-Portrait: Towards Pure Portrait Customization with Semantic Pollution-Free Fine-tuning Paper • 2504.00396 • Published Apr 1 • 4
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment Paper • 2503.23907 • Published Mar 31 • 2
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
FullDiT: Multi-Task Video Generative Foundation Model with Full Attention Paper • 2503.19907 • Published Mar 25 • 8
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published Mar 31 • 77
Position: Interactive Generative Video as Next-Generation Game Engine Paper • 2503.17359 • Published Mar 21 • 62
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18 • 27
DiffMoE: Dynamic Token Selection for Scalable Diffusion Transformers Paper • 2503.14487 • Published Mar 18 • 27
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs Paper • 2408.11813 • Published Aug 21, 2024 • 12