An Empirical Study of GPT-4o Image Generation Capabilities Paper • 2504.05979 • Published 7 days ago • 60
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper • 2504.02160 • Published 12 days ago • 32
OmniSVG: A Unified Scalable Vector Graphics Generation Model Paper • 2504.06263 • Published 6 days ago • 139
Concept Lancet: Image Editing with Compositional Representation Transplant Paper • 2504.02828 • Published 11 days ago • 16
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning Paper • 2504.02949 • Published 11 days ago • 18
Comprehensive Relighting: Generalizable and Consistent Monocular Human Relighting and Harmonization Paper • 2504.03011 • Published 11 days ago • 9
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published 12 days ago • 40
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper • 2504.02436 • Published 12 days ago • 35
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published 16 days ago • 49
GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors Paper • 2504.01016 • Published 13 days ago • 28
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper • 2503.24379 • Published 14 days ago • 74
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published 15 days ago • 92
SketchVideo: Sketch-based Video Generation and Editing Paper • 2503.23284 • Published 16 days ago • 22
MoCha: Towards Movie-Grade Talking Character Synthesis Paper • 2503.23307 • Published 16 days ago • 120