EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper β’ 2505.24417 β’ Published 8 days ago β’ 12
Alchemist: Turning Public Text-to-Image Data into Generative Gold Paper β’ 2505.19297 β’ Published 12 days ago β’ 73
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action Paper β’ 2505.01583 β’ Published May 2 β’ 9
YoChameleon: Personalized Vision and Language Generation Paper β’ 2504.20998 β’ Published Apr 29 β’ 11
VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model Paper β’ 2504.07615 β’ Published Apr 10 β’ 32
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper β’ 2504.08685 β’ Published Apr 11 β’ 125
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation Paper β’ 2504.02160 β’ Published Apr 2 β’ 37
SkyReels-A2: Compose Anything in Video Diffusion Transformers Paper β’ 2504.02436 β’ Published Apr 3 β’ 37
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper β’ 2504.01956 β’ Published Apr 2 β’ 40
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation Paper β’ 2503.20672 β’ Published Mar 26 β’ 14
Latent Space Super-Resolution for Higher-Resolution Image Generation with Diffusion Models Paper β’ 2503.18446 β’ Published Mar 24 β’ 12
Unleashing Vecset Diffusion Model for Fast Shape Generation Paper β’ 2503.16302 β’ Published Mar 20 β’ 44
Concat-ID: Towards Universal Identity-Preserving Video Synthesis Paper β’ 2503.14151 β’ Published Mar 18 β’ 10
Personalize Anything for Free with Diffusion Transformer Paper β’ 2503.12590 β’ Published Mar 16 β’ 44