GRE Suite: Geo-localization Inference via Fine-Tuned Vision-Language Models and Enhanced Reasoning Chains Paper • 2505.18700 • Published 14 days ago • 4
EasyText: Controllable Diffusion Transformer for Multilingual Text Rendering Paper • 2505.24417 • Published 8 days ago • 12
GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control Paper • 2505.22421 • Published 9 days ago • 11
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published Apr 24 • 88
InstantCharacter: Personalize Any Characters with a Scalable Diffusion Transformer Framework Paper • 2504.12395 • Published Apr 16 • 17
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper • 2503.23461 • Published Mar 30 • 95
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks Paper • 1910.01279 • Published Oct 3, 2019