PointArena: Probing Multimodal Grounding Through Language-Guided Pointing Paper • 2505.09990 • Published May 15 • 11
Style Customization of Text-to-Vector Generation with Image Diffusion Priors Paper • 2505.10558 • Published May 15 • 15
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis Paper • 2505.10046 • Published May 15 • 9