InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions Paper โข 2506.09984 โข Published Jun 11 โข 15
Unleashing Vecset Diffusion Model for Fast Shape Generation Paper โข 2503.16302 โข Published Mar 20 โข 44
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Paper โข 2410.21220 โข Published Oct 28, 2024 โข 10
Vision Search Assistant: Empower Vision-Language Models as Multimodal Search Engines Paper โข 2410.21220 โข Published Oct 28, 2024 โข 10 โข 2