LLaVA-Interactive: An All-in-One Demo for Image Chat, Segmentation, Generation and Editing Paper • 2311.00571 • Published Nov 1, 2023 • 43
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation Paper • 2505.04512 • Published May 7 • 35
M^3GPT: An Advanced Multimodal, Multitask Framework for Motion Comprehension and Generation Paper • 2405.16273 • Published May 25, 2024 • 1