ShareGPT-4o-Image: Aligning Multimodal Models with GPT-4o-Level Image Generation Paper โข 2506.18095 โข Published 5 days ago โข 59
FusionAudio-1.2M: Towards Fine-grained Audio Captioning with Multimodal Contextual Fusion Paper โข 2506.01111 โข Published 26 days ago โข 29
S2S-Arena, Evaluating Speech2Speech Protocols on Instruction Following with Paralinguistic Information Paper โข 2503.05085 โข Published Mar 7 โข 48
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper โข 2502.12900 โข Published Feb 18 โข 86
Soundwave: Less is More for Speech-Text Alignment in LLMs Paper โข 2502.12900 โข Published Feb 18 โข 86
On the Compositional Generalization of Multimodal LLMs for Medical Imaging Paper โข 2412.20070 โข Published Dec 28, 2024 โข 47
HuatuoGPT-o1, Towards Medical Complex Reasoning with LLMs Paper โข 2412.18925 โข Published Dec 25, 2024 โข 105