CapSpeech: Enabling Downstream Applications in Style-Captioned Text-to-Speech Paper • 2506.02863 • Published 3 days ago • 6
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder Paper • 2505.07916 • Published 25 days ago • 124
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published 8 days ago • 66
SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline Paper • 2505.19314 • Published 12 days ago • 4
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published 14 days ago • 63
Shifting AI Efficiency From Model-Centric to Data-Centric Compression Paper • 2505.19147 • Published 12 days ago • 144
Tool-Star: Empowering LLM-Brained Multi-Tool Reasoner via Reinforcement Learning Paper • 2505.16410 • Published 16 days ago • 55
Vox-Profile: A Speech Foundation Model Benchmark for Characterizing Diverse Speaker and Speech Traits Paper • 2505.14648 • Published 17 days ago • 8