Wan: Open and Advanced Large-Scale Video Generative Models Paper • 2503.20314 • Published 1 day ago • 29
Ditto: A Simple and Efficient Approach to Improve Sentence Embeddings Paper • 2305.10786 • Published May 18, 2023
MossFormer2: Combining Transformer and RNN-Free Recurrent Network for Enhanced Time-Domain Monaural Speech Separation Paper • 2312.11825 • Published Dec 19, 2023
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 49
HiFi-SR: A Unified Generative Transformer-Convolutional Adversarial Network for High-Fidelity Speech Super-Resolution Paper • 2501.10045 • Published Jan 17 • 9
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation Paper • 2503.00084 • Published 28 days ago
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 49
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 49
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Paper • 2501.06282 • Published Jan 10 • 49
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Paper • 2412.10117 • Published Dec 13, 2024 • 3
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Paper • 2412.10117 • Published Dec 13, 2024 • 3