Unified Multimodal Chain-of-Thought Reward Model through Reinforcement Fine-Tuning Paper • 2505.03318 • Published 6 days ago • 84
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published 6 days ago • 107
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published 6 days ago • 78