Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published May 5 • 83
Audio-Aware Large Language Models as Judges for Speaking Styles Paper • 2506.05984 • Published 5 days ago • 14
view article Article ScreenSuite - The most comprehensive evaluation suite for GUI Agents! 5 days ago • 34
CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training Paper • 2505.17589 • Published 19 days ago • 3
Perception, Reason, Think, and Plan: A Survey on Large Multimodal Reasoning Models Paper • 2505.04921 • Published May 8 • 176
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published Apr 18 • 128
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation Paper • 2504.09454 • Published Apr 13 • 12
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published Apr 11 • 47
view article Article Hugging Face and Cloudflare Partner to Make Real-Time Speech and Video Seamless with FastRTC By freddyaboulton • Apr 9 • 26
view article Article The NLP Course is becoming the LLM Course! By burtenshaw and 9 others • Apr 3 • 97
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 92