SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics Paper • 2506.01844 • Published 8 days ago • 86
EmergentTTS-Eval: Evaluating TTS Models on Complex Prosodic, Expressiveness, and Linguistic Challenges Using Model-as-a-Judge Paper • 2505.23009 • Published 13 days ago • 17
OWSM v4: Improving Open Whisper-Style Speech Models via Data Scaling and Cleaning Paper • 2506.00338 • Published 11 days ago • 8
MaziyarPanahi/Llama-Nemotron-Post-Training-Dataset-v1-ShareGPT Viewer • Updated 9 days ago • 30.2M • 618 • 39
view article Article nanoVLM: The simplest repository to train your VLM in pure PyTorch By ariG23498 and 6 others • 21 days ago • 145
Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers Paper • 2505.21497 • Published 14 days ago • 96