LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis Paper • 2505.02625 • Published 1 day ago • 10
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play Paper • 2505.02707 • Published 1 day ago • 56
Training and Inference Efficiency of Encoder-Decoder Speech Models Paper • 2503.05931 • Published Mar 7 • 3
LLMVoX: Autoregressive Streaming Text-to-Speech Model for Any LLM Paper • 2503.04724 • Published Mar 6 • 70
view article Article LLM Inference on Edge: A Fun and Easy Guide to run LLMs via React Native on your Phone! Mar 7 • 53
OWLS: Scaling Laws for Speech Recognition and Translation Collection 🦉 A suite of Whisper-style models from 250M to 18B parameters. Trained on up to 360K hours of data. 16k sampling rate. • 8 items • Updated 4 days ago • 5
Open Whisper-style Speech Models (OWSM) Collection Fully open Whisper-style speech foundation models developed by CMU WAVLab: https://www.wavlab.org/activities/2024/owsm/ • 15 items • Updated Feb 6 • 5
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 70
view article Article Introducing Three New Serverless Inference Providers: Hyperbolic, Nebius AI Studio, and Novita 🔥 Feb 18 • 99