Smart Turn v3
Smart Turn v3 is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.
Links
- Blog post: Smart Turn v3
- GitHub repo with training and inference code
- Datasets with training and inference code
Model architecture
- Backbone: Whisper Tiny encoder
- Head: shallow linear classifier
- Params: 8 M (int8)
- Checkpoint: 8 MB ONNX
How to use
Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support