Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual

Smart Turn v3

Smart Turn v3 is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.

Links

Model architecture

  • Backbone: Whisper Tiny encoder
  • Head: shallow linear classifier
  • Params: 8 M (int8)
  • Checkpoint: 8 MB ONNX

How to use

Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train pipecat-ai/smart-turn-v3