metadata
pipeline_tag: voice-activity-detection
license: bsd-2-clause
tags:
- speech-processing
- semantic-vad
- multilingual
datasets:
- pipecat-ai/smart-turn-data-v3-train
- pipecat-ai/smart-turn-data-v3-test
Smart Turn v3
Smart Turn v3 is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.
Links
- Blog post: Smart Turn v3
- GitHub repo with training and inference code
- Datasets with training and inference code
Model architecture
- Backbone: Whisper Tiny encoder
- Head: shallow linear classifier
- Params: 8 M (int8)
- Checkpoint: 8 MB ONNX
How to use
Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.