Voice Activity Detection
ONNX
speech-processing
semantic-vad
multilingual
smart-turn-v3 / README.md
marcus-daily
Fix indentation
f6ab259
|
raw
history blame
958 Bytes
metadata
pipeline_tag: voice-activity-detection
license: bsd-2-clause
tags:
  - speech-processing
  - semantic-vad
  - multilingual
datasets:
  - pipecat-ai/smart-turn-data-v3-train
  - pipecat-ai/smart-turn-data-v3-test

Smart Turn v3

Smart Turn v3 is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.

Links

Model architecture

  • Backbone: Whisper Tiny encoder
  • Head: shallow linear classifier
  • Params: 8 M (int8)
  • Checkpoint: 8 MB ONNX

How to use

Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.