smart-turn-v3 / README.md

marcus-daily

Fix indentation

f6ab259 8 days ago

958 Bytes

metadata

pipeline_tag: voice-activity-detection
license: bsd-2-clause
tags:
  - speech-processing
  - semantic-vad
  - multilingual
datasets:
  - pipecat-ai/smart-turn-data-v3-train
  - pipecat-ai/smart-turn-data-v3-test

Smart Turn v3

Smart Turn v3 is an open‑source semantic Voice Activity Detection (VAD) model that tells you whether a speaker has finished their turn by analysing the raw waveform, not the transcript.

Model architecture

Backbone: Whisper Tiny encoder
Head: shallow linear classifier
Params: 8 M (int8)
Checkpoint: 8 MB ONNX

How to use

Please see the blog post and GitHub repo for more information on using the model, either standalone or with Pipecat.

pipecat-ai
/

smart-turn-v3

Smart Turn v3

Links

Model architecture

How to use

Smart Turn v3

Links

Model architecture

How to use

Smart Turn v3