AI & ML interests

Speaker Intelligence Platform for developers

Recent Activity

hbredinΒ  updated a Space 5 days ago
pyannote/README
hbredinΒ  updated a Space 5 months ago
pyannote/README
hbredinΒ  published a model 5 months ago
pyannote/speaker-diarization-precision-2
View all activity

Identify who speaks when with pyannote

πŸ’š Simply detect, segment, label, and separate speakers in any language

Github Hugging Face Discord LinkedIn X
Playground Documentation

pyannoteAI facilitates the understanding of speakers and conversation context. We focus on identifying speakers and conversation metadata under conditions that reflect real conversations rather than controlled recordings.

🎀 What is speaker diarization?

Diarization

Speaker diarization is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question "who spoke when?". As the foundational layer of conversational AI, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.

▢️ Getting started

Install pyannote.audio latest release available from Latest release with either uv (recommended) or pip:

$ uv add pyannote.audio
$ pip install pyannote.audio

Enjoy state-of-the-art speaker diarization:

# download pretrained pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")

# perform speaker diarization locally
output = pipeline('/path/to/audio.wav')

# enjoy state-of-the-art speaker diarization
for turn, speaker in output.speaker_diarization:
    print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")

Read community-1 model card to make the most of it.

πŸ† State-of-the-art models

pyannoteAI research team trains cutting-edge speaker diarization models, thanks to Jean Zay πŸ‡«πŸ‡· supercomputer managed by GENCI πŸ’š. They come in two flavors:

  • pyannote.audio open models available on Huggingface and used by 140k+ developers over the world ;
  • premium models available on pyannoteAI cloud (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.
Benchmark (last updated in 2025-09) legacy (3.1) community-1 precision-2
AISHELL-4 12.2 11.7 11.4 πŸ†
AliMeeting (channel 1) 24.5 20.3 15.2 πŸ†
AMI (IHM) 18.8 17.0 12.9 πŸ†
AMI (SDM) 22.7 19.9 15.6 πŸ†
AVA-AVD 49.7 44.6 37.1 πŸ†
CALLHOME (part 2) 28.5 26.7 16.6 πŸ†
DIHARD 3 (full) 21.4 20.2 14.7 πŸ†
Ego4D (dev.) 51.2 46.8 39.0 πŸ†
MSDWild 25.4 22.8 17.3 πŸ†
RAMC 22.2 20.8 10.5 πŸ†
REPERE (phase2) 7.9 8.9 7.4 πŸ†
VoxConverse (v0.3) 11.2 11.2 8.5 πŸ†

Diarization error rate (in %, the lower, the better)

Our models achieve competitive performance across multiple public diarization datasets, explore pyannoteAI performance benchmark ➑️ https://www.pyannote.ai/benchmark

⏩️ Going further, better, and faster

precision-2 premium model further improves accuracy, processing speed, as well as brings additional features.

Features community-1 precision-2
Set exact/min/max number of speakers βœ… βœ…
Exclusive speaker diarization (for transcription) βœ… βœ…
Segmentation confidence scores ❌ βœ…
Speaker confidence scores ❌ βœ…
Voiceprinting ❌ βœ…
Speaker identification ❌ βœ…
STT Orchestration ❌ βœ…
Time to process 1h of audio (on H100) 37s 14s

Create a pyannoteAI account, change one line of code, and enjoy free cloud credits to try precision-2 premium diarization:

# perform premium speaker diarization on pyannoteAI cloud
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
better_output = pipeline('/path/to/audio.wav')

πŸ”Œ Get speaker-attributed transcripts

We host open-source transcription models like Nvidia Parakeet-tdt-0.6b-v3 and OpenAI whisper-large-v3-turbo with specialized STT + diarization reconciliation logic for speaker-attributed transcripts.

STT orchestration orchestrates pyannoteAI diarization Precision-2 with transcription services. Instead of running diarization and transcription separately, then reconciling outputs manually, you make one API call and receive speaker-attributed transcripts.

STT Orchestration

To use this feature, make a request to the diarize API endpoint with the transcription:true flag.

# pip install pyannoteai-sdk

from pyannoteai.sdk import Client
client = Client("your-api-key")

job_id = client.diarize(
    "[https://www.example/audio.wav](https://www.example/audio.wav)",
    transcription=True)

job_output = client.retrieve(job_id)

for word in job_output['output']['wordLevelTranscription']:
    print(word['start'], word['end'], word['speaker'], word['text'])

for turn in job_output['output']['turnLevelTranscription']:
    print(turn['start'], turn['end'], turn['speaker'], turn['text'])

datasets 0

None public yet