metadata
license: cc-by-nc-4.0
ECAPA2 Speaker Embedding Extractor
ECAPA2 is a hybrid neural network architecture and training strategy for speaker recognition. The provided model is pre-trained and has an easy-to-use API to extract speaker embeddings.
Model Details

Usage Guide
Download model
You need to install the huggingface_hub package to download the ECAPA2 model:
pip install --upgrade huggingface_hub
Or with Conda:
conda install -c conda-forge huggingface_hub
Now you can download the model by executing the following code:
from huggingface_hub import hf_hub_download
model_file = hf_hub_download(repo_id=Jenthe/ECAPA2, filename='model.pt')
model = torch.jit.load(model_file, map_location='cpu')
Subsequent calls will load the previously downloaded model automatically.
Speaker Embedding Extraction
Extracting speaker embeddings is easy and only requires a few lines of code:
import torch
import torchaudio
audio = torchaudio.load('sample.wav')
embedding = model.extract_embedding(audio)
Hierarchical Feature Extraction
For the extraction of other hierachical features, a separate model function is provided:
feature = ecapa2_model.extract_feature(label='gfe1', type='mean')
The following table describes the available features:
Feature Type | Description | Usage | Labels |
---|---|---|---|
Local Feature | Non-uniform effective receptive field in the frequency dimension of each frame-level feature. | Abstract features, probably usefull in tasks less related to speaker characteristics. | lfe1, lfe2, lfe3, lfe4 |
Global Feature | Uniform effective receptive field of each frame-level feature in the frequency dimension. | Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. | gfe1, gfe2, gfe3, pool |
Speaker Embedding | Uniform effective receptive field of each frame-level feature in the frequency dimension. | Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. | embedding |
Results
[More Information Needed]
Citation [optional]
BibTeX:
@INPROCEEDINGS{xxxxx,
author={Jenthe Thienpondt and Kris Demuynck},
booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
title={ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings},
year={2023},
volume={},
number={}
}
APA:
[More Information Needed]
Contact
Name: Jenthe Thienpondt
E-mail: [email protected]