ECAPA2 / README.md
Jenthe's picture
Update README.md
e16bd36
|
raw
history blame
3.09 kB
metadata
license: cc-by-nc-4.0

ECAPA2 Speaker Embedding Extractor

ECAPA2 is a hybrid neural network architecture and training strategy for speaker recognition. The provided model is pre-trained and has an easy-to-use API to extract speaker embeddings.

Model Details

Usage Guide

Download model

You need to install the huggingface_hub package to download the ECAPA2 model:

pip install --upgrade huggingface_hub

Or with Conda:

conda install -c conda-forge huggingface_hub

Now you can download the model by executing the following code:

from huggingface_hub import hf_hub_download

model_file = hf_hub_download(repo_id=Jenthe/ECAPA2, filename='model.pt')
model = torch.jit.load(model_file, map_location='cpu')

Subsequent calls will load the previously downloaded model automatically.

Speaker Embedding Extraction

Extracting speaker embeddings is easy and only requires a few lines of code:

import torch
import torchaudio

audio = torchaudio.load('sample.wav')
embedding = model.extract_embedding(audio)

Hierarchical Feature Extraction

For the extraction of other hierachical features, a separate model function is provided:

feature = ecapa2_model.extract_feature(label='gfe1', type='mean')

The following table describes the available features:

Feature Type Description Usage Labels
Local Feature Non-uniform effective receptive field in the frequency dimension of each frame-level feature. Abstract features, probably usefull in tasks less related to speaker characteristics. lfe1, lfe2, lfe3, lfe4
Global Feature Uniform effective receptive field of each frame-level feature in the frequency dimension. Generally capture intra-speaker variance better then speaker embeddings. E.g. speaker profiling, emotion recognition. gfe1, gfe2, gfe3, pool
Speaker Embedding Uniform effective receptive field of each frame-level feature in the frequency dimension. Best for tasks directly depending on the speaker identity (as opposed to speaker characteristics). E.g. speaker verification, speaker diarization. embedding

Results

[More Information Needed]

Citation [optional]

BibTeX:

@INPROCEEDINGS{xxxxx,
  author={Jenthe Thienpondt and Kris Demuynck},
  booktitle={2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)}, 
  title={ECAPA2: A Hybrid Neural Network Architecture and Training Strategy for Robust Speaker Embeddings}, 
  year={2023},
  volume={},
  number={}
}

APA:

[More Information Needed]

Contact

Name: Jenthe Thienpondt

E-mail: [email protected]