Papers
arxiv:2104.01541

Attention Back-end for Automatic Speaker Verification with Multiple Enrollment Utterances

Published on Apr 4, 2021
Authors:
,
,
,
,

Abstract

Probabilistic linear discriminant analysis (PLDA) or cosine similarity have been widely used in traditional speaker verification systems as back-end techniques to measure pairwise similarities. To make better use of multiple enrollment utterances, we propose a novel attention back-end model, which can be used for both text-independent (TI) and text-dependent (TD) speaker verification, and employ scaled-dot self-attention and feed-forward self-attention networks as architectures that learn the intra-relationships of the enrollment utterances. In order to verify the proposed attention back-end, we conduct a series of experiments on CNCeleb and VoxCeleb datasets by combining it with several sate-of-the-art speaker encoders including TDNN and ResNet. Experimental results using multiple enrollment utterances on CNCeleb show that the proposed attention back-end model leads to lower EER and minDCF score than the PLDA and cosine similarity counterparts for each speaker encoder and an experiment on VoxCeleb indicate that our model can be used even for single enrollment case.

Community

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2104.01541 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2104.01541 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2104.01541 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.