☆ 4.6 Article

Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

Journal

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Volume 31, Issue -, Pages 486-499

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASLP.2022.3221042

Keywords

Speaker verification; parent embedding learning; partial adaptive score normalization

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, two novel approaches are proposed to improve the generalization ability of speaker verification and reduce interference from other speakers. Experimental results show that these methods can significantly enhance system performance.

The ability to generalize to mismatches between training and testing conditions and resist interference from other speakers is crucial for the performance of speaker verification. In this paper, we propose two novel approaches to improve the generalization ability to deal with the mismatched recorded scenarios and languages in test conditions and to reduce the influence of interference from other speakers on the similarity measurement of two speaker embeddings. First, parent embedding learning (PEL) is used for model training, which exploits the generalization ability of the shared structure to improve the representation of speaker embeddings. Second, partial adaptive score normalization (PAS-Norm) is used to reduce the influence of interference from other speakers on embedding-based similarity measures. In the experiments, the speaker embedding models are trained using the VoxCeleb2 dataset, and the performance is evaluated on four other datasets under different conditions, including VoxCeleb1, Librispeech, SITW, and CN-Celeb datasets. In the experiments on VoxCeleb1, evaluation results considering a large number of verification speakers and identity restrictions show that the proposed PEL-based system reduces the EER by 6.0% and 4.9% in these two cases, respectively, compared to the state-of-the-art (SOTA) system. Furthermore, in the experiments evaluating speaker verification in mismatch conditions on SITW and CN-Celeb, the proposed PEL-based system also outperforms the SOTA system. In the language mismatched conditions, the EER is reduced by 8.3%. For the evaluation of the influence of interference from other speakers, the EER is significantly reduced by 24.4% when PAS-Norm is used instead of the baseline AS-Norm score normalization method.

Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification

Journal

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Generalization Ability Improvement of Speaker Representation and Anti-Interference for Speaker Verification

Journal

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper