☆ 4.6 Article

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2023)

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

卷 31, 期 -, 页码 1561-1573

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TASLP.2023.3265199

关键词

Feature extraction; Adaptation models; Task analysis; Neural networks; Adaptive systems; Voice activity detection; Recording; Speaker diarization; neural networks; memory-aware speaker embedding; dictionary learning; attention network; adaptive refinement

类别

Acoustics Engineering, Electrical & Electronic

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes a neural speaker diarization (NSD) network architecture that improves speaker separation through multiple key components. The proposed method outperforms other techniques in realistic operating scenarios.

In this paper, we propose a neural speaker diarization (NSD) network architecture consisting of three key components. First, a memory-aware multi-speaker embedding (MA-MSE) mechanism is proposed to facilitate a dynamical refinement of speaker embedding to reduce a potential data mismatch between the speaker embedding extraction and the NSD network. Next, a speaker selection procedure is introduced to handle situations where the detected number of speakers is different from the assumed speaker size in the NSD network. Finally, an adaptive procedure is proposed to improve the required prior information for the nonoverlap speech segments in a given utterance during each iteration. We call our proposed framework adaptive neural speaker diarization with memory-aware multi-speaker embedding (ANSD-MA-MSE). Our method improves diarization performance in realistic operating scenarios, such as adverse acoustic environments, domain mismatches, and a varying, rather than fixed, number of speakers. Having been tested on both the AMI corpus and the DIHARD-III evaluation sets, our proposed approach consistently outperforms other state-of-the-art techniques in diarization error rates, including the results reported by the best single-model system in the DIHARD-III challenge.

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ANSD-MA-MSE: Adaptive Neural Speaker Diarization Using Memory-Aware Multi-Speaker Embedding

期刊

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文