3.8 Proceedings Paper

Channel Interdependence Enhanced Speaker Embeddings for Far-Field Speaker Verification

出版社

IEEE
DOI: 10.1109/ISCSLP49672.2021.9362108

关键词

Far-field speaker verification; speaker embedding; Res2Net; Squeeze-and-excitation; channel-dependent attention

资金

  1. Huawei Technologies Co., Ltd [YBN2019095008]
  2. National Natural Science Foundation of China (NSFC) [61971371]

向作者/读者索取更多资源

In this work, the challenges of recognizing speakers from a distance using far-field microphones are addressed by enhancing the frame-level processing and feature aggregation of x-vector networks through the use of Res2Net blocks, squeeze-and-excitation units, and a channel-dependent attention mechanism. The proposed CE-Res2Net architecture shows a relative improvement of about 16% in EER and 17% in minDCF on the VOiCES 2019 Challenge's evaluation set, demonstrating the effectiveness of the approach.
Recognizing speakers from a distance using far-field microphones is difficult because of the environmental noise and reverberation distortion. In this work, we tackle these problems by strengthening the frame-level processing and feature aggregation of x-vector networks. Specifically, we restructure the dilated convolutional layers into Res2Net blocks to generate multi-scale frame-level features. To exploit the relationship between the channels, we introduce squeeze-and-excitation (SE) units to rescale the channels' activations and investigate the best places to put these SE units in the Res2Net blocks. Based on the hypothesis that layers at different depth contain speaker information at different granularity levels, multi-block feature aggregation is introduced to propagate and aggregate the features at various depths. To optimally weight the channels and frames during feature aggregation, we propose a channel-dependent attention mechanism. Combining all of these enhancements leads to a network architecture called channel-interdependence enhanced Res2Net (CE-Res2Net). Results show that the proposed network achieves a relative improvement of about 16% in EER and 17% in minDCF on the VOiCES 2019 Challenge's evaluation set.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据