4.5 Article

Blind Speech Separation and Dereverberation using neural beamforming

期刊

SPEECH COMMUNICATION
卷 140, 期 -, 页码 29-41

出版社

ELSEVIER
DOI: 10.1016/j.specom.2022.03.004

关键词

Multi-channel speaker separation; Beamforming; Dereverberation; Speaker identification; Triplet mining

资金

  1. Austrian Science Fund (FWF) [P27803-N15]
  2. Austrian Science Fund (FWF) [P27803] Funding Source: Austrian Science Fund (FWF)

向作者/读者索取更多资源

The paper introduces the BSSD network, which achieves speaker separation, dereverberation, and speaker identification simultaneously. Various techniques like predefined spatial cues, neural beamforming, embedding vectors, and triplet mining are utilized for these tasks. The system is evaluated based on SI-SDR, WER, and EER metrics.
In this paper, we present the Blind Speech Separation and Dereverberation (BSSD) network, which performs simultaneous speaker separation, dereverberation and speaker identification in a single neural network. Speaker separation is guided by a set of predefined spatial cues. Dereverberation is performed by using neural beamforming, and speaker identification is aided by embedding vectors and triplet mining. We introduce a frequency-domain model which uses complex-valued neural networks, and a time-domain variant which performs beamforming in latent space. Further, we propose a block-online mode to process longer audio recordings, as they occur in meeting scenarios. We evaluate our system in terms of Scale Independent Signal to Distortion Ratio (SI-SDR), Word Error Rate (WER) and Equal Error Rate (EER).

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据