4.5 Article

Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TETCI.2020.3014934

关键词

Deep learning; Deep neural network (DNN); drone; ego-noise reduction; microphone array

资金

  1. U.K. Engineering and Physical Sciences Research Council (EPSRC) [EP/K007491/1]
  2. ARTEMIS-JU
  3. UK Technology Strategy Board (Innovate UK) through the COPCAMS Project [332913]
  4. EPSRC [EP/K007491/1] Funding Source: UKRI

向作者/读者索取更多资源

This article introduces a novel approach using DNN for speech enhancement on drones, presenting three potential speech enhancement systems and demonstrating the effectiveness of DNN-TF in suppressing strong ego-noise in low signal-to-noise ratio environments.
This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than -15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据