4.5 Article

Deep Learning Assisted Time-Frequency Processing for Speech Enhancement on Drones

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TETCI.2020.3014934

Keywords

Deep learning; Deep neural network (DNN); drone; ego-noise reduction; microphone array

Funding

  1. U.K. Engineering and Physical Sciences Research Council (EPSRC) [EP/K007491/1]
  2. ARTEMIS-JU
  3. UK Technology Strategy Board (Innovate UK) through the COPCAMS Project [332913]
  4. EPSRC [EP/K007491/1] Funding Source: UKRI

Ask authors/readers for more resources

This article introduces a novel approach using DNN for speech enhancement on drones, presenting three potential speech enhancement systems and demonstrating the effectiveness of DNN-TF in suppressing strong ego-noise in low signal-to-noise ratio environments.
This article fills the gap between the growing interest in signal processing based on Deep Neural Networks (DNN) and the new application of enhancing speech captured by microphones on a drone. In this context, the quality of the target sound is degraded significantly by the strong ego-noise from the rotating motors and propellers. We present the first work that integrates single-channel and multi-channel DNN-based approaches for speech enhancement on drones. We employ a DNN to estimate the ideal ratio masks at individual time-frequency bins, which are subsequently used to design three potential speech enhancement systems, namely single-channel ego-noise reduction (DNN-S), multi-channel beamforming (DNN-BF), and multi-channel time-frequency spatial filtering (DNN-TF). The main novelty lies in the proposed DNN-TF algorithm, which infers the noise-dominance probabilities at individual time-frequency bins from the DNN-estimated soft masks, and then incorporates them into a time-frequency spatial filtering framework for ego-noise reduction. By jointly exploiting the direction of arrival of the target sound, the time-frequency sparsity of the acoustic signals (speech and ego-noise) and the time-frequency noise-dominance probability, DNN-TF can suppress the ego-noise effectively in scenarios with very low signal-to-noise ratios (e.g. SNR lower than -15 dB), especially when the direction of the target sound is close to that of a source of the ego-noise. Experiments with real and simulated data show the advantage of DNN-TF over competing methods, including DNN-S, DNN-BF and the state-of-the-art time-frequency spatial filtering.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available