☆ 4.6 Article

Deep Learning Models for Single-Channel Speech Enhancement on Drones

IEEE ACCESS (2023)

期刊

IEEE ACCESS

卷 11, 期 -, 页码 22993-23007

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/ACCESS.2023.3253719

关键词

Deep learning; drone audition; ego-noise reduction; single-channel; speech enhancement

类别

Computer Science, Information Systems Engineering, Electrical & Electronic Telecommunications

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this paper, the authors extensively assess the ability of single-channel deep learning approaches to ego-noise reduction on drones, given the challenge posed by strong ego-noise from rotating motors and propellers. The study trains and compares twelve representative deep neural network models, and finds that the time-frequency complex domain and UNet encoder-decoder architectures outperform other approaches in speech enhancement measures while providing a good trade-off with other criteria. The best-performing model is a UNet model operating in the time-frequency complex domain, which significantly improves speech quality at low input signal-to-noise ratios.

Speech enhancement for drone audition is made challenging by the strong ego-noise from the rotating motors and propellers, which leads to extremely low signal-to-noise ratios (e.g. SNR $< -15$ dB) at onboard microphones. In this paper, we extensively assess the ability of single-channel deep learning approaches to ego-noise reduction on drones. We train twelve representative deep neural network (DNN) models, covering three operation domains (time-frequency magnitude domain, time-frequency complex domain and end-to-end time domain) and three distinct architectures (sequential, encoder-decoder and generative). We critically discuss and compare the performance of these models in extremely low-SNR scenarios, ranging from -30 to 0 dB. We show that time-frequency complex domain and UNet encoder-decoder architectures outperform other approaches on speech enhancement measures while providing a good trade-off with other criteria, such as model size, computation complexity and context length. The best-performing model is a UNet model operating in the time-frequency complex domain, which, at input SNR -15 dB, improves ESTOI from 0.1 to 0.4, PESQ from 1.0 to 1.9 and SI-SDR from -15 dB to 3.7 dB. Based on the insights drawn from these findings, we discuss future research in drone ego-noise reduction.

Deep Learning Models for Single-Channel Speech Enhancement on Drones

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deep Learning Models for Single-Channel Speech Enhancement on Drones

期刊

IEEE ACCESS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文