4.7 Article

Characterization of Moving Sound Sources Direction-of-Arrival Estimation Using Different Deep Learning Architectures

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIM.2023.3241983

关键词

Direction-of-arrival estimation; Acoustics; Estimation; Convolutional neural networks; Feature extraction; Task analysis; Deep learning; Direction-of-arrival (DOA) detection; machine learning; microphone arrays; moving acoustic sources; neural networks (NNs)

向作者/读者索取更多资源

This article evaluates the performance of a deep learning classification system for localizing moving sound sources and investigates the impact of key parameters in feature extraction and model training. The results show that window size has a significant effect on the performance of moving sources but not static sources, sequence length affects the performance of recurrent architectures, and a temporal convolutional neural network outperforms recurrent and feedforward networks for moving sound sources.
Sound source localization is an important task for several applications and the use of deep learning for this task has recently become a popular research topic. While a number of previous works have focused on static sound sources, in this article, we evaluate the performance of a deep learning classification system for localization of moving sound sources. In particular, we evaluate the effect of key parameters at the levels of feature extraction (e.g., short-time Fourier transform (STFT) parameters) and model training (e.g., neural network (NN) architectures). We evaluate the performance of different settings in terms of precision and F-score, in a multiclass multilabel classification framework. In our previous work for localization of moving sound sources, we investigated feedforward NNs (FNNs) under different acoustic conditions and STFT parameters and showed that the presence of some reverberation in the training dataset can help in achieving better detection for the direction of arrival of the sources. In this article, we extend the work to show that the window size does not affect the performance of static sources but highly affects the performance of moving sources, a sequence length has a significant effect on the performance of recurrent architectures, and a temporal convolutional NN can outperform both recurrent and feedforward networks for moving sound sources.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据