Journal
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT
Volume 72, Issue -, Pages -Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIM.2023.3241983
Keywords
Direction-of-arrival estimation; Acoustics; Estimation; Convolutional neural networks; Feature extraction; Task analysis; Deep learning; Direction-of-arrival (DOA) detection; machine learning; microphone arrays; moving acoustic sources; neural networks (NNs)
Ask authors/readers for more resources
This article evaluates the performance of a deep learning classification system for localizing moving sound sources and investigates the impact of key parameters in feature extraction and model training. The results show that window size has a significant effect on the performance of moving sources but not static sources, sequence length affects the performance of recurrent architectures, and a temporal convolutional neural network outperforms recurrent and feedforward networks for moving sound sources.
Sound source localization is an important task for several applications and the use of deep learning for this task has recently become a popular research topic. While a number of previous works have focused on static sound sources, in this article, we evaluate the performance of a deep learning classification system for localization of moving sound sources. In particular, we evaluate the effect of key parameters at the levels of feature extraction (e.g., short-time Fourier transform (STFT) parameters) and model training (e.g., neural network (NN) architectures). We evaluate the performance of different settings in terms of precision and F-score, in a multiclass multilabel classification framework. In our previous work for localization of moving sound sources, we investigated feedforward NNs (FNNs) under different acoustic conditions and STFT parameters and showed that the presence of some reverberation in the training dataset can help in achieving better detection for the direction of arrival of the sources. In this article, we extend the work to show that the window size does not affect the performance of static sources but highly affects the performance of moving sources, a sequence length has a significant effect on the performance of recurrent architectures, and a temporal convolutional NN can outperform both recurrent and feedforward networks for moving sound sources.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available