4.7 Article

Characterization of Moving Sound Sources Direction-of-Arrival Estimation Using Different Deep Learning Architectures

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIM.2023.3241983

Keywords

Direction-of-arrival estimation; Acoustics; Estimation; Convolutional neural networks; Feature extraction; Task analysis; Deep learning; Direction-of-arrival (DOA) detection; machine learning; microphone arrays; moving acoustic sources; neural networks (NNs)

Ask authors/readers for more resources

This article evaluates the performance of a deep learning classification system for localizing moving sound sources and investigates the impact of key parameters in feature extraction and model training. The results show that window size has a significant effect on the performance of moving sources but not static sources, sequence length affects the performance of recurrent architectures, and a temporal convolutional neural network outperforms recurrent and feedforward networks for moving sound sources.
Sound source localization is an important task for several applications and the use of deep learning for this task has recently become a popular research topic. While a number of previous works have focused on static sound sources, in this article, we evaluate the performance of a deep learning classification system for localization of moving sound sources. In particular, we evaluate the effect of key parameters at the levels of feature extraction (e.g., short-time Fourier transform (STFT) parameters) and model training (e.g., neural network (NN) architectures). We evaluate the performance of different settings in terms of precision and F-score, in a multiclass multilabel classification framework. In our previous work for localization of moving sound sources, we investigated feedforward NNs (FNNs) under different acoustic conditions and STFT parameters and showed that the presence of some reverberation in the training dataset can help in achieving better detection for the direction of arrival of the sources. In this article, we extend the work to show that the window size does not affect the performance of static sources but highly affects the performance of moving sources, a sequence length has a significant effect on the performance of recurrent architectures, and a temporal convolutional NN can outperform both recurrent and feedforward networks for moving sound sources.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available