4.7 Article

Investigation of the effectiveness of time-frequency domain images and acoustic features in urban sound classification

期刊

APPLIED ACOUSTICS
卷 211, 期 -, 页码 -

出版社

ELSEVIER SCI LTD
DOI: 10.1016/j.apacoust.2023.109564

关键词

Smart city; Urban sound recognition; Deep learning; Audio-visual feature set; Acoustic analysis; Cepstral features; Environmental sound classification; Sound event recognition

向作者/读者索取更多资源

Rapid urbanization and population growth pose significant challenges to building livable and sustainable cities worldwide. This has led to an increase and diversification of urban sounds, which are now being transformed into information for the concept of smart cities. Two basic methods, signal processing and deep learning models, are used to classify urban sounds. This study investigated the effect of individual and hybrid use of features from both approaches on sound classification. Experiments on ESC-10 and UrbanSound8k datasets showed successful results, with mel-spectrogram, scalogram, and spectrogram images achieving the highest classification success.
Rapid urbanization and population growth worldwide seriously challenge building livable and sustainable cities. This increase causes the increase and diversification of urban sounds. They were transforming these sounds into information instead of just being heard, as noise plays an important role in the concept of smart cities. For this purpose, two basic methods are used to classify urban sounds. In the first of these, the sounds are processed by signal processing methods, and handcrafted features are obtained. In the other method, sounds are represented visually and classified with deep learning models. This study investigated the effect of the individual and hybrid use of features used in both approaches on the classification of urban sounds. In addition, a CNN model was created to classify hybrid features. The results obtained showed that both approaches produced successful results in classification. Among the visual representation methods, mel-spectrogram, scalogram, and spectrogram images achieved the highest classification success. Using mel-spectrogram and acoustic features and the SVM classifier positively affected accuracy. Experiments were performed on the ESC-10 and UrbanSound8k datasets. The highest accuracy for the ESC-10 was 98.33% when using the scalogram and acoustic features with the AVCNN model. The highest accuracy for UrbanSound8k was obtained as 97.70% by classifying the melspectrogram and acoustic features obtained from the AVCNN model with the SVM classifier.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据