4.7 Article

Spectral images based environmental sound classification using CNN with meaningful data augmentation

Journal

APPLIED ACOUSTICS
Volume 172, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.apacoust.2020.107581

Keywords

Environmental sound classification; Convolutional neural network; Spectrogram; Data augmentation; Transfer learning

Categories

Ask authors/readers for more resources

The study proposed an effective approach of spectral images based on environmental sound classification, utilizing meaningful data augmentation and achieving high accuracy across multiple datasets. The use of transfer learning models, such as ResNet-152 and DenseNet-161, resulted in the best-achieved results on ESC-10, ESC-50, and Us8k datasets.
In this study, an effective approach of spectral images based on environmental sound classification using Convolutional Neural Networks (CNN) with meaningful data augmentation is proposed. The feature used in this approach is the Mel spectrogram. Our approach is to define features from audio clips in the form of spectrogram images. The randomly selected CNN models used in this experiment are, a 7-layer or a 9-layer CNN learned from scratch. Also, various well-known deep learning structures with transfer learning and with a concept of freezing initial layers, training model, unfreezing the layers, again training the model with discriminative learning are considered. Three datasets, ESC-10, ESC-50, and Us8k are considered. As for the transfer learning methodology, 11 explicit pre-trained deep learning structures are used. In this study, instead of using those available data augmentation schemes for images, we proposed to have meaningful data augmentation by considering variations applied to the audio clips directly. The results show the effectiveness, robustness, and high accuracy of the proposed approach. The meaningful data augmentation can accomplish the highest accuracy with a lower error rate on all datasets by using transfer learning models. Among those used models, The ResNet-152 attained 99.04% for ESC-10 and 99.49% for Us8k datasets. DenseNet-161 gained 97.57% for ESC-50. From our understanding, they are the best-achieved results on these datasets. (C) 2020 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available