4.6 Article

Attention based convolutional recurrent neural network for environmental sound classification

Journal

NEUROCOMPUTING
Volume 453, Issue -, Pages 896-903

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2020.08.069

Keywords

Environmental sound classification; Convolutional recurrent neural network; Attention mechanism

Funding

  1. National Science and Technology Major Project [2018ZX03001009]
  2. Shanghai Institute for Advanced Communication and Data Science (SICS)

Ask authors/readers for more resources

Environmental sound classification is a challenging problem that heavily relies on the effectiveness of representative features. The proposed frame-level attention model focuses on semantically relevant and salient parts to improve ESC results. Experimental results showed the effectiveness of the method in achieving state-of-the-art or competitive classification accuracy.
Environmental sound classification (ESC) is a challenging problem due to the complexity of sounds. The classification performance is heavily dependent on the effectiveness of representative features extracted from the environmental sounds. However, ESC often suffers from the semantically irrelevant frames and silent frames. In order to deal with this, we employ a frame-level attention model to focus on the seman-tically relevant frames and salient frames. Specifically, we first propose a convolutional recurrent neural network to learn spectro-temporal features and temporal correlations. Then, we extend our convolutional RNN model with a frame-level attention mechanism to learn discriminative feature representations for ESC. We investigated the classification performance when using different attention scaling function and applying different layers. Experiments were conducted on ESC-50 and ESC-10 datasets. Experimental results demonstrated the effectiveness of the proposed method and our method achieved the state-of-the-art or competitive classification accuracy with lower computational complexity. We also visualized our attention results and observed that the proposed attention mechanism was able to lead the network tofocus on the semantically relevant parts of environmental sounds. (c) 2020 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available