3.8 Proceedings Paper

End-to-end Convolutional Neural Networks for Sound Event Detection in Urban Environments

向作者/读者索取更多资源

We present a novel approach to tackle the problem of sound event detection (SED) in urban environments using end-to-end convolutional neural networks (CNN). It consists of a 1D CNN for extracting the energy on mel-frequency bands from the audio signal based on a simple filter bank, followed by a 2D CNN for the classification task. The main goal of this two-stage architecture is to bring more interpretability to the first layers of the network and to permit their reutilization in other problems of same the domain. We present a novel model to calculate the mel-spectrogam using a neural network that outperforms an existing work, both in its simplicity and its matching performance. Also, we implement a recently proposed approach to normalize the energy of the mel-spectrogram (per channel energy normalization, PCEN) as a layer of the neural network. We show how the parameters of this normalization can be learned by the network and why this is useful for SED on urban environments. We study how the training modifies the filter bank as well as the PCEN normalization parameters. The obtained system achieves classification results that are comparable to the state-of-the-art, while decreasing the number of parameters involved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

3.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据