☆ 3.8 Proceedings Paper

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) (2017)

期刊

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

卷 -, 期 -, 页码 3461-3466

出版社

IEEE

DOI: 10.1109/IJCNN.2017.7966291

关键词

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Engineering, Electrical & Electronic

资金

Engineering and Physical Sciences Research Council (EPSRC) of the UK [EP/N014111/1]
China Scholarship Council (CSC)
Engineering and Physical Sciences Research Council (EPSRC) of the UK [EP/N014111/1]
China Scholarship Council (CSC)
EPSRC [EP/N014111/1] Funding Source: UKRI

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Environmental audio tagging is a newly proposed task to predict the presence or absence of a specific audio event in a chunk. Deep neural network (DNN) based methods have been successfully adopted for predicting the audio tags in the domestic audio scene. In this paper, we propose to use a convolutional neural network (CNN) to extract robust features from mel-filter banks (MFBs), spectrograms or even raw waveforms for audio tagging. Gated recurrent unit (GRU) based recurrent neural networks (RNNs) are then cascaded to model the long-term temporal structure of the audio signal. To complement the input information, an auxiliary CNN is designed to learn on the spatial features of stereo recordings. We evaluate our proposed methods on Task 4 (audio tagging) of the Detection and Classification of Acoustic Scenes and Events 2016 (DCASE 2016) challenge. Compared with our recent DNN-based method, the proposed structure can reduce the equal error rate (EER) from 0.13 to 0.11 on the development set. The spatial features can further reduce the EER to 0.10. The performance of the end-to-end learning on raw waveforms is also comparable. Finally, on the evaluation set, we get the state-of-the-art performance with 0.12 EER while the performance of the best existing system is 0.15 EER.

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

期刊

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging

期刊

2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文