☆ 3.8 Proceedings Paper

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) (2021)

期刊

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

卷 -, 期 -, 页码 -

出版社

IEEE

DOI: 10.1109/IJCNN52387.2021.9533654

关键词

audio; classification; ESC; Fourier transform; fbsp-wavelet

类别

Computer Science, Artificial Intelligence Computer Science, Hardware & Architecture Engineering, Electrical & Electronic

资金

TU Kaiserslautern CS PhD scholarship program
BMBF project ExplAINN [01IS19074]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Environmental Sound Classification (ESC) is a rapidly evolving field that has shown benefits in applying visual domain techniques to audio tasks. The proposed fbsp-layer, combined with a high-performance audio classification model, outperforms previous methods, achieving high accuracy on standard datasets. The study also evaluates different pre-training strategies and the model's robustness against signal perturbations.

Environmental Sound Classification (ESC) is a rapidly evolving field that recently demonstrated the advantages of application of visual domain techniques to the audio-related tasks. Previous studies indicate that the domain-specific modification of cross-domain approaches show a promise in pushing the whole area of ESC forward. In this paper, we present a new time-frequency transformation layer that is based on complex frequency B-spline (fbsp) wavelets. Being used with a high-performance audio classification model, the proposed fbsp-layer provides an accuracy improvement over the previously used Short-Time Fourier Transform (STFT) on standard datasets. We also investigate the influence of different pre-training strategies, including the joint use of two large-scale datasets for weight initialization: ImageNet and AudioSet. Our proposed model out-performs other approaches by achieving accuracies of 95.20% on the ESC-50 and 89.14% on the UrbanSound8K datasets. Additionally, we assess the increase of model robustness against additive white Gaussian noise and reduction of an effective sample rate introduced by the proposed layer and demonstrate that the fbsp-layer improves the model's ability to withstand signal perturbations, in comparison to STFT-based training. For the sake of reproducibility, our code is made available.

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

期刊

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

期刊

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)

出版社

IEEE

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文