☆ 3.8 Proceedings Paper

AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO

2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) (2022)

AST: Audio Spectrogram Transformer

Yuan Gong et al.

Summary: This paper introduces AST, a convolution-free, purely attention-based model for audio classification, which achieves impressive performance on various audio classification benchmarks.

INTERSPEECH 2021 (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Andrey Guzhov et al.

Summary: Environmental Sound Classification (ESC) is a rapidly evolving field that has shown benefits in applying visual domain techniques to audio tasks. The proposed fbsp-layer, combined with a high-performance audio classification model, outperforms previous methods, achieving high accuracy on standard datasets. The study also evaluates different pre-training strategies and the model's robustness against signal perturbations.

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) (2021)

添加到收藏夹

Proceedings Paper Computer Science, Artificial Intelligence

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Andrey Guzhov et al.

Summary: Environmental Sound Classification (ESC) is a hot research topic in the audio domain, but existing methods have difficulty benefiting from advances in other fields. This study introduces a model compatible with mono and stereo sound inputs and outperforms previous approaches in fair comparisons.

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2021)

添加到收藏夹

Article Acoustics

Zero-Shot Audio Classification Via Semantic Embeddings

Huang Xie et al.

Summary: This paper investigates zero-shot learning in audio classification using semantic embeddings extracted from textual labels and sentence descriptions, demonstrating the effectiveness of a bilinear compatibility framework and deep acoustic embeddings in improving classification performance. By involving semantically close sound classes in training and concatenating label/sentence embeddings from different language models, the results are further enhanced.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2021)

添加到收藏夹

Proceedings Paper Acoustics