3.8 Proceedings Paper

AUDIOCLIP: EXTENDING CLIP TO IMAGE, TEXT AND AUDIO

相关参考文献

注意:仅列出部分参考文献,下载原文获取全部文献信息。
Proceedings Paper Audiology & Speech-Language Pathology

AST: Audio Spectrogram Transformer

Yuan Gong et al.

Summary: This paper introduces AST, a convolution-free, purely attention-based model for audio classification, which achieves impressive performance on various audio classification benchmarks.

INTERSPEECH 2021 (2021)

Proceedings Paper Computer Science, Artificial Intelligence

ESResNe(X)t-fbsp: Learning Robust Time-Frequency Transformation of Audio

Andrey Guzhov et al.

Summary: Environmental Sound Classification (ESC) is a rapidly evolving field that has shown benefits in applying visual domain techniques to audio tasks. The proposed fbsp-layer, combined with a high-performance audio classification model, outperforms previous methods, achieving high accuracy on standard datasets. The study also evaluates different pre-training strategies and the model's robustness against signal perturbations.

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) (2021)

Proceedings Paper Computer Science, Artificial Intelligence

ESResNet: Environmental Sound Classification Based on Visual Domain Models

Andrey Guzhov et al.

Summary: Environmental Sound Classification (ESC) is a hot research topic in the audio domain, but existing methods have difficulty benefiting from advances in other fields. This study introduces a model compatible with mono and stereo sound inputs and outperforms previous approaches in fair comparisons.

2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) (2021)

Article Acoustics

Zero-Shot Audio Classification Via Semantic Embeddings

Huang Xie et al.

Summary: This paper investigates zero-shot learning in audio classification using semantic embeddings extracted from textual labels and sentence descriptions, demonstrating the effectiveness of a bilinear compatibility framework and deep acoustic embeddings in improving classification performance. By involving semantically close sound classes in training and concatenating label/sentence embeddings from different language models, the results are further enhanced.

IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING (2021)

Proceedings Paper Acoustics

ZERO-SHOT AUDIO CLASSIFICATION BASED ON CLASS LABEL EMBEDDINGS

Huang Xie et al.

2019 IEEE WORKSHOP ON APPLICATIONS OF SIGNAL PROCESSING TO AUDIO AND ACOUSTICS (WASPAA) (2019)

Article Engineering, Electrical & Electronic

Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification

Justin Salamon et al.

IEEE SIGNAL PROCESSING LETTERS (2017)

Proceedings Paper Computer Science, Artificial Intelligence

Look, Listen and Learn

Relja Arandjelovic et al.

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV) (2017)

Proceedings Paper Computer Science, Information Systems

ESC: Dataset for Environmental Sound Classification

Karol J. Piczak

MM'15: PROCEEDINGS OF THE 2015 ACM MULTIMEDIA CONFERENCE (2015)

Article Computer Science, Artificial Intelligence

Attribute-Based Classification for Zero-Shot Visual Object Categorization

Christoph H. Lampert et al.

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2014)