☆ 3.8 Proceedings Paper

TRANSFORMER-BASED QUALITY ASSESSMENT MODEL FOR GENERALIZED USER-GENERATED MULTIMEDIA AUDIO CONTENT

INTERSPEECH 2022 (2022)

Journal

INTERSPEECH 2022

Volume -, Issue -, Pages 674-678

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

DOI: 10.21437/Interspeech.2022-10386

Keywords

Non-intrusive Audio Quality Assessment; Transformer-based Learning; User-generated Multimedia

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a computational measure for the quality of audio in user-generated multimedia (UGM) and verifies it using an extended audio dataset. The results show that the transformer-based model outperforms other models in audio quality assessment.

In this paper, we propose a computational measure for the quality of audio in user-generated multimedia (UGM) in accordance with the human perceptual system. To this end, we first extend the previously proposed IIT-JMU-UGM Audio dataset by including samples with more diverse context, content, distortion types, and intensities, along with implicitly distorted audio that reflect realistic scenarios. We conduct subjective testing on the extended database containing 2075 audio clips to obtain the mean opinion scores for each sample. We then introduce transformer-based learning to the domain of audio quality assessment, which is trained on three vital audio features: Mel-frequency cepstral coefficients, chroma, and Mel-scaled spectrogram. The proposed non-intrusive transformer-based model is compared against state-of-the-art methods and found to outperform Simple RNN, LSTM, and GRU models by over 4%. The database and the source code will be made public upon acceptance.

TRANSFORMER-BASED QUALITY ASSESSMENT MODEL FOR GENERALIZED USER-GENERATED MULTIMEDIA AUDIO CONTENT

Journal

INTERSPEECH 2022

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

TRANSFORMER-BASED QUALITY ASSESSMENT MODEL FOR GENERALIZED USER-GENERATED MULTIMEDIA AUDIO CONTENT

Journal

INTERSPEECH 2022

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC

Keywords

Categories

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper