3.8 Proceedings Paper

TRANSFORMER-BASED QUALITY ASSESSMENT MODEL FOR GENERALIZED USER-GENERATED MULTIMEDIA AUDIO CONTENT

Journal

INTERSPEECH 2022
Volume -, Issue -, Pages 674-678

Publisher

ISCA-INT SPEECH COMMUNICATION ASSOC
DOI: 10.21437/Interspeech.2022-10386

Keywords

Non-intrusive Audio Quality Assessment; Transformer-based Learning; User-generated Multimedia

Ask authors/readers for more resources

This paper proposes a computational measure for the quality of audio in user-generated multimedia (UGM) and verifies it using an extended audio dataset. The results show that the transformer-based model outperforms other models in audio quality assessment.
In this paper, we propose a computational measure for the quality of audio in user-generated multimedia (UGM) in accordance with the human perceptual system. To this end, we first extend the previously proposed IIT-JMU-UGM Audio dataset by including samples with more diverse context, content, distortion types, and intensities, along with implicitly distorted audio that reflect realistic scenarios. We conduct subjective testing on the extended database containing 2075 audio clips to obtain the mean opinion scores for each sample. We then introduce transformer-based learning to the domain of audio quality assessment, which is trained on three vital audio features: Mel-frequency cepstral coefficients, chroma, and Mel-scaled spectrogram. The proposed non-intrusive transformer-based model is compared against state-of-the-art methods and found to outperform Simple RNN, LSTM, and GRU models by over 4%. The database and the source code will be made public upon acceptance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available