4.8 Article

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

Journal

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
Volume 18, Issue 11, Pages 7780-7789

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TII.2021.3139010

Keywords

Quality assessment; Streaming media; Measurement; Spectrogram; Speech recognition; Bit rate; Background noise; Audio quality assessment; deep learning; gated recurrent unit (GRU); non-intrusive quality metric; user-generated multimedia (UGM)

Funding

  1. [SRG/2020/001871]

Ask authors/readers for more resources

With the rise of social media communication, teleconferencing, and online classes, audiovisual communication has become a crucial part of our lives. This article addresses the need for algorithms to measure and enhance user experience, focusing on the quality assessment of user-generated multimedia (UGM). The lack of a standard dataset and the significant differences between speech and UGM audio properties are challenges that are overcome with the development of the IIT-JMU-UGM audio dataset. The proposed non-intrusive audio quality assessment metric, based on a deep learning framework, outperforms other methods and effectively reflects human auditory perception.
With the boom of social media communication, teleconferencing, and online classes, audiovisual communication over bandwidth strained networks has become an integral part of our lives. Consequently, the growing demand for the quality of experience necessitates developing algorithms to measure and enrich user experience. Prior studies have mainly focused on assessing speech quality and intelligibility with reference to audio quality assessment, while other categories in user-generated multimedia (UGM) are less explored. Moreover, frequency-domain properties of speech and UGM audio are significantly different from each other. Furthermore, there is a lack of a standard dataset for the quality assessment of UGM. Considering these limitations, in this article, we first develop the IIT-JMU-UGM audio dataset consisting of 1150 audio clips, with diverse context, content, and types of degradation commonly observed in real-world scenarios and annotated with the subjective quality scores. Finally, we propose a non-intrusive audio quality assessment metric using a stacked gated-recurrent-unit-based deep learning framework. The proposed model outperforms several baseline methods, including state-of-the-art non-intrusive and intrusive approaches. The resulting Pearson's correlation coefficient of 0.834 indicates that the proposed method efficiently mirrors human auditory perception.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available