4.8 Article

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

期刊

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS
卷 18, 期 11, 页码 7780-7789

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TII.2021.3139010

关键词

Quality assessment; Streaming media; Measurement; Spectrogram; Speech recognition; Bit rate; Background noise; Audio quality assessment; deep learning; gated recurrent unit (GRU); non-intrusive quality metric; user-generated multimedia (UGM)

资金

  1. [SRG/2020/001871]

向作者/读者索取更多资源

With the rise of social media communication, teleconferencing, and online classes, audiovisual communication has become a crucial part of our lives. This article addresses the need for algorithms to measure and enhance user experience, focusing on the quality assessment of user-generated multimedia (UGM). The lack of a standard dataset and the significant differences between speech and UGM audio properties are challenges that are overcome with the development of the IIT-JMU-UGM audio dataset. The proposed non-intrusive audio quality assessment metric, based on a deep learning framework, outperforms other methods and effectively reflects human auditory perception.
With the boom of social media communication, teleconferencing, and online classes, audiovisual communication over bandwidth strained networks has become an integral part of our lives. Consequently, the growing demand for the quality of experience necessitates developing algorithms to measure and enrich user experience. Prior studies have mainly focused on assessing speech quality and intelligibility with reference to audio quality assessment, while other categories in user-generated multimedia (UGM) are less explored. Moreover, frequency-domain properties of speech and UGM audio are significantly different from each other. Furthermore, there is a lack of a standard dataset for the quality assessment of UGM. Considering these limitations, in this article, we first develop the IIT-JMU-UGM audio dataset consisting of 1150 audio clips, with diverse context, content, and types of degradation commonly observed in real-world scenarios and annotated with the subjective quality scores. Finally, we propose a non-intrusive audio quality assessment metric using a stacked gated-recurrent-unit-based deep learning framework. The proposed model outperforms several baseline methods, including state-of-the-art non-intrusive and intrusive approaches. The resulting Pearson's correlation coefficient of 0.834 indicates that the proposed method efficiently mirrors human auditory perception.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.8
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据