☆ 4.8 Article

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS (2022)

期刊

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS

卷 18, 期 11, 页码 7780-7789

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TII.2021.3139010

关键词

Quality assessment; Streaming media; Measurement; Spectrogram; Speech recognition; Bit rate; Background noise; Audio quality assessment; deep learning; gated recurrent unit (GRU); non-intrusive quality metric; user-generated multimedia (UGM)

类别

Automation & Control Systems Computer Science, Interdisciplinary Applications Engineering, Industrial

资金

[SRG/2020/001871]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

With the rise of social media communication, teleconferencing, and online classes, audiovisual communication has become a crucial part of our lives. This article addresses the need for algorithms to measure and enhance user experience, focusing on the quality assessment of user-generated multimedia (UGM). The lack of a standard dataset and the significant differences between speech and UGM audio properties are challenges that are overcome with the development of the IIT-JMU-UGM audio dataset. The proposed non-intrusive audio quality assessment metric, based on a deep learning framework, outperforms other methods and effectively reflects human auditory perception.

With the boom of social media communication, teleconferencing, and online classes, audiovisual communication over bandwidth strained networks has become an integral part of our lives. Consequently, the growing demand for the quality of experience necessitates developing algorithms to measure and enrich user experience. Prior studies have mainly focused on assessing speech quality and intelligibility with reference to audio quality assessment, while other categories in user-generated multimedia (UGM) are less explored. Moreover, frequency-domain properties of speech and UGM audio are significantly different from each other. Furthermore, there is a lack of a standard dataset for the quality assessment of UGM. Considering these limitations, in this article, we first develop the IIT-JMU-UGM audio dataset consisting of 1150 audio clips, with diverse context, content, and types of degradation commonly observed in real-world scenarios and annotated with the subjective quality scores. Finally, we propose a non-intrusive audio quality assessment metric using a stacked gated-recurrent-unit-based deep learning framework. The proposed model outperforms several baseline methods, including state-of-the-art non-intrusive and intrusive approaches. The resulting Pearson's correlation coefficient of 0.834 indicates that the proposed method efficiently mirrors human auditory perception.

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

期刊

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Nonintrusive Perceptual Audio Quality Assessment for User-Generated Content Using Deep Learning

期刊

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文