4.6 Article

The Effect of Partial Time-Frequency Masking of the Direct Sound on the Perception of Reverberant Speech

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TASLP.2021.3084742

关键词

Time-frequency analysis; Reverberation; Speech enhancement; Power measurement; Speech coding; Audio coding; Wavelength measurement; Spatial perception; reverberant speech; direct-to-reverberant ratio; binaural reproduction

资金

  1. ISRAEL SCIENCE FOUNDATION [966/18]

向作者/读者索取更多资源

Research has shown the importance of direct sound in auditory perception, particularly in environments with reverberation. Experimental findings suggest that masking high-DRR bins in reverberant speech signals may better indicate the quality of spatial perception, rather than specific DRR thresholds. These insights could inform spatial audio techniques for reproducing the direct sound of reverberant speech and improving spatial perception.
The perception of sound in real-life acoustic environments, such as enclosed rooms or open spaces with reflective objects, is affected by reverberation. Hence, reverberation is extensively studied in the context of auditory perception, with many studies highlighting the importance of the direct sound for perception. Based on this insight, speech processing methods often use time-frequency (TF) analysis to detect TF bins that are dominated by the direct sound, and then use the detected bins to reproduce or enhance the speech signals. The detection of bins dominated by the direct sound is typically based on an objective measure, such as the direct-to-reverberant ratio (DRR). However, the relation between the DRR in the TF bins and the spatial perception of the reverberant sound which is reproduced from these bins is still not clear. It is the aim of this paper to provide some insights into this relation, specifically for reverberant speech, focusing on bins with high DRR. This is performed using a listening experiment, where high DRR bins within a reverberant speech signal have been masked in the TF domain, based on various DRR thresholds. The results show that the percentage of high-DRR TF bins that were masked may better indicate the quality of spatial perception, compared to the specific value of the DRR threshold. The insights from this work could be incorporated into spatial audio techniques that reproduce the direct sound of reverberant speech, and potentially improve spatial perception. This was illustrated with an implementation of directional audio coding that was studied with an additional listening experiment supporting the previously described results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据