4.6 Article

Autonomous In-Situ Soundscape Augmentation via Joint Selection of Masker and Gain

期刊

IEEE SIGNAL PROCESSING LETTERS
卷 29, 期 -, 页码 1749-1753

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LSP.2022.3194419

关键词

Recording; Psychoacoustic models; Feature extraction; Predictive models; Computational modeling; Acoustics; Convolution; Affective computing; attention; deep learning; soundscape augmentation

资金

  1. National Research Foundation
  2. Ministry of National Development, Singapore [COT-V4-2020-1]
  3. Google Cloud Research Credits Program [GCP205559654]
  4. AWS Singapore Cloud Innovation Center

向作者/读者索取更多资源

The selection of appropriate maskers and gain levels is crucial for the effectiveness of a soundscape augmentation system. Traditional methods rely on expert opinions or time-consuming listening tests. In this study, a deep learning model was used to jointly select the optimal masker and gain level. The proposed system allows for autonomous and real-time soundscape augmentation, continuously adapting to changes in the acoustic environment.
The selection of maskers and playback gain levels in an in-situ soundscape augmentation system is crucial to its effectiveness in improving the overall acoustic comfort of a given environment. Traditionally, the selection of appropriate maskers and gain levels has been informed by expert opinion, which may not be representative of the target population, or by listening tests, which can be time- and labor-intensive. Furthermore, the resulting static choices of masker and gain are often inflexible to dynamic real-world soundscapes. In this work, we utilized a deep learning model to perform joint selection of the optimal masker and its gain level for a given soundscape. The proposed model was designed with highly modular building blocks, allowing for an optimized inference process that can quickly search through a large number of masker-gain combinations. In addition, we introduced the use of feature-domain soundscape augmentation conditioned on the digital gain level, eliminating the computationally expensive waveform-domain mixing process during inference, as well as the tedious gain adjustment process required for new maskers. The proposed system was evaluated on a large-scale dataset of subjective responses to augmented soundscapes with 442 participants, with the best model achieving a mean squared error of 0.122 +/- 0.005 on pleasantness score, validating the ability of the model to predict combined effect of the masker and its gain level on the perceptual pleasantness level. The proposed system thus allows in-situ or mixed-reality soundscape augmentation to be performed autonomously with near real-time latency while continuously accounting for changes in acoustic environments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据