4.6 Article

Self-Supervised Sound Promotion Method of Sound Localization from Video

期刊

ELECTRONICS
卷 12, 期 17, 页码 -

出版社

MDPI
DOI: 10.3390/electronics12173558

关键词

audiovisual learning; self-supervised; sound localization; multi-model

向作者/读者索取更多资源

Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both modalities, accurate alignment of semantic information is crucial. Current algorithms heavily rely on sound-object localization but neglect the potential issue of suppressed feature information. Thus, we propose a sound promotion method (SPM) to increase the contribution of voices and improve audiovisual learning performance.
Compared to traditional unimodal methods, multimodal audio-visual correspondence learning has many advantages in the field of video understanding, but it also faces significant challenges. In order to fully utilize the feature information from both modalities, we needs to ensure accurate alignment of the semantic information from each modality, rather than simply concatenating them together. This requires consideration of how to design fusion networks that can better perform this task. Current algorithms heavily rely on the network's output results for sound-object localization while neglecting the potential issue of suppressed feature information due to the internal structure of the network. Thus, we propose a sound promotion method (SPM), a self-supervised framework that aims to increase the contribution of voices to produce better performance of the audiovisual learning. We first cluster the audio separately to generate pseudo-labels and then use the clusters to train the backbone of audio. Finally, we explore the impact of our method to several existing approaches on MUSIC datasets and the results prove that our proposed method is able to produce better performance.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据