☆ 4.6 Article

Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL (2023)

期刊

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL

卷 12, 期 1, 页码 -

出版社

SPRINGER

DOI: 10.1007/s13735-023-00276-7

关键词

Audio-visual retrieval; Variational autoencoder; Mutual information; InfoMax-VAE

类别

Computer Science, Artificial Intelligence Computer Science, Software Engineering

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This article introduces a method to improve the performance of audio-visual event retrieval by simulating the processing function of the human brain. The proposed InfoIIM network enhances feature representation and alignment, and the InfoMax-VAE model improves feature learning and intra-modal retrieval performance. The effectiveness of the method is verified on the AVE dataset, showing superior performance compared to existing algorithms. Future research directions are also suggested to inspire relevant researchers.

The human brain can process sound and visual information in overlapping areas of the cerebral cortex, which means that audio and visual information are deeply correlated with each other when we explore the world. To simulate this function of the human brain, audio-visual event retrieval (AVER) has been proposed. AVER is about using data from one modality (e.g., audio data) to query data from another. In this work, we aim to improve the performance of audio-visual event retrieval. To achieve this goal, first, we propose a novel network, InfoIIM, which enhance the accuracy of intra-model feature representation and inter-model feature alignment. The backbone of this network is a parallel connection of two VAE models with two different encoders and a shared decoder. Secondly, to enable the VAE to learn better feature representations and to improve intra-modal retrieval performance, we have used InfoMax-VAE instead of the vanilla VAE model. Additionally, we study the influence of modality-shared features on the effectiveness of audio-visual event retrieval. To verify the effectiveness of our proposed method, we validate our model on the AVE dataset, and the results show that our model outperforms several existing algorithms in most of the metrics. Finally, we present our future research directions, hoping to inspire relevant researchers.

Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

期刊

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Maximizing mutual information inside intra- and inter-modality for audio-visual event retrieval

期刊

INTERNATIONAL JOURNAL OF MULTIMEDIA INFORMATION RETRIEVAL

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文