☆ 4.7 Article

Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION (2021)

期刊

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

卷 25, 期 5, 页码 856-868

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TEVC.2021.3066285

关键词

Visualization; Correlation; Training; Data models; Task analysis; Knowledge discovery; Generative adversarial networks; Adversarial networks; attention model; evolutionary; multimodal; representation learning

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

资金

National Natural Science Foundation of China [61906075, 62002068, 61932010, 61932011, 61972178, 61906074]
Guangdong Basic and Applied Basic Research Foundation [2019A1515011920, 2019B1515120010, 2019A1515011276, 2019A1515011753]
Guangdong Provincial Key RD Plan [2019B1515120010, 202020022911500032, 2019B010136003]
Science and Technology Program of Guangzhou, China [202007040004]
National Key RD Plan2020 [2020YFB1005600]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Multimodal representation learning is important for multimedia applications, and the proposed evolutionary adversarial attention networks combine attention mechanism with adversarial networks for robust learning. The approach introduces a visual-textual attention model and achieves substantial performance improvements in image classification and tag recommendation tasks.

Multimodal representation learning is beneficial for many multimedia-oriented applications, such as social image recognition and visual question answering. The different modalities of the same instance (e.g., a social image and its corresponding description) are usually correlational and complementary. Most existing approaches for multimodal representation learning are not effective to model the deep correlation between different modalities. Moreover, it is difficult for these approaches to deal with the noise within social images. In this article, we propose a deep learning-based approach named evolutionary adversarial attention networks (EAANs), which combines the attention mechanism with adversarial networks through evolutionary training, for robust multimodal representation learning. Specifically, a two-branch visual-textual attention model is proposed to correlate visual and textual content for joint representation. Then adversarial networks are employed to impose regularization upon the representation by matching its posterior distribution to the given priors. Finally, the attention model and adversarial networks are integrated into an evolutionary training framework for robust multimodal representation learning. Extensive experiments have been conducted on four real-world datasets, including PASCAL, MIR, CLEF, and NUS-WIDE. Substantial performance improvements on the tasks of image classification and tag recommendation demonstrate the superiority of the proposed approach.

Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks

期刊

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Robust Multimodal Representation Learning With Evolutionary Adversarial Attention Networks

期刊

IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文