☆ 4.7 Article

Multi-Output Learning Based on Multimodal GCN and Co-Attention for Image Aesthetics and Emotion Analysis

MATHEMATICS (2021)

期刊

MATHEMATICS

卷 9, 期 12, 页码 -

出版社

MDPI

DOI: 10.3390/math9121437

关键词

image aesthetics and emotion analysis; multi-out learning; image semantic recognition; multimodal learning; Graph Convolutional Network; co-attention; intermediate supervision

类别

Mathematics

资金

National Key R&D Program of China [2018YFB1004700]
National Natural Science Foundation of China [61772122, 61872074]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

With the development of social networks and intelligent terminals, the number of social images has significantly increased, leading to higher demands for automatic image processing. Machine learning and deep learning have become powerful methods for solving mathematical problems in image processing and pattern recognition.

With the development of social networks and intelligent terminals, it is becoming more convenient to share and acquire images. The massive growth of the number of social images makes people have higher demands for automatic image processing, especially in the aesthetic and emotional perspective. Both aesthetics assessment and emotion recognition require a higher ability for the computer to simulate high-level visual perception understanding, which belongs to the field of image processing and pattern recognition. However, existing methods often ignore the prior knowledge of images and intrinsic relationships between aesthetic and emotional perspectives. Recently, machine learning and deep learning have become powerful methods for researchers to solve mathematical problems in computing, such as image processing and pattern recognition. Both images and abstract concepts can be converted into numerical matrices and then establish the mapping relations using mathematics on computers. In this work, we propose an end-to-end multi-output deep learning model based on multimodal Graph Convolutional Network (GCN) and co-attention for aesthetic and emotion conjoint analysis. In our model, a stacked multimodal GCN network is proposed to encode the features under the guidance of the correlation matrix, and a co-attention module is designed to help the aesthetics and emotion feature representation learn from each other interactively. Experimental results indicate that our proposed model achieves competitive performance on the IAE dataset. Progressive results on the AVA and ArtPhoto datasets also prove the generalization ability of our model.

Multi-Output Learning Based on Multimodal GCN and Co-Attention for Image Aesthetics and Emotion Analysis

期刊

MATHEMATICS

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Multi-Output Learning Based on Multimodal GCN and Co-Attention for Image Aesthetics and Emotion Analysis

期刊

MATHEMATICS

出版社

MDPI

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文