4.7 Article

A social emotion classification approach using multi-model fusion

出版社

ELSEVIER
DOI: 10.1016/j.future.2019.07.007

关键词

Multimodal fusion; Emotion analysis; 3D convolutional neural network; Recurrent neural network

资金

  1. Beijing Institute of Technology Research Fund Program for Young Scholars, China
  2. National Natural Science Foundation, China [61772099]
  3. Program for Innovation Team Building at Institutions of Higher Education in Chongqing, China [CXTDG201602010]
  4. University Outstanding Achievements Transformation Funding Project of Chongqing, China [KJZH17116]
  5. Artificial Intelligence Technology Innovation Important Subject Projects of Chongqing, China [cstc2017rgzn-zdyf0140]
  6. Innovation and Entrepreneurship Demonstration Team Cultivation Plan of Chongqing, China [cstc2017kjrc-cxcytd0063]
  7. China Postdoctoral Science Foundation [2014M562282]
  8. Project Postdoctoral Supported in Chongqing, China [Xm2014039]
  9. Wenfeng Leading Top Talent Project in CQUPT, China
  10. New Research Area Development Programme [A201544]
  11. Science and Technology Research Project of Chongqing Municipal Education Committee, China [KJ1400422, KJ1500441, KJ1704089, KJ1704081]
  12. Chongqing Research Program of Basic Research and Frontier Technology, China [cstc2017jcyjAX0270, cstc2018jcyjA0672, cstc2017jcyjAX0071]
  13. Industry Important Subject Projects of Chongqing, China [CSTC2018JSZX -CYZTZX0178, CSTC2018JSZX -CYZTZX0185]

向作者/读者索取更多资源

With the proliferation of the online video publishing, the number of multimodal contents on the Internet has exponentially grown. Research of emotion analysis has developed from the traditional single-mode to complex multimode analysis. Most recent studies have paid rare attention to the visual emotion information deriving from merging visual and audio emotional information at the feature or decision level, even though some of them considered the multimodality analysis. In this paper, we extract visual, textual, and audio information from video and propose a multimodal emotional classification framework to capture the emotions of users in social networks. We have designed a 3DCLS (3D Convolutional-Long Short Term Memory) hybrid model that classifies visual emotions as well as a CNN-RNN hybrid model that classifies text-based emotions. Finally, visual, audio and text modes are combined to generate final emotional classification results. Experiments on the MOUD and IEMOCAP emotion datasets show that the proposed framework outperforms existing models in multimodal mood analysis. (C) 2019 Elsevier B.V. All rights reserved.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据