4.7 Article

Micro-Video Popularity Prediction Via Multimodal Variational Information Bottleneck

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 25, 期 -, 页码 24-37

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2021.3120537

关键词

Social networking (online); Feature extraction; Visualization; Hidden Markov models; Uncertainty; Fuses; Task analysis; Micro-video analysis; multimodal learning; popularity prediction; deep variational information bottleneck; product-of-experts system

向作者/读者索取更多资源

In this paper, a Hierarchical Multimodal Variational Encoder-Decoder (HMMVED) is proposed to predict the popularity of micro-videos by leveraging user information and micro-video content. The multimodal variational encoder-decoder encodes input modalities to a lower dimensional stochastic embedding to decode the popularity of micro-videos. A user encoder-decoder is designed to learn the prior Gaussian embedding of the micro-video from user information, while a micro-video encoder-decoder encodes the refined posterior distribution of the micro-video embedding from content features.
In this paper, we propose a Hierarchical Multimodal Variational Encoder-Decoder (HMMVED) to predict the popularity of micro-videos by comprehensively leveraging the user information and the micro-video content in a hierarchical fashion. In particular, the multimodal variational encoder-decoder framework encodes the input modalities to a lower dimensional stochastic embedding, from which the popularity of micro-videos can be decoded. Considering the leading role of the user's social influence in social media for information dissemination, a user encoder-decoder is designed to learn the prior Gaussian embedding of the micro-video from the user information, which is informative about the coarse-grained popularity. In order to incorporate the fluctuation around the coarse-grained popularity caused by the diverse multimodal content, in the micro-video encoder-decoder, the refined posterior distribution of the micro-video embedding is encoded from the content features while encouraged to be close to the learned prior distribution. The fine-grained popularity of each micro-video is decoded from the posterior embedding of the micro-video. Based on the multimodal extension of variational information bottleneck theory, we show that the learned latent embeddings of micro-videos are maximally expressive about the popularity whilst maximally compressing the information from input modalities. Extensive experiments conducted on two real-world datasets demonstrate the effectiveness of the proposed method.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据