4.6 Article

Multimodal Hierarchical Graph Collaborative Filtering for Multimedia-Based Recommendation

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCSS.2022.3226862

Keywords

Collaborative filtering (CF); graph convolution network (GCN); multimodal user preference; recommender system

Funding

  1. National Natural Science Foundation of China [62272143]
  2. University Synergy Innovation Program of Anhui Province [GXXT-2022-054]
  3. Anhui Provincial Major Science and Technology Project [202203a05020025]
  4. Seventh Special Support Plan for Innovation and Entrepreneurship in Anhui Province

Ask authors/readers for more resources

Multimedia-based recommendation is a challenging task that aims to explore multimodal user preference cues and provide personalized recommendations. However, current solutions are limited by multimodal noise contamination. To address this issue, researchers propose a hierarchical framework to separately learn collaborative signals and multimodal preference cues, and take measures to alleviate noise contamination.
Multimedia-based recommendation (MMRec) is a challenging task, which goes beyond the collaborative filtering (CF) schema that only captures collaborative signals from interactions and explores multimodal user preference cues hidden in complex multimedia content. Despite the significant progress of current solutions for MMRec, we argue that they are limited by multimodal noise contamination. Specifically, a considerable amount of preference-irrelevant multimodal noise (e.g., the background, layout, and brightness in the image of the product) is incorporated into the representation learning of items, which contaminates the modeling of multimodal user preferences. Moreover, most of the latest researches are based on graph convolution networks (GCNs), which means that multimodal noise contamination is further amplified because noisy information is continuously propagated over the user-item interaction graph as recursive neighbor aggregations are performed. To address this problem, instead of the common MMRec paradigm which learns user preferences in an integrated manner, we propose a hierarchical framework to separately learn collaborative signals and multimodal preferences cues, thus preventing multimodal noise from flowing into collaborative signals. Then, to alleviate the noise contamination for multimodal user preference modeling, we propose to extract semantic entities from multimodal content that are more relevant to user interests, which can model semantic-level multimodal preferences and thus remove a large fraction of noise. Furthermore, we use the full multimodal features to model content-level multimodal preferences like the existing MMRec solutions, which ensures the sufficient utilization of multimodal information. Overall, we develop a novel model, multimodal hierarchical graph CF (MHGCF), which consists of three types of GCN modules tailored to capture collaborative signals, semantic-level preferences, and content-level preferences, respectively. We conduct extensive experiments to demonstrate the effectiveness of MHGCF and its components. The complete data and codes of MHGCF are available at.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available