4.7 Article

Learning Feature Representation and Partial Correlation for Multimodal Multi-Label Data

期刊

IEEE TRANSACTIONS ON MULTIMEDIA
卷 23, 期 -, 页码 1882-1894

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.3004963

关键词

Semantics; Correlation; Task analysis; Data models; Learning systems; Kernel; Deep learning; Cross-modal retrieval; correlation learning; feature learning; partial correlation

资金

  1. National Key R&D Program of China [2018AAA0102003]
  2. National Natural Science Foundation of China [61672497, 61836002, 61620106009, U1636214, 61931008]
  3. Key Research Program of Frontier Sciences of CAS [QYZDJ-SSW-SYS013]

向作者/读者索取更多资源

The proposed FLPCL method utilizes deep feature learning and partial correlation learning to infer relationships between modalities and learn effective multimodal representations. It outperforms state-of-the-art methods on cross-modal retrieval tasks.
User-provided annotations in existing multimodal datasets sometimes are inappropriate for model learning and can hinder the task of cross-modal retrieval. To handle this issue, we propose a discriminative and noise-robust cross-modal retrieval method, called FLPCL, which consists of deep feature learning and partial correlation learning. Deep feature learning is implemented by utilizing label supervised information to guide the training of deep neural network for each modality, which aims to find modality-specific deep feature representations that preserve the similarity and discrimination information among multimodal data. Based on deep feature learning, partial correlation learning is proposed to infer direct association between different modalities by removing the effect of common underlying semantics from each modality. It is achieved by maximizing the canonical correlation of the feature representations of different modalities conditioned on the label modality. Different from existing works that build indirect association between modalities via incorporating semantic labels, our FLPCL method can learn more effective and robust multimodal latent representations by explicitly preserving both intra-modal and inter-modal relationship among multimodal data. Extensive experiments on three cross-modal datasets show that our method outperforms state-of-the-art methods on cross-modal retrieval tasks.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据