4.7 Article

Learning Feature Representation and Partial Correlation for Multimodal Multi-Label Data

Journal

IEEE TRANSACTIONS ON MULTIMEDIA
Volume 23, Issue -, Pages 1882-1894

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TMM.2020.3004963

Keywords

Semantics; Correlation; Task analysis; Data models; Learning systems; Kernel; Deep learning; Cross-modal retrieval; correlation learning; feature learning; partial correlation

Funding

  1. National Key R&D Program of China [2018AAA0102003]
  2. National Natural Science Foundation of China [61672497, 61836002, 61620106009, U1636214, 61931008]
  3. Key Research Program of Frontier Sciences of CAS [QYZDJ-SSW-SYS013]

Ask authors/readers for more resources

The proposed FLPCL method utilizes deep feature learning and partial correlation learning to infer relationships between modalities and learn effective multimodal representations. It outperforms state-of-the-art methods on cross-modal retrieval tasks.
User-provided annotations in existing multimodal datasets sometimes are inappropriate for model learning and can hinder the task of cross-modal retrieval. To handle this issue, we propose a discriminative and noise-robust cross-modal retrieval method, called FLPCL, which consists of deep feature learning and partial correlation learning. Deep feature learning is implemented by utilizing label supervised information to guide the training of deep neural network for each modality, which aims to find modality-specific deep feature representations that preserve the similarity and discrimination information among multimodal data. Based on deep feature learning, partial correlation learning is proposed to infer direct association between different modalities by removing the effect of common underlying semantics from each modality. It is achieved by maximizing the canonical correlation of the feature representations of different modalities conditioned on the label modality. Different from existing works that build indirect association between modalities via incorporating semantic labels, our FLPCL method can learn more effective and robust multimodal latent representations by explicitly preserving both intra-modal and inter-modal relationship among multimodal data. Extensive experiments on three cross-modal datasets show that our method outperforms state-of-the-art methods on cross-modal retrieval tasks.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available