☆ 4.8 Article

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE (2021)

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Volume 43, Issue 3, Pages 858-872

Publisher

IEEE COMPUTER SOC

DOI: 10.1109/TPAMI.2019.2942028

Keywords

Multimodal learning; Gaussian process; latent variable modeling; cross-modal retrieval

Funding

National Basic Research Program of China (973 Program) [2015CB351802]
National Natural Science Foundation of China [61672497, 61931008, 61620106009, U1636214, 61836002]
Key Research Programof Frontier Sciences of CAS [QYZDJ-SSW-SYS013]
China Postdoctoral Science Foundation [119103S291]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper introduces a novel multimodal learning scheme called "Harmonization," which jointly learns latent representations and kernel hyperparameters to address modality heterogeneity. The proposed method outperforms traditional individual learning schemes and shows superior performance in cross-modal retrieval tasks.

Multimodal learning aims to discover the relationship between multiple modalities. It has become an important research topic due to extensive multimodal applications such as cross-modal retrieval. This paper attempts to address the modality heterogeneity problem based on Gaussian process latent variable models (GPLVMs) to represent multimodal data in a common space. Previous multimodal GPLVM extensions generally adopt individual learning schemes on latent representations and kernel hyperparameters, which ignore their intrinsic relationship. To exploit strong complementarity among different modalities and GPLVM components, we develop a novel learning scheme called Harmonization, where latent representations and kernel hyperparameters are jointly learned from each other. Beyond the correlation fitting or intra-modal structure preservation paradigms widely used in existing studies, the harmonization is derived in a model-driven manner to encourage the agreement between modality-specific GP kernels and the similarity of latent representations. We present a range of multimodal learning models by incorporating the harmonization mechanism into several representative GPLVM-based approaches. Experimental results on four benchmark datasets show that the proposed models outperform the strong baselines for cross-modal retrieval tasks, and that the harmonized multimodal learning method is superior in discovering semantically consistent latent representation.

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Harmonized Multimodal Learning with Gaussian Process Latent Variable Models

Journal

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE

Publisher

IEEE COMPUTER SOC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper