4.7 Article

Reconstruction regularized low-rank subspace learning for cross-modal retrieval

Journal

PATTERN RECOGNITION
Volume 113, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2020.107813

Keywords

Cross-modal retrieval; Low-rank subspace learning; Reconstruction regularization

Funding

  1. National Natural Science Foundation (NSF) of China [62006140]
  2. NSF of Shandong Province [ZR2020QF106]
  3. Future Talents Research Funds of Shandong University
  4. Fundamental Research Funds of Shandong University
  5. NSF of China [61772310, 61702300, 61702302, 61802231, U1836216, 61625301, 61731018, 61632003, 61771026]
  6. Project of Thousand Youth Talents 2016
  7. Shandong Provincial Natural Science and Foundation [ZR2019JQ23, ZR2019QF001]
  8. Major Scientific Research Project of Zhejiang Lab [2019KB0AC01, 2019KB0AB02]
  9. Beijing Academy of Artificial Intelligence
  10. Qualcomm
  11. National Key Research and Development Program of China [2017YFB1002601]

Ask authors/readers for more resources

In this study, a latent subspace learning approach is proposed for cross-modal matching or retrieval tasks. The method learns a shared subspace for multi-modal data to efficiently measure cross-modal similarity, incorporating both reconstruction of original data and low-rank regularization. The proposed method is capable of handling both supervised and unsupervised tasks, and shows high efficiency and performance in experiments.
With the rapid increase of multi-modal data through the internet, cross-modal matching or retrieval has received much attention recently. It aims to use one type of data as query and retrieve results from the database of another type. For this task, the most popular approach is the latent subspace learning, which learns a shared subspace for multi-modal data, so that we can efficiently measure cross-modal similarity. Instead of adopting traditional regularization terms, we hope that the latent representation could recover the multi-modal information, which works as a reconstruction regularization term. Besides, we assume that different view features for samples of the same category share the same representation in the latent space. Since the number of classes is generally smaller than the number of samples and the feature dimension, therefore the latent feature matrix of training instances should be low-rank. We try to learn the optimal latent representation, and propose a reconstruction based term to recover original multi-modal data and a low-rank term to regularize the learning of subspace. Our method can deal with both supervised and unsupervised cross-modal retrieval tasks. For those situations where the semantic labels are not easy to obtain, our proposed method can also work very well. We propose an efficient algorithm to optimize our framework. To evaluate the performance of our method, we conduct extensive experiments on various datasets. The experimental results show that our proposed method is very efficient and outperforms the state-of-the-art subspace learning approaches. (c) 2021 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available