4.7 Article

Fusion-Based Correlation Learning Model for Cross-Modal Remote Sensing Image Retrieval

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LGRS.2021.3131592

Keywords

Feature extraction; Semantics; Image retrieval; Fuses; Correlation; Buildings; Representation learning; Correlation learning; cross-modal retrieval; multimodal fusion; text-remote sensing (RS) image matching

Funding

  1. National Natural Science Foundation of China [61790550, 61790554, 91538201]

Ask authors/readers for more resources

In this study, a fusion-based correlation learning model is proposed to address the heterogeneity gap in remote sensing image-text retrieval. By designing a cross-modal fusion network and utilizing knowledge distillation, this model improves the discriminative ability of feature representation and enhances the intermodality semantic consistency.
With the increasing of cross-modal data, cross-modal retrieval has attracted more attention in remote sensing (RS), since it provides a more flexible and convenient way to obtain interesting information than traditional retrieval. However, existing methods cannot fully exploit the semantic information, which only focuses on the semantic consistency, and ignore the information complementarity between different modalities. In this letter, to bridge the modality gap, we propose a novel fusion-based correlation learning model (FCLM) for image-text retrieval in RS. Specifically, a cross-modal-fusion network is designed to capture the intermodality complementary information and fused feature. The fused knowledge is furtherly transferred to supervise the learning of modality-specific network by knowledge distillation, which is helpful in improving the discriminative ability of feature representation and enhancing the intermodality semantic consistency to solve the heterogeneity gap problem. Finally, extensive experiments have been conducted on a public dataset and experimental results have shown that the FCLM method is effective in performing cross-modal retrieval and outperforms several baseline methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available