4.7 Article

Learning Deep Cross-Modal Embedding Networks for Zero-Shot Remote Sensing Image Scene Classification

期刊

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING
卷 59, 期 12, 页码 10590-10603

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TGRS.2020.3047447

关键词

Visualization; Semantics; Deep learning; Task analysis; Remote sensing; Feature extraction; Big Data; Latent space; locality-preservation deep cross-modal embedding networks (LPDCMENs); remote sensing (RS) imagery; transcendental knowledge; zero-shot RS scene classification (ZSRSSC)

资金

  1. National Key Research and Development Program of China [2018YFB0505003]
  2. National Natural Science Foundation of China [41971284, 41601352]
  3. China Postdoctoral Science Foundation [2016M590716, 2017T100581]
  4. Hubei Provincial Natural Science Foundation of China [2018CFB501]

向作者/读者索取更多资源

Remote sensing image scene classification faces challenges such as annotation difficulty and zero-shot classification in the era of RS big data. This article proposes a novel ZSRSSC method based on locality-preservation deep cross-modal embedding networks, which effectively solves the problem of class structure inconsistency and significantly outperforms existing methods in performance.
Due to its wide applications, remote sensing (RS) image scene classification has attracted increasing research interest. When each category has a sufficient number of labeled samples, RS image scene classification can be well addressed by deep learning. However, in the RS big data era, it is extremely difficult or even impossible to annotate RS scene samples for all the categories in one time as the RS scene classification often needs to be extended along with the emergence of new applications that inevitably involve a new class of RS images. Hence, the RS big data era fairly requires a zeroshot RS scene classification (ZSRSSC) paradigm in which the classification model learned from training RS scene categories obeys the inference ability to recognize the RS image scenes from unseen categories, in common with the humans' evolutionary perception ability. Unfortunately, zero-shot classification is largely unexploited in the RS field. This article proposes a novel ZSRSSC method based on locality-preservation deep cross-modal embedding networks (LPDCMENs). The proposed LPDCMENs, which can fully assimilate the pairwise intramodal and intermodal supervision in an end-to-end manner, aim to alleviate the problem of class structure inconsistency between two hybrid spaces (i.e., the visual image space and the semantic space). To pursue a stable and generalization ability, which is highly desired for ZSRSSC, a set of explainable constraints is specially designed to optimize LPDCMENs. To fully verify the effectiveness of the proposed LPDCMENs, we collect a new large-scale RS scene data set, including the instance-level visual images and class-level semantic representations (RSSDIVCS), where the general and domain knowledge is exploited to construct the class-level semantic representations. Extensive experiments show that the proposed ZSRSSC method based on LPDCMENs can obviously outperform the state-of-the-art methods, and the domain knowledge further improves the performance of ZSRSSC compared with the general knowledge. The collected RSSDIVCS will be made publicly available along with this article.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据