4.7 Article

Index Your Position: A Novel Self-Supervised Learning Method for Remote Sensing Images Semantic Segmentation

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TGRS.2022.3177770

Keywords

Image segmentation; Semantics; Task analysis; Indexes; Remote sensing; Computer architecture; Crops; Remote sensing images (RSIs); self-supervised learning (SSL); semantic segmentation

Funding

  1. National Natural Science Foundation of China [42071297, 41871235]
  2. Fundamental Research Funds for the Central Universities [020914380095]
  3. High-Level Innovation and Entrepreneurship Talents Introduction Program of Jiangsu Province of China

Ask authors/readers for more resources

Learning effective visual representations without human supervision is crucial for semantic segmentation of remote sensing images (RSIs). Current self-supervised learning (SSL) methods, trained on ImageNet, do not consider spatial position information between objects in RSIs, which is important for multiobject segmentation. In this study, we propose a novel self-supervised dense representation learning method (IndexNet) that takes into account object position changes and combines image-level and pixel-level contrast, outperforming state-of-the-art SSL methods.
Learning effective visual representations without human supervision is a critical problem for the task of semantic segmentation of remote sensing images (RSIs), where pixel-level annotations are difficult to obtain. Self-supervised learning (SSL), which learns useful representations by creating artificial supervised learning problems, has recently emerged as an effective method to learn from unlabelled data. Current SSL methods are generally trained on ImageNet through image-level prediction tasks. We argue that this is suboptimal for application in semantic segmentation of RSIs since it does not take into account spatial position information between objects, which is critical for the segmentation of RSIs characterized by multiobject. In this study, we propose a novel self-supervised dense representation learning method, IndexNet, for the semantic segmentation of RSIs. On the one hand, considering the multiobject characteristics of RSIs, IndexNet learns pixel-level representations by tracking object positions, while maintaining sensitivity to object position changes to ensure that no mismatches are caused. On the other hand, by combining image-level contrast and pixel-level contrast, IndexNet can learn spatiotemporal invariant features. Experimental results show that our method works better than ImageNet pretraining and outperforms state-of-the-art (SOTA) SSL methods. Code and pretrained models will be available at https://github.com/pUmpKin-Co/offical-IndexNet.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available