4.7 Article

From center to surrounding: An interactive learning framework for hyperspectral image classification

Journal

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING
Volume 197, Issue -, Pages 145-166

Publisher

ELSEVIER
DOI: 10.1016/j.isprsjprs.2023.01.024

Keywords

Hyperspectral image classification; Deep learning; Transformer; Center-to-surrounding interactive learning

Ask authors/readers for more resources

Due to the rich spectral and spatial information, hyperspectral image (HSI) can be used for accurately classifying land covers. Deep learning techniques such as CNN, FCN, and RNN have been widely applied in HSI classification. However, current methods still face issues with geometric constraints, contribution fuzziness of central pixels, and interaction gap between local and further areas.
Owing to rich spectral and spatial information, hyperspectral image (HSI) can be utilized for finely classifying different land covers. With the emergence of deep learning techniques, convolutional neural network (CNN), fully convolutional network (FCN), and recurrent neural network (RNN) have been widely applied in the field of HSI classification. Recently, transformer-based approaches represented by Vision Transformer (ViT) have yielded promising performance on numerous tasks and have been introduced to classify HSI. However, existing methods based on the above architectures still face three crucial issues that limit the classification performance: 1) geometric constraints caused by input data, 2) contribution fuzziness of central pixels with details, and 3) interaction gap between local areas and further environments. To tackle the above problems, an interactive learning framework inspired by ViT is proposed from a center to surrounding perspective, namely the center-to -surrounding interactive learning (CSIL) framework. Different from existing works, the CSIL framework enables to achieve multi-scale, detail-aware, and space-interactive classification based on a well-designed hierarchical re-gion sampling strategy, center transformer, and surrounding transformer. Specifically, a hierarchical region sampling strategy is first proposed to flexibly generate the center region, neighbor region, and surrounding re-gion, respectively. Thus, multi-scale input data breaks the geometric constraints. Second, a center transformer is presented to obtain core characteristics in detail based on the center region. In this way, central pixels are remarkably highlighted and the details are easily perceived. Third, a surrounding transformer including inter-active self-attention learning is formulated for interacting both locally fine-grained distributions in the neighbor region and further coarse-grained environments in the surrounding region. With this structure, short-and long-term dependencies can be modeled, emphasized, and exchanged to bridge the interaction gap. Finally, the features from center transformer and surrounding transformer are integrated, then fed into a multi-layer per-ceptron for the optimization of semantic representation. Extensive experiments on six HSI datasets including small-, medium-, and large-scale scenes demonstrate the superiority over state-of-the-art CNN-, FCN-, RNN-and transformer-based approaches, even with very few training samples (for example 0.19% in complex HanChuan city scene). The source code will be available soon at https://github.com/jqyang22/CSIL.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available