4.7 Article

BS2T: Bottleneck Spatial-Spectral Transformer for Hyperspectral Image Classification

Journal

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TGRS.2022.3185640

Keywords

Feature extraction; Image classification; Transformers; Convolutional neural networks; Three-dimensional displays; Neural networks; Task analysis; 3-D convolutional neural network (CNN); bottleneck spatial-spectral transformer (BS2T); dual-branch neural network; hyperspectral (HS) image classification; multihead self-attention (MHSA); positional encoding

Funding

  1. National Natural Science Foundation of China [41971388]
  2. Innovation Team Support Program of Liaoning Higher Education Department [LT2017013]

Ask authors/readers for more resources

This article proposes a novel bottleneck spatial-spectral transformer (BS2T) for capturing the long-range global dependencies of hyperspectral (HS) image pixels. It replaces convolutional operations with multihead spatial-spectral self-attention (MHS2A) to overcome the limitations of CNN-based HS image classification methods. A dual-branch HS image classification framework based on 3-D CNN and BS2T is defined for extracting local-global features of HS images. Experimental results demonstrate significant improvement compared to state-of-the-art methods.
Convolutional neural networks (CNNs) have been extensively applied to hyperspectral (HS) image classification tasks and achieved promising performance. However, for CNN-based HS image classification methods, it is hard to depict the dependencies among HS image pixels in long-range distanced positions and bands. Moreover, the limited receptive field of the convolutional layers extremely hinders the development of the CNN structure. To tackle these problems, in this article, the novel bottleneck spatial-spectral transformer (BS2T) is proposed to depict the long-range global dependencies of HS image pixels, which can be regarded as a feature extraction module for HS image classification networks. More specifically, inspired by bottleneck transformer in computer vision, for HS image feature extraction, the proposed BS2T is incorporated with a feature contraction module, a multihead spatial-spectral self-attention (MHS2A) module, and a feature expansion module. In this way, convolutional operations are replaced by the MHS2A to capture the long-range dependency of HS pixels regardless of their spatial position and distance. Meanwhile, in the MHS2A module, to highlight the spectral features of HS images, we introduce the spectral information and content spatial positional information to classical multihead self-attention to make the attention more positional aware and spectral aware. On this basis, a dual-branch HS image classification framework based on 3-D CNN and BS2T is defined for jointly extracting the local-global features of HS images. Experimental results on three public HS image classification datasets show that the proposed classification framework achieves a significant improvement when compared with the state-of-the-art methods. The source code of the proposed framework can be downloaded from https://github.com/srxlnnu/BS2T.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available