4.7 Article

CSDS: End-to-End Aerial Scenes Classification With Depthwise Separable Convolution and an Attention Mechanism

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSTARS.2021.3117857

Keywords

Feature extraction; Remote sensing; Convolutional neural networks; Semantics; Convolution; Correlation; Training; Channel-spatial attention; convolutional neural network (CNN); depthwise separable convolution (DS-Conv); scene classification

Funding

  1. New-Generation AI Major Scientific and Technological Special Project of Tianjin [18ZXZNGX00150]
  2. Special Foundation for Technology Innovation of Tianjin [21YDTPJC00250]

Ask authors/readers for more resources

This article proposes a channel-spatial attention mechanism based on a depthwise separable convolution (CSDS) network for aerial scene classification. Experimental results on three public datasets show that the CSDS network achieves comparable performance to other state-of-the-art methods. Visualization of feature extraction results and ablation experiments demonstrate the powerful feature learning and representation capabilities of the proposed CSDS network.
Compared with natural scenes, aerial scenes are usually composed of numerous objects densely distributed within the aerial view, and thus, more key local semantic features are needed to describe them. However, when existing CNNs are used for remote sensing image classification, they typically focus on the global semantic features of the image, and especially for deep models, shallow and intermediate features are easily lost. This article proposes a channel-spatial attention mechanism based on a depthwise separable convolution (CSDS) network for aerial scene classification to solve these challenges. First, we construct a depthwise separable convolution (DS-Conv) and pyramid residual connection architecture. DS-Conv extracts features from each channel and merges them, effectively reducing the number of necessary calculations, and the pyramid residual connections connect the features from multiple layers and create associations. Then, the channel-spatial attention algorithm causes the model to obtain more effective features in the channel and spatial domains. Finally, an improved cross-entropy loss function is used to reduce the impact of similar categories on backpropagation. Comparative experiments on three public datasets show that the CSDS network can achieve results comparable to those of other state-of-the-art methods. In addition, visualization of feature extraction results by the Grad-CAM algorithm and ablation experiments for each module reflect the powerful feature learning and representation capabilities of the proposed CSDS network.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available