4.7 Article

A Multiscale Attention Network for Remote Sensing Scene Images Classification

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSTARS.2021.3109661

Keywords

Feature extraction; Remote sensing; Task analysis; Convolutional neural networks; Semantics; Convolution; Licenses; Remote sensing scene; multi-scale; attention; feature fusion

Funding

  1. Shenzhen College Stability Support Plan [GXWD20201230155427003-20200824113231001]
  2. General Program of National Natural Science Foundation of China (NSFC) [62106063, 62102259]

Ask authors/readers for more resources

The study introduces a multiscale attention network (MSA-Network) that combines a multiscale (MS) module and a channel and position attention (CPA) module to improve the performance of remote sensing scene image classification. By learning multiscale features and guiding the network to automatically focus on critical regions, experiments show that the network outperforms several state-of-the-art methods.
The remote sensing scene images classification has been of great value to civil and military fields. Deep learning models, especially the convolutional neural network (CNN), have achieved great success in this task, however, they may suffer from two challenges: first, the sizes of the category objects are usually different, but the conventional CNN extracts the features with fixed convolution extractor, which could cause the failure in learning the multiscale features; second, some image regions may not be useful during the feature learning process, therefore, how to guide the network to select and focus on the most relevant regions is crucially vital for remote sensing scene image classification. To address these two challenges, we propose a multiscale attention network (MSA-Network), which integrates a multiscale (MS) module and a channel and position attention (CPA) module to boost the performance of the remote sensing scene classification. The proposed MS module learns multiscale features by adopting various sizes of sliding windows from different depths' layers and receptive fields. The CPA module is composed of two parts: the channel attention (CA) module and the position attention (PA) one. The CA module learns the global attention features from channel-level, and the PA module extracts the local attention features from pixel-level. Thus, fusing both of those two attention features, the network is apt to focus on the more critical and salient regions automatically. Extensive experiments on UC Merced, AID, NWPU-RESISC45 datasets demonstrate that the proposed MSA-Network outperforms several state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available