4.7 Article

ORCNN-X: Attention-Driven Multiscale Network for Detecting Small Objects in Complex Aerial Scenes

Journal

REMOTE SENSING
Volume 15, Issue 14, Pages -

Publisher

MDPI
DOI: 10.3390/rs15143497

Keywords

deep learning; remote sensing; object detection; attention module; multiscale feature extraction

Ask authors/readers for more resources

Object detection on remote sensing images faces unique challenges compared to natural images, such as low resolution, complex backgrounds, and variations in scale and angle. In this study, a novel framework (ORCNN-X) was proposed for oriented small object detection in remote sensing images. The framework adopts a multiscale feature extraction network (ResNeSt+) with a dynamic attention module (DCSA) and an effective feature fusion mechanism (W-PAFPN) to enhance the model's perception ability and handle variations in scale and angle. Experimental results demonstrate the state-of-the-art performance of the proposed framework in terms of detection accuracy and speed.
Currently, object detection on remote sensing images has drawn significant attention due to its extensive applications, including environmental monitoring, urban planning, and disaster assessment. However, detecting objects in the aerial images captured by remote sensors presents unique challenges compared to natural images, such as low resolution, complex backgrounds, and variations in scale and angle. Prior object detection algorithms are limited in their ability to identify oriented small objects, especially in aerial images where small objects are usually obscured by background noise. To address the above limitations, a novel framework (ORCNN-X) was proposed for oriented small object detection in remote sensing images by improving the Oriented RCNN. The framework adopts a multiscale feature extraction network (ResNeSt+) with a dynamic attention module (DCSA) and an effective feature fusion mechanism (W-PAFPN) to enhance the model's perception ability and handle variations in scale and angle. The proposed framework is evaluated based on two public benchmark datasets, DOTA and HRSC2016. The experiments demonstrate its state-of-the-art performance in aspects of detection accuracy and speed. The presented model can also represent more objective spatial location information according to the feature visualization maps. Specifically, our model outperforms the baseline model by 1.43% mAP50 and 1.37% mAP(12) on DOTA and HRSC2016 datasets, respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available