4.3 Article

Improved YOLOv5-S object detection method for optical remote sensing images based on contextual transformer

Journal

JOURNAL OF ELECTRONIC IMAGING
Volume 31, Issue 4, Pages -

Publisher

SPIE-SOC PHOTO-OPTICAL INSTRUMENTATION ENGINEERS
DOI: 10.1117/1.JEI.31.4.043049

Keywords

deep learning; object detection; YOLOv5; attention mechanism; multiscale

Ask authors/readers for more resources

In this paper, an improved remote sensing image detection method based on YOLOv5-S is proposed to address the problems of error and omission detection caused by diverse scale changes and abundant small-scale objects in remote sensing images. The method includes strategies such as data enhancement, introduction of contextual transformer module, addition of shallow detection scale, adoption of multiscale complex fusion structure, and use of efficient intersection over union loss. Experimental results on two optical remote sensing image datasets show that the proposed method outperforms other models in terms of detection efficiency and significantly improves the detection of small-scale objects in remote sensing images.
To address the problems of error and omission detection in remote sensing image detection caused by the diverse scale changes of remote sensing object scales and the abundant proportion of small-scale objects, as well as the global and dense distribution of remote sensing objects, a remote sensing image detection improvement method based on YOLOv5-S is proposed. First, according to the characteristics of remote sensing objects, the data enhancement strategy is adopted to expand the dataset samples for the characteristics of remote sensing objects to improve the generalization ability of the model. Second, the contextual transformer module is introduced to the backbone feature extraction network and the feature fusion network to ensure the local feature extraction capability while improving the global information acquisition capability of the model, making full use of the input contextual information and guiding the dynamic attention matrix learning to improve the visual representation ability. Third, based on the original model, a shallow detection scale is added, and then a multiscale complex fusion structure is adopted. Meanwhile, the K-means++ algorithm replaces the original K-means algorithm and then clusters 12 anchor box sizes. Fourth, the efficient intersection over union loss is used to improve the accuracy of the remote sensing object recognition prediction. In the experiment on the on two optical remote sensing image datasets, a comparison with several object detection algorithms based on convolutional neural network is made, the results show that the mAP@0.5 tested on the remote sensing datasets is higher than the original YOLOv5-S. Compared with other models, the detection efficiency is higher, and the problems of small-scale object detection in remote sensing image have been significantly improved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.3
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available