4.7 Article

A Low-Altitude Remote Sensing Inspection Method on Rural Living Environments Based on a Modified YOLOv5s-ViT

Journal

REMOTE SENSING
Volume 14, Issue 19, Pages -

Publisher

MDPI
DOI: 10.3390/rs14194784

Keywords

Vision Transformer; attention mechanism; target detection; unmanned aerial vehicle (UAV); YOLOv5

Funding

  1. National Key Research and Development Program of China [2019YFD1101105]
  2. Natural Science Foundation of Hebei Province [F2022204004]
  3. Hebei Province Key Research and Development Program [20327402D,19227210D.]

Ask authors/readers for more resources

This study proposes a low-altitude remote sensing method based on a modified YOLOv5s-ViT model for detecting rural living environments. By modifying the BottleNeck structure, embedding the SimAM attention mechanism module, and incorporating the Vision Transformer component, the model's feature capture capability and perception ability are improved. Experimental results show that the modified model achieves improvements in Precision, Recall, and mAP compared to the original model, while reducing the number of parameters and computation volume. This study provides new ideas for enhancing the digital capability of governance in rural living environments.
The governance of rural living environments is one of the important tasks in the implementation of a rural revitalization strategy. At present, the illegal behaviors of random construction and random storage in public spaces have seriously affected the effectiveness of the governance of rural living environments. The current supervision on such problems mainly relies on manual inspection. Due to the large number and wide distribution of rural areas to be inspected, this method is limited by obvious disadvantages, such as low detection efficiency, long-time spending, and huge consumption of human resources, so it is difficult to meet the requirements of efficient and accurate inspection. In response to the difficulties encountered, a low-altitude remote sensing inspection method on rural living environments was proposed based on a modified YOLOv5s-ViT (YOLOv5s-Vision Transformer) in this paper. First, the BottleNeck structure was modified to enhance the multi-scale feature capture capability of the model. Then, the SimAM attention mechanism module was embedded to intensify the model's attention to key features without increasing the number of parameters. Finally, the Vision Transformer component was incorporated to improve the model's ability to perceive global features in the image. The testing results of the established model showed that, compared with the original YOLOv5 network, the Precision, Recall, and mAP of the modified YOLOv5s-ViT model improved by 2.2%, 11.5%, and 6.5%, respectively; the total number of parameters was reduced by 68.4%; and the computation volume was reduced by 83.3%. Relative to other mainstream detection models, YOLOv5s-ViT achieved a good balance between detection performance and model complexity. This study provides new ideas for improving the digital capability of the governance of rural living environments.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available