4.7 Article

A Low-Altitude Remote Sensing Inspection Method on Rural Living Environments Based on a Modified YOLOv5s-ViT

期刊

REMOTE SENSING
卷 14, 期 19, 页码 -

出版社

MDPI
DOI: 10.3390/rs14194784

关键词

Vision Transformer; attention mechanism; target detection; unmanned aerial vehicle (UAV); YOLOv5

资金

  1. National Key Research and Development Program of China [2019YFD1101105]
  2. Natural Science Foundation of Hebei Province [F2022204004]
  3. Hebei Province Key Research and Development Program [20327402D,19227210D.]

向作者/读者索取更多资源

This study proposes a low-altitude remote sensing method based on a modified YOLOv5s-ViT model for detecting rural living environments. By modifying the BottleNeck structure, embedding the SimAM attention mechanism module, and incorporating the Vision Transformer component, the model's feature capture capability and perception ability are improved. Experimental results show that the modified model achieves improvements in Precision, Recall, and mAP compared to the original model, while reducing the number of parameters and computation volume. This study provides new ideas for enhancing the digital capability of governance in rural living environments.
The governance of rural living environments is one of the important tasks in the implementation of a rural revitalization strategy. At present, the illegal behaviors of random construction and random storage in public spaces have seriously affected the effectiveness of the governance of rural living environments. The current supervision on such problems mainly relies on manual inspection. Due to the large number and wide distribution of rural areas to be inspected, this method is limited by obvious disadvantages, such as low detection efficiency, long-time spending, and huge consumption of human resources, so it is difficult to meet the requirements of efficient and accurate inspection. In response to the difficulties encountered, a low-altitude remote sensing inspection method on rural living environments was proposed based on a modified YOLOv5s-ViT (YOLOv5s-Vision Transformer) in this paper. First, the BottleNeck structure was modified to enhance the multi-scale feature capture capability of the model. Then, the SimAM attention mechanism module was embedded to intensify the model's attention to key features without increasing the number of parameters. Finally, the Vision Transformer component was incorporated to improve the model's ability to perceive global features in the image. The testing results of the established model showed that, compared with the original YOLOv5 network, the Precision, Recall, and mAP of the modified YOLOv5s-ViT model improved by 2.2%, 11.5%, and 6.5%, respectively; the total number of parameters was reduced by 68.4%; and the computation volume was reduced by 83.3%. Relative to other mainstream detection models, YOLOv5s-ViT achieved a good balance between detection performance and model complexity. This study provides new ideas for improving the digital capability of the governance of rural living environments.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据