4.6 Article

Double-Attention YOLO: Vision Transformer Model Based on Image Processing Technology in Complex Environment of Transmission Line Connection Fittings and Rust Detection

Journal

MACHINES
Volume 10, Issue 11, Pages -

Publisher

MDPI
DOI: 10.3390/machines10111002

Keywords

transmission line connection fittings; multi-scale target detection; Vision Transformer; image defogging technology; attention mechanism; model compression and optimization

Funding

  1. Natural Science Basis Research Plan in Shaanxi Province of China [2022JQ-568]
  2. Scientific Research Program - Shaanxi Provincial Education Department [21JK0661]
  3. Key Research and Development Projects in Shaanxi Province [2021GY-306]
  4. Key R&D plan of Shannxi [2021GY-320, 2020ZDLGY09-10]

Ask authors/readers for more resources

This paper proposes an image processing method based on an improved dark channel defogging algorithm, fusion channel spatial attention mechanism, Vision Transformer, and GhostNet model compression method for small target detection in complex environments. The experimental results show that this method can improve the detection performance of the model.
Transmission line fittings have been exposed to complex environments for a long time. Due to the interference of haze and other environmental factors, it is often difficult for the camera to obtain high quality on-site images, and the traditional image processing technology and convolution neural networks find it difficult to effectively deal with the dense detection task of small targets with occlusion interference. Therefore, an image processing method based on an improved dark channel defogging algorithm, the fusion channel spatial attention mechanism, Vision Transformer, and the GhostNet model compression method is proposed in this paper. Based on the global receptive field of the saliency region capture and enhancement model, a small target detection network Double-attention YOLO for complex environments is constructed. The experimental results show that embedding a multi-head self-attention component into a convolutional neural network can help the model to better interpret the multi-scale global semantic information of images. In this way, the model learns more easily the distinguishable features in the image representation. Embedding an attention mechanism module can make the neural network pay more attention to the salient region of image. Dual attention fusion can balance the global and local characteristics of the model, to improve the performance of model detection.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available