4.6 Article

Multi-dimensional, multi-functional and multi-level attention in YOLO for underwater object detection

Journal

NEURAL COMPUTING & APPLICATIONS
Volume 35, Issue 27, Pages 19935-19960

Publisher

SPRINGER LONDON LTD
DOI: 10.1007/s00521-023-08781-w

Keywords

Underwater object detection; YOLO detector; Attention perception; Attention calibration; Synergistic advantage

Ask authors/readers for more resources

Underwater object detection is essential for autonomous operation and ocean exploration of underwater robots. To address the challenges of poor imaging quality, harsh underwater environments, and concealed underwater targets, we propose a multi-dimensional, multi-functional, and multi-level attention module (mDFLAM). Our approach enhances the robustness, flexibility, and diversity of attention perception through strategies such as multi-dimensional information collection, capturing channel semantic information, and extracting intrinsic information under different receptive fields.
Underwater object detection is a prerequisite for underwater robots to achieve autonomous operation and ocean exploration. However, poor imaging quality, harsh underwater environments and concealed underwater targets greatly aggravate the difficulty of underwater object detection. In order to reduce underwater background interference and improve underwater object perception, we propose a multi-dimensional, multi-functional and multi-level attention module (mDFLAM). The multi-dimensional strategy first enhances the robustness of attention application by collecting valuable information in different target dimensions. The multi-functional strategy further improves the flexibility of attention calibration by capturing the importance of channel semantic information and the dependence of spatial location information. The multi-level strategy finally enriches the diversity of attention perception by extracting the intrinsic information under different receptive fields. In pre-processing and post-processing stages, cross-splitting and cross-linking stimulate the synergistic calibration advantage of multi-dimensional and multi-functional attention by redistributing channel dimensions and restoring feature states. In the attention calibration stage, adaptive fusion stimulates the synergistic calibration advantage of multi-level attention by assigning learnable parameters. In order to meet the high-precision and real-time requirements for underwater object detection, we integrate the plug-and-play mDFLAM into YOLO detectors. The full-port embedding further strengthens the semantic information expression by improving the feature fusion quality between scales. In underwater detection tasks, ablation and comparison experiments demonstrate the rationality and effectiveness of our attention design. In other detection tasks, our work shows good robustness and generalization.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available