☆ 4.7 Article

CViTF-Net: A Convolutional and Visual Transformer Fusion Network for Small Ship Target Detection in Synthetic Aperture Radar Images

REMOTE SENSING (2023)

期刊

REMOTE SENSING

卷 15, 期 18, 页码 -

出版社

MDPI

DOI: 10.3390/rs15184373

关键词

synthetic aperture radar (SAR); small ship targets; transformer; convolutional and visual transformer fusion network (CViTF-Net); ship detection

类别

Environmental Sciences Geosciences, Multidisciplinary Remote Sensing Imaging Science & Photographic Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

In this study, a novel two-stage detector called CViTF-Net is proposed to detect small ship targets in complex SAR images. The detection performance of CViTF-Net is enhanced through the design of a pyramid structured CViT backbone, a Gaussian prior discrepancy assigner, and a level synchronized attention mechanism. Experimental results show that CViTF-Net achieves the highest comprehensive evaluation results for the LS-SSDD-v1.0 dataset, demonstrating its effectiveness in enhancing the detection performance for small ship targets in complex scenes.

Detecting small ship targets in large-scale synthetic aperture radar (SAR) images with complex backgrounds is challenging. This difficulty arises due to indistinct visual features and noise interference. To address these issues, we propose a novel two-stage detector, namely a convolutional and visual transformer fusion network (CViTF-Net), and enhance its detection performance through three innovative modules. Firstly, we designed a pyramid structured CViT backbone. This design leverages convolutional blocks to extract low-level and local features, while utilizing transformer blocks to capture inter-object dependencies over larger image regions. As a result, the CViT backbone adeptly integrates local and global information to bolster the feature representation capacity of targets. Subsequently, we proposed the Gaussian prior discrepancy (GPD) assigner. This assigner employs the discrepancy of Gaussian distributions in two dimensions to assess the degree of matching between priors and ground truth values, thus refining the discriminative criteria for positive and negative samples. Lastly, we designed the level synchronized attention mechanism (LSAM). This mechanism simultaneously considers information from multiple layers in region of interest (RoI) feature maps, and adaptively adjusts the weights of diverse regions within the final RoI. As a result, it enhances the capability to capture both target details and contextual information. We achieved the highest comprehensive evaluation results for the public LS-SSDD-v1.0 dataset, with an mAP of 79.7% and an F1 of 80.8%. In addition, the robustness of the CViTF-Net was validated using the public SSDD dataset. Visualization of the experimental results indicated that CViTF-Net can effectively enhance the detection performance for small ship targets in complex scenes.

CViTF-Net: A Convolutional and Visual Transformer Fusion Network for Small Ship Target Detection in Synthetic Aperture Radar Images

期刊

REMOTE SENSING

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

CViTF-Net: A Convolutional and Visual Transformer Fusion Network for Small Ship Target Detection in Synthetic Aperture Radar Images

期刊

REMOTE SENSING

出版社

MDPI

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文