☆ 4.7 Article

Spatial Pyramid Attention for Deep Convolutional Neural Networks

IEEE TRANSACTIONS ON MULTIMEDIA (2021)

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

卷 23, 期 -, 页码 3048-3058

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TMM.2021.3068576

关键词

Object detection; Feature extraction; Convolutional codes; Computer architecture; Benchmark testing; Topology; Task analysis; Attention mechanism; convolutional neural network; image classification; object detection; spatial pyramid structure; structural regularization; structural information

类别

Computer Science, Information Systems Computer Science, Software Engineering Telecommunications

资金

National Science Foundation [CNS-1852134, OAC-2017564, ECCS-2010332, CNS-2037982, CNS-1563750]
Fujitsu Laboratories of America Inc.

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The study introduces a novel Spatial Pyramid Attention Network (SPANet) that enhances feature representation by combining structural information and channel relationships. SPANet can be flexibly applied to various CNN architectures and shows significant performance improvement on four benchmark datasets.

Attention mechanisms have shown great success in computer vision. However, the commonly used global average pooling in some implementations aggregates a three-dimensional feature map to a one-dimensional attention map, leading a significant loss of structural information in the attention learning. In this article, we present a novel Spatial Pyramid Attention Network (SPANet), which exploits the structural information and channel relationships for better feature representation. SPANet enhances a base network by adding Spatial Pyramid Attention (SPA) blocks laterally. By rethinking the self-attention mechanism design, we further present three topology structures of attention path connection for our SPANet. They can be flexibly applied to various CNN architectures. SPANet is conceptually simple but practically powerful. It uses both structural regularization and structural information to achieve better learning capability. We have comprehensively evaluated the performance of SPANet on four benchmark datasets for different visual tasks. The experimental results show that SPANet significantly improves the recognition accuracy without adding much computation overhead. Using SPANet, we achieve an improvement of 1.6% top-1 classification accuracy on the ImageNet 2012 benchmark based on ResNet50, and SPANet outperforms SENet and other attention methods. SPANet also significantly improves the object detection performance by a clear margin with negligible additional computation overhead. When applying SPANet to RetinaNet based on the ResNet50 backbone, we improve the performance of the baseline model by 2.3 mAP and the enhanced model outperforms SENet and GCNet by 1.1 mAP and 1.7 mAP respectively. The code of SPANet is made publicly available.(1) (1) [Online]. Available: https://github.com/13952522076/SPANet_TMM

Spatial Pyramid Attention for Deep Convolutional Neural Networks

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Spatial Pyramid Attention for Deep Convolutional Neural Networks

期刊

IEEE TRANSACTIONS ON MULTIMEDIA

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文