☆ 4.5 Article

A multichannel location-aware interaction network for visual classification

APPLIED INTELLIGENCE (2023)

期刊

APPLIED INTELLIGENCE

卷 -, 期 -, 页码 -

出版社

SPRINGER

DOI: 10.1007/s10489-023-04734-x

关键词

Feature enhancement; Global feature; Positional attention; Multiscale network; Channel clustering

类别

Computer Science, Artificial Intelligence

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

This paper proposes an efficient global channel position-aware interaction method for fine-grained visual classification. The method enhances the receptive field of global features by hierarchically grouping original features and utilizing the translation-invariant linearity and local weight sharing of convolutional networks. Same-direction location attention interaction is performed based on the global feature to capture common areas of interest. Multiple attention feature map is obtained based on the relative position interactions of the global features, and discriminative feature regions are learned and optimized for guiding the classification process. The proposed model performs well on CUB-200-2011, Stanford Cars, and FGVC Aircraft datasets.

Fine-grained visual classification aims to identify images that belong to multiple subcategories within the same category. This is a challenging task as there are only subtle regional differences between subcategories. Most of the existing methods utilize neural networks to extract global image features and quickly lock local feature regions by adding various external attention mechanisms. This type of approach may ignore the details that are inherent in the feature map itself. This paper proposes an efficient global channel position-aware interaction method to solve this problem. Specifically, we first hierarchically group the original features and take advantage of the translation-invariant linearity and local weight sharing of convolutional networks to propose a hierarchical structure that enhances the receptive field of global features. Then, same-direction location attention interaction is performed based on the global feature with rich fields of view, thus encouraging the model to capture its common areas of interest according to the feature's own learning ability. Finally, multiple attention feature map is obtained based on the relative position interactions of the global features. We again use convolutional networks to learn the discriminative features of the attention target regions and perform feature clustering optimization on the discriminative feature regions to guide the classification process. The proposed model performs well on three datasets, i.e. CUB-200-2011, Stanford Cars, and FGVC Aircraft.

A multichannel location-aware interaction network for visual classification

期刊

APPLIED INTELLIGENCE

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A multichannel location-aware interaction network for visual classification

期刊

APPLIED INTELLIGENCE

出版社

SPRINGER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文