4.5 Article

A multichannel location-aware interaction network for visual classification

期刊

APPLIED INTELLIGENCE
卷 -, 期 -, 页码 -

出版社

SPRINGER
DOI: 10.1007/s10489-023-04734-x

关键词

Feature enhancement; Global feature; Positional attention; Multiscale network; Channel clustering

向作者/读者索取更多资源

This paper proposes an efficient global channel position-aware interaction method for fine-grained visual classification. The method enhances the receptive field of global features by hierarchically grouping original features and utilizing the translation-invariant linearity and local weight sharing of convolutional networks. Same-direction location attention interaction is performed based on the global feature to capture common areas of interest. Multiple attention feature map is obtained based on the relative position interactions of the global features, and discriminative feature regions are learned and optimized for guiding the classification process. The proposed model performs well on CUB-200-2011, Stanford Cars, and FGVC Aircraft datasets.
Fine-grained visual classification aims to identify images that belong to multiple subcategories within the same category. This is a challenging task as there are only subtle regional differences between subcategories. Most of the existing methods utilize neural networks to extract global image features and quickly lock local feature regions by adding various external attention mechanisms. This type of approach may ignore the details that are inherent in the feature map itself. This paper proposes an efficient global channel position-aware interaction method to solve this problem. Specifically, we first hierarchically group the original features and take advantage of the translation-invariant linearity and local weight sharing of convolutional networks to propose a hierarchical structure that enhances the receptive field of global features. Then, same-direction location attention interaction is performed based on the global feature with rich fields of view, thus encouraging the model to capture its common areas of interest according to the feature's own learning ability. Finally, multiple attention feature map is obtained based on the relative position interactions of the global features. We again use convolutional networks to learn the discriminative features of the attention target regions and perform feature clustering optimization on the discriminative feature regions to guide the classification process. The proposed model performs well on three datasets, i.e. CUB-200-2011, Stanford Cars, and FGVC Aircraft.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.5
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据