4.5 Article

A multichannel location-aware interaction network for visual classification

Journal

APPLIED INTELLIGENCE
Volume -, Issue -, Pages -

Publisher

SPRINGER
DOI: 10.1007/s10489-023-04734-x

Keywords

Feature enhancement; Global feature; Positional attention; Multiscale network; Channel clustering

Ask authors/readers for more resources

This paper proposes an efficient global channel position-aware interaction method for fine-grained visual classification. The method enhances the receptive field of global features by hierarchically grouping original features and utilizing the translation-invariant linearity and local weight sharing of convolutional networks. Same-direction location attention interaction is performed based on the global feature to capture common areas of interest. Multiple attention feature map is obtained based on the relative position interactions of the global features, and discriminative feature regions are learned and optimized for guiding the classification process. The proposed model performs well on CUB-200-2011, Stanford Cars, and FGVC Aircraft datasets.
Fine-grained visual classification aims to identify images that belong to multiple subcategories within the same category. This is a challenging task as there are only subtle regional differences between subcategories. Most of the existing methods utilize neural networks to extract global image features and quickly lock local feature regions by adding various external attention mechanisms. This type of approach may ignore the details that are inherent in the feature map itself. This paper proposes an efficient global channel position-aware interaction method to solve this problem. Specifically, we first hierarchically group the original features and take advantage of the translation-invariant linearity and local weight sharing of convolutional networks to propose a hierarchical structure that enhances the receptive field of global features. Then, same-direction location attention interaction is performed based on the global feature with rich fields of view, thus encouraging the model to capture its common areas of interest according to the feature's own learning ability. Finally, multiple attention feature map is obtained based on the relative position interactions of the global features. We again use convolutional networks to learn the discriminative features of the attention target regions and perform feature clustering optimization on the discriminative feature regions to guide the classification process. The proposed model performs well on three datasets, i.e. CUB-200-2011, Stanford Cars, and FGVC Aircraft.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.5
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available