☆ 4.6 Article

Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition

IEEE TRANSACTIONS ON CYBERNETICS (2019)

期刊

IEEE TRANSACTIONS ON CYBERNETICS

卷 49, 期 5, 页码 1791-1802

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCYB.2018.2813971

关键词

Bilinear pooling; convolutional neural networks (CNNs); fine-grained visual recognition; long-short term memory (LSTM) units; visual attention

类别

Automation & Control Systems Computer Science, Artificial Intelligence Computer Science, Cybernetics

资金

Australian Research Council [DP140102270]
University of Sydney Business School ARC Bridging Fund

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

摘要

Fine-grained visual recognition is an important problem in pattern recognition applications. However, it is a challenging task due to the subtle interclass difference and large intraclass variation. Recent visual attention models are able to automatically locate critical object parts and represent them against appearance variations. However, without consideration of spatial dependencies in discriminative feature learning, these methods are underperformed in classifying fine-grained objects. In this paper, we present a deep attention-based spatially recursive model that can learn to attend to critical object parts and encode them into spatially expressive representations. Our network is technically premised on bilinear pooling, enabling local pairwise feature interactions between outputs from two different convolutional neural networks (CNNs) that correspond to distinct region detection and relevant feature extraction. Then, spatial long-short term memory (LSTMs) units are introduced to generate spatially meaningful hidden representations via the long-range dependency on all features in two dimensions. The attention model is leveraged between bilinear outcomes and spatial LSTMs for dynamic selection on varied inputs. Our model, which is composed of two-stream CNN layers, bilinear pooling, and spatial recursive encoding with attention, is end-to-end trainable to serve as the part detector and feature extractor whereby relevant features are localized, extracted, and encoded spatially for recognition purpose. We demonstrate the superiority of our method over two typical fine-grained recognition tasks: fine-grained image classification and person re-identification.

Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Deep Attention-Based Spatially Recursive Networks for Fine-Grained Visual Recognition

期刊

IEEE TRANSACTIONS ON CYBERNETICS

出版社

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文