Journal
PATTERN RECOGNITION
Volume 76, Issue -, Pages 704-714Publisher
ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2017.10.002
Keywords
Fine-grained image recognition; Deep descriptor selection; Part localization
Funding
- National Natural Science Foundation of China [61772256, 61422203]
- Collaborative Innovation Center of Novel Software Technology and Industrialization
Ask authors/readers for more resources
Fine-grained image recognition is a challenging computer vision problem, due to the small inter-class variations caused by highly similar subordinate categories, and the large intra-class variations in poses, scales and rotations. In this paper, we prove that selecting useful deep descriptors contributes well to fine-grained image recognition. Specifically, a novel Mask-CNN model without the fully connected layers is proposed. Based on the part annotations, the proposed model consists of a fully convolutional network to both locate the discriminative parts (e.g., head and torso), and more importantly generate weighted object/part masks for selecting useful and meaningful convolutional descriptors. After that, a three-stream Mask-CNN model is built for aggregating the selected object- and part-level descriptors simultaneously. Thanks to discarding the parameter redundant fully connected layers, our Mask-CNN has a small feature dimensionality and efficient inference speed by comparing with other fine-grained approaches. Furthermore, we obtain a new state-of-the-art accuracy on two challenging fine-grained bird species categorization datasets, which validates the effectiveness of both the descriptor selection scheme and the proposed Mask-CNN model. (C) 2017 Elsevier Ltd. All rights reserved.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available