4.8 Article

P-CNN: Part-Based Convolutional Neural Networks for Fine-Grained Visual Categorization

Publisher

IEEE COMPUTER SOC
DOI: 10.1109/TPAMI.2019.2933510

Keywords

Visualization; Training; Detectors; Streaming media; Measurement; Feature extraction; Convolutional neural networks; Part localization network; part classification network; duplex focal loss; fine-grained visual categorization

Funding

  1. National Key Research and Development Plan of China [2018YFB1402600]
  2. National Science Foundation of China (NSFC) [61701415, 61772425, 61773315, 61790552]
  3. Young Star of Science and Technology in Shaanxi Province [2018KJXX-029]
  4. Fundamental Research Funds for the Central Universities [3102018zy023, 3102019AX09]
  5. Australian Research Council Future Fellowship [FT180100116]

Ask authors/readers for more resources

This paper proposes a new end-to-end fine-grained visual categorization system called P-CNN, which consists of three modules: SE block for recalibrating feature responses, PLN for locating object parts, and PCN for part classification. The paper also introduces new metric learning and part classification techniques. Experimental results demonstrate the effectiveness of this approach.
This paper proposes an end-to-end fine-grained visual categorization system, termed Part-based Convolutional Neural Network (P-CNN), which consists of three modules. The first module is a Squeeze-and-Excitation (SE) block, which learns to recalibrate channel-wise feature responses by emphasizing informative channels and suppressing less useful ones. The second module is a Part Localization Network (PLN) used to locate distinctive object parts, through which a bank of convolutional filters are learned as discriminative part detectors. Thus, a group of informative parts can be discovered by convolving the feature maps with each part detector. The third module is a Part Classification Network (PCN) that has two streams. The first stream classifies each individual object part into image-level categories. The second stream concatenates part features and global feature into a joint feature for the final classification. In order to learn powerful part features and boost the joint feature capability, we propose a Duplex Focal Loss used for metric learning and part classification, which focuses on training hard examples. We further merge PLN and PCN into a unified network for an end-to-end training process via a simple training technique. Comprehensive experiments and comparisons with state-of-the-art methods on three benchmark datasets demonstrate the effectiveness of our proposed method.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available