4.6 Article

Deeply Encoding Stable Patterns From Contaminated Data for Scenery Image Recognition

Journal

IEEE TRANSACTIONS ON CYBERNETICS
Volume 51, Issue 12, Pages 5671-5680

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TCYB.2019.2951798

Keywords

Semantics; Noise measurement; Feature extraction; Image recognition; Probabilistic logic; Cybernetics; Robots; Deep learning; machine learning; probabilistic model; scene categorization; stable templates

Funding

  1. National Natural Science Foundation of China [61871470, 61761130079]
  2. National Key Research and Development Program of China [2018YFB1403600]

Ask authors/readers for more resources

This article introduces a new deep architecture that calculates scene categories by hierarchically deriving stable templates discovered using a generative model. Thorough experiments show that a novel aggregation network helps to distinguish scene categories in a multiclass SVM with CNN features.
Effectively recognizing different sceneries with complex backgrounds and varied lighting conditions plays an important role in modern AI systems. Competitive performance has recently been achieved by the deep scene categorization models. However, these models implicitly hypothesize that the image-level labels are 100% correct, which is too restrictive. Practically, the image-level labels for massive-scale scenery sets are usually calculated by external predictors such as ImageNet-CN. These labels can easily become contaminated because no predictors are completely accurate. This article proposes a new deep architecture that calculates scene categories by hierarchically deriving stable templates, which are discovered using a generative model. Specifically, we first construct a semantic space by incorporating image-level labels using subspace embedding. Afterward, it is noticeable that in the semantic space, the superpixel distributions from identically labeled images remain unchanged, regardless of the image-level label noises. On the basis of this observation, a probabilistic generative model learns the stable templates for each scene category. To deeply represent each scenery category, a novel aggregation network is developed to statistically concatenate the CNN features learned from scene annotations predicted by HSA. Finally, the learned deep representations are integrated into an image kernel, which is subsequently incorporated into a multiclass SVM for distinguishing scene categories. Thorough experiments have shown the performance of our method. As a byproduct, an empirical study of 33 SIFT-flow categories shows that the learned stable templates remain almost unchanged under a nearly 36% image label contamination rate.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available