4.6 Article

Image scene geometry recognition using low-level features fusion at multi-layer deep CNN

Journal

NEUROCOMPUTING
Volume 440, Issue -, Pages 111-126

Publisher

ELSEVIER
DOI: 10.1016/j.neucom.2021.01.085

Keywords

Image Scene Geometry recognition; Multi-layer CNN features; Low-level features; GoogLeNet; ResNet

Ask authors/readers for more resources

The proposed novel model of image scene geometry recognition integrates low-level handcrafted features with deep CNN multi-stage features using feature-fusion and score-level fusion strategies, resulting in improved recognition accuracy compared to existing models. By combining the advantages of both types of fusion, the proposed model outperforms other models in terms of recognition accuracy on different image datasets.
The image scene geometry recognition is an important element for reconstructing the 3D scene geometry of a single image. It is useful for computer vision applications, such as 3D TV, video categorization, and robot navigation system. A 3D scene geometry with a unique depth represents a rough structure of 2D images. An approach to efficient implementation and achieving high recognition accuracy of 3D scene geometry remains a significant challenges in the computer vision domain. Existing approaches attempt to use the pre-trained deep convolutional neural networks (CNN) models as feature extractor and also explore the benefits of multi-layer features representation for small or medium-size datasets. However, these studies pay little attention to building a discriminative feature representation by exploring the benefits of low-level features fusion with multi-layer feature from a single CNN model. To address this problem, we propose a novel model of image scene geometry recognition in which the low-level handcrafted features are integrated with deep CNN multi-stage features (HF-MSF) by using the feature-fusion and score-level fusion strategies. The low-level features contain rich discriminative information of 3D scene geometry, including shape, color, and depth estimation. In feature-fusion, the multi layer features at different stages and handcrafted features are fused at an early phase, and in score-level fusion, the handcrafted features are integrated with multi-layer feature of a single CNN model at different stages and each stage is connected with a classifier and then score-level fusion of these classifiers is performed automatically to recognize the scene geometry type. For validation and comparison purposes, two well-known deep learning architectures, namely GoogLeNet and ResNet are employed as a backbone of proposed model. Experimental results exhibited that by taking the advantages of both types of fusion, the proposed HF-MSF model has an improved recognition accuracy of 12.21% and 4.96% when compared to G-MS2F model for 12-Scene and 15-Scene image datasets, respectively. Similarly, it improves the accuracy by 3.85% when compared with the FTOTLM model for the 15-Scene dataset. (c) 2021 Elsevier B.V. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available