☆ 4.7 Article

Aggregating Rich Hierarchical Features for Scene Classification in Remote Sensing Imagery

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING (2017)

Journal

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Volume 10, Issue 9, Pages 4104-4115

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/JSTARS.2017.2705419

Keywords

Convolutional neural networks (CNNs); mixed-resolution representation; remote sensing scene classification; vector of locally aggregated descriptors (VLAD)

Funding

National Natural Science Foundation of China [61403375, 61472119, 61573352, 61375024, 91338202, 91646207]
Priority Academic Program Development of Jiangsu Higher Education Institutions
Jiangsu Collaborative Innovation Center on Atmospheric Environment and Equipment Technology

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Abstract

Scene classification is one of the most important issues in remote sensing image processing. To obtain a high discriminative feature representation for an image to be classified, traditional methods usually consider to densely accumulate hand-crafted low-level descriptors (e.g., scale-invariant feature transform) by feature encoding techniques. However, the performance is largely limited by the hand-crafted descriptors as they are not capable of describing the rich semantic information contained in various remote sensing images. To alleviate this problem, we propose a novel method to extract discriminative image features from the rich hierarchical information contained in convolutional neural networks (CNNs). Specifically, the low-level and middle-level intermediate convolutional features are, respectively, encoded by vector of locally aggregated descriptors (VLAD) and then reduced by principal component analysis to obtain hierarchical global features; meanwhile, the fully connected features are average pooled and subsequently normalized to form new global features. The proposed encoded mixed-resolution representation (EMR) is the concatenation of all the above-mentioned global features. Due to the usage of encoding strategies (VLAD and average pooling), our method can deal with images of different sizes. In addition, to reduce the computational consumption in the training stage, we directly extract EMR from VGG-VD and ResNet pretrained on the ImageNet dataset. We show in this paper that CNNs pretrained on the natural image dataset are more easily applied to the remote sensing dataset when the local structure similarity between two datasets is higher. Experimental evaluations on the UC-Merced and Brazilian Coffee Scenes datasets demonstrate that our method is superior to the state of the art.

Aggregating Rich Hierarchical Features for Scene Classification in Remote Sensing Imagery

Journal

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Aggregating Rich Hierarchical Features for Scene Classification in Remote Sensing Imagery

Journal

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper