4.7 Article

Iterative self-organizing SCEne-LEvel sampling (ISOSCELES) for large-scale building extraction

期刊

GISCIENCE & REMOTE SENSING
卷 59, 期 1, 页码 1-16

出版社

TAYLOR & FRANCIS LTD
DOI: 10.1080/15481603.2021.2006433

关键词

Deep learning; convolutional neural network; sampling; building extraction

资金

  1. US Department of Energy (DOE) [DE-AC05-00OR22725]

向作者/读者索取更多资源

Convolutional neural networks (CNN) excel in computer vision tasks, but achieving good generalization requires representative training samples. To address the diversity and high cost of remote sensing imagery data, researchers have developed the ISOSCELES method, showing promising results in training set selection.
Convolutional neural networks (CNN) provide state-of-the-art performance in many computer vision tasks, including those related to remote-sensing image analysis. Successfully training a CNN to generalize well to unseen data, however, requires training on samples that represent the full distribution of variation of both the target classes and their surrounding contexts. With remote sensing data, acquiring a sufficiently representative training set is a challenge due to both the inherent multi-modal variability of satellite or aerial imagery and the general high cost of labeling data. To address this challenge, we have developed ISOSCELES, an Iterative Self-Organizing SCEne LEvel Sampling method for hierarchical sampling of large image sets. Using affinity propagation, ISOSCELES automates the selection of highly representative training images. Compared to random sampling or using available reference data, the distribution of the training is principally data driven, reducing the chance of oversampling uninformative areas or undersampling informative ones. In comparison to manual sample selection by an analyst, ISOSCELES exploits descriptive features, spectral and/or textural, and eliminates human bias in sample selection. Using a hierarchical sampling approach, ISOSCELES can obtain a training set that reflects both between-scene variability, such as in viewing angle and time of day, and within-scene variability at the level of individual training samples. We verify the method by demonstrating its superiority to stratified random sampling in the challenging task of adapting a pre-trained model to a new image and spatial domain for country-scale building extraction. Using a pair of hand-labeled training sets comprising 1,987 sample image chips, a total of 496,000,000 individually labeled pixels, we show, across three distinct model architectures, an increase in accuracy, as measured by F1-score, of 2.2-4.2%.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据