4.7 Article

Scale-Invariant Multi-Level Context Aggregation Network for Weakly Supervised Building Extraction

Journal

REMOTE SENSING
Volume 15, Issue 5, Pages -

Publisher

MDPI
DOI: 10.3390/rs15051432

Keywords

building extraction; high-resolution remote sensing image; weakly supervised semantic segmentation; self-attentive aggregation; class activation map

Ask authors/readers for more resources

Weakly supervised semantic segmentation methods using only image-level annotations are gaining popularity in automated building extraction. Class activation maps (CAMs) are crucial for these methods but often result in inaccurate and incomplete results. We propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs, achieving state-of-the-art results on building extraction with high completeness.
Weakly supervised semantic segmentation (WSSS) methods, utilizing only image-level annotations, are gaining popularity for automated building extraction due to their advantages in eliminating the need for costly and time-consuming pixel-level labeling. Class activation maps (CAMs) are crucial for weakly supervised methods to generate pseudo-pixel-level labels for training networks in semantic segmentation. However, CAMs only activate the most discriminative regions, leading to inaccurate and incomplete results. To alleviate this, we propose a scale-invariant multi-level context aggregation network to improve the quality of CAMs in terms of fineness and completeness. The proposed method has integrated two novel modules into a Siamese network: (a) a self-attentive multi-level context aggregation module that generates and attentively aggregates multi-level CAMs to create fine-structured CAMs and (b) a scale-invariant optimization module that cooperates with mutual learning and coarse-to-fine optimization to improve the completeness of CAMs. The results of the experiments on two open building datasets demonstrate that our method achieves new state-of-the-art building extraction results using only image-level labels, producing more complete and accurate CAMs with an IoU of 0.6339 on the WHU dataset and 0.5887 on the Chicago dataset, respectively.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available