3.8 Proceedings Paper

POP-NET: ENCODER-DUAL DECODER FOR SEMANTIC SEGMENTATION AND SINGLE-VIEW HEIGHT ESTIMATION

Publisher

IEEE
DOI: 10.1109/igarss.2019.8897927

Keywords

single-view semantic 3D challenge; pyramid on pyramid; semantic segmentation; height estimation

Ask authors/readers for more resources

The single-view semantic 3D challenge in 2019 Data Fusion Contest is to predict both semantic labels and normalized digital surface model (nDSM) for urban scenes from single-view satellite images. We propose a novel pyramid on pyramid network (Pop-Net) based on Encoder-Dual Decoder framework to end-to-end multi-task learning. The encoder is a deformable ResNet-101 backbone network. Two feature pyramid networks, as decoders, are responsible for semantic segmentation and height estimation, respectively. Semantic information is crucial to estimate height. Therefore, regression pyramid on the semantic pyramid is introduced to leverage semantic features to help height estimation. To deal with outliers in heights, we leverage anchor-based regression and smooth L1 loss for optimization to obtain more robust height estimation. Without bells and whistles, our single model entry achieves 77.78% mIoU and 53.40% mIoU-3 on test set, ranking 2nd in the Single-view Semantic 3D Challenge of the 2019 IEEE GRSS Data Fusion Contest. The code is available at https://github.com/Z-Zheng/PopNet.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

3.8
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available