4.7 Article

Dense context distillation network for semantic parsing of oblique UAV images

Publisher

ELSEVIER
DOI: 10.1016/j.jag.2022.103062

Keywords

UAV; Road scene; Semantic segmentation; Deep learning; Dense context

Categories

Funding

  1. Open fund of Key Laboratory of Urban Land Resources Monitoring and Simulation, Ministry of Natural Resources [KF202106084]
  2. National Natural Science Foundation of China [41871361, 42071370]
  3. Fundamental Research Funds for the Central Universities [2042022kf1203]

Ask authors/readers for more resources

In this paper, a dense context distillation network (DCDNet) is proposed for semantic segmentation of oblique unmanned aerial vehicle (UAV) images. DCDNet effectively learns distortion-robust feature representation by densely and selectively gathering useful context from dual-scale feature maps. It also incorporates joint supervision and multi-scale feature aggregation for better learning and prediction, achieving a state-of-the-art segmentation performance on the challenging UAVid dataset with a mIoU score of 72.38%.
Semantic segmentation of oblique unmanned aerial vehicle (UAV) images serves as a foundation for many modern urban applications, such as road scene monitoring and semantic 3D modeling. However, objects in UAV images can vary intensely in size and undergo severe perspective distortion because of the oblique viewing style. Existing general segmentation models designed for ground and remote sensing images rarely considered these challenges specific to UAV images. Therefore, they have large difficulties in learning discriminative representation for simultaneously reasoning the extremely large and small objects in UAV images. In this paper, we propose a dense context distillation network (DCDNet) to learn distortion-robust feature representation for semantic segmentation of UAV images. The basic DCDNet is deployed as an dual-branch encoder-decoder architecture. To accomplish the goal of dense context distillation, DCDNet is first equipped with several cross -scale context selectors at different encoding stages to densely and selectively gather the useful context from low -to high-level dual-scale feature maps. A joint supervision is then applied to reinforce the learning of shallower features for distilling more low-level contexts that are vital to the reasoning of small or thin structures. A multi-scale feature aggregator is incorporated to adaptively fuse the long-range context during decoding, which absorbs the complementary merits of the dense context captured from feature maps of different levels. With the dense context distillation, DCDNet is more capable of offering the differently scaled objects with the required context for better learning and prediction. Extensive experiments on the challenging UAVid dataset demonstrate that our DCDNet can well adapt to the oblique UAV images, achieving a state-of-the-art segmentation performance with a mIoU score of 72.38%.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available