4.7 Article

Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing

Journal

Publisher

ELSEVIER
DOI: 10.1016/j.isprsjprs.2022.08.010

Keywords

Graph reasoning; Hierarchical representation; Multimodal remote sensing; Semantic segmentation; Transformer

Funding

  1. National Key R&D Program of China [2021YFB3900504]
  2. National Natural Science Foundation of China [61725105, 62171436]

Ask authors/readers for more resources

The article proposes a hierarchical graph network called GraFNet for multimodal semantic segmentation in remote sensing scenes. It addresses the challenges of object diversity and cross-modal gap by introducing a new modeling paradigm and utilizing semantic topological graphs. Extensive experiments demonstrate that GraFNet outperforms existing methods on various datasets.
As a large amount of earth observation data is available on a global scale, it becomes possible to apply multimodal semantic segmentation technology to remote sensing scene analysis. However, the diversity of objects in large-scale scenes and the cross-modal gap between different images are still challenging in practical applications. To address these problems, we propose a Transformer-Induced Hierarchical Graph Network (GraFNet) for multimodal semantic segmentation in remote sensing scenes, which promotes the exploration of potential intra- and inter-modal relations by introducing a new modeling paradigm. Different from existing methods, GraFNet parses multimodal remote sensing images into semantic topological graphs, and exploits the structural information of land cover categories to learn joint representations. Specifically, an attentive heterogeneous information aggregation mechanism is presented to parse diverse objects in remote sensing scenes into semantic entities, and capture modality-specific object-object interaction patterns in a topology-aware environment. In addition, modality hierarchical dependency modeling is introduced to encode the interactive representation of cross-modal objects, and distinguish the modality-specific contribution to improve cross-modal compatibility. Extensive experiments on several multimodal remote sensing datasets demonstrate that the proposed GraFNet outperforms the state-of-the-art approaches, achieving F-1/mIoU accuracy 91.1%/82.4% on the ISPRS Vaihingen dataset, 93.4%/88.4% on ISPRS Potsdam dataset, and 91.8%/84.0% on the MSAW dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available