4.7 Article

Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing

期刊

出版社

ELSEVIER
DOI: 10.1016/j.isprsjprs.2022.08.010

关键词

Graph reasoning; Hierarchical representation; Multimodal remote sensing; Semantic segmentation; Transformer

资金

  1. National Key R&D Program of China [2021YFB3900504]
  2. National Natural Science Foundation of China [61725105, 62171436]

向作者/读者索取更多资源

The article proposes a hierarchical graph network called GraFNet for multimodal semantic segmentation in remote sensing scenes. It addresses the challenges of object diversity and cross-modal gap by introducing a new modeling paradigm and utilizing semantic topological graphs. Extensive experiments demonstrate that GraFNet outperforms existing methods on various datasets.
As a large amount of earth observation data is available on a global scale, it becomes possible to apply multimodal semantic segmentation technology to remote sensing scene analysis. However, the diversity of objects in large-scale scenes and the cross-modal gap between different images are still challenging in practical applications. To address these problems, we propose a Transformer-Induced Hierarchical Graph Network (GraFNet) for multimodal semantic segmentation in remote sensing scenes, which promotes the exploration of potential intra- and inter-modal relations by introducing a new modeling paradigm. Different from existing methods, GraFNet parses multimodal remote sensing images into semantic topological graphs, and exploits the structural information of land cover categories to learn joint representations. Specifically, an attentive heterogeneous information aggregation mechanism is presented to parse diverse objects in remote sensing scenes into semantic entities, and capture modality-specific object-object interaction patterns in a topology-aware environment. In addition, modality hierarchical dependency modeling is introduced to encode the interactive representation of cross-modal objects, and distinguish the modality-specific contribution to improve cross-modal compatibility. Extensive experiments on several multimodal remote sensing datasets demonstrate that the proposed GraFNet outperforms the state-of-the-art approaches, achieving F-1/mIoU accuracy 91.1%/82.4% on the ISPRS Vaihingen dataset, 93.4%/88.4% on ISPRS Potsdam dataset, and 91.8%/84.0% on the MSAW dataset.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.7
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据