☆ 4.7 Article

Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING (2022)

期刊

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING

卷 193, 期 -, 页码 90-103

出版社

ELSEVIER

DOI: 10.1016/j.isprsjprs.2022.08.010

关键词

Graph reasoning; Hierarchical representation; Multimodal remote sensing; Semantic segmentation; Transformer

类别

Geography, Physical Geosciences, Multidisciplinary Remote Sensing Imaging Science & Photographic Technology

资金

National Key R&D Program of China [2021YFB3900504]
National Natural Science Foundation of China [61725105, 62171436]

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

The article proposes a hierarchical graph network called GraFNet for multimodal semantic segmentation in remote sensing scenes. It addresses the challenges of object diversity and cross-modal gap by introducing a new modeling paradigm and utilizing semantic topological graphs. Extensive experiments demonstrate that GraFNet outperforms existing methods on various datasets.

As a large amount of earth observation data is available on a global scale, it becomes possible to apply multimodal semantic segmentation technology to remote sensing scene analysis. However, the diversity of objects in large-scale scenes and the cross-modal gap between different images are still challenging in practical applications. To address these problems, we propose a Transformer-Induced Hierarchical Graph Network (GraFNet) for multimodal semantic segmentation in remote sensing scenes, which promotes the exploration of potential intra- and inter-modal relations by introducing a new modeling paradigm. Different from existing methods, GraFNet parses multimodal remote sensing images into semantic topological graphs, and exploits the structural information of land cover categories to learn joint representations. Specifically, an attentive heterogeneous information aggregation mechanism is presented to parse diverse objects in remote sensing scenes into semantic entities, and capture modality-specific object-object interaction patterns in a topology-aware environment. In addition, modality hierarchical dependency modeling is introduced to encode the interactive representation of cross-modal objects, and distinguish the modality-specific contribution to improve cross-modal compatibility. Extensive experiments on several multimodal remote sensing datasets demonstrate that the proposed GraFNet outperforms the state-of-the-art approaches, achieving F-1/mIoU accuracy 91.1%/82.4% on the ISPRS Vaihingen dataset, 93.4%/88.4% on ISPRS Potsdam dataset, and 91.8%/84.0% on the MSAW dataset.

Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing

期刊

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

Transformer-induced graph reasoning for multimodal semantic segmentation in remote sensing

期刊

ISPRS JOURNAL OF PHOTOGRAMMETRY AND REMOTE SENSING

出版社

ELSEVIER

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文