4.7 Article

Multi-modal unsupervised domain adaptation for semantic image segmentation

Journal

PATTERN RECOGNITION
Volume 137, Issue -, Pages -

Publisher

ELSEVIER SCI LTD
DOI: 10.1016/j.patcog.2022.109299

Keywords

Unsupervised domain adaptation; Multi -modal learning; Self -supervised learning; Knowledge transfer; Semantic segmentation

Ask authors/readers for more resources

In this paper, a novel multi-modal-based Unsupervised Domain Adaptation (UDA) method named MMADT is proposed for semantic segmentation. It utilizes both RGB and depth images and introduces a Depth Fusion Block (DFB) and Depth Adversarial Training (DAT) to address the depth discrepancy between the source and target domain. Besides, a self-supervised multi-modal depth estimation assistant network named Geo-Assistant (GA) is proposed to align the feature space of RGB and depth. Experimental results show significant performance improvement in multiple synthetic to real adaptation benchmarks.
We propose a novel multi-modal-based Unsupervised Domain Adaptation (UDA) method for semantic segmentation. Recently, depth has proven to be a relevent property for providing geometric cues to en-hance the RGB representation. However, existing UDA methods solely process RGB images or additionally cultivate depth-awareness with an auxiliary depth estimation task. We argue that geometric cues that are crucial to semantic segmentation, such as local shape and relative position, are challenging to recover from an auxiliary depth estimation task with mere color (RGB) information. In this paper, we propose a novel multi-modal UDA method named MMADT, which relies on both RGB and depth images as input. In particular, we design a Depth Fusion Block (DFB) to recalibrate depth information and leverage Depth Ad-versarial Training (DAT) to bridge the depth discrepancy between the source and target domain. Besides, we propose a self-supervised multi-modal depth estimation assistant network named Geo-Assistant (GA) to align the feature space of RGB and depth and shape the sensitivity of our MMADT to depth infor-mation. We experimentally observed significant performance improvement in multiple synthetic to real adaptation benchmarks, i.e., SYNTHIA-to-Cityscapes, GTA5-to-Cityscapes and SELMA-to-Cityscapes. Addi-tionally, our multi-modal UDA scheme is easy to port to other UDA methods with a consistent perfor-mance boost. (c) 2023 Elsevier Ltd. All rights reserved.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available