4.6 Article

DMFNet: Deep Multi-Modal Fusion Network for RGB-D Indoor Scene Segmentation

Journal

IEEE ACCESS
Volume 7, Issue -, Pages 169350-169358

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2019.2955101

Keywords

Image segmentation; Decoding; Convolution; Feature extraction; Data mining; Fuses; Computer architecture; Indoor scene segmentation; encoder-decoder architecture; squeeze-and-excitation ResNet; multiple average pooling layers; pyramid supervision

Funding

  1. National Natural Science Foundation of China [61502429, 61672337, 61971247]
  2. Zhejiang Key Research and Development Program [2019C03135]
  3. Zhejiang Provincial Natural Science Foundation of China [LY18F020012]

Ask authors/readers for more resources

Indoor scene segmentation is a difficult task in computer vision. We propose an indoor scene segmentation framework, called DFMNet, incorporating RGB and complementary depth information to establish indoor scene segmentation. We use the squeeze-and-excitation residual network as encoder to simultaneously extract features from RGB and depth data and fuse them in the decoder. Multiple average pooling layers and transposed convolution layers are used to process the encoded outputs and fuse their outputs over several decoder layers. To optimize the network parameters, we use a pyramid supervision training scheme, which applies supervised learning over different layers in the decoder to prevent vanishing gradients. We evaluated the proposed DFMNet on the NYU Depth V2 dataset, which consists of 1449 cluttered indoor scenes, achieving competitive results compared to state-of-the-art methods.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available