☆ 4.5 Article Proceedings Paper

LinkNet: 2D-3D linked multi-modal network for online semantic segmentation of RGB-D videos

COMPUTERS & GRAPHICS-UK (2021)

期刊

COMPUTERS & GRAPHICS-UK

卷 98, 期 -, 页码 37-47

出版社

PERGAMON-ELSEVIER SCIENCE LTD

DOI: 10.1016/j.cag.2021.04.013

关键词

2D; 3D scene understanding; Online RGB-D segmentation; Semantic segmentation; Multi-modal network

类别

Computer Science, Software Engineering

资金

Natural Science Foundation of China [61902210, 61521002]
Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

LinkNet is a 2D-3D linked multi-modal network proposed for online RGB-D video semantic segmentation, which utilizes crucial information from 3D scenes and different views, combining RGB information and geometric features from 3D point clouds for semantic feature learning, achieving stable and effective segmentation results.

This paper proposes LinkNet, a 2D-3D linked multi-modal network served for online semantic segmentation of RGB-D videos, which is essential for real-time applications such as robot navigation. Existing methods for RGB-D semantic segmentation usually work in the regular image domain, which allows efficient processing using convolutional neural networks (CNNs). However, RGB-D videos are captured from a 3D scene, and different frames can contain useful information of the same local region from different views. Working solely in the image domain fails to utilize such crucial information. Our novel approach is based on joint 2D and 3D analysis. The online process is realized simultaneously with 3D scene reconstruction, from which we set up 2D-3D links between continuous RGB-D frames and 3D point cloud. We combine image color and view-insensitive geometric features generated from the 3D point cloud for multi-modal semantic feature learning. Our LinkNet further uses a recurrent neural network (RNN) module to dynamically maintain the hidden semantic states during 3D fusion, and refines the voxel-based labeling results. The experimental results on SceneNet [1] and ScanNet [2] demonstrate that the semantic segmentation results of our framework are stable and effective. (c) 2021 Elsevier Ltd. All rights reserved.

LinkNet: 2D-3D linked multi-modal network for online semantic segmentation of RGB-D videos

期刊

COMPUTERS & GRAPHICS-UK

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

LinkNet: 2D-3D linked multi-modal network for online semantic segmentation of RGB-D videos

期刊

COMPUTERS & GRAPHICS-UK

出版社

PERGAMON-ELSEVIER SCIENCE LTD

关键词

类别

资金

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文