☆ 4.7 Article

Active Learning Based 3D Semantic Labeling From Images and Videos

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY (2022)

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Volume 32, Issue 12, Pages 8101-8115

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

DOI: 10.1109/TCSVT.2021.3079991

Keywords

Three-dimensional displays; Semantics; Solid modeling; Image segmentation; Labeling; Image reconstruction; Annotations; Semantic segmentation; geometric constraint; 3D semantic mesh model; active learning

Funding

National Natural Science Foundation of China [61991423, 61873265, 62073320, 61632003]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

In this paper, an active learning based 3D semantic labeling method is proposed to generate accurate 3D semantic mesh models by integrating 2D semantic segmentation results and 3D mesh models. Through the iterative process of training, fusion, and selection, the labeling quality is improved while reducing the amount of annotation needed.

3D semantic segmentation is one of the most fundamental problems for 3D scene understanding and has attracted much attention in the field of computer vision. In this paper, we propose an active learning based 3D semantic labeling method for large-scale 3D mesh model generated from images or videos. Taking as input a 3D mesh model reconstructed from the image based 3D modeling system, coupled with the calibrated images, our method outputs a fine 3D semantic mesh model in which each facet is assigned a semantic label. There are three major steps in our framework: 2D semantic segmentation, 2D-3D semantic fusion, and batch image selection. A limited annotation image set is first used to fine-tune a pre-trained semantic segmentation network for obtaining the pixel-wise semantic probability maps. Then all these maps are back-projected into 3D space and fused on the 3D mesh model using Markov Random Field optimization, thus yield a preliminary 3D semantic mesh model and a heat model showing each facet's confidence. This 3D semantic model is used as a reliable supervisor to select the parts that are not well segmented for manual annotation to boost the performance of the 2D semantic segmentation network, as well as the 3D mesh labeling, in the next iteration. This Training-Fusion-Selection process continues until the label assignment of the 3D mesh model becomes steady. By this means, we significantly reduce the amount for annotation but not the labeling quality of 3D semantic models. Extensive experiments demonstrate the effectiveness and generalization ability of our method on a wide variety of datasets.

Active Learning Based 3D Semantic Labeling From Images and Videos

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Active Learning Based 3D Semantic Labeling From Images and Videos

Journal

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper