4.6 Article

WS-OPE: Weakly Supervised 6-D Object Pose Regression Using Relative Multi-Camera Pose Constraints

Journal

IEEE ROBOTICS AND AUTOMATION LETTERS
Volume 7, Issue 2, Pages 3703-3710

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/LRA.2022.3146924

Keywords

Pose estimation; Three-dimensional displays; Detectors; Training; Cameras; Pipelines; Feature extraction; Weak supervision; object pose estimation

Categories

Funding

  1. National Natural Science Foundation of China [61803375, 91948303]

Ask authors/readers for more resources

A novel scalable, end-to-end 6-D pose regression method with weak supervision is proposed, which uses 2-D bounding boxes and object sizes as the only labels and constraints during training with multiple images of known relative poses, leading to better learning of 6-D pose embeddings compared to fully supervised methods and ensuring real-time performance by direct pose regression.
Precise annotation of 6-D poses in real data is intricate and time-consuming, however, an essential requirement to train pose estimation pipelines. We propose a way for scalable, end-to-end 6-D pose regression with weak supervision to avoid this problem. Our method requires neither 3-D models nor 6-D object poses as ground truth. Instead, we use 2-D bounding boxes and object sizes as the only labels and constrain the problem with multiple images of known relative poses during training. A novel Rotated-IoU loss brings together a pose prediction from an image with labeled 2-D bounding boxes of the corresponding object in other views. Our rotation estimation combines an initial coarse pose classification with an offset regression using a continuous rotation parametrization that allows for direct pose estimation. At test time, the model still uses only a single image to predict a 6-D pose. We observe that multi-view constraints and our rotation representation used during training lead to better learning of 6-D pose embeddings in comparison to fully supervised methods. Experiments on several datasets show that the proposed method is capable of predicting poses of good quality, in spite being trained with only weak labels. Direct pose regression without the need for a consecutive refinement stage thereby ensures real-time performance.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available