4.7 Article

Depth-Guided Progressive Network for Object Detection

Journal

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS
Volume 23, Issue 10, Pages 19523-19533

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TITS.2022.3156365

Keywords

Feature extraction; Object detection; Detectors; Interference; Signal to noise ratio; Semantics; Location awareness; Object detection; multi-scale object; depth-guided; progressive sampling

Funding

  1. National Key Research and Development Program of China [2020AAA09701]
  2. National Science Fund for Distinguished Young Scholars [62125601]
  3. National Natural Science Foundation of China [62076024, 62006018]

Ask authors/readers for more resources

In this paper, the authors propose a Depth-Guided Progressive Network (DGPNet) for multi-scale object detection. The depth estimation is used to guide the image features, enhancing the discrimination among multi-scale objects, and a progressive sampling strategy is employed to obtain high-quality predicted boxes. Experimental results show that the proposed method outperforms state-of-the-art methods on the KINS and Cityscapes dataset.
Multi-scale object detection in natural scenes is still challenging. To enhance the multi-scale perception capability, some algorithms combine the lower-level and higher-level information via multi-scale feature fusion strategies. However, the inherent spatial properties among instances and relations between foreground and background are ignored. In addition, the human-defined ``center-based'' regression quality evaluation strategy, predicting a high-to-low score based on a linear relationship with the distance to the center of ground-truth box, is not robust to scale-variant objects. In this work, we propose a Depth-Guided Progressive Network (DGPNet) for multi-scale object detection. Specifically, besides the prediction of classification and localization, the depth is estimated and used to guide the image features in a weighted manner to obtain a better spatial representation. Therefore, depth estimation and 2D object detection are simultaneously learned via a unified network, where the depth features are merged as auxiliary information into the detection branch to enhance the discrimination among multi-scale objects. Moreover, to overcome the difficulty of empirically fitting the localization quality function, high-quality predicted boxes on scale-variant objects are more adaptively obtained by an IoU-aware progressive sampling strategy. We divide the sampling process into two stages, i.e., ``statistical-aware'' and ``IoU-aware''. The former selects thresholds for positive samples based on statistical characteristics of multi-scale instances, and the latter further selects high-quality samples by IoU on the basis of the former. Therefore, the final ranking scores better reflect the quality of localization. Experiments verify that our method outperforms state-of-the-art methods on the KINS and Cityscapes dataset.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available