4.6 Article

Feature Learning Improved by Location Guidance and Supervision for Object Detection

Journal

IEEE ACCESS
Volume 9, Issue -, Pages 133335-133345

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2021.3110888

Keywords

Feature extraction; Detectors; Object detection; Convolution; Semantics; Data mining; Head; Object detection; feature alignment; multiple detection; consistency supervision

Funding

  1. National Natural Science Foundation of China [61763033, 61662049, 61866028, 61663031, 61866025]

Ask authors/readers for more resources

In recent years, single-stage detectors have shown rapid development but still lag behind multi-stage detectors in terms of detection precision. This paper proposes a novel detection model, OptNet, to address the deficiencies of single-stage detectors such as feature loss and inaccurate feature extraction. OptNet includes three modules - pyramid of attention features, feature alignment, and consistency supervision, which work together to improve detection accuracy on the MS COCO 2017 dataset.
In recent years, the single-stage detectors have been developed rapidly; however, compared with the multi-stage detectors, their detection precision is still relatively low. Single-stage detectors and multi-stage detectors are analyzes and compared in detail in this paper, which reveals that single-stage detectors suffer from some problems, including feature loss and inaccurate feature extraction. Therefore, this paper proposes a novel detection model, dubbed Optimized Network (OptNet), to alleviate these deficiencies. OptNet consists of three modules: pyramid of attention features, feature alignment and consistency supervision (CS). The pyramid of attention features, based on feature pyramid networks (FPNs), introduces a novel branch named attention FPN (AtFPN), which aggregates the multi-layer features of the backbone network and optimizes the object features by using lightweight attention modules. AtFPN alleviates the loss of the feature pyramid information and the blocking of feature transmission between adjacent layers. Meanwhile, it provides global information for the model. The feature alignment module aligns the anchor box to the feature by using the object location information to guide the network to extract precise object features. Finally, CS accelerates network optimization and reduces semantic differences between the features on different layers. In the detection stage, OptNet optimizes the prediction of the model with the first detection result to improve the accuracy. Experiments on the MS COCO 2017 dataset demonstrate that OptNet yields significant improvement in the detection precision.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available