4.7 Article

CBNet: A Composite Backbone Network Architecture for Object Detection

Journal

IEEE TRANSACTIONS ON IMAGE PROCESSING
Volume 31, Issue -, Pages 6893-6906

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/TIP.2022.3216771

Keywords

Deep learning; object detection; backbone networks; composite architectures

Funding

  1. National Natural Science Foundation of China [62176007]

Ask authors/readers for more resources

In this paper, a novel and flexible backbone framework called CBNet is proposed for constructing high-performance object detectors. CBNet integrates multiple identical backbones and gradually expands the receptive field to improve object detection. The experimental results show that CBNet is a more efficient and effective approach for building high-performance backbone networks compared to simply increasing network depth and width.
Modern top-performing object detectors depend heavily on backbone networks, whose advances bring consistent performance gains through exploring more effective network structures. In this paper, we propose a novel and flexible backbone framework, namely CBNet, to construct high-performance detectors using existing open-source pre-trained backbones under the pre-training fine-tuning paradigm. In particular, CBNet architecture groups multiple identical backbones, which are connected through composite connections. Specifically, it integrates the high- and low-level features of multiple identical backbone networks and gradually expands the receptive field to more effectively perform object detection. We also propose a better training strategy with auxiliary supervision for CBNet-based detectors. CBNet has strong generalization capabilities for different backbones and head designs of the detector architecture. Without additional pre-training of the composite backbone, CBNet can be adapted to various backbones (i.e., CNN-based vs. Transformer-based) and head designs of most mainstream detectors (i.e., one-stage vs. two-stage, anchor-based vs. anchor-freebased). Experiments provide strong evidence that, compared with simply increasing the depth and width of the network, CBNet introduces a more efficient, effective, and resource-friendly way to build high-performance backbone networks. Particularly, our CB-Swin-L achieves 59.4% box AP and 51.6% mask AP on COCO test-dev under the single-model and single-scale testing protocol, which are significantly better than the state-of-the-art results (i.e., 57.7% box AP and 50.2% mask AP) achieved by Swin-L, while reducing the training time by 6 x . With multiscale testing, we push the current best single model result to a new record of 60.1% box AP and 523% mask AP without using extra training data. Code is available at https://github.com/VDIGPKU/CBNetV2/.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available