4.7 Article

Capsule-inferenced Object Detection for Remote Sensing Images

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSTARS.2023.3266794

Keywords

CapsNet; object detection; remote sensing image; transformer

Ask authors/readers for more resources

Frequent and accurate object detection based on remote sensing images is important. The DETR model falls short in complex remote sensing scenes where entity information and relative positions between objects are critical. In this article, we propose CI_DETR, a detection model that uses capsule inference to improve remote sensing object detection. Our approach incorporates a multilevel feature fusion module, a capsule reasoning module, and a sausage model, resulting in superior detection performance compared to current detectors.
Frequent and accurate object detection based on remote sensing images can effectively monitor dynamic objects on the earth's surface. While the detection transformer (DETR) offers a simple encoder-decoder structure and a direct set prediction approach to object detection, it falls short in complex remote sensing scenes where entity information and relative positions between objects are critical to target reasoning. Notably, the DETR model's feedforward neural network (FFN) relies on weighted summation for target reasoning, disregarding interactive feature information, which is a major factor affecting detection effectiveness. To address these shortcomings, in this article, we propose a DETR-based detection model called (CI_DETR), which uses capsule inference to improve remote sensing object detection. Our approach adds a multilevel feature fusion module to the DETR network, allowing the network to learn how to spatially alter features at different levels, preserving only beneficial information for combination. In addition, we introduce a capsule reasoning module to mine entity information during inference and more effectively model the hierarchical correlation of internal knowledge representation in the neural network, consistent with the thinking model of the human brain. Lastly, we employ a sausage model to measure the similarities and differences of capsules, projecting them onto a curved surface for nonlinear function approximation and maximum preservation of the local responsiveness of capsule entities. Our experiments on public datasets confirm the superior detection performance of our proposed algorithm relative to many current detectors.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available