Journal
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
Volume 16, Issue -, Pages 5260-5270Publisher
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSTARS.2023.3266794
Keywords
CapsNet; object detection; remote sensing image; transformer
Ask authors/readers for more resources
Frequent and accurate object detection based on remote sensing images is important. The DETR model falls short in complex remote sensing scenes where entity information and relative positions between objects are critical. In this article, we propose CI_DETR, a detection model that uses capsule inference to improve remote sensing object detection. Our approach incorporates a multilevel feature fusion module, a capsule reasoning module, and a sausage model, resulting in superior detection performance compared to current detectors.
Frequent and accurate object detection based on remote sensing images can effectively monitor dynamic objects on the earth's surface. While the detection transformer (DETR) offers a simple encoder-decoder structure and a direct set prediction approach to object detection, it falls short in complex remote sensing scenes where entity information and relative positions between objects are critical to target reasoning. Notably, the DETR model's feedforward neural network (FFN) relies on weighted summation for target reasoning, disregarding interactive feature information, which is a major factor affecting detection effectiveness. To address these shortcomings, in this article, we propose a DETR-based detection model called (CI_DETR), which uses capsule inference to improve remote sensing object detection. Our approach adds a multilevel feature fusion module to the DETR network, allowing the network to learn how to spatially alter features at different levels, preserving only beneficial information for combination. In addition, we introduce a capsule reasoning module to mine entity information during inference and more effectively model the hierarchical correlation of internal knowledge representation in the neural network, consistent with the thinking model of the human brain. Lastly, we employ a sausage model to measure the similarities and differences of capsules, projecting them onto a curved surface for nonlinear function approximation and maximum preservation of the local responsiveness of capsule entities. Our experiments on public datasets confirm the superior detection performance of our proposed algorithm relative to many current detectors.
Authors
I am an author on this paper
Click your name to claim this paper and add it to your profile.
Reviews
Recommended
No Data Available