☆ 4.7 Article

A complementary dual-backbone transformer extracting and fusing weak cues for object detection in extremely dark videos

INFORMATION FUSION (2023)

期刊

INFORMATION FUSION

卷 97, 期 -, 页码 -

出版社

ELSEVIER

DOI: 10.1016/j.inffus.2023.101822

关键词

Low-light video; Object detection; Transformer; Feature aggregation; Feature fusion

类别

Computer Science, Artificial Intelligence Computer Science, Theory & Methods

向作者/读者索取更多资源

Protocol

社区支持

Reagent

社区支持

智能总结 New
摘要

Reliable object detection in dark environments is severely challenged by noise and uneven radiance. To address this, we propose illumination-aware spatio-temporal feature fusion modules for low-light video object detection under a TRansformer network structure. Extensive experiments validate the effectiveness of our approach and demonstrate that DVD-TR outperforms state-of-the-art video detectors on a large-scale multi-illuminance dark video benchmark.

Reliable object detection under dark environment is of wide applications but severely challenged by heavy noise washing out informative features and uneven radiance caused by nighttime illuminations. These unique features of dark videos would largely degenerate the performance of existing detectors. To address this issue, specially designed algorithms being able to extract and fuse the weak features buried in the low-quality videos are of vital importance. Bearing these in mind, we propose illumination-aware spatio-temporal feature fusion modules for low-light video object detection and implement a Dark Video Detector under a TRansformer network structure, dubbed as DVD-TR. Firstly, we use a dual-backbone Transformer to extract separate complementary features and fuse them to strengthen the network's feature extraction capability. Secondly, we incorporate a spatio-temporal sampling mechanism to aggregate features from multiple frames, which can enhance detection accuracy in dark videos. Thirdly, we use a small encoder-decoder network to obtain irradiance distribution which is further incorporated for illumination-aware feature fusion. Extensive experiments on large-scale multi-illuminance dark video benchmark show that DVD-TR outperforms state-of-the-art video detectors by a large margin and validate the effectiveness of the proposed approach.

A complementary dual-backbone transformer extracting and fusing weak cues for object detection in extremely dark videos

期刊

INFORMATION FUSION

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

A complementary dual-backbone transformer extracting and fusing weak cues for object detection in extremely dark videos

期刊

INFORMATION FUSION

出版社

ELSEVIER

关键词

类别

向作者/读者索取更多资源

Protocol

Reagent

作者

我是这篇论文的作者

评论

主要评分

次要评分

新颖性

重要性

科学严谨性

评价这篇论文

推荐

导出引文

分享论文