期刊
IEEE ACCESS
卷 10, 期 -, 页码 120329-120342出版社
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/ACCESS.2022.3218456
关键词
Visibility estimation; CNN; Swin-T; multi-feature stream; DDT matrix
This paper proposes a novel end-to-end framework named STCN-Net for visibility estimation, combining engineered features and learned features to achieve higher accuracy. The method uses a new 3D multi-feature stream Matrix, called DDT, which integrates CNN and Transformer for visibility estimation. Experimental results show that the method outperforms classical methods on two datasets.
Low visibility always leads to serious traffic accidents worldwide, although extensive works are studied to deal with the estimation of visibility in meteorology areas, it is still a tough problem. Deep learning-based visibility estimation methods, it has low accuracy due to lacking specific features of the foggy images. Meanwhile, physical model-based visibility estimation methods are only applicable to some specific scenes due to its high requirements for extra auxiliary parameters. Therefore, This paper proposes a novel end-to-end framework named STCN-Net for visibility estimation, which combined the engineered features and learned features to achieve higher accuracy. Specifically, a novel 3D multi-feature stream Matrix, named DDT, is designed for visibility estimation, which is consisted of a transmittance matrix, a dark channel matrix, and a depth matrix. Unlike traditional deep learning methods which only use convolutional neural networks (CNN) to deal with the input data or images, our method combines CNN and Transformer to process the input data or images. In STCN-Net, Swin-Transformer(Swin-T) module takes the original image as input while the CNN module takes the DDT matrix as input. Moreover, in order to integrate different feature information from the CNN and Swin-T, we embed a Coordinate Attention (CA) module in STCN-Net. Finally, two visibility datasets: Visibility Image Dataset I (VID I) and Visibility Image Dataset II (VID II) were constructed for evaluation where VID I is a real scene visibility dataset and VID II is a synthetic visibility dataset. The experimental results show that our method has better performance than classical methods on the two datasets. And compared with the runner-up, it has 2.1% more accuracy in VID I and 0.5% in VID II.
作者
我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。
推荐
暂无数据