4.7 Article

Axial Cross Attention Meets CNN: Bibranch Fusion Network for Change Detection

Publisher

IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
DOI: 10.1109/JSTARS.2022.3224081

Keywords

Attention; change detection; remote sensing image; vision transformer

Ask authors/readers for more resources

This work proposes a bibranch fusion network based on axial cross attention (ACABFNet) for change detection research. It effectively combines local and global features extracted by a CNN branch and a transformer branch, respectively, using a bidirectional fusion approach. The experimental results show that ACABFNet outperforms existing change detection algorithms on three datasets.
In the previous years, vision transformer has demonstrated a global information extraction capability in the field of computer vision that convolutional neural network (CNN) lacks. Due to the lack of inductive bias in vision transformer, it requires a large amount of data to support its training. In the field of remote sensing, it costs a lot to obtain a significant number of high-resolution remote sensing images. Most existing change detection networks based on deep learning rely heavily on the CNN, which cannot effectively utilize the long-distance dependence between pixels for difference discrimination. Therefore, this work aims to use a high-performance vision transformer to conduct change detection research with limited data. A bibranch fusion network based on axial cross attention (ACABFNet) is proposed. The network extracts local and global information of images through the CNN branch and transformer branch, respectively, and then, fuses local and global features by the bidirectional fusion approach. In the upsampling stage, similar feature information and difference feature information of the two branches are explicitly generated by feature addition and feature subtraction. Considering that the self-attention mechanism is not efficient enough for global attention over small datasets, we propose the axial cross attention. First, global attention along the height and width dimensions of images is performed respectively, and then cross attention is used to fuse the global feature information along two dimensions. Compared with the original self-attention, the structure is more graphics processing unit friendly and efficient. Experimental results on three datasets reveal that the ACABFNet outperforms existing change detection algorithms.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available