☆ 4.7 Article

Building Extraction from Remote Sensing Images with Sparse Token Transformers

REMOTE SENSING (2021)

Journal

REMOTE SENSING

Volume 13, Issue 21, Pages -

Publisher

MDPI

DOI: 10.3390/rs13214441

Keywords

remote sensing images; building extraction; transformers; sparse token sampler

Funding

National Key Research and Development Program of China [2019YFC15 10905]
National Natural Science Foundation of China [62125102]
Beijing Natural Science Foundation [4192034]

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper explores the potential of using transformers for efficient building extraction and designs an efficient dual-pathway transformer structure that achieves state-of-the-art accuracy on benchmark datasets.

Deep learning methods have achieved considerable progress in remote sensing image building extraction. Most building extraction methods are based on Convolutional Neural Networks (CNN). Recently, vision transformers have provided a better perspective for modeling long-range context in images, but usually suffer from high computational complexity and memory usage. In this paper, we explored the potential of using transformers for efficient building extraction. We design an efficient dual-pathway transformer structure that learns the long-term dependency of tokens in both their spatial and channel dimensions and achieves state-of-the-art accuracy on benchmark building extraction datasets. Since single buildings in remote sensing images usually only occupy a very small part of the image pixels, we represent buildings as a set of sparse feature vectors in their feature space by introducing a new module called sparse token sampler . With such a design, the computational complexity in transformers can be greatly reduced over an order of magnitude. We refer to our method as Sparse Token Transformers (STT). Experiments conducted on the Wuhan University Aerial Building Dataset (WHU) and the Inria Aerial Image Labeling Dataset (INRIA) suggest the effectiveness and efficiency of our method. Compared with some widely used segmentation methods and some state-of-the-art building extraction methods, STT has achieved the best performance with low time cost.

Building Extraction from Remote Sensing Images with Sparse Token Transformers

Journal

REMOTE SENSING

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Building Extraction from Remote Sensing Images with Sparse Token Transformers

Journal

REMOTE SENSING

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper