☆ 4.7 Article

Semantic Segmentation of UAV Images Based on Transformer Framework with Context Information

MATHEMATICS (2022)

Journal

MATHEMATICS

Volume 10, Issue 24, Pages -

Publisher

MDPI

DOI: 10.3390/math10244735

Keywords

semantic segmentation; UAV street scene images; transformer; global and local context

Funding

National Research Foundation of Korea (NRF) - Korean Government (MSIT) [2021R1C1C1012590, NRF-2022R1A4A1023 248]
Project BK21 FOUR and the Information Technology Research Center (ITRC) support program - Korean Government (MSIT) [IITP-2022-2020-0-01808]
National Research Foundation of Korea [2021R1C1C1012590] Funding Source: Korea Institute of Science & Technology Information (KISTI), National Science & Technology Information Service (NTIS)

Ask authors/readers for more resources

Protocol

Community support

Reagent

Community support

Automated Summary New
Abstract

This paper proposes a Transformer-based encoder-decoder architecture for precise segmentation of UAV images. By utilizing self-attention and convolution mechanisms, the architecture captures global and local contextual information to generate semantically rich feature representations. The effectiveness of the architecture is demonstrated on UAVid and Urban Drone datasets with promising results.

With the advances in Unmanned Aerial Vehicles (UAVs) technology, aerial images with huge variations in the appearance of objects and complex backgrounds have opened a new direction of work for researchers. The task of semantic segmentation becomes more challenging when capturing inherent features in the global and local context for UAV images. In this paper, we proposed a transformer-based encoder-decoder architecture to address this issue for the precise segmentation of UAV images. The inherent feature representation of the UAV images is exploited in the encoder network using a self-attention-based transformer framework to capture long-range global contextual information. A Token Spatial Information Fusion (TSIF) module is proposed to take advantage of a convolution mechanism that can capture local details. It fuses the local contextual details about the neighboring pixels with the encoder network and makes semantically rich feature representations. We proposed a decoder network that processes the output of the encoder network for the final semantic level prediction of each pixel. We demonstrate the effectiveness of this architecture on UAVid and Urban Drone datasets, where we achieved mIoU of 61.93% and 73.65%, respectively.

Semantic Segmentation of UAV Images Based on Transformer Framework with Context Information

Journal

MATHEMATICS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Semantic Segmentation of UAV Images Based on Transformer Framework with Context Information

Journal

MATHEMATICS

Publisher

MDPI

Keywords

Categories

Funding

Ask authors/readers for more resources

Protocol

Reagent

Authors

I am an author on this paper

Reviews

Primary Rating

Secondary Ratings

Novelty

Significance

Scientific rigor

Rate this paper

Recommended

Export Citation

Share Paper