4.7 Article

Aerial scene understanding in the wild: Multi-scene recognition via prototype-based memory networks

Journal

Publisher

ELSEVIER
DOI: 10.1016/j.isprsjprs.2021.04.006

Keywords

Convolutional neural network (CNN); Multi-scene recognition in single images; Memory network; Multi-scene aerial image dataset; Multi-head attention-based memory retrieval; Prototype learning

Funding

  1. European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme [ERC-2016-StG-714087]
  2. Helmholtz Association through the Helmholtz AI - Local Unit Munich Unit @Aeronautics, Space and Transport (MASTr) [ZT-I-PF-5-01]
  3. Helmholtz Association through Helmholtz Excellent Professorship Data Science in Earth Observation - Big Data Fusion for Urban Research
  4. German Federal Ministry of Education and Research (BMBF) of the international future AI lab AI4EO -Artificial Intelligence for Earth Observation: Reasoning, Uncertainties, Ethics and Beyond [01DD20001]

Ask authors/readers for more resources

This paper proposes a method for recognizing multiple scenes in a single image by leveraging prototype learning, external memory, and multi-head attention mechanism. Experimental results demonstrate the effectiveness of this approach in aerial scene recognition.
Aerial scene recognition is a fundamental visual task and has attracted an increasing research interest in the last few years. Most of current researches mainly deploy efforts to categorize an aerial image into one scene-level label, while in real-world scenarios, there often exist multiple scenes in a single image. Therefore, in this paper, we propose to take a step forward to a more practical and challenging task, namely multi-scene recognition in single images. Moreover, we note that manually yielding annotations for such a task is extraordinarily time- and labor-consuming. To address this, we propose a prototype-based memory network to recognize multiple scenes in a single image by leveraging massive well-annotated single-scene images. The proposed network consists of three key components: 1) a prototype learning module, 2) a prototype-inhabiting external memory, and 3) a multi-head attention-based memory retrieval module. To be more specific, we first learn the prototype representation of each aerial scene from single-scene aerial image datasets and store it in an external memory. Afterwards, a multi-head attention-based memory retrieval module is devised to retrieve scene prototypes relevant to query multi-scene images for final predictions. Notably, only a limited number of annotated multiscene images are needed in the training phase. To facilitate the progress of aerial scene recognition, we produce a new multi-scene aerial image (MAI) dataset. Experimental results on variant dataset configurations demonstrate the effectiveness of our network. Our dataset and codes are publicly available(1).

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.7
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available