4.6 Article

Sparse Attention Module for optimizing semantic segmentation performance combined with a multi-task feature extraction network

Journal

VISUAL COMPUTER
Volume 38, Issue 7, Pages 2473-2488

Publisher

SPRINGER
DOI: 10.1007/s00371-021-02124-3

Keywords

Semantic segmentation; Sparse Attention Module; Class attention features; Multi-task

Funding

  1. Fundamental Research Funds for the Central Universities [JUSRP41908]
  2. National Natural Science Foundation of China [61201429, 61362030]
  3. China Postdoctoral Science Foundation [2015M581720, 2016M600360]
  4. Jiangsu Postdoctoral Science Foundation [1601216C]

Ask authors/readers for more resources

The paper proposes a Sparse Attention Model combined with a powerful multi-task feature extraction network to reduce computing resource consumption in semantic segmentation. By using a Class Attention Module, the model ensures that query vectors capture dense contextual information efficiently.
In the task of semantic segmentation, researchers often use self-attention module to capture long-range contextual information. These methods are often effective. However, the use of the self-attention module will cause a problem that cannot be ignored, that is, the huge consumption of computing resources. Therefore, how to reduce the resource consumption of the self-attention module under the premise of ensuring performance is a very meaningful research topic. In this paper, we propose a Sparse Attention Model combined with a powerful multi-task feature extraction network for semantic segmentation. Compared with the classic self-attention model, our Sparse Attention Model does not calculate the inner product between pairs of all vectors. Instead, we first sparse the feature block Query and the feature block Key defined in self-attention module through the credit matrix generated by the pre-output. Then, we perform similarity modeling on the two sparse feature blocks. Meanwhile, to ensure that the vectors in Query could capture dense contextual information, we design a Class Attention Module and embed it into Sparse Attention Module. Note that, compared with Dual Attention Network for scene segmentation, our attention module greatly reduces the consumption of computing resources while ensuring the accuracy. Furthermore, in the stage of feature extraction, the use of downsampling will cause serious loss of detailed information and affect the segmentation performance of the network, so we adopt a multi-task feature extraction network. It learns semantic features and edge features in parallel, and we feed the learned edge features into the deep layer of the network to help restore detailed information for capturing high-quality semantic features. We do not use pure concatenation. Instead, we extract the edge features related to each channel by element-wise multiplication before concatenation. Finally, we conduct experiments on three datasets: Cityscapes, PASCAL VOC2012 and ADE20K, and obtain competitive results.

Authors

I am an author on this paper
Click your name to claim this paper and add it to your profile.

Reviews

Primary Rating

4.6
Not enough ratings

Secondary Ratings

Novelty
-
Significance
-
Scientific rigor
-
Rate this paper

Recommended

No Data Available
No Data Available