4.6 Article

Sparse Attention Module for optimizing semantic segmentation performance combined with a multi-task feature extraction network

期刊

VISUAL COMPUTER
卷 38, 期 7, 页码 2473-2488

出版社

SPRINGER
DOI: 10.1007/s00371-021-02124-3

关键词

Semantic segmentation; Sparse Attention Module; Class attention features; Multi-task

资金

  1. Fundamental Research Funds for the Central Universities [JUSRP41908]
  2. National Natural Science Foundation of China [61201429, 61362030]
  3. China Postdoctoral Science Foundation [2015M581720, 2016M600360]
  4. Jiangsu Postdoctoral Science Foundation [1601216C]

向作者/读者索取更多资源

The paper proposes a Sparse Attention Model combined with a powerful multi-task feature extraction network to reduce computing resource consumption in semantic segmentation. By using a Class Attention Module, the model ensures that query vectors capture dense contextual information efficiently.
In the task of semantic segmentation, researchers often use self-attention module to capture long-range contextual information. These methods are often effective. However, the use of the self-attention module will cause a problem that cannot be ignored, that is, the huge consumption of computing resources. Therefore, how to reduce the resource consumption of the self-attention module under the premise of ensuring performance is a very meaningful research topic. In this paper, we propose a Sparse Attention Model combined with a powerful multi-task feature extraction network for semantic segmentation. Compared with the classic self-attention model, our Sparse Attention Model does not calculate the inner product between pairs of all vectors. Instead, we first sparse the feature block Query and the feature block Key defined in self-attention module through the credit matrix generated by the pre-output. Then, we perform similarity modeling on the two sparse feature blocks. Meanwhile, to ensure that the vectors in Query could capture dense contextual information, we design a Class Attention Module and embed it into Sparse Attention Module. Note that, compared with Dual Attention Network for scene segmentation, our attention module greatly reduces the consumption of computing resources while ensuring the accuracy. Furthermore, in the stage of feature extraction, the use of downsampling will cause serious loss of detailed information and affect the segmentation performance of the network, so we adopt a multi-task feature extraction network. It learns semantic features and edge features in parallel, and we feed the learned edge features into the deep layer of the network to help restore detailed information for capturing high-quality semantic features. We do not use pure concatenation. Instead, we extract the edge features related to each channel by element-wise multiplication before concatenation. Finally, we conduct experiments on three datasets: Cityscapes, PASCAL VOC2012 and ADE20K, and obtain competitive results.

作者

我是这篇论文的作者
点击您的名字以认领此论文并将其添加到您的个人资料中。

评论

主要评分

4.6
评分不足

次要评分

新颖性
-
重要性
-
科学严谨性
-
评价这篇论文

推荐

暂无数据
暂无数据